Published on :

Product Data Analysis and Transformation

This code is also available in: Deutsch Español Français
Awaiting validation
The main objective of this script is to provide a deep understanding of a product database and to add relevant information to it. It starts by establishing a shortcut to an external folder via a LIBNAME statement. Then, it inspects the structure and content of the 'cadastro_produto' dataset. Descriptive statistical analyses (univariate and cross-tabulated frequencies) are performed on variables such as 'gênero', 'plataforma' and 'nome'. A DATA STEP is used to create a binary variable 'lancamento' based on the date, indicating whether a product is a 'launch'. Finally, the script saves an updated version of the dataset with this new variable, renamed for clarity, and improved labels to facilitate the interpretation of future reports.
Data Analysis

Type : EXTERNAL


The script uses the 'cadastro_produto' dataset which is loaded from the '/folders/myfolders/AluraPlay' folder via the 'alura' library. This dataset is not created or generated within the SAS script but is assumed to exist in the specified path.

1 Code Block
LIBNAME / PROC DATASETS
Explanation :
Declares a SAS library named 'alura' pointing to the specified folder path. Then, uses PROC DATASETS to list detailed information about the tables contained in this library, allowing verification of the presence of the expected datasets.
Copied!
1/* Declara o atalho para a minha pasta da AluraPlay */
2LIBNAME alura "/folders/myfolders/AluraPlay";
3 
4* Vou checar o meu diretório da Alura ;
5PROC DATASETS
6 lib=alura details;
7RUN;
2 Code Block
PROC CONTENTS
Explanation :
Displays metadata (dictionary) of the 'alura.cadastro_produto' dataset, including variable names, types, lengths, formats, and other attributes, providing an overview of the source dataset's structure.
Copied!
1* Checa o conteúdo da minha base de cadastro produto ;
2PROC CONTENTS
3 DATA=alura.cadastro_produto;
4RUN;
3 Code Block
PROC PRINT
Explanation :
Prints the entire 'alura.cadastro_produto' dataset to the default output, allowing a quick visual review of the first observations.
Copied!
1* Imprime a minha base de cadastro produto ;
2PROC PRINT
3 DATA=alura.cadastro_produto;
4RUN;
4 Code Block
PROC FREQ
Explanation :
Generates univariate frequency tables for the categorical variables 'genero', 'plataforma' and 'nome' from the 'alura.cadastro_produto' dataset. The NLEVELS option indicates the number of unique levels for each variable, useful for data exploration.
Copied!
1* Gera as frquencias das variáveis Gênero, Plataforma e Nome ;
2PROC FREQ
3 DATA=alura.cadastro_produto nlevels;
4 TABLE genero plataforma nome;
5RUN;
5 Code Block
DATA STEP Data
Explanation :
Creates a new temporary dataset named 'teste' by copying 'alura.cadastro_produto'. A new variable 'lancamento' is created: it takes the value 1 if the 'data' variable is greater than '201606' (indicating a recent launch), and 0 otherwise. This transforms a date into a binary indicator.
Copied!
1/* Cria uma nova base com a variável de flag de lançamento */
2DATA teste;
3SET alura.cadastro_produto;
4 
5IF DATA > 201606
6 THEN lancamento = 1;
7 ELSE lancamento = 0;
8 
9RUN;
6 Code Block
PROC PRINT
Explanation :
Prints the content of the 'teste' dataset, allowing visual verification of the correct creation of the 'lancamento' variable.
Copied!
1* Imprime minha nova base ;
2PROC PRINT
3 DATA=teste;
4RUN;
7 Code Block
PROC FREQ
Explanation :
Generates a frequency table for the new 'lancamento' variable in the 'teste' dataset, showing the distribution of products considered 'launches' versus others.
Copied!
1* Gera a tabela de frequência da minha variável de laçamento ;
2PROC FREQ
3 DATA=teste;
4 TABLE lancamento;
5RUN;
8 Code Block
PROC FREQ
Explanation :
Generates a cross-frequency table between 'genero' and 'lancamento' from the 'teste' dataset. The NOROW, NOCOL, and NOPERCENT options suppress row, column, and total percentages, displaying only counts for targeted analysis.
Copied!
1* Gera a frequencia cruzada das variáveis Gênero e Lançamento ;
2PROC FREQ
3 DATA=teste;
4 TABLE genero*lancamento
5 /norow nocol nopercent;
6RUN;
9 Code Block
PROC FREQ
Explanation :
Generates a cross-frequency table between 'nome' and 'genero' from the original 'alura.cadastro_produto' dataset. The LIST option displays the contingency table in a listed format, which is useful for combinations with many levels. NLEVELS displays the number of unique levels for each variable.
Copied!
1* Gera a lista cruzada das variáveis Nome e Gênero ;
2PROC FREQ
3 DATA=alura.cadastro_produto nlevels;
4 *table nome;
5 TABLE nome*genero
6 / list;
7RUN;
10 Code Block
DATA STEP Data
Explanation :
Creates the final 'alura.cadastro_produto_v2' dataset by copying 'teste'. The 'lancamento' variable is renamed to 'flag_lancamento' to improve clarity. Labels are assigned to the 'Genero' and 'lancamento' variables to enrich data documentation and improve readability of report outputs.
Copied!
1/* Salva a minha base intermendiária no diretório da AluraPlay */
2DATA alura.cadastro_produto_v2;
3SET teste;
4rename lancamento = flag_lancamento;
5label Genero = "Gênero"
6 lancamento = "Marque 1 pour les jeux qui sont un lancement et 0 sinon";
7RUN;
11 Code Block
PROC CONTENTS
Explanation :
Checks the metadata of the final 'alura.cadastro_produto_v2' dataset to confirm that all modifications (variable renaming, label application) have been performed correctly.
Copied!
1* Checar se minha base 'cadastro_produto_v2' foi criada corretamente ;
2PROC CONTENTS
3 DATA=alura.cadastro_produto_v2;
4RUN;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.