Published on :
Statistical CREATION_INTERNE

Descriptive analysis of categorical variables

This code is also available in: Deutsch Français
Awaiting validation
The script begins by creating two datasets (`height_and_weight` and `height_and_weight_20`) via DATA steps using `datalines`. It then uses `PROC PRINT` to generate comprehensive or selective list reports, `PROC SQL` as an alternative for similar queries, and `PROC FREQ` to obtain frequency tables for categorical variables, with a demonstration of the `MISSING` option.
Data Analysis

Type : CREATION_INTERNE


Data is entirely created within the SAS script via DATALINES statements in DATA steps.

1 Code Block
DATA STEP Data
Explanation :
Creates a SAS dataset named `height_and_weight` with variables `id` (character), `sex` (character), `ht_in` (numeric), and `wgt_lbs` (numeric) from provided in-line data.
Copied!
1DATA height_and_weight;
2 INPUT id $ sex $ ht_in wgt_lbs;
3 DATALINES;
4 001 Male 71 190
5 002 Male 69 176
6 003 Female 64 130
7 004 Female 65 154
8;
9RUN;
2 Code Block
PROC PRINT
Explanation :
Generates a list report displaying all observations and variables from the `height_and_weight` dataset.
Copied!
1PROC PRINT DATA = height_and_weight;
2RUN;
3 Code Block
PROC SQL
Explanation :
Performs an SQL query to select and display all columns from the `height_and_weight` dataset.
Copied!
1PROC SQL;
2 select *
3 from height_and_weight;
4QUIT;
4 Code Block
PROC PRINT
Explanation :
Displays a list report for the `id` and `ht_in` variables from the `height_and_weight` dataset, without including the observation number column (`noobs`).
Copied!
1 
2PROC PRINT
3DATA = height_and_weight noobs;
4var id ht_in;
5RUN;
6 
5 Code Block
PROC SQL
Explanation :
Performs an SQL query to select and display only the `id` and `ht_in` columns from the `height_and_weight` dataset.
Copied!
1PROC SQL;
2 SELECT id, ht_in
3 FROM height_and_weight;
4QUIT;
6 Code Block
DATA STEP Data
Explanation :
Creates a second SAS dataset named `height_and_weight_20` with 20 observations, including missing values for the `sex` variable, from in-line data.
Copied!
1DATA height_and_weight_20;
2 INPUT id $ sex $ ht_in wgt_lbs;
3 DATALINES;
4 001 Male 71 190
5 002 Male 69 176
6 003 Female 64 130
7 004 Female 65 154
8 005 . 73 173
9 006 Male 69 182
10 007 Female 68 140
11 008 . 73 185
12 009 Female 71 157
13 010 Male 66 155
14 011 Male 71 213
15 012 Female 69 151
16 013 Female 66 147
17 014 Female 68 196
18 015 Male 75 212
19 016 Female 69 190
20 017 Female 66 194
21 018 Female 65 176
22 019 Female 65 176
23 020 Female 65 102
24RUN;
7 Code Block
PROC FREQ
Explanation :
Generates univariate frequency tables for all variables in the `height_and_weight_20` dataset.
Copied!
1PROC FREQ DATA = height_and_weight_20;
2RUN;
8 Code Block
PROC FREQ
Explanation :
Generates a frequency table specifically for the `sex` variable in the `height_and_weight_20` dataset.
Copied!
1 
2PROC FREQ
3DATA = height_and_weight_20;
4TABLE sex;
5RUN;
6 
9 Code Block
PROC FREQ
Explanation :
Generates a frequency table for the `sex` variable in the `height_and_weight_20` dataset, explicitly including missing values in the report.
Copied!
1 
2PROC FREQ
3DATA = height_and_weight_20;
4TABLE sex / missing;
5RUN;
6 
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
Banner
Expert Advice
Expert
Stéphanie
Spécialiste Machine Learning et IA.
« This SAS script demonstrates the three pillars of data handling: Ingestion, Selective Reporting, and Data Quality Auditing. By juxtaposing procedural SAS with declarative SQL, the script highlights the versatility required for effective data manipulation. »