Descriptive analysis of categorical variables

The script begins by creating two datasets (`height_and_weight` and `height_and_weight_20`) via DATA steps using `datalines`. It then uses `PROC PRINT` to generate comprehensive or selective list reports, `PROC SQL` as an alternative for similar queries, and `PROC FREQ` to obtain frequency tables for categorical variables, with a demonstration of the `MISSING` option.

Data Analysis

Type : CREATION_INTERNE

Data is entirely created within the SAS script via DATALINES statements in DATA steps.

1 Code Block

DATA STEP Data

Explanation :
Creates a SAS dataset named `height_and_weight` with variables `id` (character), `sex` (character), `ht_in` (numeric), and `wgt_lbs` (numeric) from provided in-line data.

Copied!

1	DATA height_and_weight;
2	INPUT id $ sex $ ht_in wgt_lbs;
3	DATALINES;
4	001 Male 71 190
5	002 Male 69 176
6	003 Female 64 130
7	004 Female 65 154
8	;
9	RUN;

2 Code Block

PROC PRINT

Explanation :
Generates a list report displaying all observations and variables from the `height_and_weight` dataset.

Copied!

1	PROC PRINT DATA = height_and_weight;
2	RUN;

3 Code Block

PROC SQL

Explanation :
Performs an SQL query to select and display all columns from the `height_and_weight` dataset.

Copied!

1	PROC SQL;
2	select *
3	from height_and_weight;
4	QUIT;

4 Code Block

PROC PRINT

Explanation :
Displays a list report for the `id` and `ht_in` variables from the `height_and_weight` dataset, without including the observation number column (`noobs`).

Copied!

1
2	PROC PRINT
3	DATA = height_and_weight noobs;
4	var id ht_in;
5	RUN;
6

5 Code Block

PROC SQL

Explanation :
Performs an SQL query to select and display only the `id` and `ht_in` columns from the `height_and_weight` dataset.

Copied!

1	PROC SQL;
2	SELECT id, ht_in
3	FROM height_and_weight;
4	QUIT;

6 Code Block

DATA STEP Data

Explanation :
Creates a second SAS dataset named `height_and_weight_20` with 20 observations, including missing values for the `sex` variable, from in-line data.

Copied!

1	DATA height_and_weight_20;
2	INPUT id $ sex $ ht_in wgt_lbs;
3	DATALINES;
4	001 Male 71 190
5	002 Male 69 176
6	003 Female 64 130
7	004 Female 65 154
8	005 . 73 173
9	006 Male 69 182
10	007 Female 68 140
11	008 . 73 185
12	009 Female 71 157
13	010 Male 66 155
14	011 Male 71 213
15	012 Female 69 151
16	013 Female 66 147
17	014 Female 68 196
18	015 Male 75 212
19	016 Female 69 190
20	017 Female 66 194
21	018 Female 65 176
22	019 Female 65 176
23	020 Female 65 102
24	RUN;

7 Code Block

PROC FREQ

Explanation :
Generates univariate frequency tables for all variables in the `height_and_weight_20` dataset.

Copied!

1	PROC FREQ DATA = height_and_weight_20;
2	RUN;

8 Code Block

PROC FREQ

Explanation :
Generates a frequency table specifically for the `sex` variable in the `height_and_weight_20` dataset.

Copied!

1
2	PROC FREQ
3	DATA = height_and_weight_20;
4	TABLE sex;
5	RUN;
6

9 Code Block

PROC FREQ

Explanation :
Generates a frequency table for the `sex` variable in the `height_and_weight_20` dataset, explicitly including missing values in the report.

Copied!

1
2	PROC FREQ
3	DATA = height_and_weight_20;
4	TABLE sex / missing;
5	RUN;
6

This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.

Retour à la liste

Expert Advice

Stéphanie

Spécialiste Machine Learning et IA.

« This SAS script demonstrates the three pillars of data handling: Ingestion, Selective Reporting, and Data Quality Auditing. By juxtaposing procedural SAS with declarative SQL, the script highlights the versatility required for effective data manipulation. »