Analysis of Variance (ANOVA) and Covariance (ANCOVA)

The script is a series of statistical analysis examples. For each example, a dataset is first created via a DATA step with internal data (CARDS). Then, visualization procedures like PROC BOXPLOT and PROC SGPLOT are used to explore relationships between variables. The core of the analysis relies on PROC GLM (General Linear Models), which is used to perform: 1) An ANOVA to test the effect of a classification variable on a response variable. 2) An ANCOVA to do the same by adjusting the effect for a continuous variable (covariate). Least squares means (LSMEANS) are calculated to compare groups. This process is repeated for several datasets named medicine, data1, edu, na, and sale.

Data Analysis

Type : CREATION_INTERNE

All datasets (medicine, data1, edu, na, sale) are created and populated directly within the script using DATA steps and the CARDS/DATALINES statement. No external data is required.

1 Code Block

DATA STEP Data

Explanation :
Creation of the 'medicine' table. The ' @@' option in the INPUT statement tells SAS to read multiple observations from the same data line.

Copied!

1	DATA medicine;
2	INPUT trt x y @code_sas_json/8_SAS_Intro_ReadFile_MultiCol_@@.json;
3	CARDS;
4	1 27.2 32.6 1 22.0 36.6
5	1 33.0 37.7 1 26.8 31.0
6	2 28.6 33.8 2 26.8 31.7
7	2 26.5 30.7 2 26.8 30.4
8	3 28.6 35.2 3 22.4 29.1
9	3 23.2 28.9 3 24.4 30.2
10	4 29.3 35.0 4 21.8 27.0
11	4 30.3 36.4 4 24.3 30.5
12	5 20.4 24.6 5 19.6 23.4
13	5 25.1 30.3 5 18.1 21.8
14	;
15	RUN;

2 Code Block

PROC SORT

Explanation :
Sorting the 'medicine' dataset by the 'trt' treatment variable. This step is often a prerequisite for 'BY-group' analyses.

Copied!

1	PROC SORT DATA=medicine;
2	BY trt;
3	RUN;

3 Code Block

PROC BOXPLOT

Explanation :
Generation of boxplots to visualize the distribution of variable 'y' for each treatment group 'trt'.

Copied!

1	PROC BOXPLOT DATA=medicine;
2	plot y*trt;
3	RUN;

4 Code Block

PROC SGPLOT

Explanation :
Creation of a scatter plot to visualize the relationship between variables 'x' and 'y', differentiating points by treatment group 'trt'.

Copied!

1
2	PROC SGPLOT
3	DATA=medicine;
4	scatter x=x y=y / group=trt;
5	RUN;
6

5 Code Block

PROC GLM

Explanation :
Analysis of Variance (ANOVA). This block tests whether the mean of the response variable 'y' significantly differs between groups defined by 'trt'. LSMEANS with TDIFF compares the means of each pair of groups.

Copied!

1	PROC GLM DATA=medicine ;
2	CLASS trt;
3	MODEL y=trt /SOLUTION;
4	LSMEANS trt/TDIFF;
5	RUN;

6 Code Block

PROC GLM

Explanation :
Analysis of Covariance (ANCOVA). This model tests differences in 'y' between 'trt' groups while controlling for the effect of the continuous covariate 'x'.

Copied!

1	PROC GLM DATA=medicine ;
2	CLASS trt;
3	MODEL y=trt x /SOLUTION;
4	LSMEANS trt/TDIFF;
5	RUN;

7 Code Block

DATA STEP Data

Explanation :
Creation of a second dataset 'data1' with a character treatment variable ('A', 'B') and two numeric variables.

Copied!

1	DATA data1;
2	INPUT trt $ x y;
3	CARDS;
4	A 5 20
5	A 10 23
6	A 12 30
7	A 9 25
8	A 23 34
9	A 21 40
10	A 14 27
11	A 18 38
12	A 6 24
13	A 13 31
14	B 7 19
15	B 12 26
16	B 27 33
17	B 24 35
18	B 18 30
19	B 22 31
20	B 26 34
21	B 21 28
22	B 14 23
23	B 9 22
24	;
25	RUN;

8 Code Block

PROC GLM

Explanation :
Execution of an Analysis of Covariance (ANCOVA) on the 'data1' dataset to evaluate the effect of 'trt' on 'y' by adjusting for 'x'.

Copied!

1	PROC GLM DATA=data1 ;
2	CLASS trt;
3	MODEL y=trt x /SOLUTION;
4	LSMEANS trt/TDIFF;
5	RUN;

9 Code Block

DATA STEP Data

Explanation :
Creation of the 'edu' dataset to compare different methods. The ' @@' option allows reading multiple observations per line.

Copied!

1	DATA edu;
2	INPUT method x y @code_sas_json/8_SAS_Intro_ReadFile_MultiCol_@@.json;
3	CARDS;
4	1 29 39 1 4 34 1 18 36
5	2 17 35 2 35 38 2 3 32
6	3 1 38 3 15 43 3 32 44
7	;
8	RUN;

10 Code Block

PROC GLM

Explanation :
Analysis of Covariance (ANCOVA) on the 'edu' table to compare the effect of 'method' on 'y' while controlling for 'x'.

Copied!

1	PROC GLM DATA=edu ;
2	CLASS method;
3	MODEL y=method x /SOLUTION;
4	LSMEANS method/TDIFF;
5	RUN;

11 Code Block

DATA STEP Data

Explanation :
Creation of the 'na' table with three treatment groups (A, B, C).

Copied!

1	DATA na;
2	INPUT trt $ x y @code_sas_json/8_SAS_Intro_ReadFile_MultiCol_@@.json;
3	CARDS;
4	A 11 6 A 8 0 A 5 2 A 14 8 A 19 11 A 6 4 A 10 13 A 6 1 A 11 8 A 3 0
5	B 6 0 B 6 2 B 7 3 B 8 1 B 18 18 B 8 4 B 19 14 B 8 9 B 5 1 B 15 9
6	C 16 13 C 13 10 C 11 18 C 9 5 C 21 23 C 16 12 C 12 5 C 12 16 C 7 1 C 12 20
7	;
8	RUN;

12 Code Block

PROC GLM

Explanation :
Analysis of Covariance (ANCOVA) on the 'na' data.

Copied!

1	PROC GLM DATA=na ;
2	CLASS trt;
3	MODEL y=trt x /SOLUTION;
4	LSMEANS trt/TDIFF;
5	RUN;

13 Code Block

DATA STEP Data

Explanation :
Creation of the last example dataset, 'sale'.

Copied!

1	DATA sale;
2	INPUT type x y @code_sas_json/8_SAS_Intro_ReadFile_MultiCol_@@.json;
3	CARDS;
4	1 38 21 1 39 26 1 36 22 1 45 28 1 33 19
5	2 43 34 2 38 26 2 38 29 2 27 18 2 34 25
6	3 24 23 3 32 29 3 31 30 3 21 16 3 28 29
7	;
8	RUN;

14 Code Block

PROC GLM

Explanation :
Final Analysis of Covariance on the 'sale' table to evaluate the effect of 'type' on 'y' by adjusting for the covariate 'x'.

Copied!

1	PROC GLM DATA=sale ;
2	CLASS type;
3	MODEL y=type x /SOLUTION;
4	LSMEANS type/TDIFF;
5	RUN;

This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.

Retour à la liste