Published on :
Statistics CREATION_INTERNE

Analysis of Variance (ANOVA) and Covariance (ANCOVA)

This code is also available in: Deutsch Français
Awaiting validation
The script is a series of statistical analysis examples. For each example, a dataset is first created via a DATA step with internal data (CARDS). Then, visualization procedures like PROC BOXPLOT and PROC SGPLOT are used to explore relationships between variables. The core of the analysis relies on PROC GLM (General Linear Models), which is used to perform: 1) An ANOVA to test the effect of a classification variable on a response variable. 2) An ANCOVA to do the same by adjusting the effect for a continuous variable (covariate). Least squares means (LSMEANS) are calculated to compare groups. This process is repeated for several datasets named medicine, data1, edu, na, and sale.
Data Analysis

Type : CREATION_INTERNE


All datasets (medicine, data1, edu, na, sale) are created and populated directly within the script using DATA steps and the CARDS/DATALINES statement. No external data is required.

1 Code Block
DATA STEP Data
Explanation :
Creation of the 'medicine' table. The ' @@' option in the INPUT statement tells SAS to read multiple observations from the same data line.
Copied!
1DATA medicine;
2INPUT trt x y @code_sas_json/8_SAS_Intro_ReadFile_MultiCol_@@.json;
3CARDS;
41 27.2 32.6 1 22.0 36.6
51 33.0 37.7 1 26.8 31.0
62 28.6 33.8 2 26.8 31.7
72 26.5 30.7 2 26.8 30.4
83 28.6 35.2 3 22.4 29.1
93 23.2 28.9 3 24.4 30.2
104 29.3 35.0 4 21.8 27.0
114 30.3 36.4 4 24.3 30.5
125 20.4 24.6 5 19.6 23.4
135 25.1 30.3 5 18.1 21.8
14;
15RUN;
2 Code Block
PROC SORT
Explanation :
Sorting the 'medicine' dataset by the 'trt' treatment variable. This step is often a prerequisite for 'BY-group' analyses.
Copied!
1PROC SORT DATA=medicine;
2 BY trt;
3RUN;
3 Code Block
PROC BOXPLOT
Explanation :
Generation of boxplots to visualize the distribution of variable 'y' for each treatment group 'trt'.
Copied!
1PROC BOXPLOT DATA=medicine;
2 plot y*trt;
3RUN;
4 Code Block
PROC SGPLOT
Explanation :
Creation of a scatter plot to visualize the relationship between variables 'x' and 'y', differentiating points by treatment group 'trt'.
Copied!
1 
2PROC SGPLOT
3DATA=medicine;
4scatter x=x y=y / group=trt;
5RUN;
6 
5 Code Block
PROC GLM
Explanation :
Analysis of Variance (ANOVA). This block tests whether the mean of the response variable 'y' significantly differs between groups defined by 'trt'. LSMEANS with TDIFF compares the means of each pair of groups.
Copied!
1PROC GLM DATA=medicine ;
2CLASS trt;
3MODEL y=trt /SOLUTION;
4LSMEANS trt/TDIFF;
5RUN;
6 Code Block
PROC GLM
Explanation :
Analysis of Covariance (ANCOVA). This model tests differences in 'y' between 'trt' groups while controlling for the effect of the continuous covariate 'x'.
Copied!
1PROC GLM DATA=medicine ;
2CLASS trt;
3MODEL y=trt x /SOLUTION;
4LSMEANS trt/TDIFF;
5RUN;
7 Code Block
DATA STEP Data
Explanation :
Creation of a second dataset 'data1' with a character treatment variable ('A', 'B') and two numeric variables.
Copied!
1DATA data1;
2 INPUT trt $ x y;
3 CARDS;
4A 5 20
5A 10 23
6A 12 30
7A 9 25
8A 23 34
9A 21 40
10A 14 27
11A 18 38
12A 6 24
13A 13 31
14B 7 19
15B 12 26
16B 27 33
17B 24 35
18B 18 30
19B 22 31
20B 26 34
21B 21 28
22B 14 23
23B 9 22
24;
25RUN;
8 Code Block
PROC GLM
Explanation :
Execution of an Analysis of Covariance (ANCOVA) on the 'data1' dataset to evaluate the effect of 'trt' on 'y' by adjusting for 'x'.
Copied!
1PROC GLM DATA=data1 ;
2CLASS trt;
3MODEL y=trt x /SOLUTION;
4LSMEANS trt/TDIFF;
5RUN;
9 Code Block
DATA STEP Data
Explanation :
Creation of the 'edu' dataset to compare different methods. The ' @@' option allows reading multiple observations per line.
Copied!
1DATA edu;
2 INPUT method x y @code_sas_json/8_SAS_Intro_ReadFile_MultiCol_@@.json;
3 CARDS;
41 29 39 1 4 34 1 18 36
52 17 35 2 35 38 2 3 32
63 1 38 3 15 43 3 32 44
7;
8RUN;
10 Code Block
PROC GLM
Explanation :
Analysis of Covariance (ANCOVA) on the 'edu' table to compare the effect of 'method' on 'y' while controlling for 'x'.
Copied!
1PROC GLM DATA=edu ;
2CLASS method;
3MODEL y=method x /SOLUTION;
4LSMEANS method/TDIFF;
5RUN;
11 Code Block
DATA STEP Data
Explanation :
Creation of the 'na' table with three treatment groups (A, B, C).
Copied!
1DATA na;
2INPUT trt $ x y @code_sas_json/8_SAS_Intro_ReadFile_MultiCol_@@.json;
3CARDS;
4A 11 6 A 8 0 A 5 2 A 14 8 A 19 11 A 6 4 A 10 13 A 6 1 A 11 8 A 3 0
5B 6 0 B 6 2 B 7 3 B 8 1 B 18 18 B 8 4 B 19 14 B 8 9 B 5 1 B 15 9
6C 16 13 C 13 10 C 11 18 C 9 5 C 21 23 C 16 12 C 12 5 C 12 16 C 7 1 C 12 20
7;
8RUN;
12 Code Block
PROC GLM
Explanation :
Analysis of Covariance (ANCOVA) on the 'na' data.
Copied!
1PROC GLM DATA=na ;
2CLASS trt;
3MODEL y=trt x /SOLUTION;
4LSMEANS trt/TDIFF;
5RUN;
13 Code Block
DATA STEP Data
Explanation :
Creation of the last example dataset, 'sale'.
Copied!
1DATA sale;
2INPUT type x y @code_sas_json/8_SAS_Intro_ReadFile_MultiCol_@@.json;
3CARDS;
41 38 21 1 39 26 1 36 22 1 45 28 1 33 19
52 43 34 2 38 26 2 38 29 2 27 18 2 34 25
63 24 23 3 32 29 3 31 30 3 21 16 3 28 29
7;
8RUN;
14 Code Block
PROC GLM
Explanation :
Final Analysis of Covariance on the 'sale' table to evaluate the effect of 'type' on 'y' by adjusting for the covariate 'x'.
Copied!
1PROC GLM DATA=sale ;
2CLASS type;
3MODEL y=type x /SOLUTION;
4LSMEANS type/TDIFF;
5RUN;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.