Published on :
Statistical INTERNAL_CREATION

Mean Difference Test (T-Test) on Simulated Data

This code is also available in: Deutsch Español Français
Awaiting validation
This educational script illustrates the process of statistical simulation. It begins by creating a synthetic population of 10,000 individuals with randomly generated sex and height attributes. It calculates the true population means, extracts a simple random sample of 50 individuals, then performs a Student's T-test to determine if the difference in average height between sexes is statistically significant in this sample.
Data Analysis

Type : INTERNAL_CREATION


The 'random_pop1' data is dynamically generated in the first Data Step using random number functions (UNIFORM, NORMAL).

1 Code Block
DATA STEP Data
Explanation :
Generation of a 'random_pop1' table with 10,000 observations. The 'male' variable is randomly assigned (~50%). The 'height' variable is generated according to a normal distribution with different mean and standard deviation parameters depending on sex.
Copied!
1DATA random_pop1 (drop = i x);
2 DO i=1 to 10000;
3 x = uniform(123456);
4 IF x >.5 THEN male = 1;
5 ELSE male = 0;
6 IF male = 1 THEN height = round(71 + 4.32*normal(0), .01);
7 ELSE IF male = 0 THEN height = round(64.3 + 2.11*normal(0), .01);
8 OUTPUT;
9 END;
10RUN;
2 Code Block
PROC MEANS
Explanation :
Calculation of descriptive statistics (mean, standard deviation, etc.) for the 'height' variable for each sex group ('male') across the entire generated population.
Copied!
1PROC MEANS DATA = random_pop1;
2 class male;
3 var height;
4 title1 "Population Mean Height for Males and Females";
5RUN;
3 Code Block
PROC SURVEYSELECT Data
Explanation :
Selection of a Simple Random Sample (SRS) of 50 observations from the 'random_pop1' population, stored in the 'random_subpop1' output table.
Copied!
1PROC SURVEYSELECT DATA=random_pop1 noprint
2 seed = 2001
3 method = srs
4 sampsize = 50
5 out = random_subpop1;
6RUN;
4 Code Block
PROC TTEST
Explanation :
Execution of a Student's T-test for independent samples on the subpopulation. It tests the null hypothesis that the mean heights are equal between males and females.
Copied!
1PROC TTEST DATA = random_subpop1;
2 class male;
3 var height;
4 title1 "T-Test for Difference in Mean Height of Males and Females in Random Population 1";
5RUN;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.