Published on :
Statistical CREATION_INTERNE

Calculation of 95% Confidence Intervals and Sampling Simulation

This code is also available in: Deutsch Español Français
Awaiting validation
This program first generates a synthetic population of 10,000 individuals with gender and height attributes. It calculates descriptive statistics (mean, standard deviation, confidence interval) for the total population. Then, it performs two simple random samples (SRS) of different sizes (10 and 1,000 individuals) using the SURVEYSELECT procedure to illustrate how increasing the sample size reduces the width of the 95% confidence interval.
Data Analysis

Type : CREATION_INTERNE


Data is entirely generated via a DATA step using random functions (uniform, normal) to simulate 10,000 observations.

1 Code Block
DATA STEP Data
Explanation :
Creation of a 'random10000' dataset containing 10,000 simulated observations. The 'male' variable is determined uniformly and the 'height' variable is generated according to a normal distribution conditioned by gender.
Copied!
1DATA random10000 (drop = i x);
2 DO i=1 to 10000;
3 x = uniform(123456);
4 IF x >.5 THEN male = 1;
5 ELSE male = 0;
6 IF male = 1 THEN height = round(71 + 4.32*normal(0), .01);
7 ELSE IF male = 0 THEN height = round(64.3 + 2.11*normal(0), .01);
8 OUTPUT;
9 END;
10RUN;
2 Code Block
PROC MEANS
Explanation :
Calculation of descriptive statistics for the total population, including mean, standard deviation, and 95% confidence limits of the mean (CLM).
Copied!
1 
2PROC MEANS
3DATA = random10000 n mean std clm;
4var height;
5RUN;
6 
3 Code Block
PROC SURVEYSELECT Data
Explanation :
Selection of a simple random sample (SRS) of 10 observations from the 'random10000' population, stored in the 'random10' table.
Copied!
1PROC SURVEYSELECT DATA=random10000
2 method = srs
3 sampsize = 10
4 out = random10;
5RUN;
4 Code Block
PROC MEANS
Explanation :
Calculation of descriptive statistics and the confidence interval for the small sample of 10 individuals.
Copied!
1 
2PROC MEANS
3DATA = random10 n mean std clm;
4var height;
5RUN;
6 
5 Code Block
PROC SURVEYSELECT Data
Explanation :
Selection of a larger simple random sample (SRS) of 1,000 observations from the 'random10000' population, stored in the 'random1000' table.
Copied!
1PROC SURVEYSELECT DATA=random10000
2 method = srs
3 sampsize = 1000
4 out = random1000;
5RUN;
6 Code Block
PROC MEANS
Explanation :
Calculation of descriptive statistics and the confidence interval for the large sample of 1,000 individuals, allowing comparison of precision with the previous sample.
Copied!
1 
2PROC MEANS
3DATA = random1000 n mean std clm;
4var height;
5RUN;
6 
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.