This program first generates a synthetic population of 10,000 individuals with gender and height attributes. It calculates descriptive statistics (mean, standard deviation, confidence interval) for the total population. Then, it performs two simple random samples (SRS) of different sizes (10 and 1,000 individuals) using the SURVEYSELECT procedure to illustrate how increasing the sample size reduces the width of the 95% confidence interval.
Data Analysis
Type : CREATION_INTERNE
Data is entirely generated via a DATA step using random functions (uniform, normal) to simulate 10,000 observations.
1 Code Block
DATA STEP Data
Explanation : Creation of a 'random10000' dataset containing 10,000 simulated observations. The 'male' variable is determined uniformly and the 'height' variable is generated according to a normal distribution conditioned by gender.
Copied!
data random10000 (drop = i x);
do i=1 to 10000;
x = uniform(123456);
if x >.5 then male = 1;
else male = 0;
if male = 1 then height = round(71 + 4.32*normal(0), .01);
else if male = 0 then height = round(64.3 + 2.11*normal(0), .01);
output;
end;
run;
1
DATA random10000 (drop = i x);
2
DO i=1 to 10000;
3
x = uniform(123456);
4
IF x >.5THEN male = 1;
5
ELSE male = 0;
6
IF male = 1THEN height = round(71 + 4.32*normal(0), .01);
7
ELSEIF male = 0 THEN height = round(64.3 + 2.11*normal(0), .01);
8
OUTPUT;
9
END;
10
RUN;
2 Code Block
PROC MEANS
Explanation : Calculation of descriptive statistics for the total population, including mean, standard deviation, and 95% confidence limits of the mean (CLM).
Copied!
proc means data = random10000 n mean std clm;
var height;
run;
1
2
PROC MEANS
3
DATA = random10000 n mean std clm;
4
var height;
5
RUN;
6
3 Code Block
PROC SURVEYSELECT Data
Explanation : Selection of a simple random sample (SRS) of 10 observations from the 'random10000' population, stored in the 'random10' table.
Explanation : Calculation of descriptive statistics and the confidence interval for the small sample of 10 individuals.
Copied!
proc means data = random10 n mean std clm;
var height;
run;
1
2
PROC MEANS
3
DATA = random10 n mean std clm;
4
var height;
5
RUN;
6
5 Code Block
PROC SURVEYSELECT Data
Explanation : Selection of a larger simple random sample (SRS) of 1,000 observations from the 'random10000' population, stored in the 'random1000' table.
Explanation : Calculation of descriptive statistics and the confidence interval for the large sample of 1,000 individuals, allowing comparison of precision with the previous sample.
Copied!
proc means data = random1000 n mean std clm;
var height;
run;
1
2
PROC MEANS
3
DATA = random1000 n mean std clm;
4
var height;
5
RUN;
6
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. WeAreCAS is an independent community site and is not affiliated with SAS Institute Inc.
This site uses technical and analytical cookies to improve your experience.
Read more.