The script begins by generating a synthetic population of 10,000 individuals with randomly assigned sexes and heights. Sex is determined binarily (male/female) and height is generated from distinct normal distributions for each sex. Then, it uses PROC SURVEYSELECT to extract a simple random sample of 50 individuals from this population. Comments indicate the intention to use PROC TTEST to compare the average heights of males and females in this sample. The script is pedagogical, illustrating the creation of simulated data and sampling prior to statistical analysis.
Data Analysis
Type : CREATION_INTERNE
All data used (random_pop1 and random_subpop1) are synthetically created within the script using a DATA STEP and a sampling procedure (PROC SURVEYSELECT). No external data is required.
1 Code Block
DATA STEP Data
Explanation : This DATA STEP block generates a dataset named `random_pop1` containing 10,000 observations. Each observation represents an individual with an assigned sex (`male`, binary) and height (`height`). Sex is randomly determined (50/50) using the `uniform` function, and height is generated from a distinct normal distribution for males and females using the `normal` function, reflecting different means and standard deviations. The temporary variables `i` and `x` used for generation are dropped from the final dataset.
Copied!
data random_pop1 (drop = i x);
do i=1 to 10000;
x = uniform(123456);
if x >.5 then male = 1;
else male = 0;
if male = 1 then height = round(71 + 4.32*normal(0), .01);
else if male = 0 then height = round(64.3 + 2.11*normal(0), .01);
output;
do;
run;
1
DATA random_pop1 (drop = i x);
2
DO i=1 to 10000;
3
x = uniform(123456);
4
IF x >.5THEN male = 1;
5
ELSE male = 0;
6
IF male = 1THEN height = round(71 + 4.32*normal(0), .01);
7
ELSEIF male = 0 THEN height = round(64.3 + 2.11*normal(0), .01);
8
OUTPUT;
9
DO;
10
RUN;
2 Code Block
PROC SURVEYSELECT
Explanation : This `PROC SURVEYSELECT` procedure performs simple random sampling (`srs`) from the `random_pop1` dataset created previously. It selects 50 observations randomly and stores them in a new dataset named `random_subpop1`. The `seed = 2001` option ensures reproducibility of the sample, and `noprint` suppresses the display of the procedure's results in the SAS output.
Explanation : Comments indicate the intention to use `PROC TTEST` on the `random_subpop1` dataset to test for a significant difference between the average heights of 'male' and 'female' groups. Although the SAS code for this procedure is not included in the provided script, it is the script's final analytical objective.
Copied!
/* PROC TTEST est suggérée par les commentaires, mais non fournie dans le script. */
1
/*
2
PROC TTEST est suggérée par les commentaires, mais non fournie dans le script. */
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
Copyright Info : Practice: Test for a Difference Between Two Means. This code is posted for your benefit; however, I highly recommend that you practice typing your own SAS programs as well. With the SAS programming language, as with all new languages, immersion seems to be the best way to learn.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. WeAreCAS is an independent community site and is not affiliated with SAS Institute Inc.
This site uses technical and analytical cookies to improve your experience.
Read more.