Analysis of the mean difference between two groups

The script begins by generating a synthetic population of 10,000 individuals with randomly assigned sexes and heights. Sex is determined binarily (male/female) and height is generated from distinct normal distributions for each sex. Then, it uses PROC SURVEYSELECT to extract a simple random sample of 50 individuals from this population. Comments indicate the intention to use PROC TTEST to compare the average heights of males and females in this sample. The script is pedagogical, illustrating the creation of simulated data and sampling prior to statistical analysis.

Data Analysis

Type : CREATION_INTERNE

All data used (random_pop1 and random_subpop1) are synthetically created within the script using a DATA STEP and a sampling procedure (PROC SURVEYSELECT). No external data is required.

1 Code Block

DATA STEP Data

Explanation :
This DATA STEP block generates a dataset named `random_pop1` containing 10,000 observations. Each observation represents an individual with an assigned sex (`male`, binary) and height (`height`). Sex is randomly determined (50/50) using the `uniform` function, and height is generated from a distinct normal distribution for males and females using the `normal` function, reflecting different means and standard deviations. The temporary variables `i` and `x` used for generation are dropped from the final dataset.

Copied!

1	DATA random_pop1 (drop = i x);
2	DO i=1 to 10000;
3	x = uniform(123456);
4	IF x >.5 THEN male = 1;
5	ELSE male = 0;
6	IF male = 1 THEN height = round(71 + 4.32*normal(0), .01);
7	ELSE IF male = 0 THEN height = round(64.3 + 2.11*normal(0), .01);
8	OUTPUT;
9	DO;
10	RUN;

2 Code Block

PROC SURVEYSELECT

Explanation :
This `PROC SURVEYSELECT` procedure performs simple random sampling (`srs`) from the `random_pop1` dataset created previously. It selects 50 observations randomly and stores them in a new dataset named `random_subpop1`. The `seed = 2001` option ensures reproducibility of the sample, and `noprint` suppresses the display of the procedure's results in the SAS output.

Copied!

1	PROC SURVEYSELECT DATA=random_pop1 noprint
2	seed = 2001
3	method = srs
4	sampsize = 50
5	out = random_subpop1;
6	RUN;

3 Code Block

PROC TTEST

Explanation :
Comments indicate the intention to use `PROC TTEST` on the `random_subpop1` dataset to test for a significant difference between the average heights of 'male' and 'female' groups. Although the SAS code for this procedure is not included in the provided script, it is the script's final analytical objective.

Copied!

1	/*
2	PROC TTEST est suggérée par les commentaires, mais non fournie dans le script. */

This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.

Copyright Info : Practice: Test for a Difference Between Two Means. This code is posted for your benefit; however, I highly recommend that you practice typing your own SAS programs as well. With the SAS programming language, as with all new languages, immersion seems to be the best way to learn.

Retour à la liste