Permutation Test for Mean Comparison

This code is also available in: Deutsch Español Français
Difficulty Level
Beginner
Published on :
The script begins by defining a `trauma` dataset with in-line data. It then executes an observed t-test. Next, it uses PROC IML to generate thousands of permutations of the data and applies a t-test to each permutation. The results of these permuted t-tests (the mean differences) are stored in a `diff` dataset via ODS. Finally, it visualizes the distribution of permuted differences with PROC UNIVARIATE and calculates the number of permuted differences as extreme or more extreme than the observed difference to determine an empirical p-value.
Data Analysis

Type : CREATION_INTERNE


The 'trauma' dataset is created directly within the script using 'datalines'. The 'newds', 'diff', and 'numdiffs' datasets are generated internally by SAS procedures ('PROC IML', 'PROC TTEST', 'DATA STEP').

1 Code Block
DATA STEP Data
Explanation :
This DATA STEP block creates the 'trauma' dataset using in-line data (`datalines`). It contains two variables: `state` (categorical, indicating whether the subject is 'Non-Trauma' or 'Trauma') and `kcal` (numerical, representing a measurement).
Copied!
1DATA trauma;
2 INPUT state $ kcal;
3 DATALINES;
40 19
50 20
60 20
70 21
80 21
90 21
101 22
110 23
120 23
131 23
141 25
151 26
161 30
171 38
181 39
19;
20 
21/* 0 = Non-Trauma, 1 = Trauma */
2 Code Block
PROC TTEST
Explanation :
This `PROC TTEST` procedure performs a standard t-test to compare the mean of the `kcal` variable between the two groups defined by the `state` variable. This is the initial observed test whose mean difference will be compared to the permutation results.
Copied!
1PROC TTEST DATA=trauma;
2 class state;
3 *may need to convert School to numeric;
4 var kcal;
5RUN;
3 Code Block
ODS
Explanation :
These ODS (Output Delivery System) commands temporarily disable the generation of all output for subsequent SAS procedures, to avoid cluttering the log or output files with intermediate results from the many permutations.
Copied!
1ods OUTPUT off;
2ods exclude all;
4 Code Block
PROC IML Data
Explanation :
This `PROC IML` (Interactive Matrix Language) block is the core of the permutation test. It reads the 'trauma' dataset, then performs 5000 random permutations of the `kcal` variable using the `ranperm` function. The permuted data is then combined with the original `state` variable and saved into a new dataset named `newds`. This `newds` dataset will be used for the permutation t-tests.
Copied!
1PROC IML ;
2 use trauma;
3 read all var{state kcal} into x;
4 *change varibale names here ... make sure it is class then var ... in that order.;
5 p=t(ranperm(x[, 2], 5000));
6 *Note that the "1000" here is the number of permutations. ;
7 paf=x[, 1]||p;
8 create newds from paf;
9 append from paf;
10 QUIT;
5 Code Block
ODS
Explanation :
This ODS command captures the `conflimits` output table from the next `PROC TTEST` and saves it into a SAS dataset named `diff`. This table contains the confidence intervals and mean differences for each permutation.
Copied!
1ods OUTPUT conflimits=diff;
6 Code Block
PROC TTEST
Explanation :
This `PROC TTEST` is applied to the `newds` dataset (which contains the permuted data). It compares the means of the `col2` variable (representing permuted `kcal`) between groups defined by `col1` (representing `state`). The `plots=none` parameter suppresses the generation of graphs for these numerous tests. The results of the mean differences are captured in the `diff` dataset by the preceding ODS command.
Copied!
1PROC TTEST DATA=newds plots=none;
2 class col1;
3 var col2 - col1001;
4RUN;
7 Code Block
ODS
Explanation :
These ODS commands reactivate the normal generation of output, allowing subsequent SAS procedures to produce their results.
Copied!
1ods OUTPUT on;
2ods exclude none;
8 Code Block
PROC UNIVARIATE
Explanation :
This `PROC UNIVARIATE` is used to analyze and visualize the distribution of mean differences obtained from the permutation t-tests (stored in the `diff` dataset). A histogram of the `mean` variable (the mean differences) is generated, showing the empirical null distribution.
Copied!
1PROC UNIVARIATE DATA=diff;
2 where method="Pooled";
3 var mean;
4 histogram mean;
5RUN;
9 Code Block
DATA STEP Data
Explanation :
This DATA STEP creates the 'numdiffs' dataset. It filters the `diff` dataset to identify permutations where the absolute value of the mean difference (`mean`) is greater than or equal to the observed difference (`7.8089`). The number of observations in this 'numdiffs' dataset will be used to calculate the empirical p-value.
Copied!
1DATA numdiffs;
2 SET diff;
3 where method="Pooled";
4 
5 IF abs(mean) >=7.8089;
6 *you will need to put the observed difference you got from t test above here. note if you have a one or two tailed test.;
7RUN;
10 Code Block
PROC PRINT
Explanation :
This `PROC PRINT` displays the contents of the `numdiffs` dataset. It is used for a quick visual inspection of the permuted mean differences that were as extreme or more extreme than the observed difference.
Copied!
1 
2PROC PRINT
3DATA=numdiffs;
4where method="Pooled";
5RUN;
6 
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.

Related Documentation

Aucune documentation spécifique pour cette catégorie.