Monte Carlo Permutation Test with PROC IML

This code is also available in: Deutsch Español Français
Difficulty Level
Beginner
Published on :
The script starts by creating an internal dataset. It performs an initial T-test to observe the real difference. Then, it uses PROC IML to generate 1000 random permutations of the target variable 'Money'. These permutations are analyzed by PROC TTEST to generate an empirical distribution of mean differences under the null hypothesis. Finally, PROC UNIVARIATE and a DATA step calculate the empirical p-value by comparing the observed statistic to the simulated distribution.
Data Analysis

Type : INTERNAL_CREATION


Data is defined directly in the script via the datalines statement in the 'cash' dataset.

1 Code Block
DATA STEP Data
Explanation :
Creation of the initial 'cash' dataset containing the variables 'School' and 'Money' with embedded data.
Copied!
1DATA cash;
2 INPUT School Money;
3 
4DATALINES;
50 34
60 1200
7...
81 3
91 0
10;
2 Code Block
PROC TTEST
Explanation :
Execution of the initial Student's T-test to obtain the observed mean difference on the real data.
Copied!
1PROC TTEST DATA=cash;
2 class School;
3 var Money;
4RUN;
3 Code Block
PROC IML Data
Explanation :
Using the IML matrix language to read data, generate 1000 random permutations of the 'Money' column (variable x[,2]) while keeping 'School' fixed, and save the result in 'newds'.
Copied!
1ods OUTPUT off;
2ods exclude all;
3 
4PROC IML ;
5 use cash;
6 read all var{School Money} into x;
7 p=t(ranperm(x[, 2], 1000));
8 paf=x[, 1]||p;
9 create newds from paf;
10 append from paf;
11 QUIT;
4 Code Block
PROC TTEST
Explanation :
Calculation of T-tests for the 1000 permuted columns (col2 to col1001) relative to the group variable (col1). The results (confidence limits including the mean) are exported to the 'diff' table.
Copied!
1ods OUTPUT conflimits=diff;
2 
3PROC TTEST DATA=newds plots=none;
4 class col1;
5 var col2 - col1001;
6RUN;
7 
8ods OUTPUT on;
9ods exclude none;
5 Code Block
PROC UNIVARIATE
Explanation :
Analysis of the distribution of simulated mean differences (stored in the 'mean' variable of the 'diff' table).
Copied!
1PROC UNIVARIATE DATA=diff;
2 where method="Pooled";
3 var mean;
4 histogram mean;
5RUN;
6 Code Block
DATA STEP Data
Explanation :
Filtering of simulated results to keep only those with an absolute difference greater than or equal to the observed value (114.6), in order to calculate the p-value.
Copied!
1DATA numdiffs;
2 SET diff;
3 where method="Pooled";
4 
5 IF abs(mean) >=114.6;
6RUN;
7 Code Block
PROC PRINT
Explanation :
Display of permutations that satisfy the extremeness criterion for visual verification.
Copied!
1 
2PROC PRINT
3DATA=numdiffs;
4where method="Pooled";
5RUN;
6 
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.

Related Documentation

Aucune documentation spécifique pour cette catégorie.