Published on :
Statistical INTERNAL_CREATION

Monte Carlo Permutation Test with PROC IML and PROC TTEST

This code is also available in: Deutsch Español Français
Awaiting validation
The script first creates a manual dataset. It calculates the observed mean difference via a classic PROC TTEST. Then, it uses PROC IML to generate 1000 permutations of the response variable ('Money') while keeping the groups ('School') fixed. These permutations are analyzed en masse by PROC TTEST to generate a null distribution of mean differences. Finally, the script compares the observed difference to this distribution to estimate an empirical p-value.
Data Analysis

Type : INTERNAL_CREATION


The data is defined directly in the code via a Data Step using DATALINES (School and Money variables).

1 Code Block
DATA STEP Data
Explanation :
Creation of the 'cash' dataset containing observations for two schools (School 0 and 1) and associated amounts.
Copied!
1DATA cash;
2 INPUT School Money;
3 
4/* School = 0 = SMU
5 School = 1 = Seattle U */
6 
7DATALINES;
80 34
90 1200
100 23
110 50
120 60
130 50
140 0
150 0
160 30
170 89
180 0
190 300
200 400
210 20
220 10
230 0
241 20
251 10
261 5
271 0
281 30
291 50
301 0
311 100
321 110
331 0
341 40
351 10
361 3
371 0
38;
2 Code Block
PROC TTEST
Explanation :
Execution of the initial t-test on real data to obtain the observed statistic (the actual mean difference).
Copied!
1PROC TTEST DATA=cash;
2 class School;
3 *may need to convert School to numeric;
4 var Money;
5RUN;
3 Code Block
PROC IML Data
Explanation :
Use of the IML matrix language to read data, generate 1000 random permutations of the 'Money' column (ranperm function), and save the result in a wide table 'newds'. ODS outputs are disabled to improve performance.
Copied!
1ods OUTPUT off;
2ods exclude all;
3*borrowed code from internet ... randomizes observations and creates a matrix ... one row per randomization ;
4 
5PROC IML ;
6 use cash;
7 read all var{School Money} into x;
8 *change varibale names here ... make sure it is class then var ... in that order.;
9 p=t(ranperm(x[, 2], 1000));
10 *Note that the "1000" here is the number of permutations. ;
11 paf=x[, 1]||p;
12 create newds from paf;
13 append from paf;
14 QUIT;
15 *calculates differences and creates a histogram;
16 ods OUTPUT conflimits=diff;
4 Code Block
PROC TTEST Data
Explanation :
Massive execution of t-tests on the 1000 permuted columns. The 'newds' table contains the class in col1 and permutations in col2-col1001. The results (confidence limits/differences) are captured in the 'diff' table via the previously declared 'ods output conflimits=diff'.
Copied!
1PROC TTEST DATA=newds plots=none;
2 class col1;
3 var col2 - col1001;
4RUN;
5 
6ods OUTPUT on;
7ods exclude none;
5 Code Block
PROC UNIVARIATE
Explanation :
Analysis of the distribution of simulated mean differences (stored in the 'mean' variable of the 'diff' table) to visualize the null distribution.
Copied!
1PROC UNIVARIATE DATA=diff;
2 where method="Pooled";
3 var mean;
4 histogram mean;
5RUN;
6 Code Block
DATA STEP Data
Explanation :
Calculation of the empirical p-value: iterations where the simulated difference (in absolute value) is greater than or equal to the actually observed difference (hardcoded here at 114.6) are filtered.
Copied!
1DATA numdiffs;
2 SET diff;
3 where method="Pooled";
4 
5 IF abs(mean) >=114.6;
6 *if abs(mean) >=44.0667;
7 *you will need to put the observed difference you got from t test above here. note IF you have a one or two tailed test.;
8RUN;
7 Code Block
PROC PRINT
Explanation :
Display of identified extreme cases for verification.
Copied!
1 
2PROC PRINT
3DATA=numdiffs;
4where method="Pooled";
5RUN;
6 
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.