Monte Carlo Permutation Test with PROC IML and PROC TTEST

The script first creates a manual dataset. It calculates the observed mean difference via a classic PROC TTEST. Then, it uses PROC IML to generate 1000 permutations of the response variable ('Money') while keeping the groups ('School') fixed. These permutations are analyzed en masse by PROC TTEST to generate a null distribution of mean differences. Finally, the script compares the observed difference to this distribution to estimate an empirical p-value.

Data Analysis

Type : INTERNAL_CREATION

The data is defined directly in the code via a Data Step using DATALINES (School and Money variables).

1 Code Block

DATA STEP Data

Explanation :
Creation of the 'cash' dataset containing observations for two schools (School 0 and 1) and associated amounts.

Copied!

1	DATA cash;
2	INPUT School Money;
3
4	/* School = 0 = SMU
5	School = 1 = Seattle U */
6
7	DATALINES;
8	0 34
9	0 1200
10	0 23
11	0 50
12	0 60
13	0 50
14	0 0
15	0 0
16	0 30
17	0 89
18	0 0
19	0 300
20	0 400
21	0 20
22	0 10
23	0 0
24	1 20
25	1 10
26	1 5
27	1 0
28	1 30
29	1 50
30	1 0
31	1 100
32	1 110
33	1 0
34	1 40
35	1 10
36	1 3
37	1 0
38	;

2 Code Block

PROC TTEST

Explanation :
Execution of the initial t-test on real data to obtain the observed statistic (the actual mean difference).

Copied!

1	PROC TTEST DATA=cash;
2	class School;
3	*may need to convert School to numeric;
4	var Money;
5	RUN;

3 Code Block

PROC IML Data

Explanation :
Use of the IML matrix language to read data, generate 1000 random permutations of the 'Money' column (ranperm function), and save the result in a wide table 'newds'. ODS outputs are disabled to improve performance.

Copied!

1	ods OUTPUT off;
2	ods exclude all;
3	*borrowed code from internet ... randomizes observations and creates a matrix ... one row per randomization ;
4
5	PROC IML ;
6	use cash;
7	read all var{School Money} into x;
8	*change varibale names here ... make sure it is class then var ... in that order.;
9	p=t(ranperm(x[, 2], 1000));
10	*Note that the "1000" here is the number of permutations. ;
11	paf=x[, 1]\|\|p;
12	create newds from paf;
13	append from paf;
14	QUIT;
15	*calculates differences and creates a histogram;
16	ods OUTPUT conflimits=diff;

4 Code Block

PROC TTEST Data

Explanation :
Massive execution of t-tests on the 1000 permuted columns. The 'newds' table contains the class in col1 and permutations in col2-col1001. The results (confidence limits/differences) are captured in the 'diff' table via the previously declared 'ods output conflimits=diff'.

Copied!

1	PROC TTEST DATA=newds plots=none;
2	class col1;
3	var col2 - col1001;
4	RUN;
5
6	ods OUTPUT on;
7	ods exclude none;

5 Code Block

PROC UNIVARIATE

Explanation :
Analysis of the distribution of simulated mean differences (stored in the 'mean' variable of the 'diff' table) to visualize the null distribution.

Copied!

1	PROC UNIVARIATE DATA=diff;
2	where method="Pooled";
3	var mean;
4	histogram mean;
5	RUN;

6 Code Block

DATA STEP Data

Explanation :
Calculation of the empirical p-value: iterations where the simulated difference (in absolute value) is greater than or equal to the actually observed difference (hardcoded here at 114.6) are filtered.

Copied!

1	DATA numdiffs;
2	SET diff;
3	where method="Pooled";
4
5	IF abs(mean) >=114.6;
6	*if abs(mean) >=44.0667;
7	*you will need to put the observed difference you got from t test above here. note IF you have a one or two tailed test.;
8	RUN;

7 Code Block

PROC PRINT

Explanation :
Display of identified extreme cases for verification.

Copied!

1
2	PROC PRINT
3	DATA=numdiffs;
4	where method="Pooled";
5	RUN;
6

This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.

Retour à la liste