Monte Carlo Permutation Test with PROC IML

Difficulty Level

Beginner

Published on : 08/12/2024

The script starts by creating an internal dataset. It performs an initial T-test to observe the real difference. Then, it uses PROC IML to generate 1000 random permutations of the target variable 'Money'. These permutations are analyzed by PROC TTEST to generate an empirical distribution of mean differences under the null hypothesis. Finally, PROC UNIVARIATE and a DATA step calculate the empirical p-value by comparing the observed statistic to the simulated distribution.

Data Analysis

Type : INTERNAL_CREATION

Data is defined directly in the script via the datalines statement in the 'cash' dataset.

1 Code Block

DATA STEP Data

Explanation :
Creation of the initial 'cash' dataset containing the variables 'School' and 'Money' with embedded data.

Copied!

1	DATA cash;
2	INPUT School Money;
3
4	DATALINES;
5	0 34
6	0 1200
7	...
8	1 3
9	1 0
10	;

2 Code Block

PROC TTEST

Explanation :
Execution of the initial Student's T-test to obtain the observed mean difference on the real data.

Copied!

1	PROC TTEST DATA=cash;
2	class School;
3	var Money;
4	RUN;

3 Code Block

PROC IML Data

Explanation :
Using the IML matrix language to read data, generate 1000 random permutations of the 'Money' column (variable x[,2]) while keeping 'School' fixed, and save the result in 'newds'.

Copied!

1	ods OUTPUT off;
2	ods exclude all;
3
4	PROC IML ;
5	use cash;
6	read all var{School Money} into x;
7	p=t(ranperm(x[, 2], 1000));
8	paf=x[, 1]\|\|p;
9	create newds from paf;
10	append from paf;
11	QUIT;

4 Code Block

PROC TTEST

Explanation :
Calculation of T-tests for the 1000 permuted columns (col2 to col1001) relative to the group variable (col1). The results (confidence limits including the mean) are exported to the 'diff' table.

Copied!

1	ods OUTPUT conflimits=diff;
2
3	PROC TTEST DATA=newds plots=none;
4	class col1;
5	var col2 - col1001;
6	RUN;
7
8	ods OUTPUT on;
9	ods exclude none;

5 Code Block

PROC UNIVARIATE

Explanation :
Analysis of the distribution of simulated mean differences (stored in the 'mean' variable of the 'diff' table).

Copied!

1	PROC UNIVARIATE DATA=diff;
2	where method="Pooled";
3	var mean;
4	histogram mean;
5	RUN;

6 Code Block

DATA STEP Data

Explanation :
Filtering of simulated results to keep only those with an absolute difference greater than or equal to the observed value (114.6), in order to calculate the p-value.

Copied!

1	DATA numdiffs;
2	SET diff;
3	where method="Pooled";
4
5	IF abs(mean) >=114.6;
6	RUN;

7 Code Block

PROC PRINT

Explanation :
Display of permutations that satisfy the extremeness criterion for visual verification.

Copied!

1
2	PROC PRINT
3	DATA=numdiffs;
4	where method="Pooled";
5	RUN;
6

This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.

Difficulty Level

Published on : 08/12/2024

Data Analysis

1 Code Block

2 Code Block

3 Code Block

4 Code Block

5 Code Block

6 Code Block

7 Code Block

Related Documentation