Simulation of Correlated Bernoulli Variables and Monte Carlo Analysis

The program first generates a dataset of 20,000 clusters, each containing 5 elemental units with a success probability of 0.6 and an intra-cluster correlation of 0.3. It then transposes the data to verify the correlation structure. Next, it performs aggregations by cluster and analyzes the variance. Finally, a Monte Carlo simulation is conducted to estimate the Type I error rate on over-dispersed data.

Data Analysis

Type : CREATION_INTERNE

All data is algorithmically generated via DATA steps using the 'uniform' random function.

1 Code Block

DATA STEP Data

Explanation :
Generation of 20,000 simulated observations structured into clusters (m=5) with a defined intra-class correlation (rho2=0.3).

Copied!

1	DATA correlated_bernoullis;
2	n = 20000; *--- Number of clusters;
3	m = 5; *--- Number of elemental units within the cluster;
4	pi = 0.6; *--- Probability of success of each elemental unit;
5	rho2 = 0.3; *--- Intra-cluster correlation;
6	seed = 16670;
7	rho = sqrt( rho2 );
8	DO subjid = 1 to n;
9	yy = 0; *--- Variable yy plays the role of y of eq. (1.7);
10	u = uniform( seed );
11	IF u < pi THEN yy = 1;
12	DO i=1 to m;
13	y = 0;
14	u = uniform( seed );
15	IF u < rho THEN y = yy;
16	ELSE DO;
17	uu = uniform( seed );
18	IF uu < pi THEN y = 1;
19	END;
20	OUTPUT;
21	END;
22	END;
23	keep subjid i y;
24	RUN;

2 Code Block

PROC TRANSPOSE Data

Explanation :
Data pivoting to transform intra-cluster observations into distinct columns (_1 to _5) to calculate correlations.

Copied!

1	PROC TRANSPOSE DATA=correlated_bernoullis out=new;
2	BY subjid;
3	id i;
4	var y;
5	RUN;

3 Code Block

PROC CORR

Explanation :
Calculation of the correlation matrix between cluster units to verify if the simulation respects the defined parameters.

Copied!

1	ods html;
2	PROC CORR DATA=new;
3	var _1 - _5;
4	RUN;
5	ods html close;

4 Code Block

PROC MEANS Data

Explanation :
Data aggregation: calculation of the sum of successes 'y' by cluster (subjid).

Copied!

1	PROC MEANS DATA=correlated_bernoullis sum noprint;
2	class subjid;
3	var y;
4	OUTPUT out=out1 sum=t;
5	RUN;
6
7	DATA out1;
8	SET out1;
9	IF _type_ = 1;
10	drop _type _freq_;
11	RUN;

5 Code Block

PROC MEANS

Explanation :
Descriptive analysis (mean and variance) of the aggregated variable 't'.

Copied!

1	ods html;
2	PROC MEANS DATA=out1 mean var maxdec=2;
3	var t;
4	RUN;
5	ods html close;

6 Code Block

DATA STEP Data

Explanation :
Preparation of Monte Carlo simulation: addition of a replication variable 'reps' to divide the data into 1000 subsets.

Copied!

1	DATA correlated_bernoullis; *--- This data set was created in Program 1.1;
2	SET correlated_bernoullis;
3	reps = ceil( subjid / 20 ); *--- Creates 1000 reps of n=20 clusters of size m=5 each;
4	RUN;

7 Code Block

PROC MEANS Data

Explanation :
Calculation of the estimated proportion (Pi_hat) for each replication.

Copied!

1	PROC MEANS DATA=correlated_bernoullis mean noprint;
2	class reps;
3	var y;
4	OUTPUT out=test mean=Pi_hat;
5	RUN;

8 Code Block

DATA STEP Data

Explanation :
Calculation of the Z-score and determination of significance (P-value) to test the null hypothesis (Pi=0.6) for each replication.

Copied!

1	DATA test;
2	SET test;
3	IF _type_ = 1;
4	z = (Pi_hat - 0.6) / sqrt( Pi_hat*(1-Pi_hat) / 100 );
5	P_val = 0;
6	IF z > 1.6449 THEN P_val = 1;
7	RUN;

9 Code Block

PROC MEANS

Explanation :
Final estimation of the Type I error rate by calculating the average of null hypothesis rejections across all simulations.

Copied!

1	title "Monte Carlo Estimate of Type I Error Rate";
2	PROC MEANS DATA=test n mean maxdec=3;
3	var P_val;
4	RUN;

This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.

Retour à la liste