Published on :
Statistics CREATION_INTERNE

Simulation of Correlated Bernoulli Variables and Monte Carlo Analysis

This code is also available in: Deutsch Español Français
Awaiting validation
The program first generates a dataset of 20,000 clusters, each containing 5 elemental units with a success probability of 0.6 and an intra-cluster correlation of 0.3. It then transposes the data to verify the correlation structure. Next, it performs aggregations by cluster and analyzes the variance. Finally, a Monte Carlo simulation is conducted to estimate the Type I error rate on over-dispersed data.
Data Analysis

Type : CREATION_INTERNE


All data is algorithmically generated via DATA steps using the 'uniform' random function.

1 Code Block
DATA STEP Data
Explanation :
Generation of 20,000 simulated observations structured into clusters (m=5) with a defined intra-class correlation (rho2=0.3).
Copied!
1DATA correlated_bernoullis;
2 n = 20000; *--- Number of clusters;
3 m = 5; *--- Number of elemental units within the cluster;
4 pi = 0.6; *--- Probability of success of each elemental unit;
5 rho2 = 0.3; *--- Intra-cluster correlation;
6 seed = 16670;
7 rho = sqrt( rho2 );
8 DO subjid = 1 to n;
9 yy = 0; *--- Variable yy plays the role of y of eq. (1.7);
10 u = uniform( seed );
11 IF u < pi THEN yy = 1;
12 DO i=1 to m;
13 y = 0;
14 u = uniform( seed );
15 IF u < rho THEN y = yy;
16 ELSE DO;
17 uu = uniform( seed );
18 IF uu < pi THEN y = 1;
19 END;
20 OUTPUT;
21 END;
22 END;
23 keep subjid i y;
24RUN;
2 Code Block
PROC TRANSPOSE Data
Explanation :
Data pivoting to transform intra-cluster observations into distinct columns (_1 to _5) to calculate correlations.
Copied!
1PROC TRANSPOSE DATA=correlated_bernoullis out=new;
2 BY subjid;
3 id i;
4 var y;
5RUN;
3 Code Block
PROC CORR
Explanation :
Calculation of the correlation matrix between cluster units to verify if the simulation respects the defined parameters.
Copied!
1ods html;
2PROC CORR DATA=new;
3 var _1 - _5;
4RUN;
5ods html close;
4 Code Block
PROC MEANS Data
Explanation :
Data aggregation: calculation of the sum of successes 'y' by cluster (subjid).
Copied!
1PROC MEANS DATA=correlated_bernoullis sum noprint;
2 class subjid;
3 var y;
4 OUTPUT out=out1 sum=t;
5RUN;
6 
7DATA out1;
8 SET out1;
9 IF _type_ = 1;
10 drop _type _freq_;
11RUN;
5 Code Block
PROC MEANS
Explanation :
Descriptive analysis (mean and variance) of the aggregated variable 't'.
Copied!
1ods html;
2PROC MEANS DATA=out1 mean var maxdec=2;
3 var t;
4RUN;
5ods html close;
6 Code Block
DATA STEP Data
Explanation :
Preparation of Monte Carlo simulation: addition of a replication variable 'reps' to divide the data into 1000 subsets.
Copied!
1DATA correlated_bernoullis; *--- This data set was created in Program 1.1;
2 SET correlated_bernoullis;
3 reps = ceil( subjid / 20 ); *--- Creates 1000 reps of n=20 clusters of size m=5 each;
4RUN;
7 Code Block
PROC MEANS Data
Explanation :
Calculation of the estimated proportion (Pi_hat) for each replication.
Copied!
1PROC MEANS DATA=correlated_bernoullis mean noprint;
2 class reps;
3 var y;
4 OUTPUT out=test mean=Pi_hat;
5RUN;
8 Code Block
DATA STEP Data
Explanation :
Calculation of the Z-score and determination of significance (P-value) to test the null hypothesis (Pi=0.6) for each replication.
Copied!
1DATA test;
2SET test;
3 IF _type_ = 1;
4 z = (Pi_hat - 0.6) / sqrt( Pi_hat*(1-Pi_hat) / 100 );
5 P_val = 0;
6 IF z > 1.6449 THEN P_val = 1;
7RUN;
9 Code Block
PROC MEANS
Explanation :
Final estimation of the Type I error rate by calculating the average of null hypothesis rejections across all simulations.
Copied!
1title "Monte Carlo Estimate of Type I Error Rate";
2PROC MEANS DATA=test n mean maxdec=3;
3 var P_val;
4RUN;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.