The program first generates a dataset of 20,000 clusters, each containing 5 elemental units with a success probability of 0.6 and an intra-cluster correlation of 0.3. It then transposes the data to verify the correlation structure. Next, it performs aggregations by cluster and analyzes the variance. Finally, a Monte Carlo simulation is conducted to estimate the Type I error rate on over-dispersed data.
Data Analysis
Type : CREATION_INTERNE
All data is algorithmically generated via DATA steps using the 'uniform' random function.
1 Code Block
DATA STEP Data
Explanation : Generation of 20,000 simulated observations structured into clusters (m=5) with a defined intra-class correlation (rho2=0.3).
Copied!
data correlated_bernoullis;
n = 20000; *--- Number of clusters;
m = 5; *--- Number of elemental units within the cluster;
pi = 0.6; *--- Probability of success of each elemental unit;
rho2 = 0.3; *--- Intra-cluster correlation;
seed = 16670;
rho = sqrt( rho2 );
do subjid = 1 to n;
yy = 0; *--- Variable yy plays the role of y of eq. (1.7);
u = uniform( seed );
if u < pi then yy = 1;
do i=1 to m;
y = 0;
u = uniform( seed );
if u < rho then y = yy;
else do;
uu = uniform( seed );
if uu < pi then y = 1;
end;
output;
end;
end;
keep subjid i y;
run;
1
DATA correlated_bernoullis;
2
n = 20000; *--- Number of clusters;
3
m = 5; *--- Number of elemental units within the cluster;
4
pi = 0.6; *--- Probability of success of each elemental unit;
5
rho2 = 0.3; *--- Intra-cluster correlation;
6
seed = 16670;
7
rho = sqrt( rho2 );
8
DO subjid = 1 to n;
9
yy = 0; *--- Variable yy plays the role of y of eq. (1.7);
10
u = uniform( seed );
11
IF u < pi THEN yy = 1;
12
DO i=1 to m;
13
y = 0;
14
u = uniform( seed );
15
IF u < rho THEN y = yy;
16
ELSEDO;
17
uu = uniform( seed );
18
IF uu < pi THEN y = 1;
19
END;
20
OUTPUT;
21
END;
22
END;
23
keep subjid i y;
24
RUN;
2 Code Block
PROC TRANSPOSE Data
Explanation : Data pivoting to transform intra-cluster observations into distinct columns (_1 to _5) to calculate correlations.
Copied!
proc transpose data=correlated_bernoullis out=new;
by subjid;
id i;
var y;
run;
1
PROC TRANSPOSEDATA=correlated_bernoullis out=new;
2
BY subjid;
3
id i;
4
var y;
5
RUN;
3 Code Block
PROC CORR
Explanation : Calculation of the correlation matrix between cluster units to verify if the simulation respects the defined parameters.
Copied!
ods html;
proc corr data=new;
var _1 - _5;
run;
ods html close;
1
ods html;
2
PROC CORRDATA=new;
3
var _1 - _5;
4
RUN;
5
ods html close;
4 Code Block
PROC MEANS Data
Explanation : Data aggregation: calculation of the sum of successes 'y' by cluster (subjid).
Copied!
proc means data=correlated_bernoullis sum noprint;
class subjid;
var y;
output out=out1 sum=t;
run;
data out1;
set out1;
if _type_ = 1;
drop _type _freq_;
run;
1
PROC MEANSDATA=correlated_bernoullis sum noprint;
2
class subjid;
3
var y;
4
OUTPUT out=out1 sum=t;
5
RUN;
6
7
DATA out1;
8
SET out1;
9
IF _type_ = 1;
10
drop _type _freq_;
11
RUN;
5 Code Block
PROC MEANS
Explanation : Descriptive analysis (mean and variance) of the aggregated variable 't'.
Copied!
ods html;
proc means data=out1 mean var maxdec=2;
var t;
run;
ods html close;
1
ods html;
2
PROC MEANSDATA=out1 mean var maxdec=2;
3
var t;
4
RUN;
5
ods html close;
6 Code Block
DATA STEP Data
Explanation : Preparation of Monte Carlo simulation: addition of a replication variable 'reps' to divide the data into 1000 subsets.
Copied!
data correlated_bernoullis; *--- This data set was created in Program 1.1;
set correlated_bernoullis;
reps = ceil( subjid / 20 ); *--- Creates 1000 reps of n=20 clusters of size m=5 each;
run;
1
DATA correlated_bernoullis; *--- This data set was created in Program 1.1;
2
SET correlated_bernoullis;
3
reps = ceil( subjid / 20 ); *--- Creates 1000 reps of n=20 clusters of size m=5 each;
4
RUN;
7 Code Block
PROC MEANS Data
Explanation : Calculation of the estimated proportion (Pi_hat) for each replication.
Copied!
proc means data=correlated_bernoullis mean noprint;
class reps;
var y;
output out=test mean=Pi_hat;
run;
1
PROC MEANSDATA=correlated_bernoullis mean noprint;
2
class reps;
3
var y;
4
OUTPUT out=test mean=Pi_hat;
5
RUN;
8 Code Block
DATA STEP Data
Explanation : Calculation of the Z-score and determination of significance (P-value) to test the null hypothesis (Pi=0.6) for each replication.
Copied!
data test;
set test;
if _type_ = 1;
z = (Pi_hat - 0.6) / sqrt( Pi_hat*(1-Pi_hat) / 100 );
P_val = 0;
if z > 1.6449 then P_val = 1;
run;
Explanation : Final estimation of the Type I error rate by calculating the average of null hypothesis rejections across all simulations.
Copied!
title "Monte Carlo Estimate of Type I Error Rate";
proc means data=test n mean maxdec=3;
var P_val;
run;
1
title "Monte Carlo Estimate of Type I Error Rate";
2
PROC MEANSDATA=test n mean maxdec=3;
3
var P_val;
4
RUN;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. WeAreCAS is an independent community site and is not affiliated with SAS Institute Inc.
This site uses technical and analytical cookies to improve your experience.
Read more.