This program generates two synthetic datasets of 8000 observations each to illustrate hurdle models. The first dataset simulates a zero-truncated Poisson distribution with an explicit zero probability (Pi_1). The second simulates a Negative-Binomial distribution with a similar structure. The script calculates the expected theoretical means and variances for each observation, then uses PROC MEANS to compare these theoretical values to the empirical statistics of the generated data.
Data Analysis
Type : CREATION_INTERNE
Data is entirely generated algorithmically in DATA steps (using uniform, ranpoi, rangam random functions).
1 Code Block
ODS
Explanation : Initialization of HTML output.
Copied!
ods html;
1
ods html;
2 Code Block
DATA STEP Data
Explanation : Generation of the 'Poisson_Hurdle_Data' dataset. Simulates a process where an observation is either 0 (with probability p1), or drawn from a truncated Poisson distribution (rejection method) if the hurdle is crossed.
Copied!
data Poisson_Hurdle_Data;
n = 8000;
p1 = 0.4;
mu = 10;
seed = 1979;
*--- Underlying True Mean and Variance;
p2 = exp(-mu);
p1c = 1 - p1;
p2c = 1 - p2;
Phi = p1c / p2c;
Mean = Phi*mu;
Var = Phi*mu*(mu+1)- Mean*Mean;
do j=1 to n;
u = uniform(seed);
if u <= p1 then do;
Y = 0;
output;
end;
else do; *--- Crossing the hurdle;
*--- Get Truncated Poisson using Rejection Method;
do until (y>0);
y = ranpoi(seed,mu);
end;
output;
end;
end;
run;
1
DATA Poisson_Hurdle_Data;
2
n = 8000;
3
p1 = 0.4;
4
mu = 10;
5
seed = 1979;
6
7
*--- Underlying True Mean and Variance;
8
p2 = exp(-mu);
9
p1c = 1 - p1;
10
p2c = 1 - p2;
11
Phi = p1c / p2c;
12
Mean = Phi*mu;
13
Var = Phi*mu*(mu+1)- Mean*Mean;
14
15
DO j=1 to n;
16
u = uniform(seed);
17
IF u <= p1 THENDO;
18
Y = 0;
19
OUTPUT;
20
END;
21
ELSEDO; *--- Crossing the hurdle;
22
*--- Get Truncated Poisson using Rejection Method;
23
DO until (y>0);
24
y = ranpoi(seed,mu);
25
END;
26
OUTPUT;
27
END;
28
END;
29
RUN;
3 Code Block
PROC PRINT
Explanation : Display of theoretical parameters (Mean and Variance) calculated during the simulation for verification.
Copied!
title2 "Underlying True Mean and Variance";
proc print data=Poisson_Hurdle_Data noobs;
where j=1;
var Mean Var;
format _all_ 8.4;
run;
1
title2 "Underlying True Mean and Variance";
2
PROC PRINTDATA=Poisson_Hurdle_Data noobs;
3
where j=1;
4
var Mean Var;
5
FORMAT _all_ 8.4;
6
RUN;
4 Code Block
PROC MEANS
Explanation : Calculation of actual descriptive statistics on the simulated data to validate the model.
Copied!
title2 "Estimated Mean and Variance";
proc means data=Poisson_Hurdle_Data n mean var maxdec=4;
var y;
run;
1
title2 "Estimated Mean and Variance";
2
PROC MEANSDATA=Poisson_Hurdle_Data n mean var maxdec=4;
3
var y;
4
RUN;
5 Code Block
DATA STEP Data
Explanation : Generation of the 'NB_Hurdle_Data' dataset. Simulates a Negative-Binomial hurdle model. Uses a combination of Gamma and Poisson distributions to generate the Negative-Binomial.
Copied!
data NB_Hurdle_Data;
n = 8000;
p1 = 0.4;
mu = 10;
Kappa = 0.1;
alpha = 1 / kappa;
beta = kappa * mu;
seed = 1983;
*--- Underlying True Mean and Variance;
p2 = (1/(1+kappa*mu))**(alpha);
p1c = 1 - p1;
p2c = 1 - p2;
Phi = p1c / p2c;
Mean = Phi*mu;
Var = Phi*mu*(1+mu+kappa*mu)- Mean*Mean;
do j=1 to n;
u = uniform(seed);
if u <= p1 then do;
Y = 0;
output;
end;
else do; *--- Crossing the hurdle;
*--- Get Truncated Neg-bin using Rejection Method;
do until (y>0);
uu = beta * rangam( seed, alpha );
y = ranpoi( seed, uu );
end;
output;
end;
end;
drop alpha beta kappa;
run;
1
DATA NB_Hurdle_Data;
2
n = 8000;
3
p1 = 0.4;
4
mu = 10;
5
Kappa = 0.1;
6
alpha = 1 / kappa;
7
beta = kappa * mu;
8
seed = 1983;
9
10
*--- Underlying True Mean and Variance;
11
p2 = (1/(1+kappa*mu))**(alpha);
12
p1c = 1 - p1;
13
p2c = 1 - p2;
14
Phi = p1c / p2c;
15
Mean = Phi*mu;
16
Var = Phi*mu*(1+mu+kappa*mu)- Mean*Mean;
17
18
DO j=1 to n;
19
u = uniform(seed);
20
IF u <= p1 THENDO;
21
Y = 0;
22
OUTPUT;
23
END;
24
ELSEDO; *--- Crossing the hurdle;
25
*--- Get Truncated Neg-bin using Rejection Method;
26
DO until (y>0);
27
uu = beta * rangam( seed, alpha );
28
y = ranpoi( seed, uu );
29
END;
30
OUTPUT;
31
END;
32
END;
33
drop alpha beta kappa;
34
RUN;
6 Code Block
PROC PRINT
Explanation : Display of theoretical parameters for the Negative-Binomial model.
Copied!
title2 "Underlying True Mean and Variance";
proc print data=NB_Hurdle_Data noobs;
where j = 1;
var Mean Var;
format _all_ 8.4;
run;
1
title2 "Underlying True Mean and Variance";
2
PROC PRINTDATA=NB_Hurdle_Data noobs;
3
where j = 1;
4
var Mean Var;
5
FORMAT _all_ 8.4;
6
RUN;
7 Code Block
PROC MEANS
Explanation : Calculation of actual descriptive statistics on the simulated Negative-Binomial data.
Copied!
title2 "Estimated Mean and Variance";
proc means data=NB_Hurdle_Data n mean var maxdec=4;
var y;
run;
1
title2 "Estimated Mean and Variance";
2
PROC MEANSDATA=NB_Hurdle_Data n mean var maxdec=4;
3
var y;
4
RUN;
8 Code Block
ODS
Explanation : Closing the HTML destination.
Copied!
ods html close;
1
ods html close;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. WeAreCAS is an independent community site and is not affiliated with SAS Institute Inc.
This site uses technical and analytical cookies to improve your experience.
Read more.