Published on :
Statistics CREATION_INTERNE

Simulation of Hurdle Poisson and Negative-Binomial Models

This code is also available in: Deutsch Español Français
Awaiting validation
This program generates two synthetic datasets of 8000 observations each to illustrate hurdle models. The first dataset simulates a zero-truncated Poisson distribution with an explicit zero probability (Pi_1). The second simulates a Negative-Binomial distribution with a similar structure. The script calculates the expected theoretical means and variances for each observation, then uses PROC MEANS to compare these theoretical values to the empirical statistics of the generated data.
Data Analysis

Type : CREATION_INTERNE


Data is entirely generated algorithmically in DATA steps (using uniform, ranpoi, rangam random functions).

1 Code Block
ODS
Explanation :
Initialization of HTML output.
Copied!
1ods html;
2 Code Block
DATA STEP Data
Explanation :
Generation of the 'Poisson_Hurdle_Data' dataset. Simulates a process where an observation is either 0 (with probability p1), or drawn from a truncated Poisson distribution (rejection method) if the hurdle is crossed.
Copied!
1DATA Poisson_Hurdle_Data;
2 n = 8000;
3 p1 = 0.4;
4 mu = 10;
5 seed = 1979;
6 
7 *--- Underlying True Mean and Variance;
8 p2 = exp(-mu);
9 p1c = 1 - p1;
10 p2c = 1 - p2;
11 Phi = p1c / p2c;
12 Mean = Phi*mu;
13 Var = Phi*mu*(mu+1)- Mean*Mean;
14 
15 DO j=1 to n;
16 u = uniform(seed);
17 IF u <= p1 THEN DO;
18 Y = 0;
19 OUTPUT;
20 END;
21 ELSE DO; *--- Crossing the hurdle;
22 *--- Get Truncated Poisson using Rejection Method;
23 DO until (y>0);
24 y = ranpoi(seed,mu);
25 END;
26 OUTPUT;
27 END;
28 END;
29 RUN;
3 Code Block
PROC PRINT
Explanation :
Display of theoretical parameters (Mean and Variance) calculated during the simulation for verification.
Copied!
1title2 "Underlying True Mean and Variance";
2 PROC PRINT DATA=Poisson_Hurdle_Data noobs;
3 where j=1;
4 var Mean Var;
5 FORMAT _all_ 8.4;
6 RUN;
4 Code Block
PROC MEANS
Explanation :
Calculation of actual descriptive statistics on the simulated data to validate the model.
Copied!
1title2 "Estimated Mean and Variance";
2 PROC MEANS DATA=Poisson_Hurdle_Data n mean var maxdec=4;
3 var y;
4 RUN;
5 Code Block
DATA STEP Data
Explanation :
Generation of the 'NB_Hurdle_Data' dataset. Simulates a Negative-Binomial hurdle model. Uses a combination of Gamma and Poisson distributions to generate the Negative-Binomial.
Copied!
1DATA NB_Hurdle_Data;
2 n = 8000;
3 p1 = 0.4;
4 mu = 10;
5 Kappa = 0.1;
6 alpha = 1 / kappa;
7 beta = kappa * mu;
8 seed = 1983;
9 
10 *--- Underlying True Mean and Variance;
11 p2 = (1/(1+kappa*mu))**(alpha);
12 p1c = 1 - p1;
13 p2c = 1 - p2;
14 Phi = p1c / p2c;
15 Mean = Phi*mu;
16 Var = Phi*mu*(1+mu+kappa*mu)- Mean*Mean;
17 
18 DO j=1 to n;
19 u = uniform(seed);
20 IF u <= p1 THEN DO;
21 Y = 0;
22 OUTPUT;
23 END;
24 ELSE DO; *--- Crossing the hurdle;
25 *--- Get Truncated Neg-bin using Rejection Method;
26 DO until (y>0);
27 uu = beta * rangam( seed, alpha );
28 y = ranpoi( seed, uu );
29 END;
30 OUTPUT;
31 END;
32 END;
33 drop alpha beta kappa;
34 RUN;
6 Code Block
PROC PRINT
Explanation :
Display of theoretical parameters for the Negative-Binomial model.
Copied!
1title2 "Underlying True Mean and Variance";
2 PROC PRINT DATA=NB_Hurdle_Data noobs;
3 where j = 1;
4 var Mean Var;
5 FORMAT _all_ 8.4;
6 RUN;
7 Code Block
PROC MEANS
Explanation :
Calculation of actual descriptive statistics on the simulated Negative-Binomial data.
Copied!
1title2 "Estimated Mean and Variance";
2 PROC MEANS DATA=NB_Hurdle_Data n mean var maxdec=4;
3 var y;
4 RUN;
8 Code Block
ODS
Explanation :
Closing the HTML destination.
Copied!
1ods html close;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.