countreg countregFitModel

High-Volume Zero-Inflated Model for E-Commerce

Scénario de test & Cas d'usage

Business Context

A large e-commerce platform analyzes user sessions to predict the number of items purchased. Since 95% of browsing sessions result in zero purchases, a Zero-Inflated Negative Binomial (ZINB) model is required to separate 'browsers' from 'buyers' on a large dataset.
Data Preparation

Simulation of high-volume web traffic (50,000 rows) with excessive zeros controlled by a 'IsMember' variable.

Copied!
1 
2DATA mycas.web_traffic;
3call streaminit(999);
4DO session_id = 1 to 50000;
5SessionDuration = rand('EXPONENTIAL') * 5;
6IsMember = rand('BERNOULLI', 0.2);
7IF rand('UNIFORM') < 0.8 THEN Purchases = 0;
8ELSE Purchases = rand('POISSON', 1 + 0.1 * SessionDuration + 0.5 * IsMember);
9OUTPUT;
10END;
11 
12RUN;
13 

Étapes de réalisation

1
Fit a ZINB model on the large dataset, specifying both count and zero-inflation models, and request output statistics.
Copied!
1 
2PROC CAS;
3countreg.countregFitModel / TABLE={name='web_traffic'} model={depVars={{name='Purchases'}}, effects={{vars={'SessionDuration', 'IsMember'}}}, modeloptions={modeltype='ZINB'}} zeromodel={effects={{vars={'SessionDuration'}}}, link='LOGISTIC'} OUTPUT={casout={name='scored_traffic', replace=true}, pred='PredCounts', probzero='ProbZero'};
4 
5RUN;
6 
2
Validate the generation of the output table with predictions.
Copied!
1 
2PROC CAS;
3TABLE.tableInfo / name='scored_traffic';
4 
5RUN;
6 

Expected Result


The ZINB model handles the high volume of zeros without convergence errors. The 'scored_traffic' table is created containing 'PredCounts' (expected purchases) and 'ProbZero' (probability of no purchase) for all 50,000 observations.