Standard Case: Assessing a Marketing Propensity Model

Business Context

A retail company has built a logistic regression model to predict which customers are likely to respond to a new promotional offer. The Data Science team needs to assess the model's performance to decide on a probability cutoff for the campaign, balancing reach and cost. They want standard classification metrics like ROC, AUC, and Lift.

About the Set : percentile

Precise calculation of percentiles and quantiles.

Discover all actions of percentile

Data Preparation

Create a simulated dataset of customer profiles and model scores. 'CUST_ID' is the customer identifier, 'RESPONDED' is the actual outcome (1=yes, 0=no), and 'P_RESPONDED' is the model's predicted probability of responding.

Copied!

1	DATA casuser.marketing_scores;
2	call streaminit(123);
3	DO CUST_ID = 1 to 5000;
4	AGE = 20 + rand('integer', 50);
5	INCOME = 30000 + rand('integer', 70000);
6	IF AGE > 45 and INCOME > 60000 THEN base_prob = 0.6;
7	ELSE base_prob = 0.1;
8	P_RESPONDED = base_prob + rand('uniform')*0.3 - 0.15;
9	IF P_RESPONDED < 0 THEN P_RESPONDED = 0.01;
10	IF P_RESPONDED > 1 THEN P_RESPONDED = 0.99;
11	RESPONDED = rand('binomial', P_RESPONDED, 1);
12	OUTPUT;
13	END;
14	RUN;

Étapes de réalisation

Load the prepared data into an in-memory CAS table. This step is implicitly covered by the data prep code which creates the table directly in the casuser caslib.

Copied!

1	/*
2	Data is already in casuser.marketing_scores from the data_prep step */

Run the assess action to generate ROC, Lift, and Fit statistics. We define '1' as the event of interest and specify output tables for all metrics.

Copied!

1	PROC CAS;
2	percentile.assess
3	TABLE={name='marketing_scores', caslib='casuser'},
4	response='RESPONDED',
5	inputs={{name='P_RESPONDED'}},
6	event='1',
7	includeRoc=true,
8	includeLift=true,
9	includeFitStat=true,
10	nBins=20,
11	rocOut={name='campaign_roc', caslib='casuser', replace=true},
12	casOut={name='campaign_lift', caslib='casuser', replace=true},
13	fitStatOut={name='campaign_fitstat', caslib='casuser', replace=true};
14	QUIT;

Expected Result

The action should successfully execute and create three output tables in the 'casuser' caslib: 'campaign_roc', 'campaign_lift', and 'campaign_fitstat'. The 'campaign_fitstat' table should contain metrics like AUC, GINI, and Misclassification Rate. The other tables should contain the data points needed to plot ROC and Lift charts, with the lift table having 20 bins as specified.

Voir la documentation technique de assess