Standard Churn Prediction with Gradient Boosting

Business Context

A retail bank wants to estimate the generalization error of a Gradient Boosting model designed to predict customer churn. They need to ensure the model performs consistently across different subsets of their customer base before deploying it to production.

Data Preparation

Simulation of 1,000 banking customers with demographic data and a churn binary target.

Copied!

1
2	DATA casuser.churn_data;
3	call streaminit(12345);
4	DO i = 1 to 1000;
5	age = rand('integer', 18, 80);
6	balance = rand('uniform', 0, 50000);
7	tenure = rand('integer', 1, 20);
8	IF rand('uniform') < 0.15 THEN churn = 1;
9	ELSE churn = 0;
10	OUTPUT;
11	END;
12
13	RUN;
14

Étapes de réalisation

Load the data into CAS memory.

Copied!

1	PROC CAS;
2	TABLE.loadTable RESULT=r STATUS=s /
3	caslib="casuser" path="churn_data.sashdat" casout={name="churn_data", replace=true};
4	QUIT;

Execute crossValidate using Gradient Boosting with 5 folds and a fixed seed for reproducibility.

Copied!

1	PROC CAS;
2	mlTools.crossValidate RESULT=r /
3	TABLE={name="churn_data"}
4	modelType="GRADBOOST"
5	kFolds=5
6	seed=999
7	casOut={name="cv_churn_results", replace=TRUE}
8	trainOptions={
9	target="churn",
10	inputs={"age", "balance", "tenure"},
11	nominals={"churn"},
12	ntree=50
13	};
14	QUIT;

Expected Result

The action completes successfully. A scored output table 'cv_churn_results' is created containing predictions. The log shows the progress of 5 distinct folds. The model assessment metrics are returned, providing an estimate of the Gradient Boosting model's performance.

Voir la documentation technique de crossValidate