mlTools crossValidate

Standard Churn Prediction with Gradient Boosting

Scénario de test & Cas d'usage

Business Context

A retail bank wants to estimate the generalization error of a Gradient Boosting model designed to predict customer churn. They need to ensure the model performs consistently across different subsets of their customer base before deploying it to production.
Data Preparation

Simulation of 1,000 banking customers with demographic data and a churn binary target.

Copied!
1 
2DATA casuser.churn_data;
3call streaminit(12345);
4DO i = 1 to 1000;
5age = rand('integer', 18, 80);
6balance = rand('uniform', 0, 50000);
7tenure = rand('integer', 1, 20);
8IF rand('uniform') < 0.15 THEN churn = 1;
9ELSE churn = 0;
10OUTPUT;
11END;
12 
13RUN;
14 

Étapes de réalisation

1
Load the data into CAS memory.
Copied!
1PROC CAS;
2 TABLE.loadTable RESULT=r STATUS=s /
3 caslib="casuser" path="churn_data.sashdat" casout={name="churn_data", replace=true};
4QUIT;
2
Execute crossValidate using Gradient Boosting with 5 folds and a fixed seed for reproducibility.
Copied!
1PROC CAS;
2 mlTools.crossValidate RESULT=r /
3 TABLE={name="churn_data"}
4 modelType="GRADBOOST"
5 kFolds=5
6 seed=999
7 casOut={name="cv_churn_results", replace=TRUE}
8 trainOptions={
9 target="churn",
10 inputs={"age", "balance", "tenure"},
11 nominals={"churn"},
12 ntree=50
13 };
14QUIT;

Expected Result


The action completes successfully. A scored output table 'cv_churn_results' is created containing predictions. The log shows the progress of 5 distinct folds. The model assessment metrics are returned, providing an estimate of the Gradient Boosting model's performance.