Customer Churn Prediction and Latent Feature Extraction for Segmentation

Business Context

A telecom company wants to score new customers to predict their likelihood of churning. Additionally, they want to extract latent features from the neural network's hidden layer to use as input for a subsequent customer segmentation analysis, aiming to identify distinct behavioral profiles.

About the Set : neuralNet

Training of classical artificial neural networks.

Discover all actions of neuralNet

Data Preparation

Creates a sample customer table `new_customers` to be scored. This data simulates new sign-ups and includes typical features like tenure, contract type, and monthly charges. A pre-existing trained model named `churn_model` is assumed to be available in the caslib.

Copied!

1	DATA mycas.new_customers(promote=yes);
2	LENGTH CustomerID $10. Contract $15.;
3	INFILE DATALINES dsd;
4	INPUT CustomerID $ MonthlyCharges Tenure Contract $;
5	DATALINES;
6	CUST001,75.50,1,Month-to-month
7	CUST002,99.95,24,One year
8	CUST003,20.05,60,Two year
9	CUST004,110.0,3,Month-to-month
10	;
11	RUN;

Étapes de réalisation

Score the new customers using the pre-trained `churn_model`. This step generates the churn prediction, extracts the hidden layer node values for segmentation, and includes detailed probabilities for each outcome ('Yes'/'No').

Copied!

1	PROC CAS;
2	neuralNet.annScore /
3	TABLE={name='new_customers'},
4	modelTable={name='churn_model'},
5	casOut={name='churn_scored_features', replace=true},
6	copyVars={'CustomerID', 'MonthlyCharges', 'Tenure'},
7	listNode='HIDDEN',
8	assessOneRow=true,
9	modelId='Churn';
10	RUN;
11	QUIT;

Verify the output table structure and content. Check for the presence of original variables, the prediction column, probability columns, and hidden node activation columns.

Copied!

1	PROC CAS;
2	TABLE.columnInfo / TABLE='churn_scored_features';
3	RUN;
4	TABLE.fetch / TABLE={name='churn_scored_features', to=5};
5	RUN;
6	QUIT;

Expected Result

The output table `mycas.churn_scored_features` should contain the `CustomerID`, `MonthlyCharges`, and `Tenure`. It must also include the prediction in `Churn_PredName_`, probability columns (e.g., `_NN_P_Yes`, `_NN_P_No`), and several new numeric columns (e.g., `_NN_H_1_1`, `_NN_H_1_2`, etc.) representing the activation values from the hidden layer nodes. These hidden node values can now be used as inputs for a clustering algorithm.

Voir la documentation technique de annScore