bayesianNetClassifier bnet

Standard Case: Telecom Customer Churn Prediction with a TAN Model

Scénario de test & Cas d'usage

Business Context

A telecommunications company wants to proactively identify customers at high risk of churning. By modeling customer behavior and contract attributes, they aim to build a predictive model to score their customer base and target retention efforts effectively.
About the Set : bayesianNetClassifier

Classification using Bayesian networks.

Discover all actions of bayesianNetClassifier
Data Preparation

Creation of a customer dataset with demographic, contract, and service information. The 'HasTechSupport' variable contains some missing values to test the handling of missing data. The target variable is 'Churn'.

Copied!
1DATA casuser.telecom_churn;
2 LENGTH Contract $12. HasTechSupport $3. Churn $3.;
3 INFILE DATALINES delimiter=',';
4 INPUT CustomerID $ TenureMonths MonthlyCharges Contract $ HasTechSupport $ Churn $;
5 DATALINES;
6C001,12,55.5,Month-to-Month,Yes,No
7C002,24,80.0,One Year,No,No
8C003,5,20.0,Month-to-Month,No,Yes
9C004,55,105.2,Two Year,Yes,No
10C005,3,45.0,Month-to-Month,,Yes
11C006,36,95.0,One Year,Yes,No
12C007,1,25.0,Month-to-Month,No,Yes
13C008,68,110.5,Two Year,,No
14C009,15,75.5,One Year,No,No
15C010,8,30.0,Month-to-Month,No,Yes
16;
17RUN;

Étapes de réalisation

1
Train a Tree-Augmented Naive (TAN) Bayes model. The data is partitioned into 70% for training and 30% for validation. Missing nominal values are treated as a separate level.
Copied!
1PROC CAS;
2 bayesianNetClassifier.bnet
3 TABLE={name='telecom_churn'}
4 target='Churn'
5 inputs={'TenureMonths', 'MonthlyCharges', 'Contract', 'HasTechSupport'}
6 nominals={'Contract', 'HasTechSupport', 'Churn'}
7 partByFrac={train=0.7, validate=0.3, seed=1234}
8 structures={'TAN'}
9 missingNom='LEVEL'
10 saveState={name='churn_model_state', replace=true}
11 OUTPUT={casout={name='churn_scored_output', replace=true}, copyVars={'CustomerID', 'Churn'}}
12 outNetwork={name='churn_network_structure', replace=true};
13RUN;
2
Verify the creation of the model state table, the scored output table, and the network structure table.
Copied!
1 
2PROC CAS;
3TABLE.tableInfo / caslib='casuser', name~='churn%';
4RUN;
5 

Expected Result


The action should successfully train a TAN model. Three tables must be created: 'churn_model_state' containing the binary model for scoring, 'churn_scored_output' with the original CustomerID, actual Churn value, and the predicted churn probabilities, and 'churn_network_structure' detailing the learned dependencies. The model should handle the missing 'HasTechSupport' values by treating them as a distinct category.