neuralNet annTrain

Performance Test: Training on Large-Scale Telecom Churn Data with SGD

Scénario de test & Cas d'usage

Business Context

A major telecommunications provider needs to build a customer churn prediction model using a dataset of several million subscribers. Due to the data volume, training efficiency is critical. The model must be robust and trained using an algorithm suitable for large-scale data, like Stochastic Gradient Descent (SGD).
About the Set : neuralNet

Training of classical artificial neural networks.

Discover all actions of neuralNet
Data Preparation

Creation of a large simulated dataset representing telecom customer profiles. The dataset includes service usage, contract details, and a binary 'Churn' target. A large number of records are generated to test performance.

Copied!
1DATA telco_churn_large;
2 call streaminit(789);
3 DO i = 1 to 2000000;
4 CustomerID = i;
5 MonthlyCharges = 20 + rand('Uniform') * 100;
6 Tenure = int(rand('Uniform') * 72);
7 DataUsage = rand('Uniform') * 100;
8 ContractType = ceil(rand('Uniform')*3); /* 1=Month-to-month, 2=One year, 3=Two year */
9 Churn = 0;
10 IF (ContractType=1 and MonthlyCharges > 70 and Tenure < 12) THEN DO;
11 IF rand('Uniform') < 0.4 THEN Churn = 1;
12 END;
13 ELSE IF (ContractType=2 and Tenure < 24) THEN DO;
14 IF rand('Uniform') < 0.1 THEN Churn = 1;
15 END;
16 OUTPUT;
17 END;
18RUN;

Étapes de réalisation

1
Load the large-scale churn data into a CAS table.
Copied!
1 
2PROC CASUTIL;
3load
4DATA=telco_churn_large casout={name='telco_churn_large', replace=true};
5RUN;
6 
2
Train an MLP using the SGD optimizer, which is efficient for large datasets. Use dropout for regularization and specify SGD options like learning rate and mini-batch size.
Copied!
1PROC CAS;
2 ACTION neuralNet.annTrain /
3 TABLE={name='telco_churn_large'},
4 inputs={'MonthlyCharges', 'Tenure', 'DataUsage', 'ContractType'},
5 target='Churn',
6 nominals={'Churn', 'ContractType'},
7 hiddens={50, 25},
8 arch='MLP',
9 std='MIDRANGE',
10 dropOut=0.2,
11 nloOpts={
12 algorithm='SGD',
13 maxIters=20,
14 sgdOpt={learningRate=0.005, momentum=0.9, miniBatchSize=500}
15 },
16 seed=111,
17 code={file='churn_score_code.sas'};
18RUN;

Expected Result


The action should complete the training process on the large table without memory issues or excessive run time. The log should show the iteration history for the SGD optimization. A SAS DATA step scoring file named 'churn_score_code.sas' should be generated in the server's file system, ready for deployment.