Performance Test: Training on Large-Scale Telecom Churn Data with SGD

Business Context

A major telecommunications provider needs to build a customer churn prediction model using a dataset of several million subscribers. Due to the data volume, training efficiency is critical. The model must be robust and trained using an algorithm suitable for large-scale data, like Stochastic Gradient Descent (SGD).

About the Set : neuralNet

Training of classical artificial neural networks.

Discover all actions of neuralNet

Data Preparation

Creation of a large simulated dataset representing telecom customer profiles. The dataset includes service usage, contract details, and a binary 'Churn' target. A large number of records are generated to test performance.

Copied!

1	DATA telco_churn_large;
2	call streaminit(789);
3	DO i = 1 to 2000000;
4	CustomerID = i;
5	MonthlyCharges = 20 + rand('Uniform') * 100;
6	Tenure = int(rand('Uniform') * 72);
7	DataUsage = rand('Uniform') * 100;
8	ContractType = ceil(rand('Uniform')3); / 1=Month-to-month, 2=One year, 3=Two year */
9	Churn = 0;
10	IF (ContractType=1 and MonthlyCharges > 70 and Tenure < 12) THEN DO;
11	IF rand('Uniform') < 0.4 THEN Churn = 1;
12	END;
13	ELSE IF (ContractType=2 and Tenure < 24) THEN DO;
14	IF rand('Uniform') < 0.1 THEN Churn = 1;
15	END;
16	OUTPUT;
17	END;
18	RUN;

Étapes de réalisation

Load the large-scale churn data into a CAS table.

Copied!

1
2	PROC CASUTIL;
3	load
4	DATA=telco_churn_large casout={name='telco_churn_large', replace=true};
5	RUN;
6

Train an MLP using the SGD optimizer, which is efficient for large datasets. Use dropout for regularization and specify SGD options like learning rate and mini-batch size.

Copied!

1	PROC CAS;
2	ACTION neuralNet.annTrain /
3	TABLE={name='telco_churn_large'},
4	inputs={'MonthlyCharges', 'Tenure', 'DataUsage', 'ContractType'},
5	target='Churn',
6	nominals={'Churn', 'ContractType'},
7	hiddens={50, 25},
8	arch='MLP',
9	std='MIDRANGE',
10	dropOut=0.2,
11	nloOpts={
12	algorithm='SGD',
13	maxIters=20,
14	sgdOpt={learningRate=0.005, momentum=0.9, miniBatchSize=500}
15	},
16	seed=111,
17	code={file='churn_score_code.sas'};
18	RUN;

Expected Result

The action should complete the training process on the large table without memory issues or excessive run time. The log should show the iteration history for the SGD optimization. A SAS DATA step scoring file named 'churn_score_code.sas' should be generated in the server's file system, ready for deployment.

Voir la documentation technique de annTrain