neuralNet

annTrain

L'essentiel
At a glance
Within the SAS Viya ecosystem, the annTrain action operates as the primary engine for building and refining artificial neural networks. As part of the neuralNet action set, this tool empowers analytics professionals to tackle complex modeling challenges by leveraging architectures like Multi-Layer Perceptrons (MLP). It goes beyond simple training by offering a suite of advanced controls—including dropout techniques and solvers like SGD—to rigorously prevent overfitting and enhance model stability. Whether your goal is regression or classification, this utility streamlines the transition from training to deployment via auto-generated scoring code. The following section compiles frequently asked questions to assist you in optimizing these configurations and troubleshooting common implementation scenario

Description

The `annTrain` action, part of the `neuralNet` action set, is used to train an artificial neural network (ANN) in SAS Viya. This process involves adjusting the network's weights based on a given dataset to minimize prediction errors. The action supports various architectures like Multi-Layer Perceptrons (MLP), Generalized Linear Models (GLIM), and direct connection models. It offers extensive customization options, including different activation functions, optimization algorithms (like LBFGS and SGD), and data standardization methods, making it a versatile tool for building predictive models.

neuralNet.annTrain { acts={"EXP", "IDENTITY", "LOGISTIC", "RECTIFIER", "SIN", "SOFTPLUS", "TANH"}, applyRowOrder=TRUE | FALSE, arch="DIRECT" | "GLIM" | "MLP", attributes={{...}, ...}, bias=double, casOut={...}, code={...}, combs={"ADD", "LINEAR", "RADIAL"}, delta=double, dropOut=double, dropOutInput=double, errorFunc="ENTROPY" | "GAMMA" | "NORMAL" | "POISSON", freq="variable-name", fullWeights=TRUE | FALSE, hiddens={64-bit-integer-1, ...}, includeBias=TRUE | FALSE, inputs={{...}, ...}, inversePriors=TRUE | FALSE, listNode="ALL" | "HIDDEN" | "INPUT" | "OUTPUT", missing="MAX" | "MEAN" | "MIN" | "NONE", modelId="string", modelTable={...}, nAnns=64-bit-integer, nloOpts={...}, nominals={{...}, ...}, nTries=64-bit-integer, randDist="CAUCHY" | "MSRA" | "NORMAL" | "UNIFORM" | "XAVIER", resume=TRUE | FALSE, samplingRate=double, saveState={...}, scaleInit=64-bit-integer, seed=double, std="MIDRANGE" | "NONE" | "STD", step=double, t=double, table={...}, target="variable-name", targetAct="EXP" | "IDENTITY" | "LOGISTIC" | "SIN" | "SOFTMAX" | "TANH", targetComb="ADD" | "LINEAR" | "RADIAL", targetMissing="MAX" | "MEAN" | "MIN" | "NONE", targetStd="MIDRANGE" | "NONE" | "STD", validTable={...}, weight="variable-name" };
Settings
ParameterDescription
acts Specifies the activation function for the neurons on each hidden layer.
applyRowOrder Specifies that the action should use a prespecified row ordering.
arch Specifies the network architecture to be trained (MLP, GLIM, or DIRECT).
attributes Specifies temporary attributes, such as a format, to apply to input variables.
bias Specifies a fixed bias value for all hidden and output neurons, which will not be optimized.
casOut Specifies the output table for the trained model.
code Requests that the action produce SAS score code for deployment.
combs Specifies the combination function for the neurons on each hidden layer.
delta Specifies the annealing parameter for simulated annealing (SA) global optimization.
dropOut Specifies the dropout ratio for the hidden layers, valid only with SGD optimization and linear combinations.
dropOutInput Specifies the dropout ratio for the input layers, valid only with SGD optimization and linear combinations.
errorFunc Specifies the error function to train the network (e.g., ENTROPY, NORMAL).
freq Specifies a numeric variable that contains the frequency of occurrence of each observation.
fullWeights Generates the full weight model for LBFGS optimization.
hiddens Specifies the number of hidden neurons for each hidden layer in the model.
includeBias When set to False, bias parameters are not included for the hidden and output units.
inputs Specifies the input variables to use in the analysis.
inversePriors Calculates the weight for prediction error based on the inverse of class frequencies.
listNode Specifies which nodes (input, hidden, output, or all) to include in the scoring output table.
missing Specifies how to impute missing values for input or target variables.
modelId Specifies a model ID variable name to be included in the generated DATA step scoring code.
modelTable Specifies the table containing a pre-trained model whose weights are used to initialize the network.
nAnns Specifies the number of networks to select from multiple tries, based on the smallest error.
nloOpts Specifies the nonlinear optimization options.
nominals Specifies the nominal input and target variables to use in the analysis.
nTries Specifies the number of training attempts with random initial weights.
randDist Specifies the distribution for randomly generating initial network connection weights.
resume Resumes a training optimization using weights from a previous training session.
samplingRate Specifies the fraction of the data to use for training the neural network.
saveState Specifies the table in which to save the model state for future scoring.
scaleInit Specifies how to scale the initial weights.
seed Specifies the random number seed for initializing network weights.
std Specifies the standardization method to use on the interval variables.
step Specifies a step size for weight perturbations during Monte Carlo or simulated annealing.
t Specifies the artificial temperature parameter for Monte Carlo or simulated annealing.
table Specifies the input table containing the training data.
target Specifies the target or response variable for training.
targetAct Specifies the activation function for the neurons on the output layer.
targetComb Specifies the combination function for the neurons on the target output nodes.
targetMissing Specifies how to impute missing values for the target variable.
targetStd Specifies the standardization method to use on the target variable.
validTable Specifies the table with validation data for early stopping.
weight Specifies a variable to weight the prediction errors for each observation during training.
Data Preparation View data prep sheet
Data Creation

This example uses the `HMEQ` dataset from the `SAMPSIO` library, which contains information about home equity loans. The goal is to predict loan default. The data is loaded into a CAS table named `my_hmeq`.

Copied!
1 
2DATA my_hmeq;
3SET sampsio.hmeq;
4 
5RUN;
6 
7PROC CASUTIL;
8load
9DATA=my_hmeq casout='my_hmeq' replace;
10 
11RUN;
12 

Examples

This example trains a simple Multi-Layer Perceptron (MLP) with one hidden layer of 10 neurons to predict the binary target `BAD` using several interval inputs.

SAS® / CAS Code Code awaiting community validation
Copied!
1PROC CAS;
2 ACTION neuralNet.annTrain /
3 TABLE={name='my_hmeq'},
4 inputs={'LOAN', 'MORTDUE', 'VALUE', 'YOJ', 'DEROG', 'DELINQ', 'CLAGE', 'NINQ', 'CLNO', 'DEBTINC'},
5 target='BAD',
6 hiddens={10},
7 arch='MLP',
8 nominals={'BAD'},
9 nloOpts={maxIters=50, algorithm='LBFGS'},
10 saveState={name='ann_model', replace=true};
11RUN;
Result :
The action trains the neural network and saves the model weights and state to a CAS table named 'ann_model'. The results will include model information, optimization details, and fit statistics.

This example demonstrates a more complex training scenario. It partitions the data into training and validation sets. It then trains an MLP with two hidden layers (20 and 15 neurons), uses the RECTIFIER activation function, and employs the Stochastic Gradient Descent (SGD) optimizer with a specific learning rate and momentum. Early stopping is enabled by referencing the validation data.

SAS® / CAS Code Code awaiting community validation
Copied!
1PROC CAS;
2 partition.partition /
3 TABLE={name='my_hmeq'},
4 partInd={name='_partInd_', replace=true},
5 sampling={method='STRATIFIED', vars={'BAD'}, partprop={train=0.7, valid=0.3}};
6RUN;
7 
8 ACTION neuralNet.annTrain /
9 TABLE={name='my_hmeq', where='_partInd_=1'},
10 validTable={name='my_hmeq', where='_partInd_=2'},
11 inputs={'LOAN', 'MORTDUE', 'VALUE', 'YOJ', 'DEROG', 'DELINQ', 'CLAGE', 'NINQ', 'CLNO', 'DEBTINC'},
12 target='BAD',
13 hiddens={20, 15},
14 acts={'RECTIFIER'},
15 arch='MLP',
16 nominals={'BAD'},
17 std='STD',
18 nloOpts={
19 algorithm='SGD',
20 maxIters=100,
21 sgdOpt={learningRate=0.01, momentum=0.5, miniBatchSize=50},
22 validate={frequency=5, stagnation=10}
23 },
24 seed=12345,
25 saveState={name='ann_model_sgd', replace=true};
26RUN;
Result :
The action trains a more complex neural network using the training partition and uses the validation partition to monitor performance and stop training early if the validation error stops improving. The final model is saved to 'ann_model_sgd'.

FAQ

What is the primary purpose of the annTrain action in SAS Viya?
What types of network architectures can be trained using the annTrain action?
How can I define the structure of the hidden layers in my neural network?
Which optimization algorithms are available for training the network?
How does the annTrain action handle missing values in the training data?
Is it possible to use a validation dataset to prevent overfitting during training?
What activation functions can be used for the hidden and target layers?
How can I save the state of my trained model for later use?

Associated Scenarios

Use Case
Standard Case: Predicting Industrial Machine Failure with an MLP

An industrial manufacturing company wants to implement a predictive maintenance program. The goal is to train a neural network to predict imminent machine failure based on real-...

Use Case
Performance Test: Training on Large-Scale Telecom Churn Data with SGD

A major telecommunications provider needs to build a customer churn prediction model using a dataset of several million subscribers. Due to the data volume, training efficiency ...

Use Case
Edge Case: Handling Missing Values and Imbalanced Classes in Clinical Data

A research organization is analyzing clinical trial data to predict patient response to a new treatment. The dataset is small, contains numerous missing values from incomplete l...