decisionTree

forestTrain

Description

The forestTrain action trains a forest model, which is an ensemble of decision trees used for classification, regression, or isolation forest tasks. This action requires a SAS Visual Data Mining and Machine Learning license. It provides options for bootstrap sampling, various splitting criteria (CHAID, GINI, etc.), pruning (C4.5 or cost-complexity), and handling missing values. It can generate SAS score code, computed variables, and save the model as an analytic store (aStore).

Settings
ParameterDescription
alphaSpecifies the value to use for minimal cost-complexity pruning for regression trees.
applyRowOrderSpecifies that you wish the action use a prespecified row ordering. Requires using orderby and groupby on a preliminary table.partition call.
attributesSpecifies temporary attributes, such as a format, to apply to input variables.
binOrderWhen set to True (default), the bin order is preserved for numeric variables.
bootstrapSpecifies the fraction of the data for the bootstrap sample. Range (0-1].
casOutSpecifies the table to store the decision tree model in.
cfLevSpecifies the aggressiveness of tree pruning according to the C4.5 algorithm.
codeRequests that the action produce SAS score code.
codeInteractionsRequests that the action produce SAS score code to create variables encoding interactions.
critSpecifies the split criterion for each tree node (e.g., GINI, CHAID, VARIANCE).
encodeNameSpecifies whether to encode the variable names such as predicted probabilities of a binary or nominal target in the generated casout table.
eventSpecifies the event values of the target variable for use with eventFreq.
eventFreqSpecifies the frequency for each corresponding event in the event parameter (useful for rare-event sampling).
freqSpecifies a numeric variable that contains the frequency of occurrence of each observation.
greedyWhen set to True (default), a greedy/exhaustive search is used. False uses a fast clustering-based algorithm.
includeMissingBy default, observations with missing values are included. If False, they are ignored during scoring.
inputsSpecifies the input variables to use in the analysis.
isolationSpecifies training an isolation forest (default False).
leafSizeSpecifies the minimum number of observations on each node (default 5).
lohSpecifies number of variables to split with using the LOH method.
mSpecifies the number of input variables to consider for splitting on a node.
maxBranchSpecifies the maximum number of children (branches) allowed for each level of the tree (default 2).
maxLevelSpecifies the maximum number of the tree level (default 6).
mergeBinWhen set to True (default), merges bins where the largest value matches the lowest value of the neighbor.
minUseInSearchSpecifies a threshold for utilizing missing values in the split search when missing='USEINSEARCH'.
missingSpecifies the missing policy ('MACSMALL' or 'USEINSEARCH').
modelIdSpecifies the model ID variable name to use when generating SAS score code.
nBinsSpecifies the number of bins to use for numeric variables (default 50).
nBinsTargetSpecifies the number of bins to use for a numerical target variable.
nominalsSpecifies the nominal input variables to use in the analysis.
nominalSearchSpecifies the method for finding a split on a nominal input (e.g., handling='ENHANCED').
nTreeSpecifies the number of trees to create (default 50).
oobWhen set to True, specifies that the out-of-bag error is computed.
pruneSpecify true to use a C4.5 pruning method or minimal cost-complexity pruning.
quantileBinSpecifies bin boundaries at quantiles of numerical inputs instead of bins of equal width (default True).
rbaImpSpecifies variable importance using the random branch assignments (RBA) method.
sampleNSpecifies the sample size (default 100).
saveStateSpecifies the table to store the generated aStore model.
seedSpecifies the seed for the random number generator.
tableSpecifies the settings for the input table.
targetSpecifies the target or response variable for training.
varImpSpecifies whether the variable importance information is generated.
varIntImpRequests variable interaction importance and specifies the maximum degree of interaction.
voteSpecifies the vote strategy for classification ('MAJORITY' or 'PROB').
weightSpecifies a numeric variable that contains the weight of each observation.
Data Preparation View data prep sheet
Load Data to CAS

Loads the sashelp.class dataset into the 'casuser' caslib for analysis.

Copied!
1 
2PROC CAS;
3 
4DATA casuser.class;
5SET sashelp.class;
6 
7RUN;
8 

Examples

Trains a forest to predict 'Sex' using 'Height' and 'Weight' with default settings.

SAS® / CAS Code Code awaiting community validation
Copied!
1 
2PROC CAS;
3decisionTree.forestTrain / TABLE={name="class", caslib="casuser"} target="Sex" inputs={"Height", "Weight"};
4 
5RUN;
6 
Result :
Generates a forest model for Sex classification.

Trains a forest on 'Weight' (numeric target), requests variable importance, sets seed, creates 100 trees, and saves the model as an analytic store.

SAS® / CAS Code Code awaiting community validation
Copied!
1 
2PROC CAS;
3decisionTree.forestTrain / TABLE={name="class", caslib="casuser"} target="Weight" inputs={"Height", "Age"} nTree=100 seed=12345 varImp=TRUE oob=TRUE saveState={name="forest_astore", caslib="casuser", replace=TRUE};
4 
5RUN;
6 
Result :
Generates a regression forest, outputs variable importance, OOB error, and saves the 'forest_astore' table.

FAQ

alpha
applyRowOrder
attributes
binOrder
bootstrap
casOut
cfLev
code
codeInteractions
crit
encodeName
event
eventFreq
freq
greedy
includeMissing
inputs
isolation
leafSize
loh
m
maxBranch
maxLevel
mergeBin
minUseInSearch
missing
modelId
nBins
nBinsTarget
nominals
nominalSearch
nTree
oob
prune
quantileBin
rbaImp
sampleN
saveState
seed
table
target
varImp
varIntImp
vote
weight