decisionTree

forestTrain

Description

The forestTrain action trains a forest model, which is an ensemble of decision trees used for classification, regression, or isolation forest tasks. This action requires a SAS Visual Data Mining and Machine Learning license. It provides options for bootstrap sampling, various splitting criteria (CHAID, GINI, etc.), pruning (C4.5 or cost-complexity), and handling missing values. It can generate SAS score code, computed variables, and save the model as an analytic store (aStore).

Settings
ParameterDescription
alpha Specifies the value to use for minimal cost-complexity pruning for regression trees.
applyRowOrder Specifies that you wish the action use a prespecified row ordering. Requires using orderby and groupby on a preliminary table.partition call.
attributes Specifies temporary attributes, such as a format, to apply to input variables.
binOrder When set to True (default), the bin order is preserved for numeric variables.
bootstrap Specifies the fraction of the data for the bootstrap sample. Range (0-1].
casOut Specifies the table to store the decision tree model in.
cfLev Specifies the aggressiveness of tree pruning according to the C4.5 algorithm.
code Requests that the action produce SAS score code.
codeInteractions Requests that the action produce SAS score code to create variables encoding interactions.
crit Specifies the split criterion for each tree node (e.g., GINI, CHAID, VARIANCE).
encodeName Specifies whether to encode the variable names such as predicted probabilities of a binary or nominal target in the generated casout table.
event Specifies the event values of the target variable for use with eventFreq.
eventFreq Specifies the frequency for each corresponding event in the event parameter (useful for rare-event sampling).
freq Specifies a numeric variable that contains the frequency of occurrence of each observation.
greedy When set to True (default), a greedy/exhaustive search is used. False uses a fast clustering-based algorithm.
includeMissing By default, observations with missing values are included. If False, they are ignored during scoring.
inputs Specifies the input variables to use in the analysis.
isolation Specifies training an isolation forest (default False).
leafSize Specifies the minimum number of observations on each node (default 5).
loh Specifies number of variables to split with using the LOH method.
m Specifies the number of input variables to consider for splitting on a node.
maxBranch Specifies the maximum number of children (branches) allowed for each level of the tree (default 2).
maxLevel Specifies the maximum number of the tree level (default 6).
mergeBin When set to True (default), merges bins where the largest value matches the lowest value of the neighbor.
minUseInSearch Specifies a threshold for utilizing missing values in the split search when missing='USEINSEARCH'.
missing Specifies the missing policy ('MACSMALL' or 'USEINSEARCH').
modelId Specifies the model ID variable name to use when generating SAS score code.
nBins Specifies the number of bins to use for numeric variables (default 50).
nBinsTarget Specifies the number of bins to use for a numerical target variable.
nominals Specifies the nominal input variables to use in the analysis.
nominalSearch Specifies the method for finding a split on a nominal input (e.g., handling='ENHANCED').
nTree Specifies the number of trees to create (default 50).
oob When set to True, specifies that the out-of-bag error is computed.
prune Specify true to use a C4.5 pruning method or minimal cost-complexity pruning.
quantileBin Specifies bin boundaries at quantiles of numerical inputs instead of bins of equal width (default True).
rbaImp Specifies variable importance using the random branch assignments (RBA) method.
sampleN Specifies the sample size (default 100).
saveState Specifies the table to store the generated aStore model.
seed Specifies the seed for the random number generator.
table Specifies the settings for the input table.
target Specifies the target or response variable for training.
varImp Specifies whether the variable importance information is generated.
varIntImp Requests variable interaction importance and specifies the maximum degree of interaction.
vote Specifies the vote strategy for classification ('MAJORITY' or 'PROB').
weight Specifies a numeric variable that contains the weight of each observation.
Data Preparation View data prep sheet
Load Data to CAS

Loads the sashelp.class dataset into the 'casuser' caslib for analysis.

Copied!
1 
2PROC CAS;
3 
4DATA casuser.class;
5SET sashelp.class;
6 
7RUN;
8 

Examples

Trains a forest to predict 'Sex' using 'Height' and 'Weight' with default settings.

SAS® / CAS Code Code awaiting community validation
Copied!
1 
2PROC CAS;
3decisionTree.forestTrain / TABLE={name="class", caslib="casuser"} target="Sex" inputs={"Height", "Weight"};
4 
5RUN;
6 
Result :
Generates a forest model for Sex classification.

Trains a forest on 'Weight' (numeric target), requests variable importance, sets seed, creates 100 trees, and saves the model as an analytic store.

SAS® / CAS Code Code awaiting community validation
Copied!
1 
2PROC CAS;
3decisionTree.forestTrain / TABLE={name="class", caslib="casuser"} target="Weight" inputs={"Height", "Age"} nTree=100 seed=12345 varImp=TRUE oob=TRUE saveState={name="forest_astore", caslib="casuser", replace=TRUE};
4 
5RUN;
6 
Result :
Generates a regression forest, outputs variable importance, OOB error, and saves the 'forest_astore' table.

FAQ

alpha
applyRowOrder
attributes
binOrder
bootstrap
casOut
cfLev
code
codeInteractions
crit
encodeName
event
eventFreq
freq
greedy
includeMissing
inputs
isolation
leafSize
loh
m
maxBranch
maxLevel
mergeBin
minUseInSearch
missing
modelId
nBins
nBinsTarget
nominals
nominalSearch
nTree
oob
prune
quantileBin
rbaImp
sampleN
saveState
seed
table
target
varImp
varIntImp
vote
weight