The forestTrain action trains a forest model, which is an ensemble of decision trees used for classification, regression, or isolation forest tasks. This action requires a SAS Visual Data Mining and Machine Learning license. It provides options for bootstrap sampling, various splitting criteria (CHAID, GINI, etc.), pruning (C4.5 or cost-complexity), and handling missing values. It can generate SAS score code, computed variables, and save the model as an analytic store (aStore).
| Parameter | Description |
|---|---|
| alpha | Specifies the value to use for minimal cost-complexity pruning for regression trees. |
| applyRowOrder | Specifies that you wish the action use a prespecified row ordering. Requires using orderby and groupby on a preliminary table.partition call. |
| attributes | Specifies temporary attributes, such as a format, to apply to input variables. |
| binOrder | When set to True (default), the bin order is preserved for numeric variables. |
| bootstrap | Specifies the fraction of the data for the bootstrap sample. Range (0-1]. |
| casOut | Specifies the table to store the decision tree model in. |
| cfLev | Specifies the aggressiveness of tree pruning according to the C4.5 algorithm. |
| code | Requests that the action produce SAS score code. |
| codeInteractions | Requests that the action produce SAS score code to create variables encoding interactions. |
| crit | Specifies the split criterion for each tree node (e.g., GINI, CHAID, VARIANCE). |
| encodeName | Specifies whether to encode the variable names such as predicted probabilities of a binary or nominal target in the generated casout table. |
| event | Specifies the event values of the target variable for use with eventFreq. |
| eventFreq | Specifies the frequency for each corresponding event in the event parameter (useful for rare-event sampling). |
| freq | Specifies a numeric variable that contains the frequency of occurrence of each observation. |
| greedy | When set to True (default), a greedy/exhaustive search is used. False uses a fast clustering-based algorithm. |
| includeMissing | By default, observations with missing values are included. If False, they are ignored during scoring. |
| inputs | Specifies the input variables to use in the analysis. |
| isolation | Specifies training an isolation forest (default False). |
| leafSize | Specifies the minimum number of observations on each node (default 5). |
| loh | Specifies number of variables to split with using the LOH method. |
| m | Specifies the number of input variables to consider for splitting on a node. |
| maxBranch | Specifies the maximum number of children (branches) allowed for each level of the tree (default 2). |
| maxLevel | Specifies the maximum number of the tree level (default 6). |
| mergeBin | When set to True (default), merges bins where the largest value matches the lowest value of the neighbor. |
| minUseInSearch | Specifies a threshold for utilizing missing values in the split search when missing='USEINSEARCH'. |
| missing | Specifies the missing policy ('MACSMALL' or 'USEINSEARCH'). |
| modelId | Specifies the model ID variable name to use when generating SAS score code. |
| nBins | Specifies the number of bins to use for numeric variables (default 50). |
| nBinsTarget | Specifies the number of bins to use for a numerical target variable. |
| nominals | Specifies the nominal input variables to use in the analysis. |
| nominalSearch | Specifies the method for finding a split on a nominal input (e.g., handling='ENHANCED'). |
| nTree | Specifies the number of trees to create (default 50). |
| oob | When set to True, specifies that the out-of-bag error is computed. |
| prune | Specify true to use a C4.5 pruning method or minimal cost-complexity pruning. |
| quantileBin | Specifies bin boundaries at quantiles of numerical inputs instead of bins of equal width (default True). |
| rbaImp | Specifies variable importance using the random branch assignments (RBA) method. |
| sampleN | Specifies the sample size (default 100). |
| saveState | Specifies the table to store the generated aStore model. |
| seed | Specifies the seed for the random number generator. |
| table | Specifies the settings for the input table. |
| target | Specifies the target or response variable for training. |
| varImp | Specifies whether the variable importance information is generated. |
| varIntImp | Requests variable interaction importance and specifies the maximum degree of interaction. |
| vote | Specifies the vote strategy for classification ('MAJORITY' or 'PROB'). |
| weight | Specifies a numeric variable that contains the weight of each observation. |
Loads the sashelp.class dataset into the 'casuser' caslib for analysis.
| 1 | |
| 2 | PROC CAS; |
| 3 | |
| 4 | DATA casuser.class; |
| 5 | SET sashelp.class; |
| 6 | |
| 7 | RUN; |
| 8 |
Trains a forest to predict 'Sex' using 'Height' and 'Weight' with default settings.
| 1 | |
| 2 | PROC CAS; |
| 3 | decisionTree.forestTrain / TABLE={name="class", caslib="casuser"} target="Sex" inputs={"Height", "Weight"}; |
| 4 | |
| 5 | RUN; |
| 6 |
Trains a forest on 'Weight' (numeric target), requests variable importance, sets seed, creates 100 trees, and saves the model as an analytic store.
| 1 | |
| 2 | PROC CAS; |
| 3 | decisionTree.forestTrain / TABLE={name="class", caslib="casuser"} target="Weight" inputs={"Height", "Age"} nTree=100 seed=12345 varImp=TRUE oob=TRUE saveState={name="forest_astore", caslib="casuser", replace=TRUE}; |
| 4 | |
| 5 | RUN; |
| 6 |