Trains a gradient boosting tree. This action requires a SAS Visual Data Mining and Machine Learning license.
| Parameter | Beschreibung |
|---|---|
| applyRowOrder | Specifies that you wish the action use a prespecified row ordering. This requires using the orderby and groupby parameters on a preliminary table.partition action call. Alias: reproducibleRowOrder. Default: FALSE. |
| attributes | specifies temporary attributes, such as a format, to apply to input variables. For more information about specifying the attributes parameter, see the common casinvardesc parameter. Aliases: attribute, attrs, attr, varAttrs. Subparameters: format, formattedLength, label, name*, nfd, nfl. |
| auxData | specifies a variable for transfer learning that indicates which observations are from an auxiliary source. A value of 0 indicates a traditional training observation. Other values indicate auxiliary data. |
| binOrder | by default, the bin order is preserved for numeric variables. When set to False, the bin order is ignored for numeric variables. Default: TRUE. |
| casOut | specifies the table to store the decision tree model in. When not specified, a random name is generated. For more information about specifying the casOut parameter, see the common casouttable parameter. Subparameters: caslib, compress, indexVars, label, lifetime, maxMemSize, memoryFormat, name, promote, replace, replication, tableRedistUpPolicy, threadBlockSize, timeStamp, where. |
| code | requests that the action produce SAS score code. Specify additional parameters. For more information about specifying the code parameter, see the common codegen parameter. Subparameters: casOut (see casouttable), comment, fmtWdth, indentSize, labelId, lineSize, noTrim, tabForm. |
| codeInteractions | requests that the action produce SAS score code to create variables encoding interactions. You must also request variable interactions of at least degree 2. The viicodegen value can be one or more of the following: Subparameters: casOut (see casouttable), comment, fmtWdth, indentSize, labelId, lineSize, noTrim, tabForm. |
| distribution | specifies the distribution in gradient boosting tree. Default: BINARY. Values: BINARY (useful for binary classification), GAUSSIAN (useful for regression trees), MULTINOMIAL (useful for multinomial distribution for classification with more than two classes), POISSON (useful for poisson distribution), TWEEDIE (useful for tweedie distribution). |
| earlyStop | specifies early stopping criteria. Subparameters: metric (ASE, LOGLOSS, MCR), minimum (Alias: smallest, Default: FALSE), stagnation (Default: 0, Minimum value: 0), threshold (Default: 0, Minimum value: 0), thresholdIter (Default: 0, Minimum value: 0), tolerance (Default: 0, Minimum value: 0). |
| encodeName | specifies whether to encode the variable names such as predicted probabilities of a binary or nominal target in the generated casout table. The predicted probabilities are named with the prefix P_ instead of _DT_P_. Default: FALSE. |
| fcmpEvalMetric | specifies the FCMP evaluation metric for gradient boosting tree models. Alias: fcmpEvalFunc. |
| freq | specifies a numeric variable that contains the frequency of occurrence of each observation. |
| greedy | by default, a greedy search or exhaustive search is used to determine the best split for each variable of each tree node. When set to False, a fast and efficient algorithm that is based on clustering is applied. Setting this parameter to False is recommended for variables with high cardinality. Default: TRUE. |
| includeMissing | by default, observations with missing values are included. When set to False, observations with missing values for the variables used in the tree model are ignored when scoring. Default: TRUE. |
| initPred | Default: 0. |
| inputs | specifies the input variables to use in the analysis. For more information about specifying the inputs parameter, see the common casinvardesc parameter. Alias: input. Subparameters: format, formattedLength, label, name*, nfd, nfl. |
| lasso | specifies the L1 norm regularization on prediction. The value must be greater than or equal to zero. Default: 0. Minimum value: 0. |
| leafSize | specifies the minimum number of observations on each node. Default: 5. Minimum value: 1. |
| learningRate | specifies the learning rate of each tree. Default: 0.1. Range: (0–1]. |
| logLevel | Default: 0. Minimum value: 0. |
| m | specifies the number of input variables to consider for splitting on a node. The variables are selected at random from the input variables for each tree. By default, forest uses the square root of the number of input variables is used, rounded up to the nearest integer. For gradient boosting, the number of input variables is used. Minimum value: 1. |
| maxBranch | specifies the maximum number of children (branches) allowed for each level of the tree. Default: 2. Minimum value: 1. |
| maxLevel | specifies the maximum number of the tree level. Default: 5. Minimum value: 1. |
| mergeBin | by default, when the largest value in one bin matches the lowest value in a neighboring bin, the values are merged into the lower bin. When set to False, the action does not try to merge bins. Default: TRUE. |
| minHessian | Default: 0. Minimum value: 0. |
| minUseInSearch | specifies a threshold for utilizing missing values in the split search when the missing parameter is set to USEINSEARCH. If the number of observations in which the splitting variable has missing values in a node is greater than or equal to the specified value, then the action initiates the USEINSEARCH policy. Otherwise, the missing values are assigned to a popular branch. Default: 1. |
| missing | specifies the missing policy to handle missing values. Default: USEINSEARCH. Values: MACSMALL (treats missing values for numeric variables as the smallest machine value and for nominal variables as a separate level), USEINSEARCH (incorporates missing values in the calculation of the worth of a splitting rule). |
| modelId | specifies the model ID variable name to use when generating SAS score code. By default, DT_ is prefixed to the target variable name. |
| modelTable | specifies the table containing the model. For more information about specifying the modelTable parameter, see the common castable parameter. Subparameters: caslib, computedOnDemand, computedVars, computedVarsProgram, dataSourceOptions, groupBy, groupByMode, importOptions, name*, orderBy, singlePass, vars, where, whereTable. |
| monoDec | specifies interval inputs whose prediction should not increase when the input value increases. Perfect compliance is not guaranteed. Aliases: monotoneDecrease, monotoneDec, Dec. |
| monoInc | specifies interval inputs whose prediction should not decrease when the input value increases. Perfect compliance is not guaranteed. Aliases: monotoneIncrease, monotoneInc, Inc. |
| nBins | specifies the number of bins to use for numeric variables in the calculation of the decision tree. Default: 50. Minimum value: 1. |
| nominalHandling | Values: CLASSIC, ENHANCED. |
| nominals | specifies the nominal input variables to use in the analysis. For more information about specifying the nominals parameter, see the common casinvardesc parameter. Alias: nominal. Subparameters: format, formattedLength, label, name*, nfd, nfl. |
| nominalSearch | specifies the method for finding a split on a nominal input. Alias: nomSearch. Subparameters: handling (CLASSIC, ENHANCED), maxCategories (Aliases: maxCats, maxLevels, maxValues, cluster, minCardCluster. Default: 128, Minimum value: 0), shrinkage (Default: 10, Minimum value: 0), sort (Alias: minCardSort. Default: 10, Minimum value: 0), sortBy (COUNT, TARGET). |
| nTree | specifies the number of trees to create. Alias: nTrees. Default: 50. Minimum value: 1. |
| offset | specifies an offset variable to use with distribution=POISSON or TWEEDIE. |
| phi | this value is useful for the power parameter in tweedie distribution. Alias: scale. Minimum value (exclusive): 0. |
| power | this value is useful for the power parameter in tweedie distribution. Default: 1.5. Range: (1, 2). |
| quantileBin | specifies bin boundaries at quantiles of numerical inputs instead of bins of equal width. Aliases: qbin, qtbin. Default: TRUE. |
| ridge | specifies the L2 norm regularization on prediction. The value must be greater than or equal to zero. Default: 1. Minimum value: 0. |
| saveState | specifies the table to store the generated aStore model. For more information about specifying the saveState parameter, see the common casouttable parameter. Subparameters: caslib, compress, indexVars, label, lifetime, maxMemSize, memoryFormat, name, promote, replace, replication, tableRedistUpPolicy, threadBlockSize, timeStamp, where. |
| seed | specifies the seed for the random number generator. By default, the random number stream is based on the computer clock. Negative values also result in random number streams based on the computer clock. If you want a reproducible random number sequence between runs, specify a value that is greater than zero. Default: 0. Range: 0–MACINT. |
| singular | specifies a small value to avoid zero in division. Default: 1E-12. Minimum value: 0. |
| subSampleRate | specifies the fraction of the data to use for building each tree. Aliases: subsample, samplingRate. Default: 0.5. Range: (0–1]. |
| table | specifies the settings for an input table. Long form: table={name="table-name"}. Shortcut form: table="table-name". The castable value can be one or more of the following: Subparameters: caslib, computedOnDemand, computedVars, computedVarsProgram, dataSourceOptions, importOptions, name*, singlePass, vars, where, whereTable. |
| target | specifies the target or response variable for training. If the variable is numeric, but not specified in the nominal= parameter and nbinstarget= is not specified, then a regression tree is trained. |
| transLearnBurn | during transfer learning specifies the number of trees to create before down-weighting of auxiliary observations begins. Default: 0. Minimum value: 0. |
| transLearnShrink | during transfer learning specifies how much to down-weight unproductive auxiliary data. Default: 0.9. Range: 0–1. |
| transLearnTrim | during transfer learning specifies the fraction of the distribution of gradients on the training data beyond which auxiliary observations are down-weighted. Default: 0.01. Range: (0–0.5]. |
| validTable | specifies the settings for an input table. Long form: validTable={name="table-name"}. Shortcut form: validTable="table-name". The castable value can be one or more of the following: Subparameters: caslib, computedOnDemand, computedVars, computedVarsProgram, dataSourceOptions, importOptions, name*, singlePass, vars, where, whereTable. |
| varImp | specifies whether the variable importance information is generated. The importance value is determined by the total Gini reduction. Default: FALSE. |
| varIntImp | requests variable interaction importance and specifies the maximum degree of interaction. Default: 1. Range: 0–3. |
| weight | specifies a numeric variable that contains the weight of each observation. |