builtins

gbtreeTrain

Beschreibung

Trains a gradient boosting tree. This action requires a SAS Visual Data Mining and Machine Learning license.

decisionTree.gbtreeTrain <result=results> <status=rc> /\n applyRowOrder=TRUE | FALSE,\n attributes={{format="string", formattedLength=integer, label="string", name="variable-name", nfd=integer, nfl=integer}, {...}},\n auxData="variable-name",\n binOrder=TRUE | FALSE,\n casOut={caslib="string", compress=TRUE | FALSE, indexVars={"variable-name-1" <, "variable-name-2", ...>}, label="string", lifetime=64-bit-integer, maxMemSize=64-bit-integer, memoryFormat="DVR" | "INHERIT" | "STANDARD", name="table-name", promote=TRUE | FALSE, replace=TRUE | FALSE, replication=integer, tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE", threadBlockSize=64-bit-integer, timeStamp="string", where={"string-1" <, "string-2", ...>}},\n code={casOut={caslib="string", compress=TRUE | FALSE, indexVars={"variable-name-1" <, "variable-name-2", ...>}, label="string", lifetime=64-bit-integer, maxMemSize=64-bit-integer, memoryFormat="DVR" | "INHERIT" | "STANDARD", name="table-name", promote=TRUE | FALSE, replace=TRUE | FALSE, replication=integer, tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE", threadBlockSize=64-bit-integer, timeStamp="string", where={"string-1" <, "string-2", ...>}}, comment=TRUE | FALSE, fmtWdth=integer, indentSize=integer, labelId=integer, lineSize=integer, noTrim=TRUE | FALSE, tabForm=TRUE | FALSE},\n codeInteractions={casOut={caslib="string", compress=TRUE | FALSE, indexVars={"variable-name-1" <, "variable-name-2", ...>}, label="string", lifetime=64-bit-integer, maxMemSize=64-bit-integer, memoryFormat="DVR" | "INHERIT" | "STANDARD", name="table-name", promote=TRUE | FALSE, replace=TRUE | FALSE, replication=integer, tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE", threadBlockSize=64-bit-integer, timeStamp="string", where={"string-1" <, "string-2", ...>}}, comment=TRUE | FALSE, fmtWdth=integer, indentSize=integer, labelId=integer, lineSize=integer, noTrim=TRUE | FALSE, tabForm=TRUE | FALSE},\n distribution="BINARY" | "GAUSSIAN" | "MULTINOMIAL" | "POISSON" | "TWEEDIE" | 64-bit-integer,\n earlyStop={metric="ASE" | "LOGLOSS" | "MCR", minimum=TRUE | FALSE, stagnation=64-bit-integer, threshold=double, thresholdIter=64-bit-integer, tolerance=double},\n encodeName=TRUE | FALSE,\n fcmpEvalMetric="string",\n freq="variable-name",\n greedy=TRUE | FALSE,\n includeMissing=TRUE | FALSE,\n initPred=double,\n inputs={{format="string", formattedLength=integer, label="string", name="variable-name", nfd=integer, nfl=integer}, {...}},\n lasso=double,\n leafSize=integer,\n learningRate=double,\n logLevel=integer,\n m=integer,\n maxBranch=integer,\n maxLevel=integer,\n mergeBin=TRUE | FALSE,\n minHessian=double,\n minUseInSearch=integer,\n missing="MACSMALL" | "USEINSEARCH",\n modelId="string",\n modelTable={caslib="string", computedOnDemand=TRUE | FALSE, computedVars={{format="string", formattedLength=integer, label="string", name="variable-name", nfd=integer, nfl=integer}, {...}}, computedVarsProgram="string", dataSourceOptions={key-1=any-list-or-data-type-1 <, key-2=any-list-or-data-type-2, ...>}, groupBy={{format="string", formattedLength=integer, label="string", name="variable-name", nfd=integer, nfl=integer}, {...}}, groupByMode="NOSORT" | "REDISTRIBUTE", importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}, name="table-name", orderBy={{format="string", formattedLength=integer, label="string", name="variable-name", nfd=integer, nfl=integer}, {...}}, singlePass=TRUE | FALSE, vars={{format="string", formattedLength=integer, label="string", name="variable-name", nfd=integer, nfl=integer}, {...}}, where="where-expression", whereTable={casLib="string", dataSourceOptions={adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}, importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}, name="table-name", vars={{format="string", formattedLength=integer, label="string", name="variable-name", nfd=integer, nfl=integer}, {...}}, where="where-expression"}},\n monoDec={"string-1" <, "string-2", ...>},\n monoInc={"string-1" <, "string-2", ...>},\n nBins=integer,\n nominalHandling="CLASSIC" | "ENHANCED",\n nominalSearch={handling="CLASSIC" | "ENHANCED", maxCategories=64-bit-integer, shrinkage=double, sort=64-bit-integer, sortBy="COUNT" | "TARGET"},\n nTree=integer,\n offset="variable-name",\n phi=double,\n power=double,\n quantileBin=TRUE | FALSE,\n ridge=double,\n saveState={caslib="string", compress=TRUE | FALSE, indexVars={"variable-name-1" <, "variable-name-2", ...>}, label="string", lifetime=64-bit-integer, maxMemSize=64-bit-integer, memoryFormat="DVR" | "INHERIT" | "STANDARD", name="table-name", promote=TRUE | FALSE, replace=TRUE | FALSE, replication=integer, tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE", threadBlockSize=64-bit-integer, timeStamp="string", where={"string-1" <, "string-2", ...>}},\n seed=double,\n singular=double,\n subSampleRate=double,\n table={caslib="string", computedOnDemand=TRUE | FALSE, computedVars={{format="string", formattedLength=integer, label="string", name="variable-name", nfd=integer, nfl=integer}, {...}}, computedVarsProgram="string", dataSourceOptions={key-1=any-list-or-data-type-1 <, key-2=any-list-or-data-type-2, ...>}, importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}, name="table-name", singlePass=TRUE | FALSE, vars={{format="string", formattedLength=integer, label="string", name="variable-name", nfd=integer, nfl=integer}, {...}}, where="where-expression", whereTable={casLib="string", dataSourceOptions={adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}, importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}, name="table-name", vars={{format="string", formattedLength=integer, label="string", name="variable-name", nfd=integer, nfl=integer}, {...}}, where="where-expression"}},\n target="variable-name",\n transLearnBurn=integer,\n transLearnShrink=double,\n transLearnTrim=double,\n validTable={caslib="string", computedOnDemand=TRUE | FALSE, computedVars={{format="string", formattedLength=integer, label="string", name="variable-name", nfd=integer, nfl=integer}, {...}}, computedVarsProgram="string", dataSourceOptions={key-1=any-list-or-data-type-1 <, key-2=any-list-or-data-type-2, ...>}, importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}, name="table-name", singlePass=TRUE | FALSE, vars={{format="string", formattedLength=integer, label="string", name="variable-name", nfd=integer, nfl=integer}, {...}}, where="where-expression", whereTable={casLib="string", dataSourceOptions={adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}, importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}, name="table-name", vars={{format="string", formattedLength=integer, label="string", name="variable-name", nfd=integer, nfl=integer}, {...}}, where="where-expression"}},\n varImp=TRUE | FALSE,\n varIntImp=integer,\n weight="variable-name";
Einstellungen
ParameterBeschreibung
applyRowOrder Specifies that you wish the action use a prespecified row ordering. This requires using the orderby and groupby parameters on a preliminary table.partition action call. Alias: reproducibleRowOrder. Default: FALSE.
attributes specifies temporary attributes, such as a format, to apply to input variables. For more information about specifying the attributes parameter, see the common casinvardesc parameter. Aliases: attribute, attrs, attr, varAttrs. Subparameters: format, formattedLength, label, name*, nfd, nfl.
auxData specifies a variable for transfer learning that indicates which observations are from an auxiliary source. A value of 0 indicates a traditional training observation. Other values indicate auxiliary data.
binOrder by default, the bin order is preserved for numeric variables. When set to False, the bin order is ignored for numeric variables. Default: TRUE.
casOut specifies the table to store the decision tree model in. When not specified, a random name is generated. For more information about specifying the casOut parameter, see the common casouttable parameter. Subparameters: caslib, compress, indexVars, label, lifetime, maxMemSize, memoryFormat, name, promote, replace, replication, tableRedistUpPolicy, threadBlockSize, timeStamp, where.
code requests that the action produce SAS score code. Specify additional parameters. For more information about specifying the code parameter, see the common codegen parameter. Subparameters: casOut (see casouttable), comment, fmtWdth, indentSize, labelId, lineSize, noTrim, tabForm.
codeInteractions requests that the action produce SAS score code to create variables encoding interactions. You must also request variable interactions of at least degree 2. The viicodegen value can be one or more of the following: Subparameters: casOut (see casouttable), comment, fmtWdth, indentSize, labelId, lineSize, noTrim, tabForm.
distribution specifies the distribution in gradient boosting tree. Default: BINARY. Values: BINARY (useful for binary classification), GAUSSIAN (useful for regression trees), MULTINOMIAL (useful for multinomial distribution for classification with more than two classes), POISSON (useful for poisson distribution), TWEEDIE (useful for tweedie distribution).
earlyStop specifies early stopping criteria. Subparameters: metric (ASE, LOGLOSS, MCR), minimum (Alias: smallest, Default: FALSE), stagnation (Default: 0, Minimum value: 0), threshold (Default: 0, Minimum value: 0), thresholdIter (Default: 0, Minimum value: 0), tolerance (Default: 0, Minimum value: 0).
encodeName specifies whether to encode the variable names such as predicted probabilities of a binary or nominal target in the generated casout table. The predicted probabilities are named with the prefix P_ instead of _DT_P_. Default: FALSE.
fcmpEvalMetric specifies the FCMP evaluation metric for gradient boosting tree models. Alias: fcmpEvalFunc.
freq specifies a numeric variable that contains the frequency of occurrence of each observation.
greedy by default, a greedy search or exhaustive search is used to determine the best split for each variable of each tree node. When set to False, a fast and efficient algorithm that is based on clustering is applied. Setting this parameter to False is recommended for variables with high cardinality. Default: TRUE.
includeMissing by default, observations with missing values are included. When set to False, observations with missing values for the variables used in the tree model are ignored when scoring. Default: TRUE.
initPred Default: 0.
inputs specifies the input variables to use in the analysis. For more information about specifying the inputs parameter, see the common casinvardesc parameter. Alias: input. Subparameters: format, formattedLength, label, name*, nfd, nfl.
lasso specifies the L1 norm regularization on prediction. The value must be greater than or equal to zero. Default: 0. Minimum value: 0.
leafSize specifies the minimum number of observations on each node. Default: 5. Minimum value: 1.
learningRate specifies the learning rate of each tree. Default: 0.1. Range: (0–1].
logLevel Default: 0. Minimum value: 0.
m specifies the number of input variables to consider for splitting on a node. The variables are selected at random from the input variables for each tree. By default, forest uses the square root of the number of input variables is used, rounded up to the nearest integer. For gradient boosting, the number of input variables is used. Minimum value: 1.
maxBranch specifies the maximum number of children (branches) allowed for each level of the tree. Default: 2. Minimum value: 1.
maxLevel specifies the maximum number of the tree level. Default: 5. Minimum value: 1.
mergeBin by default, when the largest value in one bin matches the lowest value in a neighboring bin, the values are merged into the lower bin. When set to False, the action does not try to merge bins. Default: TRUE.
minHessian Default: 0. Minimum value: 0.
minUseInSearch specifies a threshold for utilizing missing values in the split search when the missing parameter is set to USEINSEARCH. If the number of observations in which the splitting variable has missing values in a node is greater than or equal to the specified value, then the action initiates the USEINSEARCH policy. Otherwise, the missing values are assigned to a popular branch. Default: 1.
missing specifies the missing policy to handle missing values. Default: USEINSEARCH. Values: MACSMALL (treats missing values for numeric variables as the smallest machine value and for nominal variables as a separate level), USEINSEARCH (incorporates missing values in the calculation of the worth of a splitting rule).
modelId specifies the model ID variable name to use when generating SAS score code. By default, DT_ is prefixed to the target variable name.
modelTable specifies the table containing the model. For more information about specifying the modelTable parameter, see the common castable parameter. Subparameters: caslib, computedOnDemand, computedVars, computedVarsProgram, dataSourceOptions, groupBy, groupByMode, importOptions, name*, orderBy, singlePass, vars, where, whereTable.
monoDec specifies interval inputs whose prediction should not increase when the input value increases. Perfect compliance is not guaranteed. Aliases: monotoneDecrease, monotoneDec, Dec.
monoInc specifies interval inputs whose prediction should not decrease when the input value increases. Perfect compliance is not guaranteed. Aliases: monotoneIncrease, monotoneInc, Inc.
nBins specifies the number of bins to use for numeric variables in the calculation of the decision tree. Default: 50. Minimum value: 1.
nominalHandling Values: CLASSIC, ENHANCED.
nominals specifies the nominal input variables to use in the analysis. For more information about specifying the nominals parameter, see the common casinvardesc parameter. Alias: nominal. Subparameters: format, formattedLength, label, name*, nfd, nfl.
nominalSearch specifies the method for finding a split on a nominal input. Alias: nomSearch. Subparameters: handling (CLASSIC, ENHANCED), maxCategories (Aliases: maxCats, maxLevels, maxValues, cluster, minCardCluster. Default: 128, Minimum value: 0), shrinkage (Default: 10, Minimum value: 0), sort (Alias: minCardSort. Default: 10, Minimum value: 0), sortBy (COUNT, TARGET).
nTree specifies the number of trees to create. Alias: nTrees. Default: 50. Minimum value: 1.
offset specifies an offset variable to use with distribution=POISSON or TWEEDIE.
phi this value is useful for the power parameter in tweedie distribution. Alias: scale. Minimum value (exclusive): 0.
power this value is useful for the power parameter in tweedie distribution. Default: 1.5. Range: (1, 2).
quantileBin specifies bin boundaries at quantiles of numerical inputs instead of bins of equal width. Aliases: qbin, qtbin. Default: TRUE.
ridge specifies the L2 norm regularization on prediction. The value must be greater than or equal to zero. Default: 1. Minimum value: 0.
saveState specifies the table to store the generated aStore model. For more information about specifying the saveState parameter, see the common casouttable parameter. Subparameters: caslib, compress, indexVars, label, lifetime, maxMemSize, memoryFormat, name, promote, replace, replication, tableRedistUpPolicy, threadBlockSize, timeStamp, where.
seed specifies the seed for the random number generator. By default, the random number stream is based on the computer clock. Negative values also result in random number streams based on the computer clock. If you want a reproducible random number sequence between runs, specify a value that is greater than zero. Default: 0. Range: 0–MACINT.
singular specifies a small value to avoid zero in division. Default: 1E-12. Minimum value: 0.
subSampleRate specifies the fraction of the data to use for building each tree. Aliases: subsample, samplingRate. Default: 0.5. Range: (0–1].
table specifies the settings for an input table. Long form: table={name="table-name"}. Shortcut form: table="table-name". The castable value can be one or more of the following: Subparameters: caslib, computedOnDemand, computedVars, computedVarsProgram, dataSourceOptions, importOptions, name*, singlePass, vars, where, whereTable.
target specifies the target or response variable for training. If the variable is numeric, but not specified in the nominal= parameter and nbinstarget= is not specified, then a regression tree is trained.
transLearnBurn during transfer learning specifies the number of trees to create before down-weighting of auxiliary observations begins. Default: 0. Minimum value: 0.
transLearnShrink during transfer learning specifies how much to down-weight unproductive auxiliary data. Default: 0.9. Range: 0–1.
transLearnTrim during transfer learning specifies the fraction of the distribution of gradients on the training data beyond which auxiliary observations are down-weighted. Default: 0.01. Range: (0–0.5].
validTable specifies the settings for an input table. Long form: validTable={name="table-name"}. Shortcut form: validTable="table-name". The castable value can be one or more of the following: Subparameters: caslib, computedOnDemand, computedVars, computedVarsProgram, dataSourceOptions, importOptions, name*, singlePass, vars, where, whereTable.
varImp specifies whether the variable importance information is generated. The importance value is determined by the total Gini reduction. Default: FALSE.
varIntImp requests variable interaction importance and specifies the maximum degree of interaction. Default: 1. Range: 0–3.
weight specifies a numeric variable that contains the weight of each observation.

Beispiele

FAQ

What is the gbtreeTrain Action?
What is applyRowOrder?
What is attributes?
What is auxData?
What is binOrder?
What is casOut?
What is code?
What is codeInteractions?
What is distribution?
What is earlyStop?
What is encodeName?
What is fcmpEvalMetric?
What is freq?
What is greedy?
What is includeMissing?
What is initPred?
What are inputs?
What is lasso?
What is leafSize?
What is learningRate?
What is logLevel?
What is m?
What is maxBranch?
What is maxLevel?
What is mergeBin?
What is minHessian?
What is minUseInSearch?
What is missing?
What is modelId?
What is modelTable?
What is monoDec?
What is monoInc?
What is nBins?
What is nominalHandling?
What are nominals?
What is nominalSearch?
What is nTree?
What is offset?
What is phi?
What is power?
What is quantileBin?
What is ridge?
What is saveState?
What is seed?
What is singular?
What is subSampleRate?
What is table?
What is target?
What is transLearnBurn?
What is transLearnShrink?
What is transLearnTrim?
What is validTable?
What is varImp?
What is varIntImp?
What is weight?