builtins

gbtreeTrain

Beschreibung

Trains a gradient boosting tree. This action requires a SAS Visual Data Mining and Machine Learning license.

decisionTree.gbtreeTrain <result=results> <status=rc> /\n applyRowOrder=TRUE | FALSE,\n attributes={{format="string", formattedLength=integer, label="string", name="variable-name", nfd=integer, nfl=integer}, {...}},\n auxData="variable-name",\n binOrder=TRUE | FALSE,\n casOut={caslib="string", compress=TRUE | FALSE, indexVars={"variable-name-1" <, "variable-name-2", ...>}, label="string", lifetime=64-bit-integer, maxMemSize=64-bit-integer, memoryFormat="DVR" | "INHERIT" | "STANDARD", name="table-name", promote=TRUE | FALSE, replace=TRUE | FALSE, replication=integer, tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE", threadBlockSize=64-bit-integer, timeStamp="string", where={"string-1" <, "string-2", ...>}},\n code={casOut={caslib="string", compress=TRUE | FALSE, indexVars={"variable-name-1" <, "variable-name-2", ...>}, label="string", lifetime=64-bit-integer, maxMemSize=64-bit-integer, memoryFormat="DVR" | "INHERIT" | "STANDARD", name="table-name", promote=TRUE | FALSE, replace=TRUE | FALSE, replication=integer, tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE", threadBlockSize=64-bit-integer, timeStamp="string", where={"string-1" <, "string-2", ...>}}, comment=TRUE | FALSE, fmtWdth=integer, indentSize=integer, labelId=integer, lineSize=integer, noTrim=TRUE | FALSE, tabForm=TRUE | FALSE},\n codeInteractions={casOut={caslib="string", compress=TRUE | FALSE, indexVars={"variable-name-1" <, "variable-name-2", ...>}, label="string", lifetime=64-bit-integer, maxMemSize=64-bit-integer, memoryFormat="DVR" | "INHERIT" | "STANDARD", name="table-name", promote=TRUE | FALSE, replace=TRUE | FALSE, replication=integer, tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE", threadBlockSize=64-bit-integer, timeStamp="string", where={"string-1" <, "string-2", ...>}}, comment=TRUE | FALSE, fmtWdth=integer, indentSize=integer, labelId=integer, lineSize=integer, noTrim=TRUE | FALSE, tabForm=TRUE | FALSE},\n distribution="BINARY" | "GAUSSIAN" | "MULTINOMIAL" | "POISSON" | "TWEEDIE" | 64-bit-integer,\n earlyStop={metric="ASE" | "LOGLOSS" | "MCR", minimum=TRUE | FALSE, stagnation=64-bit-integer, threshold=double, thresholdIter=64-bit-integer, tolerance=double},\n encodeName=TRUE | FALSE,\n fcmpEvalMetric="string",\n freq="variable-name",\n greedy=TRUE | FALSE,\n includeMissing=TRUE | FALSE,\n initPred=double,\n inputs={{format="string", formattedLength=integer, label="string", name="variable-name", nfd=integer, nfl=integer}, {...}},\n lasso=double,\n leafSize=integer,\n learningRate=double,\n logLevel=integer,\n m=integer,\n maxBranch=integer,\n maxLevel=integer,\n mergeBin=TRUE | FALSE,\n minHessian=double,\n minUseInSearch=integer,\n missing="MACSMALL" | "USEINSEARCH",\n modelId="string",\n modelTable={caslib="string", computedOnDemand=TRUE | FALSE, computedVars={{format="string", formattedLength=integer, label="string", name="variable-name", nfd=integer, nfl=integer}, {...}}, computedVarsProgram="string", dataSourceOptions={key-1=any-list-or-data-type-1 <, key-2=any-list-or-data-type-2, ...>}, groupBy={{format="string", formattedLength=integer, label="string", name="variable-name", nfd=integer, nfl=integer}, {...}}, groupByMode="NOSORT" | "REDISTRIBUTE", importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}, name="table-name", orderBy={{format="string", formattedLength=integer, label="string", name="variable-name", nfd=integer, nfl=integer}, {...}}, singlePass=TRUE | FALSE, vars={{format="string", formattedLength=integer, label="string", name="variable-name", nfd=integer, nfl=integer}, {...}}, where="where-expression", whereTable={casLib="string", dataSourceOptions={adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}, importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}, name="table-name", vars={{format="string", formattedLength=integer, label="string", name="variable-name", nfd=integer, nfl=integer}, {...}}, where="where-expression"}},\n monoDec={"string-1" <, "string-2", ...>},\n monoInc={"string-1" <, "string-2", ...>},\n nBins=integer,\n nominalHandling="CLASSIC" | "ENHANCED",\n nominalSearch={handling="CLASSIC" | "ENHANCED", maxCategories=64-bit-integer, shrinkage=double, sort=64-bit-integer, sortBy="COUNT" | "TARGET"},\n nTree=integer,\n offset="variable-name",\n phi=double,\n power=double,\n quantileBin=TRUE | FALSE,\n ridge=double,\n saveState={caslib="string", compress=TRUE | FALSE, indexVars={"variable-name-1" <, "variable-name-2", ...>}, label="string", lifetime=64-bit-integer, maxMemSize=64-bit-integer, memoryFormat="DVR" | "INHERIT" | "STANDARD", name="table-name", promote=TRUE | FALSE, replace=TRUE | FALSE, replication=integer, tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE", threadBlockSize=64-bit-integer, timeStamp="string", where={"string-1" <, "string-2", ...>}},\n seed=double,\n singular=double,\n subSampleRate=double,\n table={caslib="string", computedOnDemand=TRUE | FALSE, computedVars={{format="string", formattedLength=integer, label="string", name="variable-name", nfd=integer, nfl=integer}, {...}}, computedVarsProgram="string", dataSourceOptions={key-1=any-list-or-data-type-1 <, key-2=any-list-or-data-type-2, ...>}, importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}, name="table-name", singlePass=TRUE | FALSE, vars={{format="string", formattedLength=integer, label="string", name="variable-name", nfd=integer, nfl=integer}, {...}}, where="where-expression", whereTable={casLib="string", dataSourceOptions={adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}, importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}, name="table-name", vars={{format="string", formattedLength=integer, label="string", name="variable-name", nfd=integer, nfl=integer}, {...}}, where="where-expression"}},\n target="variable-name",\n transLearnBurn=integer,\n transLearnShrink=double,\n transLearnTrim=double,\n validTable={caslib="string", computedOnDemand=TRUE | FALSE, computedVars={{format="string", formattedLength=integer, label="string", name="variable-name", nfd=integer, nfl=integer}, {...}}, computedVarsProgram="string", dataSourceOptions={key-1=any-list-or-data-type-1 <, key-2=any-list-or-data-type-2, ...>}, importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}, name="table-name", singlePass=TRUE | FALSE, vars={{format="string", formattedLength=integer, label="string", name="variable-name", nfd=integer, nfl=integer}, {...}}, where="where-expression", whereTable={casLib="string", dataSourceOptions={adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}, importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}, name="table-name", vars={{format="string", formattedLength=integer, label="string", name="variable-name", nfd=integer, nfl=integer}, {...}}, where="where-expression"}},\n varImp=TRUE | FALSE,\n varIntImp=integer,\n weight="variable-name";
Einstellungen
ParameterBeschreibung
applyRowOrderSpecifies that you wish the action use a prespecified row ordering. This requires using the orderby and groupby parameters on a preliminary table.partition action call. Alias: reproducibleRowOrder. Default: FALSE.
attributesspecifies temporary attributes, such as a format, to apply to input variables. For more information about specifying the attributes parameter, see the common casinvardesc parameter. Aliases: attribute, attrs, attr, varAttrs. Subparameters: format, formattedLength, label, name*, nfd, nfl.
auxDataspecifies a variable for transfer learning that indicates which observations are from an auxiliary source. A value of 0 indicates a traditional training observation. Other values indicate auxiliary data.
binOrderby default, the bin order is preserved for numeric variables. When set to False, the bin order is ignored for numeric variables. Default: TRUE.
casOutspecifies the table to store the decision tree model in. When not specified, a random name is generated. For more information about specifying the casOut parameter, see the common casouttable parameter. Subparameters: caslib, compress, indexVars, label, lifetime, maxMemSize, memoryFormat, name, promote, replace, replication, tableRedistUpPolicy, threadBlockSize, timeStamp, where.
coderequests that the action produce SAS score code. Specify additional parameters. For more information about specifying the code parameter, see the common codegen parameter. Subparameters: casOut (see casouttable), comment, fmtWdth, indentSize, labelId, lineSize, noTrim, tabForm.
codeInteractionsrequests that the action produce SAS score code to create variables encoding interactions. You must also request variable interactions of at least degree 2. The viicodegen value can be one or more of the following: Subparameters: casOut (see casouttable), comment, fmtWdth, indentSize, labelId, lineSize, noTrim, tabForm.
distributionspecifies the distribution in gradient boosting tree. Default: BINARY. Values: BINARY (useful for binary classification), GAUSSIAN (useful for regression trees), MULTINOMIAL (useful for multinomial distribution for classification with more than two classes), POISSON (useful for poisson distribution), TWEEDIE (useful for tweedie distribution).
earlyStopspecifies early stopping criteria. Subparameters: metric (ASE, LOGLOSS, MCR), minimum (Alias: smallest, Default: FALSE), stagnation (Default: 0, Minimum value: 0), threshold (Default: 0, Minimum value: 0), thresholdIter (Default: 0, Minimum value: 0), tolerance (Default: 0, Minimum value: 0).
encodeNamespecifies whether to encode the variable names such as predicted probabilities of a binary or nominal target in the generated casout table. The predicted probabilities are named with the prefix P_ instead of _DT_P_. Default: FALSE.
fcmpEvalMetricspecifies the FCMP evaluation metric for gradient boosting tree models. Alias: fcmpEvalFunc.
freqspecifies a numeric variable that contains the frequency of occurrence of each observation.
greedyby default, a greedy search or exhaustive search is used to determine the best split for each variable of each tree node. When set to False, a fast and efficient algorithm that is based on clustering is applied. Setting this parameter to False is recommended for variables with high cardinality. Default: TRUE.
includeMissingby default, observations with missing values are included. When set to False, observations with missing values for the variables used in the tree model are ignored when scoring. Default: TRUE.
initPredDefault: 0.
inputsspecifies the input variables to use in the analysis. For more information about specifying the inputs parameter, see the common casinvardesc parameter. Alias: input. Subparameters: format, formattedLength, label, name*, nfd, nfl.
lassospecifies the L1 norm regularization on prediction. The value must be greater than or equal to zero. Default: 0. Minimum value: 0.
leafSizespecifies the minimum number of observations on each node. Default: 5. Minimum value: 1.
learningRatespecifies the learning rate of each tree. Default: 0.1. Range: (0–1].
logLevelDefault: 0. Minimum value: 0.
mspecifies the number of input variables to consider for splitting on a node. The variables are selected at random from the input variables for each tree. By default, forest uses the square root of the number of input variables is used, rounded up to the nearest integer. For gradient boosting, the number of input variables is used. Minimum value: 1.
maxBranchspecifies the maximum number of children (branches) allowed for each level of the tree. Default: 2. Minimum value: 1.
maxLevelspecifies the maximum number of the tree level. Default: 5. Minimum value: 1.
mergeBinby default, when the largest value in one bin matches the lowest value in a neighboring bin, the values are merged into the lower bin. When set to False, the action does not try to merge bins. Default: TRUE.
minHessianDefault: 0. Minimum value: 0.
minUseInSearchspecifies a threshold for utilizing missing values in the split search when the missing parameter is set to USEINSEARCH. If the number of observations in which the splitting variable has missing values in a node is greater than or equal to the specified value, then the action initiates the USEINSEARCH policy. Otherwise, the missing values are assigned to a popular branch. Default: 1.
missingspecifies the missing policy to handle missing values. Default: USEINSEARCH. Values: MACSMALL (treats missing values for numeric variables as the smallest machine value and for nominal variables as a separate level), USEINSEARCH (incorporates missing values in the calculation of the worth of a splitting rule).
modelIdspecifies the model ID variable name to use when generating SAS score code. By default, DT_ is prefixed to the target variable name.
modelTablespecifies the table containing the model. For more information about specifying the modelTable parameter, see the common castable parameter. Subparameters: caslib, computedOnDemand, computedVars, computedVarsProgram, dataSourceOptions, groupBy, groupByMode, importOptions, name*, orderBy, singlePass, vars, where, whereTable.
monoDecspecifies interval inputs whose prediction should not increase when the input value increases. Perfect compliance is not guaranteed. Aliases: monotoneDecrease, monotoneDec, Dec.
monoIncspecifies interval inputs whose prediction should not decrease when the input value increases. Perfect compliance is not guaranteed. Aliases: monotoneIncrease, monotoneInc, Inc.
nBinsspecifies the number of bins to use for numeric variables in the calculation of the decision tree. Default: 50. Minimum value: 1.
nominalHandlingValues: CLASSIC, ENHANCED.
nominalsspecifies the nominal input variables to use in the analysis. For more information about specifying the nominals parameter, see the common casinvardesc parameter. Alias: nominal. Subparameters: format, formattedLength, label, name*, nfd, nfl.
nominalSearchspecifies the method for finding a split on a nominal input. Alias: nomSearch. Subparameters: handling (CLASSIC, ENHANCED), maxCategories (Aliases: maxCats, maxLevels, maxValues, cluster, minCardCluster. Default: 128, Minimum value: 0), shrinkage (Default: 10, Minimum value: 0), sort (Alias: minCardSort. Default: 10, Minimum value: 0), sortBy (COUNT, TARGET).
nTreespecifies the number of trees to create. Alias: nTrees. Default: 50. Minimum value: 1.
offsetspecifies an offset variable to use with distribution=POISSON or TWEEDIE.
phithis value is useful for the power parameter in tweedie distribution. Alias: scale. Minimum value (exclusive): 0.
powerthis value is useful for the power parameter in tweedie distribution. Default: 1.5. Range: (1, 2).
quantileBinspecifies bin boundaries at quantiles of numerical inputs instead of bins of equal width. Aliases: qbin, qtbin. Default: TRUE.
ridgespecifies the L2 norm regularization on prediction. The value must be greater than or equal to zero. Default: 1. Minimum value: 0.
saveStatespecifies the table to store the generated aStore model. For more information about specifying the saveState parameter, see the common casouttable parameter. Subparameters: caslib, compress, indexVars, label, lifetime, maxMemSize, memoryFormat, name, promote, replace, replication, tableRedistUpPolicy, threadBlockSize, timeStamp, where.
seedspecifies the seed for the random number generator. By default, the random number stream is based on the computer clock. Negative values also result in random number streams based on the computer clock. If you want a reproducible random number sequence between runs, specify a value that is greater than zero. Default: 0. Range: 0–MACINT.
singularspecifies a small value to avoid zero in division. Default: 1E-12. Minimum value: 0.
subSampleRatespecifies the fraction of the data to use for building each tree. Aliases: subsample, samplingRate. Default: 0.5. Range: (0–1].
tablespecifies the settings for an input table. Long form: table={name="table-name"}. Shortcut form: table="table-name". The castable value can be one or more of the following: Subparameters: caslib, computedOnDemand, computedVars, computedVarsProgram, dataSourceOptions, importOptions, name*, singlePass, vars, where, whereTable.
targetspecifies the target or response variable for training. If the variable is numeric, but not specified in the nominal= parameter and nbinstarget= is not specified, then a regression tree is trained.
transLearnBurnduring transfer learning specifies the number of trees to create before down-weighting of auxiliary observations begins. Default: 0. Minimum value: 0.
transLearnShrinkduring transfer learning specifies how much to down-weight unproductive auxiliary data. Default: 0.9. Range: 0–1.
transLearnTrimduring transfer learning specifies the fraction of the distribution of gradients on the training data beyond which auxiliary observations are down-weighted. Default: 0.01. Range: (0–0.5].
validTablespecifies the settings for an input table. Long form: validTable={name="table-name"}. Shortcut form: validTable="table-name". The castable value can be one or more of the following: Subparameters: caslib, computedOnDemand, computedVars, computedVarsProgram, dataSourceOptions, importOptions, name*, singlePass, vars, where, whereTable.
varImpspecifies whether the variable importance information is generated. The importance value is determined by the total Gini reduction. Default: FALSE.
varIntImprequests variable interaction importance and specifies the maximum degree of interaction. Default: 1. Range: 0–3.
weightspecifies a numeric variable that contains the weight of each observation.

Beispiele

FAQ

What is the gbtreeTrain Action?
What is applyRowOrder?
What is attributes?
What is auxData?
What is binOrder?
What is casOut?
What is code?
What is codeInteractions?
What is distribution?
What is earlyStop?
What is encodeName?
What is fcmpEvalMetric?
What is freq?
What is greedy?
What is includeMissing?
What is initPred?
What are inputs?
What is lasso?
What is leafSize?
What is learningRate?
What is logLevel?
What is m?
What is maxBranch?
What is maxLevel?
What is mergeBin?
What is minHessian?
What is minUseInSearch?
What is missing?
What is modelId?
What is modelTable?
What is monoDec?
What is monoInc?
What is nBins?
What is nominalHandling?
What are nominals?
What is nominalSearch?
What is nTree?
What is offset?
What is phi?
What is power?
What is quantileBin?
What is ridge?
What is saveState?
What is seed?
What is singular?
What is subSampleRate?
What is table?
What is target?
What is transLearnBurn?
What is transLearnShrink?
What is transLearnTrim?
What is validTable?
What is varImp?
What is varIntImp?
What is weight?