dataPreprocess

impute

Description

Performs data matrix (variable) imputation.

dataPreprocess.impute <result=results> <status=rc> / casOut={caslib="string", compress=TRUE | FALSE, indexVars={"variable-name-1" <, "variable-name-2", ...>}, label="string", lifetime=64-bit-integer, maxMemSize=64-bit-integer, memoryFormat="DVR" | "INHERIT" | "STANDARD", name="table-name", promote=TRUE | FALSE, replace=TRUE | FALSE, replication=integer, tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE", threadBlockSize=64-bit-integer, timeStamp="string", where={"string-1" <, "string-2", ...>}}, casOutImputeInformation={caslib="string", compress=TRUE | FALSE, indexVars={"variable-name-1" <, "variable-name-2", ...>}, label="string", lifetime=64-bit-integer, maxMemSize=64-bit-integer, memoryFormat="DVR" | "INHERIT" | "STANDARD", name="table-name", promote=TRUE | FALSE, replace=TRUE | FALSE, replication=integer, tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE", threadBlockSize=64-bit-integer, timeStamp="string", where={"string-1" <, "string-2", ...>}}, code={casOut={caslib="string", compress=TRUE | FALSE, indexVars={"variable-name-1" <, "variable-name-2", ...>}, label="string", lifetime=64-bit-integer, maxMemSize=64-bit-integer, memoryFormat="DVR" | "INHERIT" | "STANDARD", name="table-name", promote=TRUE | FALSE, replace=TRUE | FALSE, replication=integer, tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE", threadBlockSize=64-bit-integer, timeStamp="string", where={"string-1" <, "string-2", ...>}}, comment=TRUE | FALSE, fmtWdth=integer, indentSize=integer, labelId=integer, lineSize=integer, noTrim=TRUE | FALSE, tabForm=TRUE | FALSE}, copyAllVars=TRUE | FALSE, copyVars={"variable-name-1" <, "variable-name-2", ...>}, distinctCountLimit=integer, forceMissingCount=TRUE | FALSE, freq="variable-name", fuzzyCompare=double, includeInputVars=TRUE | FALSE, includeMissingGroup=TRUE | FALSE, inputs={{format="string", formattedLength=integer, label="string", name="variable-name", nfd=integer, nfl=integer} <, {...}>}, maxRandom=double, methodInterval="MAX" | "MEAN" | "MEDIAN" | "MIDRANGE" | "MIN" | "RANDOM" | "VALUE", methodNominal="MODE" | "VALUE", minRandom=double, nNominalVars=integer, nominalVarsIndices={integer-1 <, integer-2, ...>}, outputTableOptions={forceTableReturn=TRUE | FALSE, tableNames={"string-1" <, "string-2", ...>}}, outVarsNamePrefix="string", outVarsNameSuffix="string", percentileDefinition=integer, percentileMaxIterations=integer, percentileTolerance=double, sasVarNameLength=TRUE | FALSE, seed=integer, table={caslib="string", computedOnDemand=TRUE | FALSE, computedVars={{format="string", formattedLength=integer, label="string", name="variable-name", nfd=integer, nfl=integer} <, {...}>}, computedVarsProgram="string", dataSourceOptions={key-1=any-list-or-data-type-1 <, key-2=any-list-or-data-type-2, ...>}, groupBy={{format="string", formattedLength=integer, label="string", name="variable-name", nfd=integer, nfl=integer} <, {...}>}, groupByMode="NOSORT" | "REDISTRIBUTE", importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}, name="table-name", orderBy={{format="string", formattedLength=integer, label="string", name="variable-name", nfd=integer, nfl=integer} <, {...}>}, singlePass=TRUE | FALSE, vars={{format="string", formattedLength=integer, label="string", name="variable-name", nfd=integer, nfl=integer} <, {...}>}, where="where-expression", whereTable={casLib="string", dataSourceOptions={adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}, importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}, name="table-name", vars={{format="string", formattedLength=integer, label="string", name="variable-name", nfd=integer, nfl=integer} <, {...}>}, where="where-expression"}}, valuesInterval={double-1 <, double-2, ...>}, valuesNominal={"string-1" <, "string-2", ...>}, weight="variable-name" ;
Settings
ParameterDescription
casOutscores the input table and saves the scoring results as a table. For more information about specifying the casOut parameter, see the common casouttable parameter.
casOutImputeInformationspecifies the settings for an output table that includes information about the results of the impute action. For more information about specifying the casOutImputeInformation parameter, see the common casouttable parameter. Alias: casOutImputeInfo
codespecifies the settings for generating SAS DATA step scoring code. For more information about specifying the code parameter, see the common codegen parameter.
copyAllVarswhen set to True, all the variables from the input table are copied to the scored output table. Alias: allIdVars. Default: FALSE
copyVarsspecifies the names of variables in the input table to use for identifying scored observations in the output table. The specified variables are copied to the output table.
distinctCountLimitspecifies the distinct count limit.
forceMissingCountwhen set to True, techForCont is VALUE or RANDOM, and casOut is not specified, the server returns the row count and missing count. This is done even if it requires an additional pass through the data. Leaving it False is efficient for large tables. Default: FALSE
freqspecifies the frequency variable. Alias: frequency
fuzzyComparespecifies the fuzzy comparison threshold that is used to determine distinctness of numeric values. Alias: precision. Range: 0–1E-05
includeInputVarswhen set to True, the analysis variables from the input table that are specified in the vars parameter are copied to the output table. Default: FALSE
includeMissingGroupwhen set to True, missing values are allowed as group-by keys. Default: FALSE
inputsspecifies the variables to use for the analysis. You can specify a subset of the variables from the input table. For more information about specifying the inputs parameter, see the common casinvardesc parameter. Alias: vars
maxRandomspecifies the maximum random number to generate.
methodIntervalspecifies the imputation technique for interval variables. Be aware that you can specify numeric variables as nominal using the nomVarsIndices parameter. Alias: methodContinuous. Default: MEAN. Options: MAX (replaces missing values with the maximum value), MEAN (replaces missing values with the mean), MEDIAN (replaces missing values with the median), MIDRANGE (replaces missing values with the mean of the maximum value and minimum value), MIN (replaces missing values with the minimum value), RANDOM (replaces missing values with uniform random numbers), VALUE (replaces missing values with the values specified in the valuesInterval and valuesNominal parameters).
methodNominalspecifies the imputation technique for nominal variables. Options: MODE (replaces missing values with the mode), VALUE (replaces missing values with the values specified in the valuesInterval and valuesNominal parameters).
minRandomspecifies the minimum random number to generate.
nNominalVarsspecifies to treat the last nNomVars variables as nominal if you do not provide a value for the nomVarsIndices parameter. Minimum value (exclusive): 0
nominalVarsIndicesspecifies the indices of the variables to treat as nominal variables.
outputTableOptionsspecifies options for result tables. You can specify which result tables the server returns and how group-by results are handled. Alias: tblOpts. Subparameters: forceTableReturn (when set to True, result tables are returned to the client even if the output is also saved as an output table. Default: FALSE), tableNames (specifies the names of result tables to generate. By default, all result tables are returned. Alias: outputTables)
outVarsNamePrefixspecifies a prefix to apply to the names of output variables. If a variable named 'x' results in a new variable, the generated name is <prefix>_x_<suffix>. You can use this parameter and the suffix parameter at the same time. Default: "imp"
outVarsNameSuffixspecifies a suffix to apply to the names of output variables. If a variable named 'x' results in a new variable, the generated name is <prefix>_x_<suffix>. You can use this parameter and the prefix parameter at the same time.
percentileDefinitionspecifies the percentile definition to use. The definitions are numbered 1 to 6. The default value is 6. Alias: pctlDef. Default: 6. Range: 1–6
percentileMaxIterationsspecifies the maximum number of iterations for percentile computation. Alias: pctlMaxIters
percentileTolerancespecifies the tolerance for percentile computation. Alias: pctlEpsilon. Default: 1E-05
sasVarNameLengthwhen set to True, the lengths of the names of the output variables are constrained to be less than or equal 32 characters. Default: FALSE
seedspecifies a seed value. The seed is used to generate random values. Default: 0
tablespecifies the table name, caslib, and other common parameters. For more information about specifying the table parameter, see the common castable parameter.
valuesIntervalspecifies a list of double values for imputation for the interval variables. Aliases: valuesContinuous, valuesNumeric
valuesNominalspecifies a list of string values for imputation for the nominal variables. Alias: valuesNonNumeric
weightspecifies the weight variable.

Examples

FAQ

casOut={casouttable}
casOutImputeInformation={casouttable}
code={codegen}
copyAllVars=TRUE | FALSE
copyVars={"variable-name-1" <, "variable-name-2", ...>}
distinctCountLimit=integer
forceMissingCount=TRUE | FALSE
freq="variable-name"
fuzzyCompare=double
includeInputVars=TRUE | FALSE
includeMissingGroup=TRUE | FALSE
inputs={{casinvardesc-1} <, {casinvardesc-2}, ...>}
maxRandom=double
methodInterval="MAX" | "MEAN" | "MEDIAN" | "MIDRANGE" | "MIN" | "RANDOM" | "VALUE"
methodNominal="MODE" | "VALUE"
minRandom=double
nNominalVars=integer
nominalVarsIndices={integer-1 <, integer-2, ...>}
outputTableOptions={outputTableOptions}
outVarsNamePrefix="string"
outVarsNameSuffix="string"
percentileDefinition=integer
percentileMaxIterations=integer
percentileTolerance=double
sasVarNameLength=TRUE | FALSE
seed=integer
table={castable}
valuesInterval={double-1 <, double-2, ...>}
valuesNominal={"string-1" <, "string-2", ...>}
weight="variable-name"
What is casOut?
What is casOutImputeInformation?
What is code?
What is copyAllVars?
What are copyVars?
What is distinctCountLimit?
What is forceMissingCount?
What is freq?
What is fuzzyCompare?
What is includeInputVars?
What is includeMissingGroup?
What are inputs?
What is maxRandom?
What is methodInterval?
What is methodNominal?
What is minRandom?
What is nNominalVars?
What are nominalVarsIndices?
What are outputTableOptions?
What is outVarsNamePrefix?
What is outVarsNameSuffix?
What is percentileDefinition?
What is percentileMaxIterations?
What is percentileTolerance?
What is sasVarNameLength?
What is seed?
What is table?
What are valuesInterval?
What are valuesNominal?
What is weight?