percentile

assess

Description

The `assess` action in the Percentile action set is a powerful tool for evaluating and comparing the performance of predictive models in SAS Viya. It is particularly useful in machine learning workflows to understand how well a model's predictions align with actual outcomes. This action can handle both classification (binary/nominal targets) and regression (interval targets) models. For classification, it computes essential metrics like ROC (Receiver Operating Characteristic) curves, lift charts, and various fit statistics (e.g., accuracy, misclassification rate). For regression, it calculates error metrics like Mean Squared Error (MSE). The action allows for detailed analysis by providing options to bin data, handle missing values, and partition data for validation, making it a cornerstone for robust model assessment.

percentile.assess { attributes={{name="variable-name", format="string", formattedLength=integer, label="string", nfd=integer, nfl=integer}, ...}, binNum=64-bit-integer, casOut={caslib="string", compress=TRUE|FALSE, indexVars={"variable-name-1", ...}, label="string", lifetime=64-bit-integer, maxMemSize=64-bit-integer, memoryFormat="DVR"|"INHERIT"|"STANDARD", name="table-name", promote=TRUE|FALSE, replace=TRUE|FALSE, replication=integer, tableRedistUpPolicy="DEFER"|"NOREDIST"|"REBALANCE", threadBlockSize=64-bit-integer, timeStamp="string", where={"string-1", ...}}, cutStep=double, epsilon=double, event="string", fitStatOut={caslib="string", compress=TRUE|FALSE, indexVars={"variable-name-1", ...}, label="string", lifetime=64-bit-integer, maxMemSize=64-bit-integer, memoryFormat="DVR"|"INHERIT"|"STANDARD", name="table-name", promote=TRUE|FALSE, replace=TRUE|FALSE, replication=integer, tableRedistUpPolicy="DEFER"|"NOREDIST"|"REBALANCE", threadBlockSize=64-bit-integer, timeStamp="string", where={"string-1", ...}}, freq="variable-name", groupByLimit=64-bit-integer, includeCutoffOne=TRUE|FALSE, includeFitStat=TRUE|FALSE, includeLift=TRUE|FALSE, includeRoc=TRUE|FALSE, includeZeroDepth=TRUE|FALSE, inputs={{name="variable-name", format="string", formattedLength=integer, label="string", nfd=integer, nfl=integer}, ...}, maxIters=integer, method="EXACT"|"ITERATIVE", nBins=integer, noMissingTarget=TRUE|FALSE, partition=TRUE|FALSE, partKey={"string-1", ...}, pEvent={"string-1", ...}, pResponse="variable-name", pVar={"variable-name-1", ...}, response="variable-name", responseFmt="string", rocOut={caslib="string", compress=TRUE|FALSE, indexVars={"variable-name-1", ...}, label="string", lifetime=64-bit-integer, maxMemSize=64-bit-integer, memoryFormat="DVR"|"INHERIT"|"STANDARD", name="table-name", promote=TRUE|FALSE, replace=TRUE|FALSE, replication=integer, tableRedistUpPolicy="DEFER"|"NOREDIST"|"REBALANCE", threadBlockSize=64-bit-integer, timeStamp="string", where={"string-1", ...}}, table={caslib="string", computedOnDemand=TRUE|FALSE, computedVars={{name="variable-name", format="string", formattedLength=integer, label="string", nfd=integer, nfl=integer}, ...}, computedVarsProgram="string", dataSourceOptions={key-1=any-list-or-data-type-1, ...}, groupBy={{name="variable-name", format="string", formattedLength=integer, label="string", nfd=integer, nfl=integer}, ...}, groupByMode="NOSORT"|"REDISTRIBUTE", importOptions={fileType="ANY"|"AUDIO"|"AUTO"|"BASESAS"|"CSV"|"DELIMITED"|"DOCUMENT"|"DTA"|"ESP"|"EXCEL"|"FMT"|"HDAT"|"IMAGE"|"JMP"|"LASR"|"PARQUET"|"SOUND"|"SPSS"|"VIDEO"|"XLS", fileType-specific-parameters}, name="table-name", orderBy={{name="variable-name", format="string", formattedLength=integer, label="string", nfd=integer, nfl=integer}, ...}, singlePass=TRUE|FALSE, where="where-expression", whereTable={casLib="string", dataSourceOptions={adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}, importOptions={fileType="ANY"|"AUDIO"|"AUTO"|"BASESAS"|"CSV"|"DELIMITED"|"DOCUMENT"|"DTA"|"ESP"|"EXCEL"|"FMT"|"HDAT"|"IMAGE"|"JMP"|"LASR"|"PARQUET"|"SOUND"|"SPSS"|"VIDEO"|"XLS", fileType-specific-parameters}, name="table-name", vars={{name="variable-name", format="string", formattedLength=integer, label="string", nfd=integer, nfl=integer}, ...}, where="where-expression"}}, useRawPResponse=TRUE|FALSE, userCutoff=double, weight="variable-name" }
Settings
ParameterDescription
attributesSpecifies temporary attributes, such as a format, to apply to input variables.
binNumSpecifies the bin number for a three-pass iterative assessment method.
casOutSpecifies the output table for lift calculations.
cutStepSpecifies the step size to use for the ROC calculations.
epsilonSpecifies the tolerance used in determining the convergence of the iterative algorithm for percentile calculation.
eventSpecifies the formatted value of the response variable that represents the event of interest in a classification model.
fitStatOutSpecifies the output table for fit statistics.
freqSpecifies a variable that contains the frequency of each observation.
groupByLimitSpecifies the maximum number of levels in a group-by set to prevent creating excessively large result sets.
includeCutoffOneWhen set to True, includes a row for cutoff=1 in the ROC statistics to simplify plotting the ROC curve.
includeFitStatWhen set to False, fit statistics are not generated.
includeLiftWhen set to False, lift calculations are not generated.
includeRocWhen set to False, ROC calculations are not generated.
includeZeroDepthWhen set to True, includes a row for depth=0 in the lift statistics to simplify plotting the lift curve.
inputsSpecifies the input variables to use in the analysis.
maxItersSpecifies the maximum number of iterations for the iterative percentile calculation algorithm.
methodSpecifies the algorithm for the percentile analysis, either 'EXACT' or 'ITERATIVE'.
nBinsSpecifies the number of bins to use for lift calculations.
noMissingTargetWhen set to True, excludes observations where the target variable has a missing value.
partitionWhen set to True for a partitioned table, results are calculated efficiently for each partition.
partKeySpecifies a partition key to compute results for a single partition of a partitioned table.
pEventSpecifies the event levels corresponding to each probability variable in `pVar`.
pResponseSpecifies the predicted response variable for model assessment.
pVarSpecifies the event probability variables for assessment.
responseSpecifies the actual outcome or response variable for model assessment.
responseFmtSpecifies a temporary format for the response variable to produce the specified event.
rocOutSpecifies the output table for ROC curve calculations.
tableSpecifies the input CAS table containing the data for assessment.
useRawPResponseWhen set to True, uses raw values of the predicted response variable to filter observations.
userCutoffSpecifies a user-defined cutoff value for generating a confusion matrix.
weightSpecifies a variable to use for weighting each observation in the analysis.
Data Preparation View data prep sheet
Data Creation

This example uses the `HMEQ` dataset, which contains information about home equity loans. We will first load this data into a CAS table. Then, we'll run a logistic regression to predict loan default (`BAD`) and store the predicted probabilities in a new table called `HMEQ_SCORED`. This scored table will be the input for the `assess` action.

Copied!
1/* Load HMEQ data into CAS */
2PROC CASUTIL;
3 load DATA=sampsio.hmeq outcaslib="casuser" casout="hmeq" replace;
4QUIT;
5 
6/* Run logistic regression and score the data */
7PROC CAS;
8 logistic.regress TABLE={name='hmeq'},
9 class={'JOB', 'REASON'},
10 model={depvar='BAD', effects={'LOAN', 'MORTDUE', 'VALUE', 'REASON', 'JOB', 'YOJ', 'DEROG', 'DELINQ', 'CLAGE', 'NINQ', 'CLNO', 'DEBTINC'}},
11 OUTPUT={casout={name='hmeq_scored', replace=true}, into={'P_BAD1'='P_BAD1', 'P_BAD0'='P_BAD0'}};
12QUIT;

Examples

This is a basic example of using the `assess` action to evaluate a binary classification model. We specify the actual response (`BAD`), the predicted event probability (`P_BAD1`), and the event level ('1'). The action will compute default assessment statistics like ROC and Lift information.

SAS® / CAS Code Code awaiting community validation
Copied!
1PROC CAS;
2 percentile.assess TABLE={name='hmeq_scored'},
3 response='BAD',
4 inputs={{name='P_BAD1'}},
5 event='1';
6QUIT;
Result :
The action returns several result tables, including 'ROCInfo' with ROC curve data, 'LiftInfo' with lift chart data, and 'FitStat' with overall model fit statistics like Misclassification Rate and AUC (Area Under Curve).

This example demonstrates a more comprehensive use of the `assess` action. We assess the model for the event '1' using the predicted probability `P_BAD1`. We explicitly request ROC and Lift calculations (`includeRoc=true`, `includeLift=true`) and specify output tables (`rocOut`, `casOut`, `fitStatOut`) to store the results persistently in the `casuser` caslib. This allows for further analysis or visualization of the assessment metrics.

SAS® / CAS Code Code awaiting community validation
Copied!
1PROC CAS;
2 percentile.assess TABLE={name='hmeq_scored'},
3 response='BAD',
4 inputs={{name='P_BAD1'}},
5 event='1',
6 includeRoc=true,
7 includeLift=true,
8 includeFitStat=true,
9 rocOut={name='roc_results', caslib='casuser', replace=true},
10 casOut={name='lift_results', caslib='casuser', replace=true},
11 fitStatOut={name='fit_statistics', caslib='casuser', replace=true};
12QUIT;
Result :
Three new tables are created in the 'casuser' caslib: `roc_results` containing the data points for the ROC curve, `lift_results` containing data for the lift chart and other quantile-based statistics, and `fit_statistics` containing a summary of model performance metrics. The CAS log will confirm the creation of these tables.

The `assess` action can also be used for interval target (regression) models. In this case, you provide the actual response variable and the predicted response variable (`pResponse`). The action calculates regression-specific fit statistics like Mean Square Error (MSE), Root Mean Square Error (RMSE), and R-Square.

SAS® / CAS Code Code awaiting community validation
Copied!
1/* First, create a scored table from a regression model */
2PROC CAS;
3 decisionTree.gbtreeTrain TABLE='hmeq'
4 inputs={'LOAN', 'MORTDUE', 'YOJ', 'DEROG', 'DELINQ', 'CLAGE', 'NINQ', 'CLNO'}
5 target='DEBTINC'
6 savestate={name='hmeq_gbtree_model', replace=true};
7 decisionTree.gbtreeScore TABLE='hmeq'
8 modelTable='hmeq_gbtree_model'
9 casout={name='hmeq_scored_reg', replace=true};
10QUIT;
11 
12/* Now, assess the regression model */
13PROC CAS;
14 percentile.assess TABLE='hmeq_scored_reg',
15 response='DEBTINC',
16 pResponse='_GBT_Pred_';
17QUIT;
Result :
The result will include a 'FitStat' table containing regression fit statistics such as _ASSESS_ASE (Average Squared Error), _ASSESS_RASE (Root Average Squared Error), and _ASSESS_MALE (Mean Absolute Log Error), among others. No ROC or Lift tables are produced for interval targets.

FAQ

What is the primary purpose of the `percentile.assess` action?
What are the mandatory parameters for using the `assess` action?
How do I define the event of interest for a classification model assessment?
Can the `assess` action generate ROC and Lift chart data?
What calculation methods does the `assess` action support for percentiles?
How can I output fit statistics to a separate table?

Associated Scenarios

Use Case
Standard Case: Assessing a Marketing Propensity Model

A retail company has built a logistic regression model to predict which customers are likely to respond to a new promotional offer. The Data Science team needs to assess the mod...

Use Case
Performance Case: Assessing a Fraud Model on Partitioned Data

A financial services company needs to evaluate a fraud detection model across millions of transactions. To speed up the process and get segment-specific insights, they want to a...

Use Case
Edge Case: Handling Missing Data and Weights in Readmission Model

A healthcare provider is assessing a model that predicts the likelihood of patient readmission within 30 days. The dataset is imperfect, with some missing readmission statuses (...