percentile

assess

L'essentiel
At a glance
Data engineers and statisticians utilize the assess action to benchmark predictive model performance across distributed CAS environments. By calculating error metrics for regressions and generating sophisticated visualization data like Receiver Operating Characteristic (ROC) charts for classification tasks, this tool ensures high-quality model governance. Its flexibility is a major asset, allowing for fine-grained evaluation through specific data partitions and observation weighting. To streamline your workflow, we have compiled a dedicated FAQ section below that addresses common technical queries regarding syntax, output tables, and model comparison strategies within the SAS Viya framework.

Description

The `assess` action in the Percentile action set is a powerful tool for evaluating and comparing the performance of predictive models in SAS Viya. It is particularly useful in machine learning workflows to understand how well a model's predictions align with actual outcomes. This action can handle both classification (binary/nominal targets) and regression (interval targets) models. For classification, it computes essential metrics like ROC (Receiver Operating Characteristic) curves, lift charts, and various fit statistics (e.g., accuracy, misclassification rate). For regression, it calculates error metrics like Mean Squared Error (MSE). The action allows for detailed analysis by providing options to bin data, handle missing values, and partition data for validation, making it a cornerstone for robust model assessment.

percentile.assess { attributes={{name="variable-name", format="string", formattedLength=integer, label="string", nfd=integer, nfl=integer}, ...}, binNum=64-bit-integer, casOut={caslib="string", compress=TRUE|FALSE, indexVars={"variable-name-1", ...}, label="string", lifetime=64-bit-integer, maxMemSize=64-bit-integer, memoryFormat="DVR"|"INHERIT"|"STANDARD", name="table-name", promote=TRUE|FALSE, replace=TRUE|FALSE, replication=integer, tableRedistUpPolicy="DEFER"|"NOREDIST"|"REBALANCE", threadBlockSize=64-bit-integer, timeStamp="string", where={"string-1", ...}}, cutStep=double, epsilon=double, event="string", fitStatOut={caslib="string", compress=TRUE|FALSE, indexVars={"variable-name-1", ...}, label="string", lifetime=64-bit-integer, maxMemSize=64-bit-integer, memoryFormat="DVR"|"INHERIT"|"STANDARD", name="table-name", promote=TRUE|FALSE, replace=TRUE|FALSE, replication=integer, tableRedistUpPolicy="DEFER"|"NOREDIST"|"REBALANCE", threadBlockSize=64-bit-integer, timeStamp="string", where={"string-1", ...}}, freq="variable-name", groupByLimit=64-bit-integer, includeCutoffOne=TRUE|FALSE, includeFitStat=TRUE|FALSE, includeLift=TRUE|FALSE, includeRoc=TRUE|FALSE, includeZeroDepth=TRUE|FALSE, inputs={{name="variable-name", format="string", formattedLength=integer, label="string", nfd=integer, nfl=integer}, ...}, maxIters=integer, method="EXACT"|"ITERATIVE", nBins=integer, noMissingTarget=TRUE|FALSE, partition=TRUE|FALSE, partKey={"string-1", ...}, pEvent={"string-1", ...}, pResponse="variable-name", pVar={"variable-name-1", ...}, response="variable-name", responseFmt="string", rocOut={caslib="string", compress=TRUE|FALSE, indexVars={"variable-name-1", ...}, label="string", lifetime=64-bit-integer, maxMemSize=64-bit-integer, memoryFormat="DVR"|"INHERIT"|"STANDARD", name="table-name", promote=TRUE|FALSE, replace=TRUE|FALSE, replication=integer, tableRedistUpPolicy="DEFER"|"NOREDIST"|"REBALANCE", threadBlockSize=64-bit-integer, timeStamp="string", where={"string-1", ...}}, table={caslib="string", computedOnDemand=TRUE|FALSE, computedVars={{name="variable-name", format="string", formattedLength=integer, label="string", nfd=integer, nfl=integer}, ...}, computedVarsProgram="string", dataSourceOptions={key-1=any-list-or-data-type-1, ...}, groupBy={{name="variable-name", format="string", formattedLength=integer, label="string", nfd=integer, nfl=integer}, ...}, groupByMode="NOSORT"|"REDISTRIBUTE", importOptions={fileType="ANY"|"AUDIO"|"AUTO"|"BASESAS"|"CSV"|"DELIMITED"|"DOCUMENT"|"DTA"|"ESP"|"EXCEL"|"FMT"|"HDAT"|"IMAGE"|"JMP"|"LASR"|"PARQUET"|"SOUND"|"SPSS"|"VIDEO"|"XLS", fileType-specific-parameters}, name="table-name", orderBy={{name="variable-name", format="string", formattedLength=integer, label="string", nfd=integer, nfl=integer}, ...}, singlePass=TRUE|FALSE, where="where-expression", whereTable={casLib="string", dataSourceOptions={adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}, importOptions={fileType="ANY"|"AUDIO"|"AUTO"|"BASESAS"|"CSV"|"DELIMITED"|"DOCUMENT"|"DTA"|"ESP"|"EXCEL"|"FMT"|"HDAT"|"IMAGE"|"JMP"|"LASR"|"PARQUET"|"SOUND"|"SPSS"|"VIDEO"|"XLS", fileType-specific-parameters}, name="table-name", vars={{name="variable-name", format="string", formattedLength=integer, label="string", nfd=integer, nfl=integer}, ...}, where="where-expression"}}, useRawPResponse=TRUE|FALSE, userCutoff=double, weight="variable-name" }
Settings
ParameterDescription
attributes Specifies temporary attributes, such as a format, to apply to input variables.
binNum Specifies the bin number for a three-pass iterative assessment method.
casOut Specifies the output table for lift calculations.
cutStep Specifies the step size to use for the ROC calculations.
epsilon Specifies the tolerance used in determining the convergence of the iterative algorithm for percentile calculation.
event Specifies the formatted value of the response variable that represents the event of interest in a classification model.
fitStatOut Specifies the output table for fit statistics.
freq Specifies a variable that contains the frequency of each observation.
groupByLimit Specifies the maximum number of levels in a group-by set to prevent creating excessively large result sets.
includeCutoffOne When set to True, includes a row for cutoff=1 in the ROC statistics to simplify plotting the ROC curve.
includeFitStat When set to False, fit statistics are not generated.
includeLift When set to False, lift calculations are not generated.
includeRoc When set to False, ROC calculations are not generated.
includeZeroDepth When set to True, includes a row for depth=0 in the lift statistics to simplify plotting the lift curve.
inputs Specifies the input variables to use in the analysis.
maxIters Specifies the maximum number of iterations for the iterative percentile calculation algorithm.
method Specifies the algorithm for the percentile analysis, either 'EXACT' or 'ITERATIVE'.
nBins Specifies the number of bins to use for lift calculations.
noMissingTarget When set to True, excludes observations where the target variable has a missing value.
partition When set to True for a partitioned table, results are calculated efficiently for each partition.
partKey Specifies a partition key to compute results for a single partition of a partitioned table.
pEvent Specifies the event levels corresponding to each probability variable in `pVar`.
pResponse Specifies the predicted response variable for model assessment.
pVar Specifies the event probability variables for assessment.
response Specifies the actual outcome or response variable for model assessment.
responseFmt Specifies a temporary format for the response variable to produce the specified event.
rocOut Specifies the output table for ROC curve calculations.
table Specifies the input CAS table containing the data for assessment.
useRawPResponse When set to True, uses raw values of the predicted response variable to filter observations.
userCutoff Specifies a user-defined cutoff value for generating a confusion matrix.
weight Specifies a variable to use for weighting each observation in the analysis.
Data Preparation View data prep sheet
Data Creation

This example uses the `HMEQ` dataset, which contains information about home equity loans. We will first load this data into a CAS table. Then, we'll run a logistic regression to predict loan default (`BAD`) and store the predicted probabilities in a new table called `HMEQ_SCORED`. This scored table will be the input for the `assess` action.

Copied!
1/* Load HMEQ data into CAS */
2PROC CASUTIL;
3 load DATA=sampsio.hmeq outcaslib="casuser" casout="hmeq" replace;
4QUIT;
5 
6/* Run logistic regression and score the data */
7PROC CAS;
8 logistic.regress TABLE={name='hmeq'},
9 class={'JOB', 'REASON'},
10 model={depvar='BAD', effects={'LOAN', 'MORTDUE', 'VALUE', 'REASON', 'JOB', 'YOJ', 'DEROG', 'DELINQ', 'CLAGE', 'NINQ', 'CLNO', 'DEBTINC'}},
11 OUTPUT={casout={name='hmeq_scored', replace=true}, into={'P_BAD1'='P_BAD1', 'P_BAD0'='P_BAD0'}};
12QUIT;

Examples

This is a basic example of using the `assess` action to evaluate a binary classification model. We specify the actual response (`BAD`), the predicted event probability (`P_BAD1`), and the event level ('1'). The action will compute default assessment statistics like ROC and Lift information.

SAS® / CAS Code Code awaiting community validation
Copied!
1PROC CAS;
2 percentile.assess TABLE={name='hmeq_scored'},
3 response='BAD',
4 inputs={{name='P_BAD1'}},
5 event='1';
6QUIT;
Result :
The action returns several result tables, including 'ROCInfo' with ROC curve data, 'LiftInfo' with lift chart data, and 'FitStat' with overall model fit statistics like Misclassification Rate and AUC (Area Under Curve).

This example demonstrates a more comprehensive use of the `assess` action. We assess the model for the event '1' using the predicted probability `P_BAD1`. We explicitly request ROC and Lift calculations (`includeRoc=true`, `includeLift=true`) and specify output tables (`rocOut`, `casOut`, `fitStatOut`) to store the results persistently in the `casuser` caslib. This allows for further analysis or visualization of the assessment metrics.

SAS® / CAS Code Code awaiting community validation
Copied!
1PROC CAS;
2 percentile.assess TABLE={name='hmeq_scored'},
3 response='BAD',
4 inputs={{name='P_BAD1'}},
5 event='1',
6 includeRoc=true,
7 includeLift=true,
8 includeFitStat=true,
9 rocOut={name='roc_results', caslib='casuser', replace=true},
10 casOut={name='lift_results', caslib='casuser', replace=true},
11 fitStatOut={name='fit_statistics', caslib='casuser', replace=true};
12QUIT;
Result :
Three new tables are created in the 'casuser' caslib: `roc_results` containing the data points for the ROC curve, `lift_results` containing data for the lift chart and other quantile-based statistics, and `fit_statistics` containing a summary of model performance metrics. The CAS log will confirm the creation of these tables.

The `assess` action can also be used for interval target (regression) models. In this case, you provide the actual response variable and the predicted response variable (`pResponse`). The action calculates regression-specific fit statistics like Mean Square Error (MSE), Root Mean Square Error (RMSE), and R-Square.

SAS® / CAS Code Code awaiting community validation
Copied!
1/* First, create a scored table from a regression model */
2PROC CAS;
3 decisionTree.gbtreeTrain TABLE='hmeq'
4 inputs={'LOAN', 'MORTDUE', 'YOJ', 'DEROG', 'DELINQ', 'CLAGE', 'NINQ', 'CLNO'}
5 target='DEBTINC'
6 savestate={name='hmeq_gbtree_model', replace=true};
7 decisionTree.gbtreeScore TABLE='hmeq'
8 modelTable='hmeq_gbtree_model'
9 casout={name='hmeq_scored_reg', replace=true};
10QUIT;
11 
12/* Now, assess the regression model */
13PROC CAS;
14 percentile.assess TABLE='hmeq_scored_reg',
15 response='DEBTINC',
16 pResponse='_GBT_Pred_';
17QUIT;
Result :
The result will include a 'FitStat' table containing regression fit statistics such as _ASSESS_ASE (Average Squared Error), _ASSESS_RASE (Root Average Squared Error), and _ASSESS_MALE (Mean Absolute Log Error), among others. No ROC or Lift tables are produced for interval targets.

FAQ

What is the primary purpose of the `percentile.assess` action?
What are the mandatory parameters for using the `assess` action?
How do I define the event of interest for a classification model assessment?
Can the `assess` action generate ROC and Lift chart data?
What calculation methods does the `assess` action support for percentiles?
How can I output fit statistics to a separate table?

Associated Scenarios

Use Case
Standard Case: Assessing a Marketing Propensity Model

A retail company has built a logistic regression model to predict which customers are likely to respond to a new promotional offer. The Data Science team needs to assess the mod...

Use Case
Performance Case: Assessing a Fraud Model on Partitioned Data

A financial services company needs to evaluate a fraud detection model across millions of transactions. To speed up the process and get segment-specific insights, they want to a...

Use Case
Edge Case: Handling Missing Data and Weights in Readmission Model

A healthcare provider is assessing a model that predicts the likelihood of patient readmission within 30 days. The dataset is imperfect, with some missing readmission statuses (...