The `assess` action in the Percentile action set is a powerful tool for evaluating and comparing the performance of predictive models in SAS Viya. It is particularly useful in machine learning workflows to understand how well a model's predictions align with actual outcomes. This action can handle both classification (binary/nominal targets) and regression (interval targets) models. For classification, it computes essential metrics like ROC (Receiver Operating Characteristic) curves, lift charts, and various fit statistics (e.g., accuracy, misclassification rate). For regression, it calculates error metrics like Mean Squared Error (MSE). The action allows for detailed analysis by providing options to bin data, handle missing values, and partition data for validation, making it a cornerstone for robust model assessment.
| Parameter | Description |
|---|---|
| attributes | Specifies temporary attributes, such as a format, to apply to input variables. |
| binNum | Specifies the bin number for a three-pass iterative assessment method. |
| casOut | Specifies the output table for lift calculations. |
| cutStep | Specifies the step size to use for the ROC calculations. |
| epsilon | Specifies the tolerance used in determining the convergence of the iterative algorithm for percentile calculation. |
| event | Specifies the formatted value of the response variable that represents the event of interest in a classification model. |
| fitStatOut | Specifies the output table for fit statistics. |
| freq | Specifies a variable that contains the frequency of each observation. |
| groupByLimit | Specifies the maximum number of levels in a group-by set to prevent creating excessively large result sets. |
| includeCutoffOne | When set to True, includes a row for cutoff=1 in the ROC statistics to simplify plotting the ROC curve. |
| includeFitStat | When set to False, fit statistics are not generated. |
| includeLift | When set to False, lift calculations are not generated. |
| includeRoc | When set to False, ROC calculations are not generated. |
| includeZeroDepth | When set to True, includes a row for depth=0 in the lift statistics to simplify plotting the lift curve. |
| inputs | Specifies the input variables to use in the analysis. |
| maxIters | Specifies the maximum number of iterations for the iterative percentile calculation algorithm. |
| method | Specifies the algorithm for the percentile analysis, either 'EXACT' or 'ITERATIVE'. |
| nBins | Specifies the number of bins to use for lift calculations. |
| noMissingTarget | When set to True, excludes observations where the target variable has a missing value. |
| partition | When set to True for a partitioned table, results are calculated efficiently for each partition. |
| partKey | Specifies a partition key to compute results for a single partition of a partitioned table. |
| pEvent | Specifies the event levels corresponding to each probability variable in `pVar`. |
| pResponse | Specifies the predicted response variable for model assessment. |
| pVar | Specifies the event probability variables for assessment. |
| response | Specifies the actual outcome or response variable for model assessment. |
| responseFmt | Specifies a temporary format for the response variable to produce the specified event. |
| rocOut | Specifies the output table for ROC curve calculations. |
| table | Specifies the input CAS table containing the data for assessment. |
| useRawPResponse | When set to True, uses raw values of the predicted response variable to filter observations. |
| userCutoff | Specifies a user-defined cutoff value for generating a confusion matrix. |
| weight | Specifies a variable to use for weighting each observation in the analysis. |
This example uses the `HMEQ` dataset, which contains information about home equity loans. We will first load this data into a CAS table. Then, we'll run a logistic regression to predict loan default (`BAD`) and store the predicted probabilities in a new table called `HMEQ_SCORED`. This scored table will be the input for the `assess` action.
| 1 | /* Load HMEQ data into CAS */ |
| 2 | PROC CASUTIL; |
| 3 | load DATA=sampsio.hmeq outcaslib="casuser" casout="hmeq" replace; |
| 4 | QUIT; |
| 5 | |
| 6 | /* Run logistic regression and score the data */ |
| 7 | PROC CAS; |
| 8 | logistic.regress TABLE={name='hmeq'}, |
| 9 | class={'JOB', 'REASON'}, |
| 10 | model={depvar='BAD', effects={'LOAN', 'MORTDUE', 'VALUE', 'REASON', 'JOB', 'YOJ', 'DEROG', 'DELINQ', 'CLAGE', 'NINQ', 'CLNO', 'DEBTINC'}}, |
| 11 | OUTPUT={casout={name='hmeq_scored', replace=true}, into={'P_BAD1'='P_BAD1', 'P_BAD0'='P_BAD0'}}; |
| 12 | QUIT; |
This is a basic example of using the `assess` action to evaluate a binary classification model. We specify the actual response (`BAD`), the predicted event probability (`P_BAD1`), and the event level ('1'). The action will compute default assessment statistics like ROC and Lift information.
| 1 | PROC CAS; |
| 2 | percentile.assess TABLE={name='hmeq_scored'}, |
| 3 | response='BAD', |
| 4 | inputs={{name='P_BAD1'}}, |
| 5 | event='1'; |
| 6 | QUIT; |
This example demonstrates a more comprehensive use of the `assess` action. We assess the model for the event '1' using the predicted probability `P_BAD1`. We explicitly request ROC and Lift calculations (`includeRoc=true`, `includeLift=true`) and specify output tables (`rocOut`, `casOut`, `fitStatOut`) to store the results persistently in the `casuser` caslib. This allows for further analysis or visualization of the assessment metrics.
| 1 | PROC CAS; |
| 2 | percentile.assess TABLE={name='hmeq_scored'}, |
| 3 | response='BAD', |
| 4 | inputs={{name='P_BAD1'}}, |
| 5 | event='1', |
| 6 | includeRoc=true, |
| 7 | includeLift=true, |
| 8 | includeFitStat=true, |
| 9 | rocOut={name='roc_results', caslib='casuser', replace=true}, |
| 10 | casOut={name='lift_results', caslib='casuser', replace=true}, |
| 11 | fitStatOut={name='fit_statistics', caslib='casuser', replace=true}; |
| 12 | QUIT; |
The `assess` action can also be used for interval target (regression) models. In this case, you provide the actual response variable and the predicted response variable (`pResponse`). The action calculates regression-specific fit statistics like Mean Square Error (MSE), Root Mean Square Error (RMSE), and R-Square.
| 1 | /* First, create a scored table from a regression model */ |
| 2 | PROC CAS; |
| 3 | decisionTree.gbtreeTrain TABLE='hmeq' |
| 4 | inputs={'LOAN', 'MORTDUE', 'YOJ', 'DEROG', 'DELINQ', 'CLAGE', 'NINQ', 'CLNO'} |
| 5 | target='DEBTINC' |
| 6 | savestate={name='hmeq_gbtree_model', replace=true}; |
| 7 | decisionTree.gbtreeScore TABLE='hmeq' |
| 8 | modelTable='hmeq_gbtree_model' |
| 9 | casout={name='hmeq_scored_reg', replace=true}; |
| 10 | QUIT; |
| 11 | |
| 12 | /* Now, assess the regression model */ |
| 13 | PROC CAS; |
| 14 | percentile.assess TABLE='hmeq_scored_reg', |
| 15 | response='DEBTINC', |
| 16 | pResponse='_GBT_Pred_'; |
| 17 | QUIT; |
A retail company has built a logistic regression model to predict which customers are likely to respond to a new promotional offer. The Data Science team needs to assess the mod...
A financial services company needs to evaluate a fraud detection model across millions of transactions. To speed up the process and get segment-specific insights, they want to a...
A healthcare provider is assessing a model that predicts the likelihood of patient readmission within 30 days. The dataset is imperfect, with some missing readmission statuses (...