assess - WeAreCAS

Q: What is the primary purpose of the `percentile.assess` action?

The `percentile.assess` action is used to assess and compare models. It provides functionalities for calculating percentiles, box plot values, and generating various model assessment statistics.

Q: What are the mandatory parameters for using the `assess` action?

To run the `assess` action, you must specify the `table` parameter, which defines the input data table, and the `response` parameter, which indicates the response variable for the model assessment.

Q: How do I define the event of interest for a classification model assessment?

You can specify the event of interest using the `event` parameter, which takes the formatted value of the response variable that represents the event. If you don't specify this and the response is numeric, the action performs an assessment for a regression model.

Q: Can the `assess` action generate ROC and Lift chart data?

Yes. To get ROC (Receiver Operating Characteristic) data, set `includeRoc` to TRUE and specify an output table with the `rocOut` parameter. For Lift data, set `includeLift` to TRUE and use the `casOut` parameter to specify the output table. You can control the granularity of ROC calculations with `cutStep` and Lift calculations with `nBins`.

Q: What calculation methods does the `assess` action support for percentiles?

The `assess` action supports two algorithms for percentile analysis, specified via the `method` parameter: 'ITERATIVE' (the default) and 'EXACT'. The iterative method's convergence can be fine-tuned using the `maxIters` and `epsilon` parameters.

Q: How can I output fit statistics to a separate table?

You can save the fit statistics to a separate CAS output table by using the `fitStatOut` parameter. This is particularly useful when assessing nominal response variables where you must also specify the probability events (`pEvent`) and probability variables (`pVar`).

At a glance

Data engineers and statisticians utilize the assess action to benchmark predictive model performance across distributed CAS environments. By calculating error metrics for regressions and generating sophisticated visualization data like Receiver Operating Characteristic (ROC) charts for classification tasks, this tool ensures high-quality model governance. Its flexibility is a major asset, allowing for fine-grained evaluation through specific data partitions and observation weighting. To streamline your workflow, we have compiled a dedicated FAQ section below that addresses common technical queries regarding syntax, output tables, and model comparison strategies within the SAS Viya framework.

Codes SAS Liés

Processing Millions of Rows with SAS PROC ASSESS

Sans titre

Netezza Data Connector

Description

The `assess` action in the Percentile action set is a powerful tool for evaluating and comparing the performance of predictive models in SAS Viya. It is particularly useful in machine learning workflows to understand how well a model's predictions align with actual outcomes. This action can handle both classification (binary/nominal targets) and regression (interval targets) models. For classification, it computes essential metrics like ROC (Receiver Operating Characteristic) curves, lift charts, and various fit statistics (e.g., accuracy, misclassification rate). For regression, it calculates error metrics like Mean Squared Error (MSE). The action allows for detailed analysis by providing options to bin data, handle missing values, and partition data for validation, making it a cornerstone for robust model assessment.

percentile.assess { attributes={{name="variable-name", format="string", formattedLength=integer, label="string", nfd=integer, nfl=integer}, ...}, binNum=64-bit-integer, casOut={caslib="string", compress=TRUE|FALSE, indexVars={"variable-name-1", ...}, label="string", lifetime=64-bit-integer, maxMemSize=64-bit-integer, memoryFormat="DVR"|"INHERIT"|"STANDARD", name="table-name", promote=TRUE|FALSE, replace=TRUE|FALSE, replication=integer, tableRedistUpPolicy="DEFER"|"NOREDIST"|"REBALANCE", threadBlockSize=64-bit-integer, timeStamp="string", where={"string-1", ...}}, cutStep=double, epsilon=double, event="string", fitStatOut={caslib="string", compress=TRUE|FALSE, indexVars={"variable-name-1", ...}, label="string", lifetime=64-bit-integer, maxMemSize=64-bit-integer, memoryFormat="DVR"|"INHERIT"|"STANDARD", name="table-name", promote=TRUE|FALSE, replace=TRUE|FALSE, replication=integer, tableRedistUpPolicy="DEFER"|"NOREDIST"|"REBALANCE", threadBlockSize=64-bit-integer, timeStamp="string", where={"string-1", ...}}, freq="variable-name", groupByLimit=64-bit-integer, includeCutoffOne=TRUE|FALSE, includeFitStat=TRUE|FALSE, includeLift=TRUE|FALSE, includeRoc=TRUE|FALSE, includeZeroDepth=TRUE|FALSE, inputs={{name="variable-name", format="string", formattedLength=integer, label="string", nfd=integer, nfl=integer}, ...}, maxIters=integer, method="EXACT"|"ITERATIVE", nBins=integer, noMissingTarget=TRUE|FALSE, partition=TRUE|FALSE, partKey={"string-1", ...}, pEvent={"string-1", ...}, pResponse="variable-name", pVar={"variable-name-1", ...}, response="variable-name", responseFmt="string", rocOut={caslib="string", compress=TRUE|FALSE, indexVars={"variable-name-1", ...}, label="string", lifetime=64-bit-integer, maxMemSize=64-bit-integer, memoryFormat="DVR"|"INHERIT"|"STANDARD", name="table-name", promote=TRUE|FALSE, replace=TRUE|FALSE, replication=integer, tableRedistUpPolicy="DEFER"|"NOREDIST"|"REBALANCE", threadBlockSize=64-bit-integer, timeStamp="string", where={"string-1", ...}}, table={caslib="string", computedOnDemand=TRUE|FALSE, computedVars={{name="variable-name", format="string", formattedLength=integer, label="string", nfd=integer, nfl=integer}, ...}, computedVarsProgram="string", dataSourceOptions={key-1=any-list-or-data-type-1, ...}, groupBy={{name="variable-name", format="string", formattedLength=integer, label="string", nfd=integer, nfl=integer}, ...}, groupByMode="NOSORT"|"REDISTRIBUTE", importOptions={fileType="ANY"|"AUDIO"|"AUTO"|"BASESAS"|"CSV"|"DELIMITED"|"DOCUMENT"|"DTA"|"ESP"|"EXCEL"|"FMT"|"HDAT"|"IMAGE"|"JMP"|"LASR"|"PARQUET"|"SOUND"|"SPSS"|"VIDEO"|"XLS", fileType-specific-parameters}, name="table-name", orderBy={{name="variable-name", format="string", formattedLength=integer, label="string", nfd=integer, nfl=integer}, ...}, singlePass=TRUE|FALSE, where="where-expression", whereTable={casLib="string", dataSourceOptions={adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}, importOptions={fileType="ANY"|"AUDIO"|"AUTO"|"BASESAS"|"CSV"|"DELIMITED"|"DOCUMENT"|"DTA"|"ESP"|"EXCEL"|"FMT"|"HDAT"|"IMAGE"|"JMP"|"LASR"|"PARQUET"|"SOUND"|"SPSS"|"VIDEO"|"XLS", fileType-specific-parameters}, name="table-name", vars={{name="variable-name", format="string", formattedLength=integer, label="string", nfd=integer, nfl=integer}, ...}, where="where-expression"}}, useRawPResponse=TRUE|FALSE, userCutoff=double, weight="variable-name" }

Settings

Parameter	Description
attributes	Specifies temporary attributes, such as a format, to apply to input variables.
binNum	Specifies the bin number for a three-pass iterative assessment method.
casOut	Specifies the output table for lift calculations.
cutStep	Specifies the step size to use for the ROC calculations.
epsilon	Specifies the tolerance used in determining the convergence of the iterative algorithm for percentile calculation.
event	Specifies the formatted value of the response variable that represents the event of interest in a classification model.
fitStatOut	Specifies the output table for fit statistics.
freq	Specifies a variable that contains the frequency of each observation.
groupByLimit	Specifies the maximum number of levels in a group-by set to prevent creating excessively large result sets.
includeCutoffOne	When set to True, includes a row for cutoff=1 in the ROC statistics to simplify plotting the ROC curve.
includeFitStat	When set to False, fit statistics are not generated.
includeLift	When set to False, lift calculations are not generated.
includeRoc	When set to False, ROC calculations are not generated.
includeZeroDepth	When set to True, includes a row for depth=0 in the lift statistics to simplify plotting the lift curve.
inputs	Specifies the input variables to use in the analysis.
maxIters	Specifies the maximum number of iterations for the iterative percentile calculation algorithm.
method	Specifies the algorithm for the percentile analysis, either 'EXACT' or 'ITERATIVE'.
nBins	Specifies the number of bins to use for lift calculations.
noMissingTarget	When set to True, excludes observations where the target variable has a missing value.
partition	When set to True for a partitioned table, results are calculated efficiently for each partition.
partKey	Specifies a partition key to compute results for a single partition of a partitioned table.
pEvent	Specifies the event levels corresponding to each probability variable in `pVar`.
pResponse	Specifies the predicted response variable for model assessment.
pVar	Specifies the event probability variables for assessment.
response	Specifies the actual outcome or response variable for model assessment.
responseFmt	Specifies a temporary format for the response variable to produce the specified event.
rocOut	Specifies the output table for ROC curve calculations.
table	Specifies the input CAS table containing the data for assessment.
useRawPResponse	When set to True, uses raw values of the predicted response variable to filter observations.
userCutoff	Specifies a user-defined cutoff value for generating a confusion matrix.
weight	Specifies a variable to use for weighting each observation in the analysis.

Data Preparation View data prep sheet

Data Creation

This example uses the `HMEQ` dataset, which contains information about home equity loans. We will first load this data into a CAS table. Then, we'll run a logistic regression to predict loan default (`BAD`) and store the predicted probabilities in a new table called `HMEQ_SCORED`. This scored table will be the input for the `assess` action.

Copied!

1	/* Load HMEQ data into CAS */
2	PROC CASUTIL;
3	load DATA=sampsio.hmeq outcaslib="casuser" casout="hmeq" replace;
4	QUIT;
5
6	/* Run logistic regression and score the data */
7	PROC CAS;
8	logistic.regress TABLE={name='hmeq'},
9	class={'JOB', 'REASON'},
10	model={depvar='BAD', effects={'LOAN', 'MORTDUE', 'VALUE', 'REASON', 'JOB', 'YOJ', 'DEROG', 'DELINQ', 'CLAGE', 'NINQ', 'CLNO', 'DEBTINC'}},
11	OUTPUT={casout={name='hmeq_scored', replace=true}, into={'P_BAD1'='P_BAD1', 'P_BAD0'='P_BAD0'}};
12	QUIT;

Examples

This is a basic example of using the `assess` action to evaluate a binary classification model. We specify the actual response (`BAD`), the predicted event probability (`P_BAD1`), and the event level ('1'). The action will compute default assessment statistics like ROC and Lift information.

SAS® / CAS Code Code awaiting community validation

Copied!

1	PROC CAS;
2	percentile.assess TABLE={name='hmeq_scored'},
3	response='BAD',
4	inputs={{name='P_BAD1'}},
5	event='1';
6	QUIT;

Result :
The action returns several result tables, including 'ROCInfo' with ROC curve data, 'LiftInfo' with lift chart data, and 'FitStat' with overall model fit statistics like Misclassification Rate and AUC (Area Under Curve).

This example demonstrates a more comprehensive use of the `assess` action. We assess the model for the event '1' using the predicted probability `P_BAD1`. We explicitly request ROC and Lift calculations (`includeRoc=true`, `includeLift=true`) and specify output tables (`rocOut`, `casOut`, `fitStatOut`) to store the results persistently in the `casuser` caslib. This allows for further analysis or visualization of the assessment metrics.

SAS® / CAS Code Code awaiting community validation

Copied!

1	PROC CAS;
2	percentile.assess TABLE={name='hmeq_scored'},
3	response='BAD',
4	inputs={{name='P_BAD1'}},
5	event='1',
6	includeRoc=true,
7	includeLift=true,
8	includeFitStat=true,
9	rocOut={name='roc_results', caslib='casuser', replace=true},
10	casOut={name='lift_results', caslib='casuser', replace=true},
11	fitStatOut={name='fit_statistics', caslib='casuser', replace=true};
12	QUIT;

Result :
Three new tables are created in the 'casuser' caslib: `roc_results` containing the data points for the ROC curve, `lift_results` containing data for the lift chart and other quantile-based statistics, and `fit_statistics` containing a summary of model performance metrics. The CAS log will confirm the creation of these tables.

The `assess` action can also be used for interval target (regression) models. In this case, you provide the actual response variable and the predicted response variable (`pResponse`). The action calculates regression-specific fit statistics like Mean Square Error (MSE), Root Mean Square Error (RMSE), and R-Square.

SAS® / CAS Code Code awaiting community validation

Copied!

1	/* First, create a scored table from a regression model */
2	PROC CAS;
3	decisionTree.gbtreeTrain TABLE='hmeq'
4	inputs={'LOAN', 'MORTDUE', 'YOJ', 'DEROG', 'DELINQ', 'CLAGE', 'NINQ', 'CLNO'}
5	target='DEBTINC'
6	savestate={name='hmeq_gbtree_model', replace=true};
7	decisionTree.gbtreeScore TABLE='hmeq'
8	modelTable='hmeq_gbtree_model'
9	casout={name='hmeq_scored_reg', replace=true};
10	QUIT;
11
12	/* Now, assess the regression model */
13	PROC CAS;
14	percentile.assess TABLE='hmeq_scored_reg',
15	response='DEBTINC',
16	pResponse='_GBT_Pred_';
17	QUIT;

Result :
The result will include a 'FitStat' table containing regression fit statistics such as _ASSESS_ASE (Average Squared Error), _ASSESS_RASE (Root Average Squared Error), and _ASSESS_MALE (Mean Absolute Log Error), among others. No ROC or Lift tables are produced for interval targets.

FAQ

What is the primary purpose of the `percentile.assess` action?

What are the mandatory parameters for using the `assess` action?

How do I define the event of interest for a classification model assessment?

Can the `assess` action generate ROC and Lift chart data?

What calculation methods does the `assess` action support for percentiles?

How can I output fit statistics to a separate table?

Associated Scenarios

Use Case

Standard Case: Assessing a Marketing Propensity Model

A retail company has built a logistic regression model to predict which customers are likely to respond to a new promotional offer. The Data Science team needs to assess the mod...

View scenario

Use Case

Performance Case: Assessing a Fraud Model on Partitioned Data

A financial services company needs to evaluate a fraud detection model across millions of transactions. To speed up the process and get segment-specific insights, they want to a...

View scenario

Use Case

Edge Case: Handling Missing Data and Weights in Readmission Model

A healthcare provider is assessing a model that predicts the likelihood of patient readmission within 30 days. The dataset is imperfect, with some missing readmission statuses (...

View scenario

Actions associées

percentile

boxPlot

The boxPlot action calculates quantiles, high and low whiskers, and outliers ...

Table of Contents

At a glance

Processing Millions of Rows with SAS PROC ASSESS

Sans titre

Netezza Data Connector

Description

Data Creation

Examples

Basic Model Assessment

Detailed Assessment with Output Tables

Assessment for Regression Models

FAQ

Associated Scenarios

Use Case

Standard Case: Assessing a Marketing Propensity Model

Use Case

Performance Case: Assessing a Fraud Model on Partitioned Data

Use Case

Edge Case: Handling Missing Data and Weights in Readmission Model

Actions associées

boxPlot