fairAITools

assessBias

Description

The assessBias action calculates bias metrics for predictive models. This is a crucial step in ensuring fairness in artificial intelligence by identifying whether a model produces different outcomes for different subgroups, particularly those defined by sensitive variables like race or gender. The action can handle models saved as analytic stores (ASTORE) or as SAS DATA step code.

fairAITools.assessBias { code="string", cutoff=double, event="string", frequency={casvardesc}, modelTable={castable}, modelTables={{castable-1} <, {castable-2}, ...>}, modelTableType="ASTORE" | "DATASTEP" | "NONE", nBins=64-bit-integer, predictedVariables={{casvardesc-1} <, {casvardesc-2}, ...>}, referenceLevel="string", response={casvardesc}, responseLevels={"string-1" <, "string-2", ...>}, rocStep=double, scoredTable={casouttable}, selectionDepth=64-bit-integer, sensitiveVariable={casvardesc}, table={castable}, weight={casvardesc} };
Settings
ParameterDescription
codeSpecifies the DATA step code that describes the model or the DS2 code used with an analytic store.
cutoffSpecifies the probability cutoff for classifying an observation as an event in the confusion matrix. Default is 0.5.
eventSpecifies the formatted value of the response variable that represents the event of interest.
frequencySpecifies the variable that contains the frequency of occurrence for each observation.
modelTableSpecifies the input table containing the model to be assessed, which can be an analytic store or DATA step scoring code.
modelTablesSpecifies multiple input tables containing model components, typically used with DS2 code.
modelTableTypeSpecifies the type of scoring model provided: ASTORE, DATASTEP, or NONE. Default is ASTORE.
nBinsSpecifies the number of bins to use for lift calculations. Default is 20.
predictedVariablesSpecifies the list of variables that contain the model's predictions.
referenceLevelSpecifies the reference level for the sensitive variable, which acts as the baseline for comparison.
responseSpecifies the response (target) variable.
responseLevelsSpecifies the list of formatted values for the response variable.
rocStepSpecifies the step size for Receiver Operating Characteristic (ROC) calculations. Default is 0.05.
scoredTableSpecifies the output table to store the scored results.
selectionDepthSpecifies the depth to use in lift calculations. Default is 10.
sensitiveVariableSpecifies the sensitive variable (e.g., gender, race) to use for bias assessment.
tableSpecifies the input data table for assessment.
weightSpecifies the variable that contains observation weights.
Data Preparation View data prep sheet
Data Creation for Bias Assessment

This example first loads the `HMEQ` dataset, which contains home equity loan data. Then, a gradient boosting model is trained to predict loan defaults (`BAD`). The model's predictions are saved as `P_BAD1` and `P_BAD0`. This scored table, `HMEQ_SCORED`, will be used as input for the bias assessment.

Copied!
1PROC CASUTIL;
2 load DATA=sampsio.hmeq path='%casuser/hmeq.csv' replace;
3QUIT;
4 
5PROC GRADBOOST DATA=mycas.hmeq seed=12345;
6 INPUT LOAN MORTDUE VALUE YOJ DEROG DELINQ CLAGE NINQ CLNO DEBTINC / level=interval;
7 INPUT REASON JOB / level=nominal;
8 target BAD / level=nominal;
9 OUTPUT out=mycas.hmeq_scored copyvars=(_all_) pred=p;
10QUIT;

Examples

This example performs a basic bias assessment on a pre-scored table. It uses the `JOB` variable as the sensitive attribute and `BAD` as the response variable. The model's predicted probabilities for the event '1' are in the `P_BAD1` variable.

SAS® / CAS Code Code awaiting community validation
Copied!
1PROC CAS;
2 fairAITools.assessBias
3 TABLE={name='hmeq_scored'},
4 response={name='BAD'},
5 sensitiveVariable={name='JOB'},
6 predictedVariables={{name='P_BAD1'}},
7 event='1';
8RUN;

This example demonstrates a more detailed bias assessment. It explicitly defines 'Other' as the reference level for the `JOB` sensitive variable. It also specifies a custom probability cutoff of 0.6 for creating the confusion matrix and saves the detailed assessment results, including group-specific metrics, into a CAS table named `BIAS_ASSESSMENT_RESULTS`.

SAS® / CAS Code Code awaiting community validation
Copied!
1PROC CAS;
2 fairAITools.assessBias
3 TABLE={name='hmeq_scored'},
4 response={name='BAD'},
5 sensitiveVariable={name='JOB'},
6 predictedVariables={{name='P_BAD1'}},
7 event='1',
8 referenceLevel='Other',
9 cutoff=0.6,
10 scoredTable={name='BIAS_ASSESSMENT_RESULTS', replace=true};
11RUN;

FAQ

What is the purpose of the fairAITools.assessBias action?
What is the 'code' parameter used for in the assessBias action?
How is the 'cutoff' parameter used in the assessBias action?
What does the 'event' parameter signify?
How can I specify frequency values for the analysis?
What is the purpose of the 'modelTable' parameter?
When should I use the 'modelTables' parameter?
What are the possible values for the 'modelTableType' parameter?
What does the 'nBins' parameter control?
How do I specify the model's prediction variables?
What is the 'referenceLevel' parameter for?
How is the response or target variable specified?
What is the 'responseLevels' parameter?
What does the 'rocStep' parameter do?
How can I save the scored outputs?
What is the 'selectionDepth' parameter?
Which parameter is required for specifying the sensitive variable?
How do I specify the input data table for the assessBias action?

Associated Scenarios

Use Case
Standard Case: Assessing Gender Bias in a Loan Approval Model

A retail bank has developed a machine learning model to predict the likelihood of loan default. To comply with fair lending regulations, the bank needs to assess whether the mod...

Use Case
Performance Case: Bias Assessment on a Large Dataset with an ASTORE Model

An insurance company uses a gradient boosting model (ASTORE) to flag potentially fraudulent claims. They need to ensure the model is not unfairly flagging claims from certain ge...

Use Case
Edge Case: Handling Missing Data and Weights in Patient Readmission Model

A healthcare provider wants to assess a model that predicts patient readmission within 30 days. The goal is to check for bias related to the patient's preferred language. The da...