fairAITools

mitigateBias

Description

The mitigateBias action uses the exponentiated gradient reduction algorithm to mitigate bias in predictive models. This iterative process adjusts observation weights to train a series of models, aiming to satisfy specified fairness constraints such as demographic parity or equalized odds. It is a flexible tool that can wrap any CAS action that supports a weight parameter for model training, providing a powerful method for developing fairer machine learning models.

fairAITools.mitigateBias <result=results> <status=rc> / biasMetric="string", bound=double, copyVarsCASLVariable="string", cutoff=double, event="string", frequency={casvardesc}, iterationCASLVariable="string", learningRate=double, logLevel=64-bit-integer, maxIters=64-bit-integer, nBins=64-bit-integer, predictedVariables={{casvardesc-1} <, {casvardesc-2}, ...>}, predictedVariablesResultKey="string", response={casvardesc}, responseLevels={"string-1" <, "string-2", ...>}, responseLevelsResultKey="string", rocStep=double, scoredCASLVariable="string", seed=double, selectionDepth=64-bit-integer, sensitiveVariable={casvardesc}, table={castable}, tableCASLVariable="string", tableModList={{fairaitools_mitigateBias_tableModList-1} <, {fairaitools_mitigateBias_tableModList-2}, ...>}, tableSaveList={{fairaitools_mitigateBias_tableSaveList-1} <, {fairaitools_mitigateBias_tableSaveList-2}, ...>}, tolerance=double, trainProgram="string", tuneBound=TRUE | FALSE, vars={key-1=any-list-or-data-type-1 <, key-2=any-list-or-data-type-2, ...>}, weight={casvardesc}, weightCASLVariable="string";
Settings
ParameterDescription
biasMetric Specifies the type of bias measurement. Valid values include 'DEMOGRAPHICPARITY', 'EQUALIZEDODDS', 'EQUALOPPORTUNITY', and 'PREDICTIVEPARITY'.
bound Specifies the bound value for the exponentiated gradient reduction algorithm.
copyVarsCASLVariable Specifies the name of the CASL variable passed to the training program that contains the copyVars list for scored table creation.
cutoff Specifies the cutoff for the confusion matrix.
event Specifies the formatted value of the response (target) variable that represents the event of interest.
frequency Specifies the variable that contains frequency values.
iterationCASLVariable Specifies the name of the CASL variable passed to the training program that contains the current iteration number.
learningRate Specifies the step size for updating the exponentiated gradient reduction algorithm.
logLevel Specifies the level of log information to print, with higher levels providing more detail.
maxIters Specifies the maximum number of iterations for the exponentiated gradient reduction algorithm.
nBins Specifies the number of bins to use in lift calculations.
predictedVariables Specifies the list of variables that contain the model's predictions. The order must match the responseLevels parameter.
predictedVariablesResultKey Specifies the results key from the training program that identifies the predicted variable names.
response Specifies the response (target) variable for supervised learning.
responseLevels Specifies the formatted values of the response variable. The order must match the predictedVariables parameter.
responseLevelsResultKey Specifies the results key from the training program that identifies the response variable levels.
rocStep Specifies the step size for Receiver Operating Characteristic (ROC) calculations.
scoredCASLVariable Specifies the name of the CASL variable passed to the training program that contains the output specification for the scored table.
seed Specifies the seed for the random number generator for reproducibility.
selectionDepth Specifies the depth to use in lift calculations.
sensitiveVariable Specifies the sensitive variable to use in bias calculations.
table Specifies the input data table for the mitigation process.
tableCASLVariable Specifies the name of the CASL variable passed to the training program that contains the modified input data table information.
tableModList Specifies a list of tables to modify and pass to the training program. The main input table is automatically appended to this list.
tableSaveList Specifies a list of tables to save after running the training program, saved only if the specified biasMetric improves.
tolerance Specifies the parity constraint violation tolerance. A value of 0 runs for maxIters.
trainProgram Specifies the CASL code block for training a model, which will be executed iteratively by the mitigation algorithm.
tuneBound When set to True, specifies that the bound value should be tuned.
vars Specifies additional variables to pass to the training program.
weight Specifies a variable of pre-existing weights. The algorithm-generated weights will be multiplied by these values.
weightCASLVariable Specifies the name of the CASL variable passed to the training program that contains the name of the weight variable.
Data Preparation View data prep sheet
Create Biased Loan Application Data

This SAS DATA step creates a synthetic dataset named 'applicant_data' in the CASUSER caslib. It simulates loan applicant information, including a 'Gender' variable, and intentionally introduces bias by applying different default logic based on gender and credit score. This table will be used to demonstrate bias mitigation.

Copied!
1 
2DATA casuser.applicant_data;
3call streaminit(123);
4DO i = 1 to 2000;
5IF rand('UNIFORM') > 0.6 THEN Gender = 'Male';
6ELSE Gender = 'Female';
7IF rand('UNIFORM') > 0.3 THEN HomeOwner = 'Yes';
8ELSE HomeOwner = 'No';
9Income = round(30000 + rand('UNIFORM') * 90000, 1);
10CreditScore = 500 + floor(rand('UNIFORM') * 350);
11Loan_Default = 0;
12IF (Gender = 'Male' and CreditScore < 680 and rand('UNIFORM') > 0.5) or (Gender = 'Female' and CreditScore < 640 and rand('UNIFORM') > 0.6) THEN Loan_Default = 1;
13IF CreditScore < 600 and rand('UNIFORM') > 0.4 THEN Loan_Default = 1;
14OUTPUT;
15END;
16drop i;
17 
18RUN;
19 

Examples

This example demonstrates a basic application of the mitigateBias action. It uses a logistic regression model defined in the 'trainPgm' source block. The goal is to reduce the 'DEMOGRAPHICPARITY' bias related to the 'Gender' variable, with a tolerance of 0.05 over a maximum of 10 iterations.

SAS® / CAS Code Code awaiting community validation
Copied!
1 
2PROC CAS;
3 
4SOURCE trainPgm;
5regression.logistic TABLE=TABLE, class={'Gender','HomeOwner'}, model={depvar='Loan_Default', effects={'Gender', 'HomeOwner', 'Income', 'CreditScore'}}, weight=weight, OUTPUT={casOut=casout, copyVars={'Loan_Default', 'Gender'}}, store={name='logistic_model', replace=true};
6 
7ENDSOURCE;
8fairAITools.mitigateBias TABLE={name='applicant_data', caslib='CASUSER'}, response={name='Loan_Default', options={event='1'}}, sensitiveVariable={name='Gender'}, trainProgram=trainPgm, biasMetric='DEMOGRAPHICPARITY', tolerance=0.05, maxIters=10, scoredCASLVariable='casout', weightCASLVariable='weight';
9 
10RUN;
11 
12QUIT;
13 
Result :
The action returns tables detailing the mitigation progress, including the bias metric value at each iteration. The final model weights and parameters are adjusted to better satisfy the Demographic Parity constraint within the specified tolerance.

This detailed example mitigates bias for a Gradient Boosting Tree model, targeting the 'EQUALIZEDODDS' metric. It specifies a CASL source block ('trainPgm_gb') that first trains a model and then scores the data. The mitigateBias action iteratively calls this block, adjusting weights to meet the fairness constraint. It also uses the 'tableSaveList' parameter to save the best-performing model's store ('mitigated_gbtree_model') based on the mitigation progress.

SAS® / CAS Code Code awaiting community validation
Copied!
1 
2PROC CAS;
3 
4SOURCE trainPgm_gb;
5decisionTree.gbtreeTrain TABLE=TABLE, target='Loan_Default', inputs={'Gender', 'HomeOwner', 'Income', 'CreditScore'}, nominals={'Gender', 'HomeOwner', 'Loan_Default'}, weight=weight, saveState={name='gbtree_model', caslib='CASUSER', replace=true};
6decisionTree.gbtreeScore TABLE=TABLE, modelTable={name='gbtree_model', caslib='CASUSER'}, casOut=casout, copyVars={'Loan_Default', 'Gender'};
7 
8ENDSOURCE;
9fairAITools.mitigateBias TABLE={name='applicant_data', caslib='CASUSER'}, response={name='Loan_Default', options={event='1'}}, sensitiveVariable={name='Gender'}, trainProgram=trainPgm_gb, biasMetric='EQUALIZEDODDS', tolerance=0.01, maxIters=50, learningRate=0.02, bound=75, seed=456, tableCASLVariable='table', weightCASLVariable='weight', scoredCASLVariable='casout', predictedVariablesResultKey='Scored_CAS_Table_Vars', responseLevelsResultKey='Scored_Target_Levels', tableSaveList={{key='State', casout={name='mitigated_gbtree_model', caslib='CASUSER', replace=true}}};
10 
11RUN;
12 
13QUIT;
14 
Result :
The action produces several output tables, including 'MitigationHistory' which tracks the fairness metric and model performance over iterations, and 'BestModelInfo' which provides details about the iteration that yielded the best result. A new CAS table named 'mitigated_gbtree_model' is created in the CASUSER caslib, containing the analytic store of the fairest model found.

FAQ

What is the purpose of the mitigateBias action?
Which bias metrics can be used with the mitigateBias action?
What are the mandatory parameters for the mitigateBias action?
How does the `trainProgram` parameter work?
What is the role of the `tolerance` parameter?
How can I control the iterations of the mitigation algorithm?
Is it possible to save intermediate results during the mitigation process?