fairAITools

mitigateBias

Description

The mitigateBias action uses the exponentiated gradient reduction algorithm to mitigate bias in predictive models. This iterative process adjusts observation weights to train a series of models, aiming to satisfy specified fairness constraints such as demographic parity or equalized odds. It is a flexible tool that can wrap any CAS action that supports a weight parameter for model training, providing a powerful method for developing fairer machine learning models.

fairAITools.mitigateBias <result=results> <status=rc> / biasMetric="string", bound=double, copyVarsCASLVariable="string", cutoff=double, event="string", frequency={casvardesc}, iterationCASLVariable="string", learningRate=double, logLevel=64-bit-integer, maxIters=64-bit-integer, nBins=64-bit-integer, predictedVariables={{casvardesc-1} <, {casvardesc-2}, ...>}, predictedVariablesResultKey="string", response={casvardesc}, responseLevels={"string-1" <, "string-2", ...>}, responseLevelsResultKey="string", rocStep=double, scoredCASLVariable="string", seed=double, selectionDepth=64-bit-integer, sensitiveVariable={casvardesc}, table={castable}, tableCASLVariable="string", tableModList={{fairaitools_mitigateBias_tableModList-1} <, {fairaitools_mitigateBias_tableModList-2}, ...>}, tableSaveList={{fairaitools_mitigateBias_tableSaveList-1} <, {fairaitools_mitigateBias_tableSaveList-2}, ...>}, tolerance=double, trainProgram="string", tuneBound=TRUE | FALSE, vars={key-1=any-list-or-data-type-1 <, key-2=any-list-or-data-type-2, ...>}, weight={casvardesc}, weightCASLVariable="string";
Settings
ParameterDescription
biasMetricSpecifies the type of bias measurement. Valid values include 'DEMOGRAPHICPARITY', 'EQUALIZEDODDS', 'EQUALOPPORTUNITY', and 'PREDICTIVEPARITY'.
boundSpecifies the bound value for the exponentiated gradient reduction algorithm.
copyVarsCASLVariableSpecifies the name of the CASL variable passed to the training program that contains the copyVars list for scored table creation.
cutoffSpecifies the cutoff for the confusion matrix.
eventSpecifies the formatted value of the response (target) variable that represents the event of interest.
frequencySpecifies the variable that contains frequency values.
iterationCASLVariableSpecifies the name of the CASL variable passed to the training program that contains the current iteration number.
learningRateSpecifies the step size for updating the exponentiated gradient reduction algorithm.
logLevelSpecifies the level of log information to print, with higher levels providing more detail.
maxItersSpecifies the maximum number of iterations for the exponentiated gradient reduction algorithm.
nBinsSpecifies the number of bins to use in lift calculations.
predictedVariablesSpecifies the list of variables that contain the model's predictions. The order must match the responseLevels parameter.
predictedVariablesResultKeySpecifies the results key from the training program that identifies the predicted variable names.
responseSpecifies the response (target) variable for supervised learning.
responseLevelsSpecifies the formatted values of the response variable. The order must match the predictedVariables parameter.
responseLevelsResultKeySpecifies the results key from the training program that identifies the response variable levels.
rocStepSpecifies the step size for Receiver Operating Characteristic (ROC) calculations.
scoredCASLVariableSpecifies the name of the CASL variable passed to the training program that contains the output specification for the scored table.
seedSpecifies the seed for the random number generator for reproducibility.
selectionDepthSpecifies the depth to use in lift calculations.
sensitiveVariableSpecifies the sensitive variable to use in bias calculations.
tableSpecifies the input data table for the mitigation process.
tableCASLVariableSpecifies the name of the CASL variable passed to the training program that contains the modified input data table information.
tableModListSpecifies a list of tables to modify and pass to the training program. The main input table is automatically appended to this list.
tableSaveListSpecifies a list of tables to save after running the training program, saved only if the specified biasMetric improves.
toleranceSpecifies the parity constraint violation tolerance. A value of 0 runs for maxIters.
trainProgramSpecifies the CASL code block for training a model, which will be executed iteratively by the mitigation algorithm.
tuneBoundWhen set to True, specifies that the bound value should be tuned.
varsSpecifies additional variables to pass to the training program.
weightSpecifies a variable of pre-existing weights. The algorithm-generated weights will be multiplied by these values.
weightCASLVariableSpecifies the name of the CASL variable passed to the training program that contains the name of the weight variable.
Data Preparation View data prep sheet
Create Biased Loan Application Data

This SAS DATA step creates a synthetic dataset named 'applicant_data' in the CASUSER caslib. It simulates loan applicant information, including a 'Gender' variable, and intentionally introduces bias by applying different default logic based on gender and credit score. This table will be used to demonstrate bias mitigation.

Copied!
1 
2DATA casuser.applicant_data;
3call streaminit(123);
4DO i = 1 to 2000;
5IF rand('UNIFORM') > 0.6 THEN Gender = 'Male';
6ELSE Gender = 'Female';
7IF rand('UNIFORM') > 0.3 THEN HomeOwner = 'Yes';
8ELSE HomeOwner = 'No';
9Income = round(30000 + rand('UNIFORM') * 90000, 1);
10CreditScore = 500 + floor(rand('UNIFORM') * 350);
11Loan_Default = 0;
12IF (Gender = 'Male' and CreditScore < 680 and rand('UNIFORM') > 0.5) or (Gender = 'Female' and CreditScore < 640 and rand('UNIFORM') > 0.6) THEN Loan_Default = 1;
13IF CreditScore < 600 and rand('UNIFORM') > 0.4 THEN Loan_Default = 1;
14OUTPUT;
15END;
16drop i;
17 
18RUN;
19 

Examples

This example demonstrates a basic application of the mitigateBias action. It uses a logistic regression model defined in the 'trainPgm' source block. The goal is to reduce the 'DEMOGRAPHICPARITY' bias related to the 'Gender' variable, with a tolerance of 0.05 over a maximum of 10 iterations.

SAS® / CAS Code Code awaiting community validation
Copied!
1 
2PROC CAS;
3 
4SOURCE trainPgm;
5regression.logistic TABLE=TABLE, class={'Gender','HomeOwner'}, model={depvar='Loan_Default', effects={'Gender', 'HomeOwner', 'Income', 'CreditScore'}}, weight=weight, OUTPUT={casOut=casout, copyVars={'Loan_Default', 'Gender'}}, store={name='logistic_model', replace=true};
6 
7ENDSOURCE;
8fairAITools.mitigateBias TABLE={name='applicant_data', caslib='CASUSER'}, response={name='Loan_Default', options={event='1'}}, sensitiveVariable={name='Gender'}, trainProgram=trainPgm, biasMetric='DEMOGRAPHICPARITY', tolerance=0.05, maxIters=10, scoredCASLVariable='casout', weightCASLVariable='weight';
9 
10RUN;
11 
12QUIT;
13 
Result :
The action returns tables detailing the mitigation progress, including the bias metric value at each iteration. The final model weights and parameters are adjusted to better satisfy the Demographic Parity constraint within the specified tolerance.

This detailed example mitigates bias for a Gradient Boosting Tree model, targeting the 'EQUALIZEDODDS' metric. It specifies a CASL source block ('trainPgm_gb') that first trains a model and then scores the data. The mitigateBias action iteratively calls this block, adjusting weights to meet the fairness constraint. It also uses the 'tableSaveList' parameter to save the best-performing model's store ('mitigated_gbtree_model') based on the mitigation progress.

SAS® / CAS Code Code awaiting community validation
Copied!
1 
2PROC CAS;
3 
4SOURCE trainPgm_gb;
5decisionTree.gbtreeTrain TABLE=TABLE, target='Loan_Default', inputs={'Gender', 'HomeOwner', 'Income', 'CreditScore'}, nominals={'Gender', 'HomeOwner', 'Loan_Default'}, weight=weight, saveState={name='gbtree_model', caslib='CASUSER', replace=true};
6decisionTree.gbtreeScore TABLE=TABLE, modelTable={name='gbtree_model', caslib='CASUSER'}, casOut=casout, copyVars={'Loan_Default', 'Gender'};
7 
8ENDSOURCE;
9fairAITools.mitigateBias TABLE={name='applicant_data', caslib='CASUSER'}, response={name='Loan_Default', options={event='1'}}, sensitiveVariable={name='Gender'}, trainProgram=trainPgm_gb, biasMetric='EQUALIZEDODDS', tolerance=0.01, maxIters=50, learningRate=0.02, bound=75, seed=456, tableCASLVariable='table', weightCASLVariable='weight', scoredCASLVariable='casout', predictedVariablesResultKey='Scored_CAS_Table_Vars', responseLevelsResultKey='Scored_Target_Levels', tableSaveList={{key='State', casout={name='mitigated_gbtree_model', caslib='CASUSER', replace=true}}};
10 
11RUN;
12 
13QUIT;
14 
Result :
The action produces several output tables, including 'MitigationHistory' which tracks the fairness metric and model performance over iterations, and 'BestModelInfo' which provides details about the iteration that yielded the best result. A new CAS table named 'mitigated_gbtree_model' is created in the CASUSER caslib, containing the analytic store of the fairest model found.

FAQ

What is the purpose of the mitigateBias action?
Which bias metrics can be used with the mitigateBias action?
What are the mandatory parameters for the mitigateBias action?
How does the `trainProgram` parameter work?
What is the role of the `tolerance` parameter?
How can I control the iterations of the mitigation algorithm?
Is it possible to save intermediate results during the mitigation process?