mitigateBias - WeAreCAS

Q: What is the purpose of the mitigateBias action?

The mitigateBias action, part of the Fair AI Tools Action Set, is used to mitigate bias during the training of predictive models using the exponentiated gradient reduction algorithm.

Q: Which bias metrics can be used with the mitigateBias action?

The `biasMetric` parameter allows you to specify one of four types of bias measurements: 'DEMOGRAPHICPARITY', 'EQUALIZEDODDS', 'EQUALOPPORTUNITY', or 'PREDICTIVEPARITY'. The default is 'PREDICTIVEPARITY'.

Q: What are the mandatory parameters for the mitigateBias action?

The required parameters are `table` to specify the input data, `response` to specify the target variable, `sensitiveVariable` to define the variable for bias calculation, and `trainProgram` which contains the CASL code for model training.

Q: How does the `trainProgram` parameter work?

The `trainProgram` parameter specifies the CASL training code to be executed. The mitigation action passes several key variables to this program, such as the modified input table (`tableCASLVariable`), the weight variable (`weightCASLVariable`), and the current iteration number (`iterationCASLVariable`).

Q: What is the role of the `tolerance` parameter?

The `tolerance` parameter specifies the parity constraint violation tolerance. The mitigation process stops when the bias measurement falls below this tolerance. If set to 0, the action will run for the maximum number of iterations specified by `maxIters`.

Q: How can I control the iterations of the mitigation algorithm?

You can use the `maxIters` parameter to set the maximum number of iterations for the exponentiated gradient reduction algorithm (default is 10) and the `learningRate` parameter to define the step size for updates (default is 0.01).

Q: Is it possible to save intermediate results during the mitigation process?

Yes, the `tableSaveList` parameter allows you to specify a list of tables to save after running the training program in an iteration. These tables are saved only if the specified `biasMetric` improves during that iteration.

Description

The mitigateBias action uses the exponentiated gradient reduction algorithm to mitigate bias in predictive models. This iterative process adjusts observation weights to train a series of models, aiming to satisfy specified fairness constraints such as demographic parity or equalized odds. It is a flexible tool that can wrap any CAS action that supports a weight parameter for model training, providing a powerful method for developing fairer machine learning models.

fairAITools.mitigateBias <result=results> <status=rc> / biasMetric="string", bound=double, copyVarsCASLVariable="string", cutoff=double, event="string", frequency={casvardesc}, iterationCASLVariable="string", learningRate=double, logLevel=64-bit-integer, maxIters=64-bit-integer, nBins=64-bit-integer, predictedVariables={{casvardesc-1} <, {casvardesc-2}, ...>}, predictedVariablesResultKey="string", response={casvardesc}, responseLevels={"string-1" <, "string-2", ...>}, responseLevelsResultKey="string", rocStep=double, scoredCASLVariable="string", seed=double, selectionDepth=64-bit-integer, sensitiveVariable={casvardesc}, table={castable}, tableCASLVariable="string", tableModList={{fairaitools_mitigateBias_tableModList-1} <, {fairaitools_mitigateBias_tableModList-2}, ...>}, tableSaveList={{fairaitools_mitigateBias_tableSaveList-1} <, {fairaitools_mitigateBias_tableSaveList-2}, ...>}, tolerance=double, trainProgram="string", tuneBound=TRUE | FALSE, vars={key-1=any-list-or-data-type-1 <, key-2=any-list-or-data-type-2, ...>}, weight={casvardesc}, weightCASLVariable="string";

Settings

Parameter	Description
biasMetric	Specifies the type of bias measurement. Valid values include 'DEMOGRAPHICPARITY', 'EQUALIZEDODDS', 'EQUALOPPORTUNITY', and 'PREDICTIVEPARITY'.
bound	Specifies the bound value for the exponentiated gradient reduction algorithm.
copyVarsCASLVariable	Specifies the name of the CASL variable passed to the training program that contains the copyVars list for scored table creation.
cutoff	Specifies the cutoff for the confusion matrix.
event	Specifies the formatted value of the response (target) variable that represents the event of interest.
frequency	Specifies the variable that contains frequency values.
iterationCASLVariable	Specifies the name of the CASL variable passed to the training program that contains the current iteration number.
learningRate	Specifies the step size for updating the exponentiated gradient reduction algorithm.
logLevel	Specifies the level of log information to print, with higher levels providing more detail.
maxIters	Specifies the maximum number of iterations for the exponentiated gradient reduction algorithm.
nBins	Specifies the number of bins to use in lift calculations.
predictedVariables	Specifies the list of variables that contain the model's predictions. The order must match the responseLevels parameter.
predictedVariablesResultKey	Specifies the results key from the training program that identifies the predicted variable names.
response	Specifies the response (target) variable for supervised learning.
responseLevels	Specifies the formatted values of the response variable. The order must match the predictedVariables parameter.
responseLevelsResultKey	Specifies the results key from the training program that identifies the response variable levels.
rocStep	Specifies the step size for Receiver Operating Characteristic (ROC) calculations.
scoredCASLVariable	Specifies the name of the CASL variable passed to the training program that contains the output specification for the scored table.
seed	Specifies the seed for the random number generator for reproducibility.
selectionDepth	Specifies the depth to use in lift calculations.
sensitiveVariable	Specifies the sensitive variable to use in bias calculations.
table	Specifies the input data table for the mitigation process.
tableCASLVariable	Specifies the name of the CASL variable passed to the training program that contains the modified input data table information.
tableModList	Specifies a list of tables to modify and pass to the training program. The main input table is automatically appended to this list.
tableSaveList	Specifies a list of tables to save after running the training program, saved only if the specified biasMetric improves.
tolerance	Specifies the parity constraint violation tolerance. A value of 0 runs for maxIters.
trainProgram	Specifies the CASL code block for training a model, which will be executed iteratively by the mitigation algorithm.
tuneBound	When set to True, specifies that the bound value should be tuned.
vars	Specifies additional variables to pass to the training program.
weight	Specifies a variable of pre-existing weights. The algorithm-generated weights will be multiplied by these values.
weightCASLVariable	Specifies the name of the CASL variable passed to the training program that contains the name of the weight variable.

Data Preparation View data prep sheet

Create Biased Loan Application Data

This SAS DATA step creates a synthetic dataset named 'applicant_data' in the CASUSER caslib. It simulates loan applicant information, including a 'Gender' variable, and intentionally introduces bias by applying different default logic based on gender and credit score. This table will be used to demonstrate bias mitigation.

Copied!

1
2	DATA casuser.applicant_data;
3	call streaminit(123);
4	DO i = 1 to 2000;
5	IF rand('UNIFORM') > 0.6 THEN Gender = 'Male';
6	ELSE Gender = 'Female';
7	IF rand('UNIFORM') > 0.3 THEN HomeOwner = 'Yes';
8	ELSE HomeOwner = 'No';
9	Income = round(30000 + rand('UNIFORM') * 90000, 1);
10	CreditScore = 500 + floor(rand('UNIFORM') * 350);
11	Loan_Default = 0;
12	IF (Gender = 'Male' and CreditScore < 680 and rand('UNIFORM') > 0.5) or (Gender = 'Female' and CreditScore < 640 and rand('UNIFORM') > 0.6) THEN Loan_Default = 1;
13	IF CreditScore < 600 and rand('UNIFORM') > 0.4 THEN Loan_Default = 1;
14	OUTPUT;
15	END;
16	drop i;
17
18	RUN;
19

Examples

This example demonstrates a basic application of the mitigateBias action. It uses a logistic regression model defined in the 'trainPgm' source block. The goal is to reduce the 'DEMOGRAPHICPARITY' bias related to the 'Gender' variable, with a tolerance of 0.05 over a maximum of 10 iterations.

SAS® / CAS Code Code awaiting community validation

Copied!

1
2	PROC CAS;
3
4	SOURCE trainPgm;
5	regression.logistic TABLE=TABLE, class={'Gender','HomeOwner'}, model={depvar='Loan_Default', effects={'Gender', 'HomeOwner', 'Income', 'CreditScore'}}, weight=weight, OUTPUT={casOut=casout, copyVars={'Loan_Default', 'Gender'}}, store={name='logistic_model', replace=true};
6
7	ENDSOURCE;
8	fairAITools.mitigateBias TABLE={name='applicant_data', caslib='CASUSER'}, response={name='Loan_Default', options={event='1'}}, sensitiveVariable={name='Gender'}, trainProgram=trainPgm, biasMetric='DEMOGRAPHICPARITY', tolerance=0.05, maxIters=10, scoredCASLVariable='casout', weightCASLVariable='weight';
9
10	RUN;
11
12	QUIT;
13

Result :
The action returns tables detailing the mitigation progress, including the bias metric value at each iteration. The final model weights and parameters are adjusted to better satisfy the Demographic Parity constraint within the specified tolerance.

This detailed example mitigates bias for a Gradient Boosting Tree model, targeting the 'EQUALIZEDODDS' metric. It specifies a CASL source block ('trainPgm_gb') that first trains a model and then scores the data. The mitigateBias action iteratively calls this block, adjusting weights to meet the fairness constraint. It also uses the 'tableSaveList' parameter to save the best-performing model's store ('mitigated_gbtree_model') based on the mitigation progress.

SAS® / CAS Code Code awaiting community validation

Copied!

1
2	PROC CAS;
3
4	SOURCE trainPgm_gb;
5	decisionTree.gbtreeTrain TABLE=TABLE, target='Loan_Default', inputs={'Gender', 'HomeOwner', 'Income', 'CreditScore'}, nominals={'Gender', 'HomeOwner', 'Loan_Default'}, weight=weight, saveState={name='gbtree_model', caslib='CASUSER', replace=true};
6	decisionTree.gbtreeScore TABLE=TABLE, modelTable={name='gbtree_model', caslib='CASUSER'}, casOut=casout, copyVars={'Loan_Default', 'Gender'};
7
8	ENDSOURCE;
9	fairAITools.mitigateBias TABLE={name='applicant_data', caslib='CASUSER'}, response={name='Loan_Default', options={event='1'}}, sensitiveVariable={name='Gender'}, trainProgram=trainPgm_gb, biasMetric='EQUALIZEDODDS', tolerance=0.01, maxIters=50, learningRate=0.02, bound=75, seed=456, tableCASLVariable='table', weightCASLVariable='weight', scoredCASLVariable='casout', predictedVariablesResultKey='Scored_CAS_Table_Vars', responseLevelsResultKey='Scored_Target_Levels', tableSaveList={{key='State', casout={name='mitigated_gbtree_model', caslib='CASUSER', replace=true}}};
10
11	RUN;
12
13	QUIT;
14

Result :
The action produces several output tables, including 'MitigationHistory' which tracks the fairness metric and model performance over iterations, and 'BestModelInfo' which provides details about the iteration that yielded the best result. A new CAS table named 'mitigated_gbtree_model' is created in the CASUSER caslib, containing the analytic store of the fairest model found.

FAQ

What is the purpose of the mitigateBias action?

Which bias metrics can be used with the mitigateBias action?

What are the mandatory parameters for the mitigateBias action?

How does the `trainProgram` parameter work?

What is the role of the `tolerance` parameter?

How can I control the iterations of the mitigation algorithm?

Is it possible to save intermediate results during the mitigation process?

Actions associées

fairAITools

assessBias

The assessBias action calculates bias metrics for predictive models. This is ...

Table of Contents

Description

Create Biased Loan Application Data

Examples

Simple Mitigation with Logistic Regression

Detailed Mitigation for Equalized Odds with Gradient Boosting

FAQ

Actions associées

assessBias