The mitigateBias action uses the exponentiated gradient reduction algorithm to mitigate bias in predictive models. This iterative process adjusts observation weights to train a series of models, aiming to satisfy specified fairness constraints such as demographic parity or equalized odds. It is a flexible tool that can wrap any CAS action that supports a weight parameter for model training, providing a powerful method for developing fairer machine learning models.
| Parameter | Description |
|---|---|
| biasMetric | Specifies the type of bias measurement. Valid values include 'DEMOGRAPHICPARITY', 'EQUALIZEDODDS', 'EQUALOPPORTUNITY', and 'PREDICTIVEPARITY'. |
| bound | Specifies the bound value for the exponentiated gradient reduction algorithm. |
| copyVarsCASLVariable | Specifies the name of the CASL variable passed to the training program that contains the copyVars list for scored table creation. |
| cutoff | Specifies the cutoff for the confusion matrix. |
| event | Specifies the formatted value of the response (target) variable that represents the event of interest. |
| frequency | Specifies the variable that contains frequency values. |
| iterationCASLVariable | Specifies the name of the CASL variable passed to the training program that contains the current iteration number. |
| learningRate | Specifies the step size for updating the exponentiated gradient reduction algorithm. |
| logLevel | Specifies the level of log information to print, with higher levels providing more detail. |
| maxIters | Specifies the maximum number of iterations for the exponentiated gradient reduction algorithm. |
| nBins | Specifies the number of bins to use in lift calculations. |
| predictedVariables | Specifies the list of variables that contain the model's predictions. The order must match the responseLevels parameter. |
| predictedVariablesResultKey | Specifies the results key from the training program that identifies the predicted variable names. |
| response | Specifies the response (target) variable for supervised learning. |
| responseLevels | Specifies the formatted values of the response variable. The order must match the predictedVariables parameter. |
| responseLevelsResultKey | Specifies the results key from the training program that identifies the response variable levels. |
| rocStep | Specifies the step size for Receiver Operating Characteristic (ROC) calculations. |
| scoredCASLVariable | Specifies the name of the CASL variable passed to the training program that contains the output specification for the scored table. |
| seed | Specifies the seed for the random number generator for reproducibility. |
| selectionDepth | Specifies the depth to use in lift calculations. |
| sensitiveVariable | Specifies the sensitive variable to use in bias calculations. |
| table | Specifies the input data table for the mitigation process. |
| tableCASLVariable | Specifies the name of the CASL variable passed to the training program that contains the modified input data table information. |
| tableModList | Specifies a list of tables to modify and pass to the training program. The main input table is automatically appended to this list. |
| tableSaveList | Specifies a list of tables to save after running the training program, saved only if the specified biasMetric improves. |
| tolerance | Specifies the parity constraint violation tolerance. A value of 0 runs for maxIters. |
| trainProgram | Specifies the CASL code block for training a model, which will be executed iteratively by the mitigation algorithm. |
| tuneBound | When set to True, specifies that the bound value should be tuned. |
| vars | Specifies additional variables to pass to the training program. |
| weight | Specifies a variable of pre-existing weights. The algorithm-generated weights will be multiplied by these values. |
| weightCASLVariable | Specifies the name of the CASL variable passed to the training program that contains the name of the weight variable. |
This SAS DATA step creates a synthetic dataset named 'applicant_data' in the CASUSER caslib. It simulates loan applicant information, including a 'Gender' variable, and intentionally introduces bias by applying different default logic based on gender and credit score. This table will be used to demonstrate bias mitigation.
| 1 | |
| 2 | DATA casuser.applicant_data; |
| 3 | call streaminit(123); |
| 4 | DO i = 1 to 2000; |
| 5 | IF rand('UNIFORM') > 0.6 THEN Gender = 'Male'; |
| 6 | ELSE Gender = 'Female'; |
| 7 | IF rand('UNIFORM') > 0.3 THEN HomeOwner = 'Yes'; |
| 8 | ELSE HomeOwner = 'No'; |
| 9 | Income = round(30000 + rand('UNIFORM') * 90000, 1); |
| 10 | CreditScore = 500 + floor(rand('UNIFORM') * 350); |
| 11 | Loan_Default = 0; |
| 12 | IF (Gender = 'Male' and CreditScore < 680 and rand('UNIFORM') > 0.5) or (Gender = 'Female' and CreditScore < 640 and rand('UNIFORM') > 0.6) THEN Loan_Default = 1; |
| 13 | IF CreditScore < 600 and rand('UNIFORM') > 0.4 THEN Loan_Default = 1; |
| 14 | OUTPUT; |
| 15 | END; |
| 16 | drop i; |
| 17 | |
| 18 | RUN; |
| 19 |
This example demonstrates a basic application of the mitigateBias action. It uses a logistic regression model defined in the 'trainPgm' source block. The goal is to reduce the 'DEMOGRAPHICPARITY' bias related to the 'Gender' variable, with a tolerance of 0.05 over a maximum of 10 iterations.
| 1 | |
| 2 | PROC CAS; |
| 3 | |
| 4 | SOURCE trainPgm; |
| 5 | regression.logistic TABLE=TABLE, class={'Gender','HomeOwner'}, model={depvar='Loan_Default', effects={'Gender', 'HomeOwner', 'Income', 'CreditScore'}}, weight=weight, OUTPUT={casOut=casout, copyVars={'Loan_Default', 'Gender'}}, store={name='logistic_model', replace=true}; |
| 6 | |
| 7 | ENDSOURCE; |
| 8 | fairAITools.mitigateBias TABLE={name='applicant_data', caslib='CASUSER'}, response={name='Loan_Default', options={event='1'}}, sensitiveVariable={name='Gender'}, trainProgram=trainPgm, biasMetric='DEMOGRAPHICPARITY', tolerance=0.05, maxIters=10, scoredCASLVariable='casout', weightCASLVariable='weight'; |
| 9 | |
| 10 | RUN; |
| 11 | |
| 12 | QUIT; |
| 13 |
This detailed example mitigates bias for a Gradient Boosting Tree model, targeting the 'EQUALIZEDODDS' metric. It specifies a CASL source block ('trainPgm_gb') that first trains a model and then scores the data. The mitigateBias action iteratively calls this block, adjusting weights to meet the fairness constraint. It also uses the 'tableSaveList' parameter to save the best-performing model's store ('mitigated_gbtree_model') based on the mitigation progress.
| 1 | |
| 2 | PROC CAS; |
| 3 | |
| 4 | SOURCE trainPgm_gb; |
| 5 | decisionTree.gbtreeTrain TABLE=TABLE, target='Loan_Default', inputs={'Gender', 'HomeOwner', 'Income', 'CreditScore'}, nominals={'Gender', 'HomeOwner', 'Loan_Default'}, weight=weight, saveState={name='gbtree_model', caslib='CASUSER', replace=true}; |
| 6 | decisionTree.gbtreeScore TABLE=TABLE, modelTable={name='gbtree_model', caslib='CASUSER'}, casOut=casout, copyVars={'Loan_Default', 'Gender'}; |
| 7 | |
| 8 | ENDSOURCE; |
| 9 | fairAITools.mitigateBias TABLE={name='applicant_data', caslib='CASUSER'}, response={name='Loan_Default', options={event='1'}}, sensitiveVariable={name='Gender'}, trainProgram=trainPgm_gb, biasMetric='EQUALIZEDODDS', tolerance=0.01, maxIters=50, learningRate=0.02, bound=75, seed=456, tableCASLVariable='table', weightCASLVariable='weight', scoredCASLVariable='casout', predictedVariablesResultKey='Scored_CAS_Table_Vars', responseLevelsResultKey='Scored_Target_Levels', tableSaveList={{key='State', casout={name='mitigated_gbtree_model', caslib='CASUSER', replace=true}}}; |
| 10 | |
| 11 | RUN; |
| 12 | |
| 13 | QUIT; |
| 14 |