Performance Case: Bias Assessment on a Large Dataset with an ASTORE Model

Business Context

An insurance company uses a gradient boosting model (ASTORE) to flag potentially fraudulent claims. They need to ensure the model is not unfairly flagging claims from certain geographical regions. This test evaluates the action's performance and its ability to use a pre-compiled ASTORE model on a large dataset, a common production scenario.

About the Set : fairAITools

Bias detection and mitigation in AI models.

Discover all actions of fairAITools

Data Preparation

First, train a gradient boosting model to predict fraud and save it as an ASTORE. Then, create a large (1M records) unscored dataset of insurance claims with a 'REGION' sensitive variable.

Copied!

1	/* Step 1: Train model and create ASTORE */
2	PROC GRADBOOST DATA=sampsio.hmeq seed=123;
3	INPUT LOAN MORTDUE VALUE / level=interval;
4	INPUT JOB / level=nominal;
5	target BAD / level=nominal;
6	savestate rstore=mycas.fraud_model_astore;
7	QUIT;
8
9	/* Step 2: Generate large unscored dataset */
10	DATA mycas.large_claims_unscored(keep=CLAIM_ID REGION LOAN MORTDUE VALUE JOB BAD);
11	CALL STREAMINIT(456);
12	ARRAY REGIONS[4] $ 4 ('NE', 'SW', 'MW', 'SE');
13	DO CLAIM_ID = 1 TO 1000000;
14	REGION = REGIONS[RAND('INTEGER', 1, 4)];
15	LOAN = 1000 + RAND('UNIFORM') * 50000;
16	MORTDUE = 50000 + RAND('UNIFORM') * 200000;
17	VALUE = 80000 + RAND('UNIFORM') * 400000;
18	JOB = 'Other';
19	BAD = RAND('BERNOULLI', 0.05); /* Actual fraud status */
20	OUTPUT;
21	END;
22	RUN;

Étapes de réalisation

Execute the assessBias action using the ASTORE model. The action must first score the large input table and then perform the bias assessment based on the 'REGION' variable. Non-default values for nBins and rocStep are used.

Copied!

1	PROC CAS;
2	fairAITools.assessBias
3	TABLE={name='large_claims_unscored'},
4	response={name='BAD'},
5	sensitiveVariable={name='REGION'},
6	modelTable={name='fraud_model_astore'},
7	modelTableType='ASTORE',
8	event='1',
9	nBins=50,
10	rocStep=0.01,
11	scoredTable={name='FRAUD_BIAS_RESULTS', replace=true};
12	RUN;
13	QUIT;

Expected Result

The action should complete efficiently, demonstrating its ability to handle large data volumes and on-the-fly scoring with an ASTORE model. The output 'scoredTable' (FRAUD_BIAS_RESULTS) will contain the scored data plus bias assessment columns. The 'LiftInfo' and 'ROCInfo' output tables should reflect the custom settings, with 50 bins for lift and a 0.01 step for ROC calculations. Bias metrics for all four regions will be generated.

Voir la documentation technique de assessBias