Scénario de test & Cas d'usage
Bias detection and mitigation in AI models.
Discover all actions of fairAIToolsFirst, train a gradient boosting model to predict fraud and save it as an ASTORE. Then, create a large (1M records) unscored dataset of insurance claims with a 'REGION' sensitive variable.
| 1 | /* Step 1: Train model and create ASTORE */ |
| 2 | PROC GRADBOOST DATA=sampsio.hmeq seed=123; |
| 3 | INPUT LOAN MORTDUE VALUE / level=interval; |
| 4 | INPUT JOB / level=nominal; |
| 5 | target BAD / level=nominal; |
| 6 | savestate rstore=mycas.fraud_model_astore; |
| 7 | QUIT; |
| 8 | |
| 9 | /* Step 2: Generate large unscored dataset */ |
| 10 | DATA mycas.large_claims_unscored(keep=CLAIM_ID REGION LOAN MORTDUE VALUE JOB BAD); |
| 11 | CALL STREAMINIT(456); |
| 12 | ARRAY REGIONS[4] $ 4 ('NE', 'SW', 'MW', 'SE'); |
| 13 | DO CLAIM_ID = 1 TO 1000000; |
| 14 | REGION = REGIONS[RAND('INTEGER', 1, 4)]; |
| 15 | LOAN = 1000 + RAND('UNIFORM') * 50000; |
| 16 | MORTDUE = 50000 + RAND('UNIFORM') * 200000; |
| 17 | VALUE = 80000 + RAND('UNIFORM') * 400000; |
| 18 | JOB = 'Other'; |
| 19 | BAD = RAND('BERNOULLI', 0.05); /* Actual fraud status */ |
| 20 | OUTPUT; |
| 21 | END; |
| 22 | RUN; |
| 1 | PROC CAS; |
| 2 | fairAITools.assessBias |
| 3 | TABLE={name='large_claims_unscored'}, |
| 4 | response={name='BAD'}, |
| 5 | sensitiveVariable={name='REGION'}, |
| 6 | modelTable={name='fraud_model_astore'}, |
| 7 | modelTableType='ASTORE', |
| 8 | event='1', |
| 9 | nBins=50, |
| 10 | rocStep=0.01, |
| 11 | scoredTable={name='FRAUD_BIAS_RESULTS', replace=true}; |
| 12 | RUN; |
| 13 | QUIT; |
The action should complete efficiently, demonstrating its ability to handle large data volumes and on-the-fly scoring with an ASTORE model. The output 'scoredTable' (FRAUD_BIAS_RESULTS) will contain the scored data plus bias assessment columns. The 'LiftInfo' and 'ROCInfo' output tables should reflect the custom settings, with 50 bins for lift and a 0.01 step for ROC calculations. Bias metrics for all four regions will be generated.