Edge Case: Handling Missing Data and Weights in Patient Readmission Model

Business Context

A healthcare provider wants to assess a model that predicts patient readmission within 30 days. The goal is to check for bias related to the patient's preferred language. The dataset is imperfect and contains missing values for the sensitive variable. It also includes a weighting variable to give more importance to patients with severe conditions. This scenario tests the action's robustness to messy data and its handling of the 'weight' parameter.

About the Set : fairAITools

Bias detection and mitigation in AI models.

Discover all actions of fairAITools

Data Preparation

Create a patient dataset with missing values in the 'LANGUAGE' sensitive variable. Include a 'PATIENT_WEIGHT' variable and pre-scored probabilities for the binary outcome 'READMITTED'.

Copied!

1	DATA mycas.patient_records_scored;
2	CALL STREAMINIT(789);
3	ARRAY LANGS[4] $ 10 ('English', 'Spanish', 'Mandarin', '');
4	DO i = 1 TO 1500;
5	/* 25% will have a missing language */
6	IF RAND('UNIFORM') < 0.75 THEN LANGUAGE = LANGS[RAND('INTEGER', 1, 3)];
7	ELSE LANGUAGE = '';
8
9	SEVERITY = RAND('INTEGER', 1, 5);
10	PATIENT_WEIGHT = SEVERITY; /* Higher severity is more important */
11
12	P_READMIT = 0.05 + (SEVERITY / 5) * 0.3;
13	IF LANGUAGE = 'Spanish' THEN P_READMIT = P_READMIT * 1.2; /* Introduce bias */
14
15	READMITTED = (RAND('UNIFORM') < P_READMIT);
16	P_READMIT = MIN(0.99, MAX(0.01, P_READMIT + (RAND('UNIFORM')-0.5)*0.1));
17	P_NOT_READMIT = 1 - P_READMIT;
18	OUTPUT;
19	END;
20	RUN;

Étapes de réalisation

Execute the assessBias action on the messy data. Use the 'weight' parameter to apply patient-specific weights. Use 'responseLevels' to explicitly define the order of outcomes, matching the 'predictedVariables' list.

Copied!

1	PROC CAS;
2	fairAITools.assessBias
3	TABLE={name='patient_records_scored'},
4	response={name='READMITTED'},
5	sensitiveVariable={name='LANGUAGE'},
6	weight='PATIENT_WEIGHT',
7	responseLevels={'1', '0'},
8	predictedVariables={{name='P_READMIT'}, {name='P_NOT_READMIT'}},
9	scoredTable={name='PATIENT_BIAS_RESULTS', replace=true};
10	RUN;
11	QUIT;

Expected Result

The action should run without errors. It is expected to automatically exclude records where the sensitive variable 'LANGUAGE' is missing. The 'BiasMetrics' table should only contain results for 'English', 'Spanish', and 'Mandarin'. All calculations, including accuracy, TPR, FPR, etc., should be influenced by the 'PATIENT_WEIGHT' variable. The action should correctly associate 'P_READMIT' with the event '1' and 'P_NOT_READMIT' with '0' as specified in the responseLevels/predictedVariables parameters.

Voir la documentation technique de assessBias