percentile assess

Edge Case: Handling Missing Data and Weights in Readmission Model

Scénario de test & Cas d'usage

Business Context

A healthcare provider is assessing a model that predicts the likelihood of patient readmission within 30 days. The dataset is imperfect, with some missing readmission statuses (target variable). Furthermore, they want to give more importance to high-risk patients (e.g., those from the ICU) during the assessment by using a weight variable.
About the Set : percentile

Precise calculation of percentiles and quantiles.

Discover all actions of percentile
Data Preparation

Create a patient dataset with missing target values ('READMITTED_30D'). A 'PATIENT_WEIGHT' is assigned, giving higher weight to ICU patients. 'P_READMIT' is the model's predicted probability.

Copied!
1DATA casuser.patient_readmission;
2 call streaminit(789);
3 DO PATIENT_ID = 1 to 2000;
4 ICU_STAY = rand('bern', 0.2);
5 IF ICU_STAY = 1 THEN DO;
6 PATIENT_WEIGHT = 2.5;
7 base_prob = 0.4;
8 END;
9 ELSE DO;
10 PATIENT_WEIGHT = 1.0;
11 base_prob = 0.15;
12 END;
13 P_READMIT = base_prob + rand('uniform')*0.2;
14 READMITTED_30D = rand('binomial', P_READMIT, 1);
15 /* Introduce missing targets for 5% of records */
16 IF rand('uniform') < 0.05 THEN call missing(READMITTED_30D);
17 OUTPUT;
18 END;
19RUN;

Étapes de réalisation

1
First, run assess without handling missing targets to see the default behavior. An error or warning is expected.
Copied!
1PROC CAS;
2 percentile.assess
3 TABLE={name='patient_readmission', caslib='casuser'},
4 response='READMITTED_30D',
5 inputs={{name='P_READMIT'}},
6 event='1',
7 weight='PATIENT_WEIGHT',
8 fitStatOut={name='readmit_fit_fail', caslib='casuser', replace=true};
9QUIT;
2
Now, correctly run the assessment using `noMissingTarget=true` to exclude records with missing outcomes and apply the `weight` parameter.
Copied!
1PROC CAS;
2 percentile.assess
3 TABLE={name='patient_readmission', caslib='casuser'},
4 response='READMITTED_30D',
5 inputs={{name='P_READMIT'}},
6 event='1',
7 weight='PATIENT_WEIGHT',
8 noMissingTarget=true,
9 includeFitStat=true,
10 fitStatOut={name='readmit_fit_weighted', caslib='casuser', replace=true};
11QUIT;

Expected Result


The first step should produce a warning or error in the log indicating that missing values were found in the response variable. The second step should run successfully. The resulting 'readmit_fit_weighted' table will contain fit statistics (AUC, etc.) calculated only on the non-missing target observations, and these calculations will be influenced by the 'PATIENT_WEIGHT' variable, giving more influence to the high-risk ICU patients.