bart bartGauss

Edge Case: Modeling Patient Response with Incomplete Biometric Data

Scénario de test & Cas d'usage

Business Context

A clinical research organization is analyzing patient trial data. Some biometric sensor readings, which are important predictors for treatment effectiveness, are missing due to device malfunction. The model must be robust to this missingness and, if possible, use the pattern of missing data as an informative feature.
About the Set : bart

Bayesian Additive Regression Trees models.

Discover all actions of bart
Data Preparation

Creates a dataset 'clinical_trial' where the predictor 'biomarker_b' has approximately 30% missing values (represented by '.'). The target is 'treatment_response'.

Copied!
1DATA clinical_trial;
2 call streaminit(99);
3 DO patient_id = 1 to 500;
4 age = 30 + rand('UNIFORM') * 40;
5 drug_dosage = rand('UNIFORM') * 100;
6 biomarker_a = 10 + rand('NORMAL', 0, 2);
7 biomarker_b = 25 + rand('NORMAL', 0, 5);
8 IF rand('UNIFORM') < 0.3 THEN call missing(biomarker_b);
9 treatment_response = 50 + (biomarker_a - 10)*3 + (biomarker_b - 25)*2 - (age-30)*0.5 + rand('NORMAL', 0, 10);
10 IF missing(biomarker_b) THEN treatment_response = treatment_response - 15;
11 OUTPUT;
12 END;
13RUN;

Étapes de réalisation

1
Load the clinical data containing missing values into CAS.
Copied!
1PROC CASUTIL;
2 load DATA=clinical_trial casout='clinical_trial' replace;
3RUN;
4QUIT;
2
Run bartGauss specifying 'SEPARATE' for the missing parameter to treat missingness as an informative level.
Copied!
1PROC CAS;
2 LOADACTIONSET 'bart';
3 bart.bartGauss /
4 TABLE={name='clinical_trial'},
5 target='treatment_response',
6 inputs={'age', 'drug_dosage', 'biomarker_a', 'biomarker_b'},
7 missing='SEPARATE',
8 nTree=50,
9 nBI=500,
10 nMC=2000,
11 seed=789,
12 outputTables={names={'VarImp', 'MissingInfo'}};
13RUN;
14QUIT;

Expected Result


The action completes successfully without errors. The 'MissingInfo' output table should be generated, showing that 'biomarker_b' had missing values and they were handled using the 'SEPARATE' method. The 'VarImp' table should include 'biomarker_b' as a predictor, confirming it was not dropped from the model. This demonstrates the action's ability to build a predictive model on incomplete data.