dataPreprocess binning

Patient Grouping Based on Pre-defined Clinical Biomarker Thresholds

Scénario de test & Cas d'usage

Business Context

In a clinical trial analysis, researchers need to categorize patients into risk groups based on specific, medically-defined cut-off points for a 'biomarker_level'. These thresholds are absolute and must be applied precisely. The goal is to create four specific risk groups: 'Low', 'Medium', 'High', and 'Very High'.
About the Set : dataPreprocess

Data cleaning, imputation, and preprocessing.

Discover all actions of dataPreprocess
Data Preparation

Creation of a large patient dataset with a continuous biomarker measurement.

Copied!
1DATA mycas.clinical_trial_data(copies=2);
2 call streaminit(789);
3 DO patient_id = 1 to 100000;
4 biomarker_level = rand('NORMAL', 20, 8);
5 IF biomarker_level < 0 THEN biomarker_level = 0;
6 OUTPUT;
7 END;
8RUN;

Étapes de réalisation

1
Apply user-defined binning using the 'CUTPTS' method with the specific clinical thresholds. Only copy the patient ID and the new binned variable to the output table for efficiency.
Copied!
1PROC CAS;
2 dataPreprocess.binning /
3 TABLE={name='clinical_trial_data'},
4 inputs={{name='biomarker_level'}},
5 method='CUTPTS',
6 cutPoints={15.0, 22.5, 30.0},
7 copyVars={'patient_id'},
8 outVarsNamePrefix='risk_group_',
9 casOut={name='patient_risk_groups', replace=true};
10RUN;
11QUIT;
2
Verify the binning results by checking the min and max biomarker levels within each generated risk group to ensure they align with the specified cut points.
Copied!
1PROC CAS;
2 SIMPLE.summary /
3 TABLE={name='patient_risk_groups', groupBy={'risk_group_biomarker_level'}},
4 inputs={{name='biomarker_level'}},
5 subSet={'MIN', 'MAX'};
6RUN;
7QUIT;

Expected Result


The action creates the 'patient_risk_groups' table. It contains 'patient_id' and a new column 'risk_group_biomarker_level'. The summary results will confirm that the data has been correctly partitioned: Bin 1 contains patients with biomarker_level <= 15.0; Bin 2 has levels > 15.0 and <= 22.5; Bin 3 has levels > 22.5 and <= 30.0; and Bin 4 has levels > 30.0.