Patient Grouping Based on Pre-defined Clinical Biomarker Thresholds

Business Context

In a clinical trial analysis, researchers need to categorize patients into risk groups based on specific, medically-defined cut-off points for a 'biomarker_level'. These thresholds are absolute and must be applied precisely. The goal is to create four specific risk groups: 'Low', 'Medium', 'High', and 'Very High'.

About the Set : dataPreprocess

Data cleaning, imputation, and preprocessing.

Discover all actions of dataPreprocess

Data Preparation

Creation of a large patient dataset with a continuous biomarker measurement.

Copied!

1	DATA mycas.clinical_trial_data(copies=2);
2	call streaminit(789);
3	DO patient_id = 1 to 100000;
4	biomarker_level = rand('NORMAL', 20, 8);
5	IF biomarker_level < 0 THEN biomarker_level = 0;
6	OUTPUT;
7	END;
8	RUN;

Étapes de réalisation

Apply user-defined binning using the 'CUTPTS' method with the specific clinical thresholds. Only copy the patient ID and the new binned variable to the output table for efficiency.

Copied!

1	PROC CAS;
2	dataPreprocess.binning /
3	TABLE={name='clinical_trial_data'},
4	inputs={{name='biomarker_level'}},
5	method='CUTPTS',
6	cutPoints={15.0, 22.5, 30.0},
7	copyVars={'patient_id'},
8	outVarsNamePrefix='risk_group_',
9	casOut={name='patient_risk_groups', replace=true};
10	RUN;
11	QUIT;

Verify the binning results by checking the min and max biomarker levels within each generated risk group to ensure they align with the specified cut points.

Copied!

1	PROC CAS;
2	SIMPLE.summary /
3	TABLE={name='patient_risk_groups', groupBy={'risk_group_biomarker_level'}},
4	inputs={{name='biomarker_level'}},
5	subSet={'MIN', 'MAX'};
6	RUN;
7	QUIT;

Expected Result

The action creates the 'patient_risk_groups' table. It contains 'patient_id' and a new column 'risk_group_biomarker_level'. The summary results will confirm that the data has been correctly partitioned: Bin 1 contains patients with biomarker_level <= 15.0; Bin 2 has levels > 15.0 and <= 22.5; Bin 3 has levels > 22.5 and <= 30.0; and Bin 4 has levels > 30.0.

Voir la documentation technique de binning