Scénario de test & Cas d'usage
Data cleaning, imputation, and preprocessing.
Discover all actions of dataPreprocessCreation of a large patient dataset with a continuous biomarker measurement.
| 1 | DATA mycas.clinical_trial_data(copies=2); |
| 2 | call streaminit(789); |
| 3 | DO patient_id = 1 to 100000; |
| 4 | biomarker_level = rand('NORMAL', 20, 8); |
| 5 | IF biomarker_level < 0 THEN biomarker_level = 0; |
| 6 | OUTPUT; |
| 7 | END; |
| 8 | RUN; |
| 1 | PROC CAS; |
| 2 | dataPreprocess.binning / |
| 3 | TABLE={name='clinical_trial_data'}, |
| 4 | inputs={{name='biomarker_level'}}, |
| 5 | method='CUTPTS', |
| 6 | cutPoints={15.0, 22.5, 30.0}, |
| 7 | copyVars={'patient_id'}, |
| 8 | outVarsNamePrefix='risk_group_', |
| 9 | casOut={name='patient_risk_groups', replace=true}; |
| 10 | RUN; |
| 11 | QUIT; |
| 1 | PROC CAS; |
| 2 | SIMPLE.summary / |
| 3 | TABLE={name='patient_risk_groups', groupBy={'risk_group_biomarker_level'}}, |
| 4 | inputs={{name='biomarker_level'}}, |
| 5 | subSet={'MIN', 'MAX'}; |
| 6 | RUN; |
| 7 | QUIT; |
The action creates the 'patient_risk_groups' table. It contains 'patient_id' and a new column 'risk_group_biomarker_level'. The summary results will confirm that the data has been correctly partitioned: Bin 1 contains patients with biomarker_level <= 15.0; Bin 2 has levels > 15.0 and <= 22.5; Bin 3 has levels > 22.5 and <= 30.0; and Bin 4 has levels > 30.0.