Scénario de test & Cas d'usage
Automated Machine Learning (AutoML) and pipeline generation.
Discover all actions of dataSciencePilotCreation of a patient dataset from a clinical trial. 'visit2_score', 'visit3_score', and 'experimental_marker' contain numerous missing values due to patient dropout or measurement failure.
| 1 | DATA casuser.clinical_trial_data; |
| 2 | LENGTH patient_id $ 8. treatment_arm $ 1.; |
| 3 | INPUT patient_id $ treatment_arm $ visit1_score visit2_score visit3_score experimental_marker; |
| 4 | CARDS; |
| 5 | P001 A 85 82 79 1.2 |
| 6 | P002 B 76 70 . . |
| 7 | P003 A 91 . . . |
| 8 | P004 B 65 66 68 . |
| 9 | P005 A 88 85 . . |
| 10 | P006 B 72 . . . |
| 11 | ; |
| 12 | RUN; |
| 1 | PROC CASUTIL; |
| 2 | load DATA=casuser.clinical_trial_data outcaslib='casuser' casout='clinical_trial_data' replace; |
| 3 | RUN; |
| 4 | QUIT; |
| 1 | PROC CAS; |
| 2 | dataSciencePilot.analyzeMissingPatterns / |
| 3 | TABLE={name='clinical_trial_data', caslib='casuser'}, |
| 4 | casOut={name='clinical_missing_edge_case', caslib='casuser', replace=true}; |
| 5 | RUN; |
| 6 | QUIT; |
The 'MissingPatterns' output table should correctly identify distinct patterns of missingness. A key pattern to verify is the one where 'visit2_score', 'visit3_score', and 'experimental_marker' are all missing, which represents early dropout (2 occurrences). Another pattern is where 'visit3_score' and 'experimental_marker' are missing (2 occurrences). The 'MissingCounts' table should report a very high percentage of missing values (83.3%) for 'experimental_marker', testing its handling of sparse variables.