Scénario de test & Cas d'usage
Data cleaning, imputation, and preprocessing.
Discover all actions of dataPreprocessDataset with intentional missing values in a categorical column.
| 1 | DATA casuser.employee_data; |
| 2 | LENGTH certification $10; |
| 3 | DO i = 1 to 100; |
| 4 | r = rand('Uniform'); |
| 5 | IF r < 0.3 THEN certification = 'PMP'; |
| 6 | ELSE IF r < 0.6 THEN certification = 'MBA'; |
| 7 | ELSE call missing(certification); /* Explicit Missing Value */ |
| 8 | OUTPUT; |
| 9 | END; |
| 10 | RUN; |
| 1 | PROC CAS; |
| 2 | dataPreprocess.catTrans / |
| 3 | TABLE={name='employee_data', caslib='casuser'}, |
| 4 | method='ONEHOT', |
| 5 | inputs={{name='certification'}}, |
| 6 | includeMissingGroup=true, |
| 7 | casOut={name='employee_encoded', caslib='casuser', replace=true}; |
| 8 | RUN; |
| 9 | QUIT; |
The output table 'employee_encoded' will contain binary columns for each certification type (e.g., 'certification_PMP', 'certification_MBA'). Crucially, because 'includeMissingGroup=true' was set, there will be a specific column/bin representing the missing values, ensuring that the 'No Certification' status is actively captured as a feature for the neural network.