Scénario de test & Cas d'usage
Data cleaning, imputation, and preprocessing.
Discover all actions of dataPreprocessSimulation of 5,000 loan applicants with categorical attributes and a binary target (default).
| 1 | DATA casuser.credit_risk; |
| 2 | call streaminit(12345); |
| 3 | LENGTH job_type $15 housing_status $10; |
| 4 | DO i = 1 to 5000; |
| 5 | /* Simulate Job Type */ |
| 6 | r = rand('Uniform'); |
| 7 | IF r < 0.3 THEN job_type = 'BlueCollar'; |
| 8 | ELSE IF r < 0.6 THEN job_type = 'WhiteCollar'; |
| 9 | ELSE IF r < 0.8 THEN job_type = 'Retired'; |
| 10 | ELSE job_type = 'Unemployed'; |
| 11 | |
| 12 | /* Simulate Housing */ |
| 13 | IF rand('Uniform') < 0.5 THEN housing_status = 'Own'; ELSE housing_status = 'Rent'; |
| 14 | |
| 15 | /* Simulate Default (Target) - Correlation with Unemployed */ |
| 16 | prob_default = 0.05; |
| 17 | IF job_type = 'Unemployed' THEN prob_default = 0.30; |
| 18 | IF housing_status = 'Rent' THEN prob_default = prob_default + 0.10; |
| 19 | |
| 20 | IF rand('Uniform') < prob_default THEN default_flag = 1; ELSE default_flag = 0; |
| 21 | OUTPUT; |
| 22 | END; |
| 23 | RUN; |
| 1 | PROC CAS; |
| 2 | dataPreprocess.catTrans / |
| 3 | TABLE={name='credit_risk', caslib='casuser'}, |
| 4 | method='WOE', |
| 5 | inputs={{name='job_type'}, {name='housing_status'}}, |
| 6 | targets={{name='default_flag'}}, |
| 7 | events={'1'}, |
| 8 | casOut={name='credit_scored', caslib='casuser', replace=true}, |
| 9 | casOutBinDetails={name='woe_details', caslib='casuser', replace=true}, |
| 10 | code={casOut={name='score_code', caslib='casuser', replace=true}}, |
| 11 | outVarsNamePrefix='woe'; |
| 12 | RUN; |
| 13 | QUIT; |
The action successfully creates 'credit_scored' with new columns 'woe_job_type' and 'woe_housing_status'. The 'woe_details' table lists the calculated WOE and Information Value (IV) for each category. Additionally, a table 'score_code' is generated containing the SAS DATA Step logic required to apply these WOE mappings to new data in a production environment.