dataPreprocess catTrans

Credit Risk Modeling: Weight of Evidence Transformation

Scénario de test & Cas d'usage

Business Context

A financial institution wants to build a scorecard for loan applicants. The modeling team needs to transform categorical variables (such as 'Job Type' and 'Housing Status') into numerical Weight of Evidence (WOE) values. This transformation allows them to use linear models like Logistic Regression while capturing non-linear relationships and the 'strength' of specific categories in predicting loan default. They also need to generate SAS scoring code to deploy this logic into production.
About the Set : dataPreprocess

Data cleaning, imputation, and preprocessing.

Discover all actions of dataPreprocess
Data Preparation

Simulation of 5,000 loan applicants with categorical attributes and a binary target (default).

Copied!
1DATA casuser.credit_risk;
2 call streaminit(12345);
3 LENGTH job_type $15 housing_status $10;
4 DO i = 1 to 5000;
5 /* Simulate Job Type */
6 r = rand('Uniform');
7 IF r < 0.3 THEN job_type = 'BlueCollar';
8 ELSE IF r < 0.6 THEN job_type = 'WhiteCollar';
9 ELSE IF r < 0.8 THEN job_type = 'Retired';
10 ELSE job_type = 'Unemployed';
11 
12 /* Simulate Housing */
13 IF rand('Uniform') < 0.5 THEN housing_status = 'Own'; ELSE housing_status = 'Rent';
14 
15 /* Simulate Default (Target) - Correlation with Unemployed */
16 prob_default = 0.05;
17 IF job_type = 'Unemployed' THEN prob_default = 0.30;
18 IF housing_status = 'Rent' THEN prob_default = prob_default + 0.10;
19 
20 IF rand('Uniform') < prob_default THEN default_flag = 1; ELSE default_flag = 0;
21 OUTPUT;
22 END;
23RUN;

Étapes de réalisation

1
Execute WOE transformation with scoring code generation
Copied!
1PROC CAS;
2 dataPreprocess.catTrans /
3 TABLE={name='credit_risk', caslib='casuser'},
4 method='WOE',
5 inputs={{name='job_type'}, {name='housing_status'}},
6 targets={{name='default_flag'}},
7 events={'1'},
8 casOut={name='credit_scored', caslib='casuser', replace=true},
9 casOutBinDetails={name='woe_details', caslib='casuser', replace=true},
10 code={casOut={name='score_code', caslib='casuser', replace=true}},
11 outVarsNamePrefix='woe';
12RUN;
13QUIT;

Expected Result


The action successfully creates 'credit_scored' with new columns 'woe_job_type' and 'woe_housing_status'. The 'woe_details' table lists the calculated WOE and Information Value (IV) for each category. Additionally, a table 'score_code' is generated containing the SAS DATA Step logic required to apply these WOE mappings to new data in a production environment.