Published on :
Statistical CREATION_INTERNE

Bucket Binning and Weight of Evidence Calculation

This code is also available in: Deutsch Español Français
Awaiting validation
The BINNING procedure is used to discretize variables, creating groups of values called 'bins'. Bucket Binning divides data into an equal number of observations per bin. For each bin, the Weight of Evidence (WOE) is calculated, measuring the strength of the relationship between a predictor and the target variable. A positive WOE value indicates a higher probability of the target event in that bin, while a negative value indicates a lower probability. The Information Value (IV) is a weighted sum of WOE for all categories of a variable, and serves to assess the usefulness of a variable for predicting the target. This method is crucial for handling outliers, multicollinearity, and improving model performance. Processed variables must be loaded into CAS memory for the procedure to work.
Data Analysis

Type : CREATION_INTERNE


Examples use generated data (datalines) to ensure their autonomy.

1 Code Block
PROC BINNING Data
Explanation :
This example illustrates the simplest use of the BINNING procedure to perform bucket binning on the `x1` variable with 5 bins, and calculate the Weight of Evidence (WOE) with respect to the `y` target variable (where 'y' is the event). Data is created directly with `datalines` and loaded into CAS memory via the `mylib` library.
Copied!
1cas mylib; /* S'assurer que la bibliothèque 'mylib' est définie pour CAS */
2LIBNAME mylib cas;
3 
4DATA mylib.data_basique;
5 INPUT x0 x1 x2 y $;
6 DATALINES;
72 10 7 n
82 12 6 y
93 11 1 n
102 13 7 y
112 10 4 n
123 16 7 n
131 14 4 y
142 15 6 y
151 16 4 n
162 13 2 n
17;
18RUN;
19 
20PROC BINNING DATA=mylib.data_basique numbin=5 woe;
21 INPUT x1;
22 target y / event="y";
23 OUTPUT out=mylib.output_basique;
24RUN;
2 Code Block
PROC BINNING Data
Explanation :
This example extends the basic case by applying bucket binning to multiple variables (`x0` and `x1`) by explicitly specifying `binmethod=bucket`. It also calculates WOE. Two output tables are generated: `out` for binning details and `outwoe` for WOE mappings. This is useful for inspecting transformations and potentially applying these mappings to new data.
Copied!
1cas mylib; /* S'assurer que la bibliothèque 'mylib' est définie pour CAS */
2LIBNAME mylib cas;
3 
4DATA mylib.data_intermediaire;
5 INPUT x0 x1 x2 y $ freq;
6 DATALINES;
72 10 7 n 2
82 12 6 y 3
93 11 1 o 0
102 13 7 y 5
112 . 4 n -5
123 16 7 n 3
131 14 4 y 4
142 15 6 y 3
151 16 4 o 1
162 13 2 n 3
17;
18RUN;
19 
20PROC BINNING DATA=mylib.data_intermediaire numbin=4 woe;
21 INPUT x0 x1 / binmethod=bucket;
22 target y / event="y";
23 OUTPUT out=mylib.output_intermediaire_bins outwoe=mylib.output_intermediaire_woe;
24RUN;
3 Code Block
PROC BINNING Data
Explanation :
This advanced example shows how to handle missing values with the `missing=special` option, which creates a separate bin for them. It also uses a `freq` variable to weight observations in the WOE calculation. Additionally, it combines different binning methods, `bucket` for `x0` and `x1`, and `quantile` for `x2`, to demonstrate the procedure's flexibility. This is particularly relevant for real datasets where missing values and weights are common.
Copied!
1cas mylib; /* S'assurer que la bibliothèque 'mylib' est définie pour CAS */
2LIBNAME mylib cas;
3 
4DATA mylib.data_avancee;
5 INPUT x0 x1 x2 y $ freq;
6 DATALINES;
72 10 7 n 2
82 12 6 y 3
93 0 1 o 0
102 13 7 y 5
112 . 4 n -5
123 16 7 n 3
131 14 4 y 4
142 15 6 y 3
151 16 4 o 1
162 13 2 n 3
17;
18RUN;
19 
20PROC BINNING DATA=mylib.data_avancee numbin=3 woe;
21 INPUT x0 x1 / binmethod=bucket missing=special;
22 INPUT x2 / binmethod=quantile;
23 target y / event="y";
24 weight freq;
25 OUTPUT out=mylib.output_avancee;
26RUN;
4 Code Block
PROC BINNING Data
Explanation :
This example highlights the capabilities of SAS Viya and the CAS environment. It shows how the BINNING procedure can not only generate bins and WOE but also save this binning 'model' into a CAS table (`mylib.woe_mapping`) using the `SAVE WOE=` statement. This mapping table can then be used to apply the same WOE transformations to new data (`APPLYWOE=`), ensuring consistency between training and validation/test datasets, which is a common practice in predictive modeling.
Copied!
1cas mylib; /* S'assurer que la bibliothèque 'mylib' est définie pour CAS */
2LIBNAME mylib cas;
3 
4DATA mylib.data_cas;
5 INPUT id x0 x1 x2 y $;
6 DATALINES;
71 2 10 7 n
82 2 12 6 y
93 3 11 1 n
104 2 13 7 y
115 2 . 4 n
126 3 16 7 n
137 1 14 4 y
148 2 15 6 y
159 1 16 4 n
1610 2 13 2 n
17;
18RUN;
19 
20PROC BINNING DATA=mylib.data_cas numbin=4 woe;
21 INPUT x0 x1 x2;
22 target y / event="y";
23 save woe=mylib.woe_mapping / replace;
24 OUTPUT out=mylib.output_cas;
25RUN;
26 
27/* Appliquer le mapping WOE à de nouvelles données (exemple) */
28DATA mylib.new_data;
29 INPUT id x0 x1 x2 y $;
30 DATALINES;
3111 2 11 5 y
3212 3 14 6 n
3313 1 10 3 y
34;
35RUN;
36 
37PROC BINNING DATA=mylib.new_data applywoe=mylib.woe_mapping;
38 INPUT x0 x1 x2;
39 target y / event="y";
40 OUTPUT out=mylib.applied_woe_data;
41RUN;
42 
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
Copyright Info : Copyright © SAS Institute Inc. All Rights Reserved