The BINNING procedure is used to discretize variables, creating groups of values called 'bins'. Bucket Binning divides data into an equal number of observations per bin. For each bin, the Weight of Evidence (WOE) is calculated, measuring the strength of the relationship between a predictor and the target variable. A positive WOE value indicates a higher probability of the target event in that bin, while a negative value indicates a lower probability. The Information Value (IV) is a weighted sum of WOE for all categories of a variable, and serves to assess the usefulness of a variable for predicting the target. This method is crucial for handling outliers, multicollinearity, and improving model performance. Processed variables must be loaded into CAS memory for the procedure to work.
Data Analysis
Type : CREATION_INTERNE
Examples use generated data (datalines) to ensure their autonomy.
1 Code Block
PROC BINNING Data
Explanation : This example illustrates the simplest use of the BINNING procedure to perform bucket binning on the `x1` variable with 5 bins, and calculate the Weight of Evidence (WOE) with respect to the `y` target variable (where 'y' is the event). Data is created directly with `datalines` and loaded into CAS memory via the `mylib` library.
Copied!
cas mylib; /* S'assurer que la bibliothèque 'mylib' est définie pour CAS */
libname mylib cas;
data mylib.data_basique;
input x0 x1 x2 y $;
datalines;
2 10 7 n
2 12 6 y
3 11 1 n
2 13 7 y
2 10 4 n
3 16 7 n
1 14 4 y
2 15 6 y
1 16 4 n
2 13 2 n
;
run;
proc binning data=mylib.data_basique numbin=5 woe;
input x1;
target y / event="y";
output out=mylib.output_basique;
run;
1
cas mylib; /* S'assurer que la bibliothèque 'mylib' est définie pour CAS */
2
LIBNAME mylib cas;
3
4
DATA mylib.data_basique;
5
INPUT x0 x1 x2 y $;
6
DATALINES;
7
2107 n
8
2126 y
9
3111 n
10
2137 y
11
2104 n
12
3167 n
13
1144 y
14
2156 y
15
1164 n
16
2132 n
17
;
18
RUN;
19
20
PROC BINNINGDATA=mylib.data_basique numbin=5 woe;
21
INPUT x1;
22
target y / event="y";
23
OUTPUT out=mylib.output_basique;
24
RUN;
2 Code Block
PROC BINNING Data
Explanation : This example extends the basic case by applying bucket binning to multiple variables (`x0` and `x1`) by explicitly specifying `binmethod=bucket`. It also calculates WOE. Two output tables are generated: `out` for binning details and `outwoe` for WOE mappings. This is useful for inspecting transformations and potentially applying these mappings to new data.
Copied!
cas mylib; /* S'assurer que la bibliothèque 'mylib' est définie pour CAS */
libname mylib cas;
data mylib.data_intermediaire;
input x0 x1 x2 y $ freq;
datalines;
2 10 7 n 2
2 12 6 y 3
3 11 1 o 0
2 13 7 y 5
2 . 4 n -5
3 16 7 n 3
1 14 4 y 4
2 15 6 y 3
1 16 4 o 1
2 13 2 n 3
;
run;
proc binning data=mylib.data_intermediaire numbin=4 woe;
input x0 x1 / binmethod=bucket;
target y / event="y";
output out=mylib.output_intermediaire_bins outwoe=mylib.output_intermediaire_woe;
run;
1
cas mylib; /* S'assurer que la bibliothèque 'mylib' est définie pour CAS */
Explanation : This advanced example shows how to handle missing values with the `missing=special` option, which creates a separate bin for them. It also uses a `freq` variable to weight observations in the WOE calculation. Additionally, it combines different binning methods, `bucket` for `x0` and `x1`, and `quantile` for `x2`, to demonstrate the procedure's flexibility. This is particularly relevant for real datasets where missing values and weights are common.
Copied!
cas mylib; /* S'assurer que la bibliothèque 'mylib' est définie pour CAS */
libname mylib cas;
data mylib.data_avancee;
input x0 x1 x2 y $ freq;
datalines;
2 10 7 n 2
2 12 6 y 3
3 0 1 o 0
2 13 7 y 5
2 . 4 n -5
3 16 7 n 3
1 14 4 y 4
2 15 6 y 3
1 16 4 o 1
2 13 2 n 3
;
run;
proc binning data=mylib.data_avancee numbin=3 woe;
input x0 x1 / binmethod=bucket missing=special;
input x2 / binmethod=quantile;
target y / event="y";
weight freq;
output out=mylib.output_avancee;
run;
1
cas mylib; /* S'assurer que la bibliothèque 'mylib' est définie pour CAS */
2
LIBNAME mylib cas;
3
4
DATA mylib.data_avancee;
5
INPUT x0 x1 x2 y $ freq;
6
DATALINES;
7
2107 n 2
8
2126 y 3
9
3 0 1 o 0
10
2137 y 5
11
2 . 4 n -5
12
3167 n 3
13
1144 y 4
14
2156 y 3
15
1164 o 1
16
2132 n 3
17
;
18
RUN;
19
20
PROC BINNINGDATA=mylib.data_avancee numbin=3 woe;
21
INPUT x0 x1 / binmethod=bucket missing=special;
22
INPUT x2 / binmethod=quantile;
23
target y / event="y";
24
weight freq;
25
OUTPUT out=mylib.output_avancee;
26
RUN;
4 Code Block
PROC BINNING Data
Explanation : This example highlights the capabilities of SAS Viya and the CAS environment. It shows how the BINNING procedure can not only generate bins and WOE but also save this binning 'model' into a CAS table (`mylib.woe_mapping`) using the `SAVE WOE=` statement. This mapping table can then be used to apply the same WOE transformations to new data (`APPLYWOE=`), ensuring consistency between training and validation/test datasets, which is a common practice in predictive modeling.
Copied!
cas mylib; /* S'assurer que la bibliothèque 'mylib' est définie pour CAS */
libname mylib cas;
data mylib.data_cas;
input id x0 x1 x2 y $;
datalines;
1 2 10 7 n
2 2 12 6 y
3 3 11 1 n
4 2 13 7 y
5 2 . 4 n
6 3 16 7 n
7 1 14 4 y
8 2 15 6 y
9 1 16 4 n
10 2 13 2 n
;
run;
proc binning data=mylib.data_cas numbin=4 woe;
input x0 x1 x2;
target y / event="y";
save woe=mylib.woe_mapping / replace;
output out=mylib.output_cas;
run;
/* Appliquer le mapping WOE à de nouvelles données (exemple) */
data mylib.new_data;
input id x0 x1 x2 y $;
datalines;
11 2 11 5 y
12 3 14 6 n
13 1 10 3 y
;
run;
proc binning data=mylib.new_data applywoe=mylib.woe_mapping;
input x0 x1 x2;
target y / event="y";
output out=mylib.applied_woe_data;
run;
1
cas mylib; /* S'assurer que la bibliothèque 'mylib' est définie pour CAS */
2
LIBNAME mylib cas;
3
4
DATA mylib.data_cas;
5
INPUT id x0 x1 x2 y $;
6
DATALINES;
7
12107 n
8
22126 y
9
33111 n
10
42137 y
11
52 . 4 n
12
63167 n
13
71144 y
14
82156 y
15
91164 n
16
102132 n
17
;
18
RUN;
19
20
PROC BINNINGDATA=mylib.data_cas numbin=4 woe;
21
INPUT x0 x1 x2;
22
target y / event="y";
23
save woe=mylib.woe_mapping / replace;
24
OUTPUT out=mylib.output_cas;
25
RUN;
26
27
/* Appliquer le mapping WOE à de nouvelles données (exemple) */
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. WeAreCAS is an independent community site and is not affiliated with SAS Institute Inc.
This site uses technical and analytical cookies to improve your experience.
Read more.