Published on :
Statistical CREATION_INTERNE

Cutpoint Binning

This code is also available in: Deutsch Español Français
Awaiting validation
This functionality uses the BINNING procedure, which is part of the visual statistical procedures in SAS© Viya. The CUTPTS method allows manually defining thresholds for each category. Each interval is defined by a lower and an upper bound. The process transforms the original numerical values into bin identifiers. The procedure can also calculate descriptive statistics for each bin and generate output tables containing the binned data. Execution takes place on the Cloud Analytic Services (CAS) engine, which allows processing large volumes of data in a distributed and performant manner.
Data Analysis

Type : CREATION_INTERNE


Examples use generated data (datalines) or SASHELP tables to ensure the autonomy of each code block.

1 Code Block
PROC BINNING Data
Explanation :
This example creates a simple 'auto_data' data table in the CASUSER library. Then, the BINNING procedure is used with the CUTPTS method for the 'Horsepower' variable. Cutpoints at 180 and 200 are specified, creating three categories: Horsepower <= 180, 180 < Horsepower <= 200, and Horsepower > 200. The output table 'binned_data' will contain the new binned variable as well as the copied original variables. The results are then displayed for the first observations, and a frequency is performed on the binned variable to check the binning.
Copied!
1/* Création d'une table CAS temporaire */
2options casdatalimit=1000;
3caslib _all_ assign;
4 
5DATA casuser.auto_data;
6 INPUT MSRP Invoice EngineSize Horsepower Cylinders MPG_City MPG_Highway Weight Wheelbase LENGTH;
7 DATALINES;
825945 23722 3.5 210 6 18 26 3470 110 195
926445 24300 3.5 210 6 18 26 3470 110 195
1028495 26033 3.5 210 6 18 26 3470 110 195
1123720 21683 2.0 165 4 23 30 2778 101 179
1224595 22473 2.0 165 4 23 30 2778 101 179
1327270 24806 2.0 190 4 21 29 2833 104 184
1428170 25686 2.0 190 4 21 29 2833 104 184
1520000 18456 2.0 160 4 24 31 2901 106 187
1622000 20387 2.0 170 4 23 31 2946 106 187
1730000 27500 3.0 220 6 17 25 3450 109 188
18;
19RUN;
20 
21/* Application du découpage par points de coupure */
22PROC BINNING DATA=casuser.auto_data numbin=3 method=cutpts;
23 INPUT Horsepower / cutpts(180, 200);
24 OUTPUT out=casuser.binned_data copyvars=(MSRP Horsepower);
25RUN;
26 
27/* Affichage des résultats (facultatif) */
28PROC PRINT DATA=casuser.binned_data (obs=5);
29RUN;
30 
31PROC FREQ DATA=casuser.binned_data;
32 tables Binned_Horsepower;
33RUN;
34 
2 Code Block
PROC BINNING Data
Explanation :
This example illustrates cutpoint binning on two variables, 'SqFt' and 'YearBuilt', from a 'house_prices' data table. The MONITOR=(SalePrice) option is used for the 'SqFt' variable to track the 'SalePrice' variable in the bin statistics, which is useful for evaluating binning performance. The OUTSTAT option generates an additional 'bin_stats' table containing the statistics for each bin, offering a detailed view of the grouping results.
Copied!
1/* Création d'une table CAS temporaire */
2options casdatalimit=1000;
3caslib _all_ assign;
4 
5DATA casuser.house_prices;
6 INPUT SqFt Bedrooms Bathrooms LotSize YearBuilt SalePrice;
7 DATALINES;
81500 3 2 5000 1990 150000
91800 4 2.5 6000 1995 200000
101200 2 1 4000 1980 100000
112000 3 3 7000 2000 250000
121600 3 2 5500 1992 160000
131300 2 1.5 4500 1985 110000
142200 4 3.5 8000 2005 300000
151700 3 2 5800 1998 180000
161900 4 2 6500 2002 220000
171400 3 1 4800 1988 130000
18;
19RUN;
20 
21/* Application du découpage par points de coupure avec options */
22PROC BINNING DATA=casuser.house_prices numbin=4 method=cutpts;
23 INPUT SqFt / cutpts(1500, 1800, 2000) monitor=(SalePrice);
24 INPUT YearBuilt / cutpts(1990, 2000) monitor=(SalePrice);
25 OUTPUT out=casuser.binned_house_data copyvars=(SqFt YearBuilt SalePrice) outstat=casuser.bin_stats;
26RUN;
27 
28/* Affichage des détails des bins */
29PROC PRINT DATA=casuser.bin_stats;
30RUN;
31 
32/* Affichage des données binées */
33PROC PRINT DATA=casuser.binned_house_data (obs=5);
34RUN;
35 
3 Code Block
PROC BINNING Data
Explanation :
This advanced example uses a customer data table with missing values. For the 'Age' variable, specific cutpoints (30, 40, 50) are defined. For 'Income_Monthly', although cutpoints are provided, the HANDLEMISSING=BIN option indicates that missing values should be treated as a distinct bin category. This demonstrates a more robust approach to data grouping, taking into account real-world scenarios of incomplete data.
Copied!
1/* Création d'une table CAS temporaire avec des valeurs manquantes */
2options casdatalimit=1000;
3caslib _all_ assign;
4 
5DATA casuser.customer_data;
6 INPUT Age Income_Monthly Education_Level $ Credit_Score_PreBinning Gender $;
7 DATALINES;
830 3000 Bac NA . M
945 5000 Master NA 750 F
1025 2000 HighSchool NA 600 M
11NA 4000 PhD NA 800 F
1235 NA Bachelor NA 700 M
1350 6000 Master NA 850 F
1428 2500 HighSchool NA 620 M
1540 4500 Bachelor NA 720 F
16NA 3500 PhD NA 780 M
1732 3200 HighSchool NA 680 F
18;
19RUN;
20 
21/* Application du découpage avec gestion des manquantes et différentes numbin */
22PROC BINNING DATA=casuser.customer_data numbin=5 method=cutpts;
23 INPUT Age / cutpts(30, 40, 50);
24 INPUT Income_Monthly / numbin=3 cutpts(3000, 5000) handlemissing=bin;
25 OUTPUT out=casuser.binned_customer_data copyvars=(Age Income_Monthly Credit_Score_PreBinning);
26RUN;
27 
28/* Affichage des résultats pour les données binées */
29PROC PRINT DATA=casuser.binned_customer_data;
30RUN;
31 
32/* Vérification des fréquences pour les variables binées */
33PROC FREQ DATA=casuser.binned_customer_data;
34 tables Binned_Age Binned_Income_Monthly;
35RUN;
36 
4 Code Block
PROC BINNING
Explanation :
This example demonstrates the integration of PROC BINNING into the SAS Viya/CAS environment. It loads the 'iris' table from SASHELP into a CAS library. Then, it applies cutpoint binning to 'PetalLength' and 'SepalWidth' and saves the binning state (the binning rules) into a 'binning_state' table using the SAVE STATE option. This 'state' can be reused to apply exactly the same binning rules (scoring) to new data without having to redefine the cutpoints. The 'scored_iris' table shows the result of binning on the new data.
Copied!
1/* Assurez-vous d'avoir une session CAS active et une caslib assignée */
2options casdatalimit=1000;
3caslib _all_ assign;
4 
5DATA casuser.iris;
6 SET sashelp.iris;
7RUN;
8 
9/* Application du découpage par points de coupure sur une table CAS */
10PROC BINNING DATA=casuser.iris numbin=3 method=cutpts;
11 INPUT PetalLength / cutpts(1.5, 4.5);
12 INPUT SepalWidth / cutpts(3.0, 3.5);
13 OUTPUT out=casuser.binned_iris (replace=true) copyvars=(Species PetalLength SepalWidth);
14 save state out=casuser.binning_state (replace=true);
15RUN;
16 
17/* Appliquer le découpage sur de nouvelles données (scoring) */
18DATA casuser.new_iris_data;
19 INPUT PetalLength SepalWidth Species $;
20 DATALINES;
211.2 3.8 Setosa
225.0 3.0 Virginica
234.0 2.5 Versicolor
24;
25RUN;
26 
27PROC BINNING DATA=casuser.new_iris_data;
28 score state=casuser.binning_state out=casuser.scored_iris;
29RUN;
30 
31PROC PRINT DATA=casuser.scored_iris;
32RUN;
33 
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.