Examples use generated data (datalines) or SASHELP tables to ensure the autonomy of each code block.
1 Code Block
PROC BINNING Data
Explanation : This example creates a simple 'auto_data' data table in the CASUSER library. Then, the BINNING procedure is used with the CUTPTS method for the 'Horsepower' variable. Cutpoints at 180 and 200 are specified, creating three categories: Horsepower <= 180, 180 < Horsepower <= 200, and Horsepower > 200. The output table 'binned_data' will contain the new binned variable as well as the copied original variables. The results are then displayed for the first observations, and a frequency is performed on the binned variable to check the binning.
Explanation : This example illustrates cutpoint binning on two variables, 'SqFt' and 'YearBuilt', from a 'house_prices' data table. The MONITOR=(SalePrice) option is used for the 'SqFt' variable to track the 'SalePrice' variable in the bin statistics, which is useful for evaluating binning performance. The OUTSTAT option generates an additional 'bin_stats' table containing the statistics for each bin, offering a detailed view of the grouping results.
Explanation : This advanced example uses a customer data table with missing values. For the 'Age' variable, specific cutpoints (30, 40, 50) are defined. For 'Income_Monthly', although cutpoints are provided, the HANDLEMISSING=BIN option indicates that missing values should be treated as a distinct bin category. This demonstrates a more robust approach to data grouping, taking into account real-world scenarios of incomplete data.
Copied!
/* Création d'une table CAS temporaire avec des valeurs manquantes */
options casdatalimit=1000;
caslib _all_ assign;
data casuser.customer_data;
input Age Income_Monthly Education_Level $ Credit_Score_PreBinning Gender $;
datalines;
30 3000 Bac NA . M
45 5000 Master NA 750 F
25 2000 HighSchool NA 600 M
NA 4000 PhD NA 800 F
35 NA Bachelor NA 700 M
50 6000 Master NA 850 F
28 2500 HighSchool NA 620 M
40 4500 Bachelor NA 720 F
NA 3500 PhD NA 780 M
32 3200 HighSchool NA 680 F
;
run;
/* Application du découpage avec gestion des manquantes et différentes numbin */
proc binning data=casuser.customer_data numbin=5 method=cutpts;
input Age / cutpts(30, 40, 50);
input Income_Monthly / numbin=3 cutpts(3000, 5000) handlemissing=bin;
output out=casuser.binned_customer_data copyvars=(Age Income_Monthly Credit_Score_PreBinning);
run;
/* Affichage des résultats pour les données binées */
proc print data=casuser.binned_customer_data;
run;
/* Vérification des fréquences pour les variables binées */
proc freq data=casuser.binned_customer_data;
tables Binned_Age Binned_Income_Monthly;
run;
1
/* Création d'une table CAS temporaire avec des valeurs manquantes */
2
options casdatalimit=1000;
3
caslib _all_ assign;
4
5
DATA casuser.customer_data;
6
INPUT Age Income_Monthly Education_Level $ Credit_Score_PreBinning Gender $;
7
DATALINES;
8
303000 Bac NA . M
9
455000 Master NA 750 F
10
252000 HighSchool NA 600 M
11
NA 4000 PhD NA 800 F
12
35 NA Bachelor NA 700 M
13
506000 Master NA 850 F
14
282500 HighSchool NA 620 M
15
404500 Bachelor NA 720 F
16
NA 3500 PhD NA 780 M
17
323200 HighSchool NA 680 F
18
;
19
RUN;
20
21
/* Application du découpage avec gestion des manquantes et différentes numbin */
/* Affichage des résultats pour les données binées */
29
PROC PRINTDATA=casuser.binned_customer_data;
30
RUN;
31
32
/* Vérification des fréquences pour les variables binées */
33
PROC FREQDATA=casuser.binned_customer_data;
34
tables Binned_Age Binned_Income_Monthly;
35
RUN;
36
4 Code Block
PROC BINNING
Explanation : This example demonstrates the integration of PROC BINNING into the SAS Viya/CAS environment. It loads the 'iris' table from SASHELP into a CAS library. Then, it applies cutpoint binning to 'PetalLength' and 'SepalWidth' and saves the binning state (the binning rules) into a 'binning_state' table using the SAVE STATE option. This 'state' can be reused to apply exactly the same binning rules (scoring) to new data without having to redefine the cutpoints. The 'scored_iris' table shows the result of binning on the new data.
Copied!
/* Assurez-vous d'avoir une session CAS active et une caslib assignée */
options casdatalimit=1000;
caslib _all_ assign;
data casuser.iris;
set sashelp.iris;
run;
/* Application du découpage par points de coupure sur une table CAS */
proc binning data=casuser.iris numbin=3 method=cutpts;
input PetalLength / cutpts(1.5, 4.5);
input SepalWidth / cutpts(3.0, 3.5);
output out=casuser.binned_iris (replace=true) copyvars=(Species PetalLength SepalWidth);
save state out=casuser.binning_state (replace=true);
run;
/* Appliquer le découpage sur de nouvelles données (scoring) */
data casuser.new_iris_data;
input PetalLength SepalWidth Species $;
datalines;
1.2 3.8 Setosa
5.0 3.0 Virginica
4.0 2.5 Versicolor
;
run;
proc binning data=casuser.new_iris_data;
score state=casuser.binning_state out=casuser.scored_iris;
run;
proc print data=casuser.scored_iris;
run;
1
/* Assurez-vous d'avoir une session CAS active et une caslib assignée */
2
options casdatalimit=1000;
3
caslib _all_ assign;
4
5
DATA casuser.iris;
6
SET sashelp.iris;
7
RUN;
8
9
/* Application du découpage par points de coupure sur une table CAS */
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. WeAreCAS is an independent community site and is not affiliated with SAS Institute Inc.
This site uses technical and analytical cookies to improve your experience.
Read more.