Published on :
Data Manipulation CREATION_INTERNE

Examples: Processing BY Groups in the DATA Step

This code is also available in: Deutsch Español Français
Awaiting validation
BY-group analysis is a fundamental feature in SAS© that allows for segmenting and processing data based on unique or combined values of specified variables. These examples demonstrate how the SORT procedure is used to prepare data, followed by the DATA step with the BY statement for grouping logic. The use of the GROUPFORMAT option is also detailed for scenarios where data is grouped based on formatted values, ensuring consistency between the DATA step and display procedures.
Data Analysis

Type : CREATION_INTERNE


All examples use internally generated data via the DATALINES statement, ensuring their autonomy.

1 Code Block
DATA STEP / PROC SORT / PROC PRINT Data
Explanation :
This example demonstrates how to group data using a single BY variable, `zipCode`, in a DATA step. The input dataset, `zip`, contains street names, cities, states, and zip codes. Groups are created by specifying the `zipCode` variable in the BY statement. The DATA step organizes zip codes with the same values into groups.
Copied!
1DATA zip;
2INPUT zipCode State $ City $ Street $20-29;
3DATALINES;
485730 AZ Tucson Domenic Ln
585730 AZ Tucson Gleeson Pl
633133 FL Miami Rice St
733133 FL Miami Thomas Ave
833133 FL Miami Surrey Dr
933133 FL Miami Trade Ave
1033146 FL Miami Nervia St
1133146 FL Miami Corsica St
1233801 FL Lakeland French Ave
1333809 FL Lakeland Egret Dr
14;
15 
16PROC SORT DATA=zip;
17 BY zipCode;
18RUN;
19 
20DATA zip;
21 SET zip;
22 BY zipCode;
23RUN;
24 
25PROC PRINT DATA=zip noobs;
26 title 'BY-Group Uing a Single Variable: ZipCode';
27RUN;
2 Code Block
DATA STEP / PROC SORT / PROC PRINT Data
Explanation :
This example demonstrates the results of processing the `zip` dataset with two BY variables, `State` and `City`. Observations are arranged so that those from Arizona appear first. Observations within each `State` value are arranged in order of `City` value. Each BY group has a unique combination of values for the `State` and `City` variables.
Copied!
1DATA zip;
2INPUT State $ City $ Street $13-22 ZipCode ;
3DATALINES;
4FL Miami Nervia St 33146
5FL Miami Rice St 33133
6FL Miami Corsica St 33146
7FL Miami Thomas Ave 33133
8FL Miami Surrey Dr 33133
9FL Miami Trade Ave 33133
10FL Lakeland French Ave 33801
11FL Lakeland Egret Dr 33809
12AZ Tucson Domenic Ln 85730
13AZ Tucson Gleeson Pl 85730
14;
15 
16 
17PROC SORT DATA=zip;
18 BY State City;
19RUN;
20 
21DATA zip;
22 SET zip;
23 BY State City;
24RUN;
25PROC PRINT DATA=zip noobs;
26 title 'BY Groups with Multiple BY Variables: State City';
27RUN;
3 Code Block
DATA STEP / PROC FORMAT / PROC PRINT Data
Explanation :
This example uses the `FORMAT` procedure, the `GROUPFORMAT` option, and the `FORMAT` statement to create and print a simple dataset. The input dataset `TEST` is sorted by increasing values. The `NEWTEST` dataset is organized by the formatted values of the `Score` variable. Processing BY groups in the DATA step with the `GROUPFORMAT` option is identical to processing BY groups with formatted values in SAS procedures, which is useful when defining custom formats to display grouped data.
Copied!
1options
2linesize=80 pagesize=60;
3 
4DATA test;
5 INPUT name $ Score;
6DATALINES;
7Jon 1
8Anthony 3
9Miguel 3
10Joseph 4
11Ian 5
12Jan 6
13;
14PROC FORMAT;
15 value Range 1-2='Low'
16 3-4='Medium'
17 5-6='High';
18RUN;
19 
20DATA newtest;
21 SET test;
22 BY groupformat Score;
23 FORMAT Score Range.;
24RUN;
25 
26PROC PRINT DATA=newtest;
27 title 'Score Categories';
28 var Name Score;
29 BY Score;
30RUN;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
Copyright Info : Copyright © SAS Institute Inc. All Rights Reserved


Banner
Expert Advice
Expert
Michael
Responsable de l'infrastructure Viya.
« Efficient data segmentation in SAS relies on two non-negotiable steps: sorting and grouping logic. To ensure your DATA step runs without errors, always precede it with a PROC SORT using the exact same variable order as your BY statement. When your analysis requires grouping based on custom ranges—such as turning numerical scores into "Low," "Medium," or "High" categories—use the GROUPFORMAT option. This ensures the DATA step recognizes groups based on the applied format's labels rather than the raw underlying data, maintaining perfect consistency between your data processing and your final reports. »