Published on :
Data Manipulation INTERNAL_CREATION

Examples: Processing BY Groups in the DATA Step

This code is also available in: Deutsch Español Français
Awaiting validation
Analyzing BY groups in SAS© is a powerful feature for selectively processing data. By using the BY statement in a DATA step, you can organize observations into groups based on the values of one or more variables. This facilitates the execution of specific operations (calculations, aggregations, etc.) for each data group, thereby improving code efficiency and clarity. The SORT procedure is generally used before processing BY groups to sort the data according to the BY variables, as this is a fundamental requirement. The OUT= option of PROC SORT allows the creation of a new sorted dataset, although in some examples, the original dataset may be replaced. The document explores use cases with a single BY variable, multiple BY variables, and the GROUPFORMAT option for managing custom formats.
Data Analysis

Type : INTERNAL_CREATION


The examples use data generated via datalines, ensuring their autonomy.

1 Code Block
DATA STEP / PROC SORT Data
Explanation :
This example shows how to group data using a single BY variable, `zipCode`, in a DATA step. The `zip` dataset contains street names, cities, states, and zip codes. Groups are created by specifying the `zipCode` variable in the BY statement. The DATA step arranges zip codes with the same values into groups. The figure shows five BY groups being created.
Copied!
1DATA zip;
2INPUT zipCode State $ City $ Street $20-29;
3DATALINES;
485730 AZ Tucson Domenic Ln
585730 AZ Tucson Gleeson Pl
633133 FL Miami Rice St
733133 FL Miami Thomas Ave
833133 FL Miami Surrey Dr
933133 FL Miami Trade Ave
1033146 FL Miami Nervia St
1133146 FL Miami Corsica St
1233801 FL Lakeland French Ave
1333809 FL Lakeland Egret Dr
14;
15 
16PROC SORT DATA=zip;
17 BY zipCode;
18RUN;
19 
20DATA zip;
21 SET zip;
22 BY zipCode;
23RUN;
24 
25PROC PRINT DATA=zip noobs;
26 title 'BY-Group Uing a Single Variable: ZipCode';
27RUN;
2 Code Block
DATA STEP / PROC SORT Data
Explanation :
This example shows the results of processing the `zip` dataset with two BY variables, State and City. The figure shows three BY groups. The dataset is displayed with the BY variables State and City printed on the left for easy reading. The position of BY variables in observations does not affect how values are grouped and ordered.

Observations are organized so that observations for Arizona appear first. Observations within each State value are organized in order of the City value. Each BY group has a unique combination of values for the State and City variables. For example, the BY value of the first BY group is `AZ Tucson`, and the BY value of the second BY group is `FL Lakeland`.
Copied!
1DATA zip;
2INPUT State $ City $ Street $13-22 ZipCode ;
3DATALINES;
4FL Miami Nervia St 33146
5FL Miami Rice St 33133
6FL Miami Corsica St 33146
7FL Miami Thomas Ave 33133
8FL Miami Surrey Dr 33133
9FL Miami Trade Ave 33133
10FL Lakeland French Ave 33801
11FL Lakeland Egret Dr 33809
12AZ Tucson Domenic Ln 85730
13AZ Tucson Gleeson Pl 85730
14;
15 
16 
17PROC SORT DATA=zip;
18 BY State City;
19RUN;
20 
21DATA zip;
22 SET zip;
23 BY State City;
24RUN;
25PROC PRINT DATA=zip noobs;
26 title 'BY Groups with Multiple BY Variables: State City';
27RUN;
3 Code Block
DATA STEP / PROC FORMAT Data
Explanation :
This example uses the FORMAT procedure, the GROUPFORMAT option, and the FORMAT statement to create and print a simple dataset. The input TEST dataset is sorted by increasing values. The NEWTEST dataset is organized by the formatted values of the Score variable. The example uses the GROUPFORMAT option and the FORMAT statement to create and print a simple dataset.

Key ideas:
- Processing BY groups in the DATA step using the GROUPFORMAT option is the same as processing BY groups with formatted values in SAS procedures. Using the GROUPFORMAT option is useful when defining your own formats to display grouped data.
- Using the GROUPFORMAT option in the DATA step ensures that the BY groups you use to create a dataset match the BY groups in the PROC steps that report grouped and formatted data. GROUPFORMAT also determines how the FIRST.variable and LAST.variable are assigned.
Copied!
1options
2linesize=80 pagesize=60;
3 
4DATA test;
5 INPUT name $ Score;
6DATALINES;
7Jon 1
8Anthony 3
9Miguel 3
10Joseph 4
11Ian 5
12Jan 6
13;
14PROC FORMAT;
15 value Range 1-2='Low'
16 3-4='Medium'
17 5-6='High';
18RUN;
19 
20DATA newtest;
21 SET test;
22 BY groupformat Score;
23 FORMAT Score Range.;
24RUN;
25 
26PROC PRINT DATA=newtest;
27 title 'Score Categories';
28 var Name Score;
29 BY Score;
30RUN;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
Copyright Info : Copyright © SAS Institute Inc. All Rights Reserved