Examples: Processing BY Groups in the DATA Step

BY-group analysis is a fundamental feature in SAS^© that allows for segmenting and processing data based on unique or combined values of specified variables. These examples demonstrate how the SORT procedure is used to prepare data, followed by the DATA step with the BY statement for grouping logic. The use of the GROUPFORMAT option is also detailed for scenarios where data is grouped based on formatted values, ensuring consistency between the DATA step and display procedures.

Data Analysis

Type : CREATION_INTERNE

All examples use internally generated data via the DATALINES statement, ensuring their autonomy.

1 Code Block

DATA STEP / PROC SORT / PROC PRINT Data

Explanation :
This example demonstrates how to group data using a single BY variable, `zipCode`, in a DATA step. The input dataset, `zip`, contains street names, cities, states, and zip codes. Groups are created by specifying the `zipCode` variable in the BY statement. The DATA step organizes zip codes with the same values into groups.

Copied!

1	DATA zip;
2	INPUT zipCode State $ City $ Street $20-29;
3	DATALINES;
4	85730 AZ Tucson Domenic Ln
5	85730 AZ Tucson Gleeson Pl
6	33133 FL Miami Rice St
7	33133 FL Miami Thomas Ave
8	33133 FL Miami Surrey Dr
9	33133 FL Miami Trade Ave
10	33146 FL Miami Nervia St
11	33146 FL Miami Corsica St
12	33801 FL Lakeland French Ave
13	33809 FL Lakeland Egret Dr
14	;
15
16	PROC SORT DATA=zip;
17	BY zipCode;
18	RUN;
19
20	DATA zip;
21	SET zip;
22	BY zipCode;
23	RUN;
24
25	PROC PRINT DATA=zip noobs;
26	title 'BY-Group Uing a Single Variable: ZipCode';
27	RUN;

2 Code Block

DATA STEP / PROC SORT / PROC PRINT Data

Explanation :
This example demonstrates the results of processing the `zip` dataset with two BY variables, `State` and `City`. Observations are arranged so that those from Arizona appear first. Observations within each `State` value are arranged in order of `City` value. Each BY group has a unique combination of values for the `State` and `City` variables.

Copied!

1	DATA zip;
2	INPUT State $ City $ Street $13-22 ZipCode ;
3	DATALINES;
4	FL Miami Nervia St 33146
5	FL Miami Rice St 33133
6	FL Miami Corsica St 33146
7	FL Miami Thomas Ave 33133
8	FL Miami Surrey Dr 33133
9	FL Miami Trade Ave 33133
10	FL Lakeland French Ave 33801
11	FL Lakeland Egret Dr 33809
12	AZ Tucson Domenic Ln 85730
13	AZ Tucson Gleeson Pl 85730
14	;
15
16
17	PROC SORT DATA=zip;
18	BY State City;
19	RUN;
20
21	DATA zip;
22	SET zip;
23	BY State City;
24	RUN;
25	PROC PRINT DATA=zip noobs;
26	title 'BY Groups with Multiple BY Variables: State City';
27	RUN;

3 Code Block

DATA STEP / PROC FORMAT / PROC PRINT Data

Explanation :
This example uses the `FORMAT` procedure, the `GROUPFORMAT` option, and the `FORMAT` statement to create and print a simple dataset. The input dataset `TEST` is sorted by increasing values. The `NEWTEST` dataset is organized by the formatted values of the `Score` variable. Processing BY groups in the DATA step with the `GROUPFORMAT` option is identical to processing BY groups with formatted values in SAS procedures, which is useful when defining custom formats to display grouped data.

Copied!

1	options
2	linesize=80 pagesize=60;
3
4	DATA test;
5	INPUT name $ Score;
6	DATALINES;
7	Jon 1
8	Anthony 3
9	Miguel 3
10	Joseph 4
11	Ian 5
12	Jan 6
13	;
14	PROC FORMAT;
15	value Range 1-2='Low'
16	3-4='Medium'
17	5-6='High';
18	RUN;
19
20	DATA newtest;
21	SET test;
22	BY groupformat Score;
23	FORMAT Score Range.;
24	RUN;
25
26	PROC PRINT DATA=newtest;
27	title 'Score Categories';
28	var Name Score;
29	BY Score;
30	RUN;

This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.

Retour à la liste

Expert Advice

Michael

Responsable de l'infrastructure Viya.

« Efficient data segmentation in SAS relies on two non-negotiable steps: sorting and grouping logic. To ensure your DATA step runs without errors, always precede it with a PROC SORT using the exact same variable order as your BY statement. When your analysis requires grouping based on custom ranges—such as turning numerical scores into "Low," "Medium," or "High" categories—use the GROUPFORMAT option. This ensures the DATA step recognizes groups based on the applied format's labels rather than the raw underlying data, maintaining perfect consistency between your data processing and your final reports. »