All examples use internally generated data via the DATALINES statement, ensuring their autonomy.
1 Code Block
DATA STEP / PROC SORT / PROC PRINT Data
Explanation : This example demonstrates how to group data using a single BY variable, `zipCode`, in a DATA step. The input dataset, `zip`, contains street names, cities, states, and zip codes. Groups are created by specifying the `zipCode` variable in the BY statement. The DATA step organizes zip codes with the same values into groups.
Copied!
data zip;
input zipCode State $ City $ Street $20-29;
datalines;
85730 AZ Tucson Domenic Ln
85730 AZ Tucson Gleeson Pl
33133 FL Miami Rice St
33133 FL Miami Thomas Ave
33133 FL Miami Surrey Dr
33133 FL Miami Trade Ave
33146 FL Miami Nervia St
33146 FL Miami Corsica St
33801 FL Lakeland French Ave
33809 FL Lakeland Egret Dr
;
proc sort data=zip;
by zipCode;
run;
data zip;
set zip;
by zipCode;
run;
proc print data=zip noobs;
title 'BY-Group Uing a Single Variable: ZipCode';
run;
1
DATA zip;
2
INPUT zipCode State $ City $ Street $20-29;
3
DATALINES;
4
85730 AZ Tucson Domenic Ln
5
85730 AZ Tucson Gleeson Pl
6
33133 FL Miami Rice St
7
33133 FL Miami Thomas Ave
8
33133 FL Miami Surrey Dr
9
33133 FL Miami Trade Ave
10
33146 FL Miami Nervia St
11
33146 FL Miami Corsica St
12
33801 FL Lakeland French Ave
13
33809 FL Lakeland Egret Dr
14
;
15
16
PROC SORTDATA=zip;
17
BY zipCode;
18
RUN;
19
20
DATA zip;
21
SET zip;
22
BY zipCode;
23
RUN;
24
25
PROC PRINTDATA=zip noobs;
26
title 'BY-Group Uing a Single Variable: ZipCode';
27
RUN;
2 Code Block
DATA STEP / PROC SORT / PROC PRINT Data
Explanation : This example demonstrates the results of processing the `zip` dataset with two BY variables, `State` and `City`. Observations are arranged so that those from Arizona appear first. Observations within each `State` value are arranged in order of `City` value. Each BY group has a unique combination of values for the `State` and `City` variables.
Copied!
data zip;
input State $ City $ Street $13-22 ZipCode ;
datalines;
FL Miami Nervia St 33146
FL Miami Rice St 33133
FL Miami Corsica St 33146
FL Miami Thomas Ave 33133
FL Miami Surrey Dr 33133
FL Miami Trade Ave 33133
FL Lakeland French Ave 33801
FL Lakeland Egret Dr 33809
AZ Tucson Domenic Ln 85730
AZ Tucson Gleeson Pl 85730
;
proc sort data=zip;
by State City;
run;
data zip;
set zip;
by State City;
run;
proc print data=zip noobs;
title 'BY Groups with Multiple BY Variables: State City';
run;
1
DATA zip;
2
INPUT State $ City $ Street $13-22 ZipCode ;
3
DATALINES;
4
FL Miami Nervia St 33146
5
FL Miami Rice St 33133
6
FL Miami Corsica St 33146
7
FL Miami Thomas Ave 33133
8
FL Miami Surrey Dr 33133
9
FL Miami Trade Ave 33133
10
FL Lakeland French Ave 33801
11
FL Lakeland Egret Dr 33809
12
AZ Tucson Domenic Ln 85730
13
AZ Tucson Gleeson Pl 85730
14
;
15
16
17
PROC SORTDATA=zip;
18
BY State City;
19
RUN;
20
21
DATA zip;
22
SET zip;
23
BY State City;
24
RUN;
25
PROC PRINTDATA=zip noobs;
26
title 'BY Groups with Multiple BY Variables: State City';
27
RUN;
3 Code Block
DATA STEP / PROC FORMAT / PROC PRINT Data
Explanation : This example uses the `FORMAT` procedure, the `GROUPFORMAT` option, and the `FORMAT` statement to create and print a simple dataset. The input dataset `TEST` is sorted by increasing values. The `NEWTEST` dataset is organized by the formatted values of the `Score` variable. Processing BY groups in the DATA step with the `GROUPFORMAT` option is identical to processing BY groups with formatted values in SAS procedures, which is useful when defining custom formats to display grouped data.
Copied!
options
linesize=80 pagesize=60;
data test;
input name $ Score;
datalines;
Jon 1
Anthony 3
Miguel 3
Joseph 4
Ian 5
Jan 6
;
proc format;
value Range 1-2='Low'
3-4='Medium'
5-6='High';
run;
data newtest;
set test;
by groupformat Score;
format Score Range.;
run;
proc print data=newtest;
title 'Score Categories';
var Name Score;
by Score;
run;
1
options
2
linesize=80 pagesize=60;
3
4
DATA test;
5
INPUT name $ Score;
6
DATALINES;
7
Jon 1
8
Anthony 3
9
Miguel 3
10
Joseph 4
11
Ian 5
12
Jan 6
13
;
14
PROC FORMAT;
15
value Range 1-2='Low'
16
3-4='Medium'
17
5-6='High';
18
RUN;
19
20
DATA newtest;
21
SET test;
22
BY groupformat Score;
23
FORMAT Score Range.;
24
RUN;
25
26
PROC PRINTDATA=newtest;
27
title 'Score Categories';
28
var Name Score;
29
BY Score;
30
RUN;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
« Efficient data segmentation in SAS relies on two non-negotiable steps: sorting and grouping logic. To ensure your DATA step runs without errors, always precede it with a PROC SORT using the exact same variable order as your BY statement. When your analysis requires grouping based on custom ranges—such as turning numerical scores into "Low," "Medium," or "High" categories—use the GROUPFORMAT option. This ensures the DATA step recognizes groups based on the applied format's labels rather than the raw underlying data, maintaining perfect consistency between your data processing and your final reports. »
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. WeAreCAS is an independent community site and is not affiliated with SAS Institute Inc.
This site uses technical and analytical cookies to improve your experience.
Read more.