Published on :
Data Manipulation INTERNAL_CREATION

Examples: Processing BY Groups with FIRST. and LAST. DATA Step Variables

This code is also available in: Deutsch Español Français
Awaiting validation
The functional analysis focuses on how SAS© uses the temporary variables FIRST.<variable> and LAST.<variable> to mark the beginning and end of each BY group. These variables are automatically generated by SAS© and take the value 1 at the beginning (FIRST.) or end (LAST.) of a group, and 0 otherwise. Their behavior is heavily influenced by the data's sort order. The examples illustrate their use in the DATA step to identify group boundaries and perform specific actions, such as initializing counters or writing aggregated results. It is crucial to note that these variables are temporary and are not part of the output dataset. The automatic variable _N_ is also used to track DATA step iterations.
Data Analysis

Type : INTERNAL_CREATION


The examples use generated data (datalines).

1 Code Block
DATA STEP / PROC SORT Data
Explanation :
This example shows how SAS uses FIRST.<variable> and LAST.<variable> to mark the beginning and end of BY groups. FIRST. and LAST. variables are temporary variables automatically created by SAS, representing the start and end of each BY group. They can be referenced in the DATA step but are not part of the output dataset. The automatic variable _N_ is used as a counter for DATA step iterations.
Copied!
1DATA zip;
2INPUT State $ City $ ZipCode Street $20-29;
3DATALINES;
4FL Miami 33133 Rice St
5FL Miami 33133 Thomas Ave
6FL Miami 33133 Surrey Dr
7FL Miami 33133 Trade Ave
8FL Miami 33146 Nervia St
9FL Miami 33146 Corsica St
10FL Lakeland 33801 French Ave
11FL Lakeland 33809 Egret Dr
12AZ Tucson 85730 Domenic Ln
13AZ Tucson 85730 Gleeson Pl
14;
15PROC SORT DATA=zip;
16 BY State City ZipCode;
17RUN;
18 
19DATA zip2;
20 SET zip;
21 BY State City ZipCode;
22 put _n_= City State ZipCode
23 first.city= last.city=
24 first.state= last.state=
25 first.ZipCode= last.ZipCode= ;
26RUN;
2 Code Block
DATA STEP / PROC SORT Data
Explanation :
This example illustrates that each BY variable creates temporary variables (FIRST.State, LAST.State, FIRST.City, LAST.City, FIRST.ZipCode, and LAST.ZipCode). FIRST. and LAST. variables are temporary variables automatically created by SAS, representing the start and end of each BY group. They can be referenced in the DATA step but are not part of the output dataset. The automatic variable _N_ is used as a counter for DATA step iterations.
Copied!
1DATA zip;
2INPUT State $ City $ ZipCode Street $20-29;
3DATALINES;
4FL Miami 33133 Rice St
5FL Miami 33133 Thomas Ave
6FL Miami 33133 Surrey Dr
7FL Miami 33133 Trade Ave
8FL Miami 33146 Nervia St
9FL Miami 33146 Corsica St
10FL Lakeland 33801 French Ave
11FL Lakeland 33809 Egret Dr
12AZ Tucson 85730 Domenic Ln
13AZ Tucson 85730 Gleeson Pl
14;
15PROC SORT DATA=zip;
16BY City State ZipCode;
17RUN;
18 
19DATA zip2;
20 SET zip;
3 Code Block
DATA STEP Data
Explanation :
This example demonstrates that a change in a previous value can affect the value of FIRST.<variable>, even if the current value of the variable remains the same. The values of FIRST.<variable> and LAST.<variable> depend on the sort order and the value of the BY variable. For observation 3, the value of FIRST.Y is set to 1 because BLUEBERRY is a new value for Y, which also causes FIRST.Z to be set to 1, even though the value of Z has not changed. FIRST. and LAST. variables are temporary, automatically created by SAS, and represent the start and end of each BY group.
Copied!
1DATA fruit;
2 INPUT x $ y $ 10-18 z $ 21-29;
3 DATALINES;
4apple banana coconut
5apple banana coconut
6apple blueberry citron
7apricot blueberry citron
8;
9 
10DATA _null_;
11 SET fruit;
12 BY x y z;
13 IF _N_=1 THEN put 'Grouped by X Y Z';
14 put _N_= x= first.x= last.x= first.y= last.y= first.z= last.z= ;
15RUN;
16 
17DATA _null_;
18 SET fruit;
19 BY y x z;
20 IF _N_=1 THEN put 'Grouped by Y X Z';
21 put _N_= first.y= last.y= first.x= last.x= first.z= last.z= ;
22RUN;
4 Code Block
DATA STEP / PROC SORT Data
Explanation :
This example calculates the annual payroll by department. It uses IF-THEN statements and the values of the automatic FIRST.<variable> and LAST.<variable> variables to reset the PAYROLL value to 0 at the beginning of each BY group and to write an observation after processing the last observation of a BY group. FIRST. and LAST. variables are temporary and automatically created by SAS. The IF/THEN statement executes statements conditionally.
Copied!
1DATA salaries;
2 INPUT Department $ Name $ WageCategory $ WageRate;
3 DATALINES;
4BAD Carol Salaried 20000
5BAD Elizabeth Salaried 5000
6BAD Linda Salaried 7000
7BAD Thomas Salaried 9000
8BAD Lynne Hourly 230
9DDG Jason Hourly 200
10DDG Paul Salaried 4000
11PPD Kevin Salaried 5500
12PPD Amber Hourly 150
13PPD Tina Salaried 13000
14STD Helen Hourly 200
15STD Jim Salaried 8000
16;
17 
18PROC SORT DATA=salaries out=temp;
19 BY Department;
20RUN;
21 
22DATA budget (keep=Department Payroll);
23 SET temp;
24 BY Department;
25 IF WageCategory='Salaried' THEN YearlyWage=WageRate*12;
26 ELSE IF WageCategory='Hourly' THEN YearlyWage=WageRate*2000;
27 
28 IF first.Department THEN Payroll=0;
29 Payroll+YearlyWage;
30 IF last.Department;
31RUN;
32 
33PROC PRINT DATA=budget;
34 FORMAT Payroll dollar10.;
35 title 'Annual Payroll by Department';
36RUN;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
Copyright Info : Copyright © SAS Institute Inc. All Rights Reserved


Banner
Expert Advice
Expert
Michael
Responsable de l'infrastructure Viya.
« In the SAS DATA step, the automatic variables FIRST.variable and LAST.variable are the primary mechanism for high-performance data summarization. These flags allow you to detect boundaries between groups (like Departments or ZipCodes) without needing complex look-ahead logic. By identifying the exact moment a group starts or ends, you can perform conditional calculations—such as resetting counters, calculating running totals, or outputting unique summary records. »