During data exploration or debugging, it is common to want to isolate only the "end" of a table (dataset). Unlike the OBS=10 option which gives us the beginning, retrieving the last 10 observations from a table containing $N$ rows requires a more thoughtful approach, especially if the table is large.
Note : The Naive Approach (Descending Sort)
The first idea that often comes to mind is to sort the table in reverse chronological order and take the first few rows.
proc sort data=MA_TABLE out=sorted_table;
by descending date_variable;
run;
data last_ten;
set sorted_table (obs=10);
run;
1
PROC SORTDATA=MA_TABLE out=sorted_table;
2
BY descending date_variable;
3
RUN;
4
5
DATA last_ten;
6
SET sorted_table (obs=10);
7
RUN;
Verdict: To be avoided on large tables.
Why? Sorting a table with millions of rows consumes a huge amount of CPU and I/O resources just to read 10 lines. It's inefficient.
Note : The Arithmetic Approach (NOBS and IF)
A much cleaner method is to use the NOBS option in the SET statement. This option stores the total number of observations in a temporary variable at compile time, without having to read the table.
data last_ten;
set MA_TABLE nobs=total_obs;
/* On ne garde que si le numéro de ligne (_N_) est dans les 10 derniers */
if _N_ > (total_obs - 10);
run;
1
DATA last_ten;
2
SET MA_TABLE nobs=total_obs;
3
/* On ne garde que si le numéro de ligne (_N_) est dans les 10 derniers */
Note : The High-Performance Approach (Direct Access with POINT=)
This is the optimal solution recommended by experts. The goal is to use random access (Direct Access) to jump directly to the end of the file without reading the beginning.
We combine NOBS (to know the end) and POINT (to go to a specific line).
data last_ten;
/* Boucle de la ligne N-9 jusqu'à la ligne N */
do i = (total_obs - 9) to total_obs;
set MA_TABLE nobs=total_obs point=i;
output;
end;
stop; /* IMPORTANT : stop évite une boucle infinie avec l'instruction POINT */
run;
1
DATA last_ten;
2
/* Boucle de la ligne N-9 jusqu'à la ligne N */
3
DO i = (total_obs - 9) to total_obs;
4
SET MA_TABLE nobs=total_obs point=i;
5
OUTPUT;
6
END;
7
stop; /* IMPORTANT : stop évite une boucle infinie avec l'instruction POINT */
Technical note: The STOP statement is mandatory because the SET statement with POINT= never encounters an End-of-File marker.
Note : The Macro Approach (Dynamic FIRSTOBS)
An interesting alternative is to calculate the starting point in a previous step, then use the FIRSTOBS option.
/* Étape 1 : Calculer le point de départ */
data _null_;
set MA_TABLE nobs=n;
start_point = n - 9;
call symput('start_obs', start_point);
run;
/* Étape 2 : Lire à partir de ce point */
data last_ten;
set MA_TABLE (firstobs=&start_obs);
run;
1
/* Étape 1 : Calculer le point de départ */
2
DATA _null_;
3
SET MA_TABLE nobs=n;
4
start_point = n - 9;
5
call symput('start_obs', start_point);
6
RUN;
7
8
/* Étape 2 : Lire à partir de ce point */
9
DATA last_ten;
10
SET MA_TABLE (firstobs=&start_obs);
11
RUN;
Verdict: Functional but verbose. Method 3 remains superior because it is done in a single Data step.
Points of Attention and Limitations
To conclude, here are some technical nuances raised during the discussion:
The codes and examples provided on WeAreCAS.eu are for educational purposes. It is imperative not to blindly copy-paste them into your production environments. The best approach is to understand the logic before applying it. We strongly recommend testing these scripts in a test environment (Sandbox/Dev). WeAreCAS accepts no responsibility for any impact or data loss on your systems.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. WeAreCAS is an independent community site and is not affiliated with SAS Institute Inc.
This site uses technical and analytical cookies to improve your experience.
Read more.