During data exploration or debugging, it is common to want to isolate only the "end" of a table (dataset). Unlike the OBS=10 option which gives us the beginning, retrieving the last 10 observations from a table containing $N$ rows requires a more thoughtful approach, especially if the table is large.
Note : The Naive Approach (Descending Sort)
The first idea that often comes to mind is to sort the table in reverse chronological order and take the first few rows.
proc sort data=MA_TABLE out=sorted_table;
by descending date_variable;
run;
data last_ten;
set sorted_table (obs=10);
run;
1
PROC SORTDATA=MA_TABLE out=sorted_table;
2
BY descending date_variable;
3
RUN;
4
5
DATA last_ten;
6
SET sorted_table (obs=10);
7
RUN;
Verdict: To be avoided on large tables.
Why? Sorting a table with millions of rows consumes a huge amount of CPU and I/O resources just to read 10 lines. It's inefficient.
Note : The Arithmetic Approach (NOBS and IF)
A much cleaner method is to use the NOBS option in the SET statement. This option stores the total number of observations in a temporary variable at compile time, without having to read the table.
data last_ten;
set MA_TABLE nobs=total_obs;
/* On ne garde que si le numéro de ligne (_N_) est dans les 10 derniers */
if _N_ > (total_obs - 10);
run;
1
DATA last_ten;
2
SET MA_TABLE nobs=total_obs;
3
/* On ne garde que si le numéro de ligne (_N_) est dans les 10 derniers */
Note : The High-Performance Approach (Direct Access with POINT=)
This is the optimal solution recommended by experts. The goal is to use random access (Direct Access) to jump directly to the end of the file without reading the beginning.
We combine NOBS (to know the end) and POINT (to go to a specific line).
data last_ten;
/* Boucle de la ligne N-9 jusqu'à la ligne N */
do i = (total_obs - 9) to total_obs;
set MA_TABLE nobs=total_obs point=i;
output;
end;
stop; /* IMPORTANT : stop évite une boucle infinie avec l'instruction POINT */
run;
1
DATA last_ten;
2
/* Boucle de la ligne N-9 jusqu'à la ligne N */
3
DO i = (total_obs - 9) to total_obs;
4
SET MA_TABLE nobs=total_obs point=i;
5
OUTPUT;
6
END;
7
stop; /* IMPORTANT : stop évite une boucle infinie avec l'instruction POINT */
Technical note: The STOP statement is mandatory because the SET statement with POINT= never encounters an End-of-File marker.
Note : The Macro Approach (Dynamic FIRSTOBS)
An interesting alternative is to calculate the starting point in a previous step, then use the FIRSTOBS option.
/* Étape 1 : Calculer le point de départ */
data _null_;
set MA_TABLE nobs=n;
start_point = n - 9;
call symput('start_obs', start_point);
run;
/* Étape 2 : Lire à partir de ce point */
data last_ten;
set MA_TABLE (firstobs=&start_obs);
run;
1
/* Étape 1 : Calculer le point de départ */
2
DATA _null_;
3
SET MA_TABLE nobs=n;
4
start_point = n - 9;
5
call symput('start_obs', start_point);
6
RUN;
7
8
/* Étape 2 : Lire à partir de ce point */
9
DATA last_ten;
10
SET MA_TABLE (firstobs=&start_obs);
11
RUN;
Verdict: Functional but verbose. Method 3 remains superior because it is done in a single Data step.
Points of Attention and Limitations
To conclude, here are some technical nuances raised during the discussion:
Die auf WeAreCAS.eu bereitgestellten Codes und Beispiele dienen Lehrzwecken. Es ist zwingend erforderlich, sie nicht blind in Ihre Produktionsumgebungen zu kopieren. Der beste Ansatz besteht darin, die Logik zu verstehen, bevor sie angewendet wird. Wir empfehlen dringend, diese Skripte in einer Testumgebung (Sandbox/Dev) zu testen. WeAreCAS übernimmt keine Verantwortung für mögliche Auswirkungen oder Datenverluste auf Ihren Systemen.
SAS und alle anderen Produkt- oder Dienstleistungsnamen von SAS Institute Inc. sind eingetragene Marken oder Marken von SAS Institute Inc. in den USA und anderen Ländern. ® zeigt die Registrierung in den USA an. WeAreCAS ist eine unabhängige Community-Site und nicht mit SAS Institute Inc. verbunden.
Diese Website verwendet technische und analytische Cookies, um Ihre Erfahrung zu verbessern.
Mehr erfahren.