During data exploration or debugging, it is common to want to isolate only the "end" of a table (dataset). Unlike the OBS=10 option which gives us the beginning, retrieving the last 10 observations from a table containing $N$ rows requires a more thoughtful approach, especially if the table is large.
Note : The Naive Approach (Descending Sort)
The first idea that often comes to mind is to sort the table in reverse chronological order and take the first few rows.
proc sort data=MA_TABLE out=sorted_table;
by descending date_variable;
run;
data last_ten;
set sorted_table (obs=10);
run;
1
PROC SORTDATA=MA_TABLE out=sorted_table;
2
BY descending date_variable;
3
RUN;
4
5
DATA last_ten;
6
SET sorted_table (obs=10);
7
RUN;
Verdict: To be avoided on large tables.
Why? Sorting a table with millions of rows consumes a huge amount of CPU and I/O resources just to read 10 lines. It's inefficient.
Note : The Arithmetic Approach (NOBS and IF)
A much cleaner method is to use the NOBS option in the SET statement. This option stores the total number of observations in a temporary variable at compile time, without having to read the table.
data last_ten;
set MA_TABLE nobs=total_obs;
/* On ne garde que si le numéro de ligne (_N_) est dans les 10 derniers */
if _N_ > (total_obs - 10);
run;
1
DATA last_ten;
2
SET MA_TABLE nobs=total_obs;
3
/* On ne garde que si le numéro de ligne (_N_) est dans les 10 derniers */
Note : The High-Performance Approach (Direct Access with POINT=)
This is the optimal solution recommended by experts. The goal is to use random access (Direct Access) to jump directly to the end of the file without reading the beginning.
We combine NOBS (to know the end) and POINT (to go to a specific line).
data last_ten;
/* Boucle de la ligne N-9 jusqu'à la ligne N */
do i = (total_obs - 9) to total_obs;
set MA_TABLE nobs=total_obs point=i;
output;
end;
stop; /* IMPORTANT : stop évite une boucle infinie avec l'instruction POINT */
run;
1
DATA last_ten;
2
/* Boucle de la ligne N-9 jusqu'à la ligne N */
3
DO i = (total_obs - 9) to total_obs;
4
SET MA_TABLE nobs=total_obs point=i;
5
OUTPUT;
6
END;
7
stop; /* IMPORTANT : stop évite une boucle infinie avec l'instruction POINT */
Technical note: The STOP statement is mandatory because the SET statement with POINT= never encounters an End-of-File marker.
Note : The Macro Approach (Dynamic FIRSTOBS)
An interesting alternative is to calculate the starting point in a previous step, then use the FIRSTOBS option.
/* Étape 1 : Calculer le point de départ */
data _null_;
set MA_TABLE nobs=n;
start_point = n - 9;
call symput('start_obs', start_point);
run;
/* Étape 2 : Lire à partir de ce point */
data last_ten;
set MA_TABLE (firstobs=&start_obs);
run;
1
/* Étape 1 : Calculer le point de départ */
2
DATA _null_;
3
SET MA_TABLE nobs=n;
4
start_point = n - 9;
5
call symput('start_obs', start_point);
6
RUN;
7
8
/* Étape 2 : Lire à partir de ce point */
9
DATA last_ten;
10
SET MA_TABLE (firstobs=&start_obs);
11
RUN;
Verdict: Functional but verbose. Method 3 remains superior because it is done in a single Data step.
Points of Attention and Limitations
To conclude, here are some technical nuances raised during the discussion:
Los códigos y ejemplos proporcionados en WeAreCAS.eu son con fines educativos. Es imperativo no copiarlos y pegarlos ciegamente en sus entornos de producción. El mejor enfoque es comprender la lógica antes de aplicarla. Recomendamos encarecidamente probar estos scripts en un entorno de prueba (Sandbox/Dev). WeAreCAS no acepta ninguna responsabilidad por cualquier impacto o pérdida de datos en sus sistemas.
SAS y todos los demás nombres de productos o servicios de SAS Institute Inc. son marcas registradas o marcas comerciales de SAS Institute Inc. en los EE. UU. y otros países. ® indica registro en los EE. UU. WeAreCAS es un sitio comunitario independiente y no está afiliado a SAS Institute Inc.
Este sitio utiliza cookies técnicas y analíticas para mejorar su experiencia.
Saber más.