Fedsql

SAS and SQL: How to Replicate OVER PARTITION BY in SAS

Simon 22 vistas

For data analysts coming from the standard SQL world (Oracle, SQL Server, PostgreSQL), one of the first frustrations in SAS© is the apparent lack of support for Window Functions.

Common syntaxes like ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) or SUM(val) OVER (PARTITION BY group) cause immediate syntax errors in SAS©'s PROC SQL.

This article explains why this limitation exists and, more importantly, how to work around it effectively using the native power of the SAS© language.

The Observation: PROC SQL and the ANSI Standard

The standard PROC SQL procedure in SAS© is based on an old version of the ANSI standard (mainly ANSI 92). Window functions (introduced later in the SQL:2003 standard) are not supported by the native SQL engine of Base SAS©.

There's no point in looking for a hidden option: PROC SQL will not understand the OVER clause. However, SAS© offers several often more performant alternatives to achieve the same result.

Alternative 1: The Native SAS© Approach (DATA Step)

This is the most "SAS©" method. Instead of thinking in terms of set-based queries, we use the sequential processing of the DATA Step combined with groups (BY groups). This is the direct equivalent of PARTITION BY.

The Matching Logic

  • SQL: PARTITION BY variable

  • SAS©: BY variable (requires a preliminary sort)

  • SQL: ORDER BY date

  • SAS©: PROC SORT by date

Example: Replicating a ROW_NUMBER()

Let's imagine we want to number rows for each client, sorted by date.

In standard SQL:

1SELECT customer_id,
2ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY trans_date) as row_num
3FROM transactions;
4 

In SAS© (DATA Step): This method uses the automatic variables FIRST. to detect the change of group.

1/* 1. Le tri est obligatoire pour utiliser BY */
2PROC SORT DATA=transactions;
3 BY customer_id trans_date;
4RUN;
5 
6/* 2. Le Data Step */
7DATA want;
8 SET transactions;
9 BY customer_id;
10
11 /* Si c'est la première ligne du client, on remet le compteur à 1 */
12 IF first.customer_id THEN row_num = 1;
13 /* Sinon, on incrémente */
14 ELSE row_num + 1;
15RUN;

Example: Replicating a Cumulative SUM

In standard SQL: SUM(amount) OVER (PARTITION BY customer_id ORDER BY date)

In SAS© (DATA Step):

1DATA want;
2 SET transactions;
3 BY customer_id;
4
5 /* RETAIN garde la valeur de la ligne précédente */
6 retain running_total;
7
8 IF first.customer_id THEN running_total = amount;
9 ELSE running_total = running_total + amount;
10RUN;

Alternative 2: SQL Pass-Through (Delegation)

If your data already resides in a powerful database (Oracle, SQL Server, Teradata) and you are using SAS©/ACCESS, it is often more efficient to let the database do the work.

This technique, called Explicit Pass-Through, sends your native SQL code directly to the database server. SAS© does not interpret it; it only receives the result.

1PROC SQL;
2 connect to oracle (user=... pass=... path=...);
3
4 create TABLE want as
5 select * from connection to oracle
6 (
7 /* Ici, vous pouvez utiliser toute la syntaxe Oracle,
8 y compris les fonctions de fenêtrage */
9 SELECT
10 customer_id,
11 trans_date,
12 RANK() OVER (PARTITION BY customer_id ORDER BY trans_date DESC) as rang
13 FROM
14 schema.transactions
15 );
16
17 disconnect from oracle;
18QUIT;
Advantage: You use the SQL syntax you are already familiar with and leverage the power of the RDBMS server.

Alternative 3: Specific Procedures

For some common window functions like RANK or deciles (NTILE), SAS© has dedicated, optimized procedures that avoid writing complex code.

To replace RANK() OVER (PARTITION BY ...):

1PROC RANK DATA=transactions out=want descending;
2 BY customer_id; /* Équivalent du PARTITION BY */
3 var amount; /* La variable sur laquelle on classe */
4 ranks my_rank; /* Le nom de la nouvelle colonne */
5RUN;

What About PROC FEDSQL?

SAS© introduced a more modern procedure called PROC FEDSQL, which aims for stricter ANSI SQL compliance (SQL:1999 and beyond). Although FedSQL is much more advanced than PROC SQL, full support for window functions is highly dependent on the environment (SAS© 9 vs. SAS© Viya/CAS) and can sometimes be limited compared to native databases. It is recommended to check the specific documentation for your version of SAS©, but for classic windowing needs, the Data Step often remains the best path in terms of local performance.

SQL NeedRecommended SAS© Solution
Local Data (SAS© datasets)Use the DATA Step with BY, FIRST., LAST., and RETAIN. It's fast and flexible.
Data in a DB (Oracle, etc.)Use SQL Pass-Through (CONNECT TO) to execute native SQL.
Rank / Percentile CalculationUse PROC RANK or PROC UNIVARIATE.
Simple QueriesStick with standard PROC SQL (without window functions).