SAS9

SAS Optimization: LIBNAME vs. SQL Pass-Through for Large Data Volumes

Simon 14 Aufrufe

When you inherit SAS© codes that perform complex queries (numerous joins, tables with 40 to 100 million rows), performance quickly becomes the main challenge. A recurring question then arises: should you use the easy method of the LIBNAME engine or hard-code it via SQL Pass-Through?

This article explores how to diagnose bottlenecks and choose the best strategy for your massive joins.

The Dilemma: Implicit vs. Explicit

In SAS©, there are two main ways to query an external database (Oracle, Teradata, SQL Server, etc.):

  1. LIBNAME (Implicit Pass-Through): You assign a library (libname mydb oracle ...) and use the tables as if they were local. SAS© attempts to translate your SAS© code into native SQL to send to the database.

  2. SQL Pass-Through (Explicit Pass-Through): You use CONNECT TO in a PROC SQL. You write the SQL in the database's language. SAS© simply "passes the plate" and retrieves the final result.

The performance problem often occurs with LIBNAME when SAS© fails to translate the full complexity of your query (especially the 8 joins). In this case, SAS© downloads the entire tables (the 40 million rows) to the local SAS© server to perform the joins itself, which causes processing times to explode and saturates the network.

Step 1: Diagnose with SASTRACE

Before rewriting everything, you need to see what's happening "under the hood." Is SAS© delegating the work to the database or is it bringing the data back?

Enable the trace options to see the SQL generated by the LIBNAME engine:

1options sastrace=',,,d' sastraceloc=SASLOG;
If you see a complex SQL query with your JOINs and WHERE clauses in the log: The LIBNAME engine is working well (Implicit Pass-Through successful). The problem may be missing indexes on the database side.

If you see simple SELECT * FROM table statements: SAS© is bringing back the raw data to process the join locally. This is where you need to take action.

Step 2: Optimization Strategies

There is no single answer, but here are the approaches to test, from simplest to most complex.

Strategy A: Force Implicit Pass-Through (LIBNAME)

This is the preferred method because it keeps your code "clean" and portable (independent of the database type). If SASTRACE shows that the work is not being delegated, try simplifying the SAS© syntax or using functions compatible with your DBMS to help SAS© translate the query.

Strategy B: Switch to Explicit Pass-Through

If the LIBNAME engine fails to optimize, write the query directly in native SQL via CONNECT TO.

  • Advantage: You guarantee that the heavy lifting (joins, sorts) is done by the database server, which is often more powerful than the SAS© server.

  • Disadvantage: You lose portability (the code becomes specific to Oracle/SQL Server/etc.).

1PROC SQL;
2 connect to oracle (user=... password=... path=...);
3 create TABLE want as
4 select * from connection to oracle
5 (
6 SELECT A.id, B.value
7 FROM tableA A
8 INNER JOIN tableB B ON A.id = B.id
9 /* Ce code est exécuté par Oracle, pas SAS */
10 );
11 disconnect from oracle;
12QUIT;

Strategy C: The Hybrid Approach (Native SAS©)

Counter-intuitively, sometimes databases are less performant than SAS© for certain operations, or the network is too slow to return the result of a complex join.

A solid alternative is to:

  1. Extract only the necessary columns from each table into temporary SAS© tables (Subsetting).

  2. Use PROC SORT and DATA STEP MERGE locally.

The SAS© sort engine is extremely powerful. On massive volumes, a well-prepared MERGE can sometimes beat a poorly indexed SQL query on the server side.

Step 3: Measure Performance

Don't trust your watch. Use the built-in benchmarking tools to compare your tests (LIBNAME vs. Pass-Through vs. Data Step):

  1. options FULLSTIMER;: Displays the CPU time, real time, and memory used for each step in the log.

  2. PROC SQL _method;: If you stick to pure SAS©, this option shows you the algorithm used by SAS© (Hash Join, Sort Merge Join, etc.) to execute the query.

There's no magic, only comparative tests (FULLSTIMER) in your specific environment.

To optimize your massive joins:

  1. Audit first with SASTRACE to see if LIBNAME is doing its job.

  2. Test explicit Pass-Through if implicit fails.

  3. Don't rule out the option of bringing back subsets of data to do a local MERGE if the database server is overloaded.