Fedsql

SAS Viya Optimization: Mastering Implicit Pass-Through with FedSQL

Simon 19 vistas

In the classic SAS© ecosystem (v9), the concept of SQL Pass-Through (implicit or explicit) is well-known for delegating computations to the database. With the arrival of SAS© Viya and the in-memory engine CAS (Cloud Analytic Services), this mechanism has evolved.

There are now two ways to perform SQL in Viya:

  1. The traditional approach: Via SAS©/ACCESS engines in the SPRE environment (the "classic" side of Viya).

  2. The modern approach: Via the FedSQL language connected directly to CAS.

This article focuses on this second method: how FedSQL optimizes your queries by intelligently deciding whether to execute them in CAS or directly in the source database.

The Evolution of FedSQL Behavior

To understand the current optimization, one must look at the engine's evolution:

  • In SAS© Viya 3.2 (The past): The default behavior was systematic loading. If you queried an Oracle or PostgreSQL table not loaded in memory, CAS would first load the entire table into memory ("on-the-fly loading"), and then execute the query. This was functional but sometimes resource-intensive for simple filters.

  • Since SAS© Viya 3.3 (The present): Implicit Pass-Through was introduced. If the tables are not loaded, CAS attempts to translate the FedSQL query into native SQL of the database, sends it for execution, and loads only the result into memory.

Conditions for Success

For Implicit Pass-Through to activate, several conditions must be met:

  1. The source tables are not already loaded into memory in CAS.

  2. All queried tables belong to the same CASLIB (i.e., the same database connection).

  3. The query syntax is compatible with the target database (no untranslatable proprietary SAS© functions).

Analysis by Example

Let's use the _method option in PROC FEDSQL to visualize the execution plan and understand what's happening under the hood.

Case 1: Successful Pass-Through (Maximum Optimization)

Let's imagine a join between two tables (film and film_category) located in the same Postgres CASLIB (pg_db1).

1caslib pg_db1 datasource=(srctype="postgres", ...);
2 
3PROC FEDSQL sessref=mySession _method;
4 create TABLE casuser.myresults{options replace=true} as
5 select film.title, film_category.category_id
6 from pg_db1."film" as film,
7 pg_db1."film_category" as film_category
8 where film.film_id=film_category.film_id and
9 film_category.category_id=1;
10QUIT;
Log Analysis: If Pass-Through is successful, you will see an Offloaded SQL statement section showing the native SQL query sent to Postgres, followed by the note:

NOTE: The SQL statement was fully offloaded to the underlying data source via full pass-through

Here, only the filtered result is returned to CAS.

Case 2: Impossible Pass-Through (Multi-Source)

If we perform the same join, but the tables come from two different databases (and therefore two distinct CASLIBs, pg_db1 and pg_db2).

1PROC FEDSQL sessref=mySession _method;
2 create TABLE casuser.myresults as
3 select film.title, film_category.category_id
4 from pg_db1."film" as film,
5 pg_db2."film_category" as film_category
6 /* ... suite de la requête ... */
7QUIT;

Log Analysis: The engine cannot send a single SQL query to two separate servers. The execution plan will show:

SeqScan from PG_DB1.film SeqScan from PG_DB2.film_category

CAS will temporarily load the tables (or scan them sequentially) and perform the join (HashJoin or MergeJoin) itself in memory.

Case 3: Unsupported SAS© Functions

Using SAS©-specific functions (like PUT with a SAS© format) often prevents translation into standard ANSI SQL.

1/* Utilisation de l'option requireFullPassThrough pour forcer le test */
2PROC FEDSQL sessref=mySession _method cntl=(requireFullPassThrough);
3 create TABLE pg_db1.myresults as
4 select * from pg_db1."customer"
5 where put(create_date,ddmmyy10.)='14/02/2006'; /* Fonction SAS */
6QUIT;

Result: Since we enabled the requireFullPassThrough control option, the query will intentionally fail instead of fetching the data:

NOTE: Full pass-through to the underlying data source was not possible. Stopping execution.

Without this option, CAS would have brought back the entire customer table to filter the date locally, which can be disastrous for performance on large volumes.

Advanced Control Options

FedSQL offers two powerful options for the Query Planner to control this behavior:

  1. requireFullPassThrough: Stops execution if the query cannot be fully delegated to the database. This is a vital safeguard to prevent unintentional loading of massive amounts of data.

  2. disablePassThrough: Forces tables to be loaded into CAS for local processing (useful if the database server is overloaded but CAS is powerful).

Limitations to Be Aware Of

  • Full Query Only: Currently, the mechanism only supports "Full-Query Pass-Through". This means that the entire query must be translatable. If only a part (like a WHERE filter) is translatable but not the rest, partial optimization is not yet applied (unlike classic SAS©/ACCESS).

  • Data Sources: This feature is supported for Hadoop, Impala, ODBC, Oracle, PostgreSQL, Teradata, Amazon Redshift, DB2, and SAP HANA. (Note the notable absence of SQL Server in some initial versions; check according to your Viya version).

  • Case Sensitivity: FedSQL uppercases names by default. For case-sensitive databases (like Postgres), use double quotes (e.g., pg_db1."table").