The modern approach: Via the FedSQL language connected directly to CAS.
This article focuses on this second method: how FedSQL optimizes your queries by intelligently deciding whether to execute them in CAS or directly in the source database.
Let's use the _method option in PROC FEDSQL to visualize the execution plan and understand what's happening under the hood.
Case 1: Successful Pass-Through (Maximum Optimization)
Let's imagine a join between two tables (film and film_category) located in the same Postgres CASLIB (pg_db1).
caslib pg_db1 datasource=(srctype="postgres", ...);
proc fedsql sessref=mySession _method;
create table casuser.myresults{options replace=true} as
select film.title, film_category.category_id
from pg_db1."film" as film,
pg_db1."film_category" as film_category
where film.film_id=film_category.film_id and
film_category.category_id=1;
quit;
create TABLE casuser.myresults{options replace=true} as
5
select film.title, film_category.category_id
6
from pg_db1."film" as film,
7
pg_db1."film_category" as film_category
8
where film.film_id=film_category.film_id and
9
film_category.category_id=1;
10
QUIT;
Log Analysis: If Pass-Through is successful, you will see an Offloaded SQL statement section showing the native SQL query sent to Postgres, followed by the note:
NOTE: The SQL statement was fully offloaded to the underlying data source via full pass-through
Here, only the filtered result is returned to CAS.
Case 2: Impossible Pass-Through (Multi-Source)
If we perform the same join, but the tables come from two different databases (and therefore two distinct CASLIBs, pg_db1 and pg_db2).
proc fedsql sessref=mySession _method;
create table casuser.myresults as
select film.title, film_category.category_id
from pg_db1."film" as film,
pg_db2."film_category" as film_category
/* ... suite de la requête ... */
quit;
1
PROC FEDSQL sessref=mySession _method;
2
create TABLE casuser.myresults as
3
select film.title, film_category.category_id
4
from pg_db1."film" as film,
5
pg_db2."film_category" as film_category
6
/* ... suite de la requête ... */
7
QUIT;
Log Analysis:
The engine cannot send a single SQL query to two separate servers. The execution plan will show:
SeqScan from PG_DB1.filmSeqScan from PG_DB2.film_category
CAS will temporarily load the tables (or scan them sequentially) and perform the join (HashJoin or MergeJoin) itself in memory.
/* Utilisation de l'option requireFullPassThrough pour forcer le test */
proc fedsql sessref=mySession _method cntl=(requireFullPassThrough);
create table pg_db1.myresults as
select * from pg_db1."customer"
where put(create_date,ddmmyy10.)='14/02/2006'; /* Fonction SAS */
quit;
1
/* Utilisation de l'option requireFullPassThrough pour forcer le test */
where put(create_date,ddmmyy10.)='14/02/2006'; /* Fonction SAS */
6
QUIT;
Result:
Since we enabled the requireFullPassThrough control option, the query will intentionally fail instead of fetching the data:
NOTE: Full pass-through to the underlying data source was not possible. Stopping execution.
Without this option, CAS would have brought back the entire customer table to filter the date locally, which can be disastrous for performance on large volumes.
Advanced Control Options
FedSQL offers two powerful options for the Query Planner to control this behavior:
requireFullPassThrough: Stops execution if the query cannot be fully delegated to the database. This is a vital safeguard to prevent unintentional loading of massive amounts of data.
disablePassThrough: Forces tables to be loaded into CAS for local processing (useful if the database server is overloaded but CAS is powerful).
Data Sources: This feature is supported for Hadoop, Impala, ODBC, Oracle, PostgreSQL, Teradata, Amazon Redshift, DB2, and SAP HANA. (Note the notable absence of SQL Server in some initial versions; check according to your Viya™ version).
Case Sensitivity: FedSQL uppercases names by default. For case-sensitive databases (like Postgres), use double quotes (e.g., pg_db1."table").
Aviso importante
Los códigos y ejemplos proporcionados en WeAreCAS.eu son con fines educativos. Es imperativo no copiarlos y pegarlos ciegamente en sus entornos de producción. El mejor enfoque es comprender la lógica antes de aplicarla. Recomendamos encarecidamente probar estos scripts en un entorno de prueba (Sandbox/Dev). WeAreCAS no acepta ninguna responsabilidad por cualquier impacto o pérdida de datos en sus sistemas.
SAS y todos los demás nombres de productos o servicios de SAS Institute Inc. son marcas registradas o marcas comerciales de SAS Institute Inc. en los EE. UU. y otros países. ® indica registro en los EE. UU. WeAreCAS es un sitio comunitario independiente y no está afiliado a SAS Institute Inc.
Este sitio utiliza cookies técnicas y analíticas para mejorar su experiencia.
Saber más.