The modern approach: Via the FedSQL language connected directly to CAS.
This article focuses on this second method: how FedSQL optimizes your queries by intelligently deciding whether to execute them in CAS or directly in the source database.
Let's use the _method option in PROC FEDSQL to visualize the execution plan and understand what's happening under the hood.
Case 1: Successful Pass-Through (Maximum Optimization)
Let's imagine a join between two tables (film and film_category) located in the same Postgres CASLIB (pg_db1).
caslib pg_db1 datasource=(srctype="postgres", ...);
proc fedsql sessref=mySession _method;
create table casuser.myresults{options replace=true} as
select film.title, film_category.category_id
from pg_db1."film" as film,
pg_db1."film_category" as film_category
where film.film_id=film_category.film_id and
film_category.category_id=1;
quit;
create TABLE casuser.myresults{options replace=true} as
5
select film.title, film_category.category_id
6
from pg_db1."film" as film,
7
pg_db1."film_category" as film_category
8
where film.film_id=film_category.film_id and
9
film_category.category_id=1;
10
QUIT;
Log Analysis: If Pass-Through is successful, you will see an Offloaded SQL statement section showing the native SQL query sent to Postgres, followed by the note:
NOTE: The SQL statement was fully offloaded to the underlying data source via full pass-through
Here, only the filtered result is returned to CAS.
Case 2: Impossible Pass-Through (Multi-Source)
If we perform the same join, but the tables come from two different databases (and therefore two distinct CASLIBs, pg_db1 and pg_db2).
proc fedsql sessref=mySession _method;
create table casuser.myresults as
select film.title, film_category.category_id
from pg_db1."film" as film,
pg_db2."film_category" as film_category
/* ... suite de la requête ... */
quit;
1
PROC FEDSQL sessref=mySession _method;
2
create TABLE casuser.myresults as
3
select film.title, film_category.category_id
4
from pg_db1."film" as film,
5
pg_db2."film_category" as film_category
6
/* ... suite de la requête ... */
7
QUIT;
Log Analysis:
The engine cannot send a single SQL query to two separate servers. The execution plan will show:
SeqScan from PG_DB1.filmSeqScan from PG_DB2.film_category
CAS will temporarily load the tables (or scan them sequentially) and perform the join (HashJoin or MergeJoin) itself in memory.
/* Utilisation de l'option requireFullPassThrough pour forcer le test */
proc fedsql sessref=mySession _method cntl=(requireFullPassThrough);
create table pg_db1.myresults as
select * from pg_db1."customer"
where put(create_date,ddmmyy10.)='14/02/2006'; /* Fonction SAS */
quit;
1
/* Utilisation de l'option requireFullPassThrough pour forcer le test */
where put(create_date,ddmmyy10.)='14/02/2006'; /* Fonction SAS */
6
QUIT;
Result:
Since we enabled the requireFullPassThrough control option, the query will intentionally fail instead of fetching the data:
NOTE: Full pass-through to the underlying data source was not possible. Stopping execution.
Without this option, CAS would have brought back the entire customer table to filter the date locally, which can be disastrous for performance on large volumes.
Advanced Control Options
FedSQL offers two powerful options for the Query Planner to control this behavior:
requireFullPassThrough: Stops execution if the query cannot be fully delegated to the database. This is a vital safeguard to prevent unintentional loading of massive amounts of data.
disablePassThrough: Forces tables to be loaded into CAS for local processing (useful if the database server is overloaded but CAS is powerful).
Data Sources: This feature is supported for Hadoop, Impala, ODBC, Oracle, PostgreSQL, Teradata, Amazon Redshift, DB2, and SAP HANA. (Note the notable absence of SQL Server in some initial versions; check according to your Viya™ version).
Case Sensitivity: FedSQL uppercases names by default. For case-sensitive databases (like Postgres), use double quotes (e.g., pg_db1."table").
Wichtiger Haftungsausschluss
Die auf WeAreCAS.eu bereitgestellten Codes und Beispiele dienen Lehrzwecken. Es ist zwingend erforderlich, sie nicht blind in Ihre Produktionsumgebungen zu kopieren. Der beste Ansatz besteht darin, die Logik zu verstehen, bevor sie angewendet wird. Wir empfehlen dringend, diese Skripte in einer Testumgebung (Sandbox/Dev) zu testen. WeAreCAS übernimmt keine Verantwortung für mögliche Auswirkungen oder Datenverluste auf Ihren Systemen.
SAS und alle anderen Produkt- oder Dienstleistungsnamen von SAS Institute Inc. sind eingetragene Marken oder Marken von SAS Institute Inc. in den USA und anderen Ländern. ® zeigt die Registrierung in den USA an. WeAreCAS ist eine unabhängige Community-Site und nicht mit SAS Institute Inc. verbunden.
Diese Website verwendet technische und analytische Cookies, um Ihre Erfahrung zu verbessern.
Mehr erfahren.