Fedsql

Advanced Tutorial: Manipulating Native JSON in SAS Viya with FedSQL

Simon 28 Aufrufe

In the SAS© Viya environment, the FedSQL language is the standard for interacting with data. While it excels at translating standard queries (ANSI SQL) to databases (Implicit Pass-Through), it shows its limits when you want to use proprietary features of the database engine, such as geospatial functions or JSON parsing.

This article focuses on a powerful technique: Explicit Pass-Through, illustrated here by extracting JSON data stored in PostgreSQL.

The Challenge: The Limits of Automatic Translation

When you write a classic FedSQL query, SAS© tries to translate it into native SQL to send to the database. However, if your query contains operators that SAS© does not know, the translation fails, or SAS© tries to bring all the data back to process it locally.

Let's take the case of PostgreSQL, which has excellent JSON processing capabilities via specific operators like -> or ->>. If you use these symbols directly in a standard FedSQL query, you will get a syntax error because the SAS© interpreter does not recognize them.

The Solution: Explicit Pass-Through

The solution is to use the CONNECTION TO clause. This syntax creates a direct "tunnel" to the database.

The principle is simple: Everything written inside the connection clause is invisible to the SAS©/FedSQL interpreter. SAS© simply takes the string and passes it as is to the PostgreSQL engine.

General Syntax

1PROC FEDSQL sessref=ma_session;
2 create TABLE cas.ma_table_cible as
3 select * from CONNECTION TO nom_de_la_caslib
4 (
5 /* Code SQL Natif (Postgres, Oracle, etc.) */
6 /* Invisible pour SAS */
7 );
8QUIT;

Concrete Case: JSON Data Analysis

Imagine a PostgreSQL table containing a data_json column storing raw information. We want to extract a specific field from this JSON and calculate an aggregation, without ever loading the raw JSON into CAS (which would be cumbersome and unnecessary).

The Code

1PROC FEDSQL sessref=mysession;
2 create TABLE cas.synthese_ventes as
3 select region, total_ventes
4 from connection to ma_lib_postgres
5 (
6 /* Début du code natif PostgreSQL */
7 SELECT
8 /* Extraction de la clé 'region' dans l'objet JSON */
9 data_json ->> 'region' as region,
10 /* Extraction et conversion de la clé 'montant' */
11 SUM(CAST(data_json ->> 'montant' as INTEGER)) as total_ventes
12 FROM
13 ventes_brutes_json
14 GROUP BY
15 data_json ->> 'region'
16 /* Fin du code natif PostgreSQL */
17 );
18QUIT;

Technical Analysis

Here is why this approach is optimal:

  1. Invisibility to the Interpreter (Black Box): The FedSQL interpreter does not "see" the -> and ->> operators. Therefore, it does not attempt to validate them. If it saw them, it would return a syntax error because these symbols do not exist in the ANSI SQL standard supported by SAS©. It is PostgreSQL that receives the raw string and interprets it.

  2. Use of Native Operators:

    • ->: Retrieves a JSON object field as a JSON object.

    • ->>: Retrieves a JSON object field as text (String). These operators allow you to navigate the hierarchical structure of the data directly at the storage engine level.

  3. Performance and "Data Movement": This is the crucial point.

    • Without Explicit Pass-Through: You would have to load the entire table (with the large JSON) into CAS, then try to parse the text with complex SAS© functions. This consumes a huge amount of memory and network bandwidth.

    • With Explicit Pass-Through: PostgreSQL performs the JSON extraction and the ("heavy") aggregation. The CAS server only receives the final result: a small summary table ("a small, ready-to-use table").

Explicit Pass-Through via CONNECTION TO is essential for advanced SAS© Viya users. It allows you to break the barriers of the standard FedSQL language to exploit 100% of your databases' capabilities (JSON parsing, XML, specific indexing), while drastically minimizing data movement to the CAS server.