Fedsql

SAS Viya Optimization: Emulating PROC SQL with CAS-Enabled PROC FedSQL

Simon 7 views

Adopting SAS© Viya often comes with a performance goal: leveraging the in-memory CAS (Cloud Analytic Services) engine to accelerate data processing. For users accustomed to PROC SQL, this often involves migrating to PROC FedSQL.

However, although PROC SQL complies with the ANSI 92 standard, it also allows for many unique features specific to SAS©. PROC FedSQL, on the other hand, complies with the ANSI 99 standard and is optimized for distributed processing, but it does not handle some implicit SAS© syntax in the same way.

This article explores how to refactor classic PROC SQL conditional logic to work efficiently in a CAS environment with PROC FedSQL.

SAS Viya Optimization: Emulating PROC SQL with CAS-Enabled PROC FedSQL -

The Challenge: Handling Booleans

In classic SAS©, a common trick in PROC SQL is to sum the result of a boolean expression. SAS© evaluates a true condition as being equal to 1 and a false one as equal to 0.

Let's take this example using the sashelp.baseball table. We want to count the total number of items and calculate a sum based on a complex condition (where the absolute difference between hits and runs is less than 10% of at-bats).

Original PROC SQL Code:

1PROC SQL;
2 create TABLE BenchMark as
3 select count(*) as ItemCount
4 , sum( abs( nhits - nruns ) < 0.1*natbat ) as DIFF_10
5 from sashelp.baseball;
6RUN;
In this code, the expression abs( nhits - nruns ) < 0.1*natbat implicitly returns 1 or 0, which allows the SUM function to work directly.

The Solution: Using CASE in PROC FedSQL

When we switch to PROC FedSQL to run this code in a distributed manner on CAS, this implicit boolean-to-integer conversion is not supported in the same way by the stricter ANSI standard. If you try to run the code as-is, it will fail or not produce the expected result.

To emulate this behavior, we must make the logic explicit by using a CASE statement.

Refactored PROC FedSQL Code for CAS:

1PROC FEDSQL sessref=casauto;
2 create TABLE BenchMark as
3 select count(*) as ItemCount
4 , sum(case
5 when (abs (nhits - nruns ) < (0.1*natbat)) is true THEN 1
6 END
7 ) as DIFF_10
8 from baseball;
9QUIT;

Analysis of Changes

  1. CASE Statement: We have encapsulated the condition in a CASE ... WHEN ... THEN ... END structure.

  2. Explicitness: The clause WHEN ... IS TRUE THEN 1 forces the engine to return the numeric value 1 when the condition is met. If the condition is false, the CASE implicitly returns NULL (or one could add an ELSE 0), which is ignored by the SUM function, thus replicating the behavior of the original PROC SQL.

  3. CAS Execution: Thanks to the sessref=casauto option (or the name of your CAS session), this code now runs with the power of distributed computing.

The goal when migrating to SAS© Viya is to use CAS to accelerate existing routines. Refactoring PROC SQL code into PROC FedSQL is an excellent way to achieve this, provided you master the syntactical nuances between the ANSI standards.

For PROC SQL code that proves too complex to refactor or uses SAS©-specific features not supported by FedSQL, it is still possible to run it as-is in the SAS© Programming Runtime Environment (SPRE) of Viya.