CAS

Why (and how) CAS crushes traditional performance

Simon 24/03/2021 6 views

Inertia is a powerful phenomenon. Even among the most fervent SAS© users, the adoption of CAS (Cloud Analytic Services), SAS© Viya's in-memory engine, sometimes raises reluctance. Why change a recipe that works (DATA Step, PROC SQL) for a new paradigm?

The answer lies in one word: Speed.

To convince the "true believers" of SAS© who hesitate to take the plunge, nothing beats a demonstration by proof. We compared the performance of the traditional SAS© engine (Base SAS©) against CAS Actions on common data manipulation tasks.

Note: The following tests were performed on a dataset of 160 million rows, hosted on a modest virtual architecture (5 nodes).

Note :
1. Aggregation (Group By): Low Cardinality
When you need to sum variables based on a few groups (e.g., Installation Type and Product Line), the natural reflex is to use PROC MEANS or PROC SUMMARY. In CAS, the optimized equivalent is the simple.summary action.

The Test:

Data: 160 million rows.

Groups: 8 unique combinations (low cardinality).

Compared Code:

Classic SAS©:
1PROC MEANS DATA=mega_corp NOPRINT;
2 VAR revenue expenses;
3 CLASS facilityType productline;
4 OUTPUT OUT=summaryMC SUM=;
5RUN;
Note :
CAS (simple.summary Action):
1PROC CAS;
2 SIMPLE.summary /
3 inputs={"revenue","expenses"},
4 subSet={"SUM"},
5 TABLE={name="mega_corp", groupBy={"facilityType","productline"}},
6 casout={name="summaryMC", replace=True};
7QUIT;
Résultat SAS
Gain: CAS is approximately 20 times faster.

2. Aggregation: High Cardinality

A common (and often outdated) criticism suggests that in-memory engines struggle when the number of groups explodes. Let's verify this by increasing the complexity.

The Test:

  • Data: 160 million rows.

  • Groups: 88,000 unique combinations (Product ID, date, unit).

Illustration

3. Deduplication (Removing Duplicates)

Removing duplicates is a heavy task that often involves costly sorting. In classic SAS©, this is the domain of PROC SORT with the NODUPKEY option. In CAS, the recommended action for this task has evolved (see technical note below), but the principle remains efficient grouping.

The Test:

  • Task: Keep a unique row for each combination (Product, Date, Unit).

  • Unique Keys: 88,000.

Illustration

4. Important Technical Note: The evolution towards deduplication.deduplicate

Techniques evolve with Viya versions. Although the simple.groupBy action is very powerful, SAS© Viya has introduced a specialized action: deduplication.deduplicate.

The good news? You don't always need to relearn the syntax. If your source and target data are in CAS, using your good old PROC SORT NODUPKEY will often be automatically translated by SAS© into the optimized deduplication.deduplicate action. You keep the simple syntax, and SAS© handles the performance.

The leap to CAS requires a learning effort, particularly to master the CASL language and CAS Actions. However, for anyone dealing with large volumes of data (Big Data), the return on investment in terms of processing time is immediate and spectacular.