CAS

CAS vs SAS: Understanding Data Step Performance and Behavior Differences

Simon 24/08/2022 9 Aufrufe

When transitioning to SAS© Viya, a common question arises among developers: why does some code appear much faster in CAS, while in other cases, the gain is less obvious? More importantly, why do some logical codes, which worked perfectly under SAS© 9, produce different results in CAS?

This article explores the underlying mechanisms that distinguish the traditional SAS© engine (SMP) from the distributed CAS engine (MPP), focusing on the DATA step.

1. Speed: In-Memory and Disk I/O

One of the most obvious advantages of CAS is in-memory processing.

  • Classic SAS© (SAS© 9): Processing often involves reading and writing to the hard drive. Even if the processor is fast, the bottleneck is often the writing (I/O) step.

  • CAS: Data resides in Random Access Memory (RAM). Eliminating disk writing steps significantly accelerates overall execution time.

This often explains why code generating large amounts of data (such as a loop creating millions of rows with calculations) runs faster in CAS: the time saved comes not only from CPU computation, but from the absence of physical disk writing.

2. The "Single Thread" Trap

However, it is crucial to note that CAS is not magic. Not everything automatically runs in parallel.

Consider the example of a DATA step that generates data without an input table (for example, a DO loop from 1 to 5 million to create simulated data):

1/* Exemple de génération sans entrée */
2DATA casuser.simulation;
3 DO i=1 to 5000000;
4 /* calculs complexes */
5 OUTPUT;
6 END;
7RUN;

In this specific case, CAS will execute this code on a single thread (monothread). The log will also display an explicit note:

NOTE: The DATA step has no input data set and will run in a single thread.

In a purely single-threaded scenario, the classic SAS© 9 engine can sometimes be more efficient and even faster than CAS, as it has less system management "overhead" than the distributed engine. The power of CAS lies in distribution; without distribution, this gain can be negated.

3. The Paradigm Shift: Parallelism and Partitioning

The real architectural difference appears when the DATA step reads a distributed table. CAS divides the data into blocks and distributes them across multiple threads (and potentially multiple nodes/machines).

This introduces major behavioral changes for certain historical SAS© instructions.

The case of RETAIN

In SAS© 9, the RETAIN statement retains a value from one observation to the next. In CAS, this retention occurs within the same thread. A value retained in "Thread 1" is not visible to "Thread 2."

The trap of END=EOF

This is undoubtedly the most frequent trap. In classic SAS©, the condition if eof; is triggered only once, at the very end of the table.

In a distributed environment (multi-thread), the logic changes:

  • Each thread processes its portion of data.

  • Each thread has its own end-of-file indicator.

If you execute this code in CAS with multiple threads:

1DATA casuser.resultat;
2 SET casuser.SOURCE END=eof;
3 IF eof;
4 /* Somme cumulative ou autre logique de fin */
5RUN;

You will not get a single row, but as many rows as there are active threads. If your session uses 36 threads, you will have 36 observations in the output, each representing the last row processed by that specific thread.

To obtain a global total, an additional aggregation step (post-processing) is often necessary.

4. Function Compatibility

Finally, when migrating code, it is important to remember that not all SAS© functions are ported to CAS. If a function is not natively supported by the CAS engine, the system may sometimes transfer the data back to the Compute Server to perform the processing, which negates the performance benefits of distributed processing. It is recommended to check the documentation to ensure that the functions used are "CAS-enabled."

The transition from SAS© 9 to CAS is not just about changing the LIBNAME name.

  • For performance: CAS excels in massive in-memory and parallel processing, but may be less performant on small volumes or purely sequential (single-threaded) tasks.

  • For logic: Developers must adapt their "mental map." The concept of sequential reading from beginning to end of a single file disappears in favor of independent block processing.