Programming Documentation SAS VIYA CAS

Beyond the DATA Step: Unleashing Multithreaded Power with SAS DS2

This code is also available in: Deutsch Español Français
Michael

Expert Advice

Michael
Responsable de l'infrastructure Viya.

To ensure your DS2 program runs at full speed in parallel on the CAS server, avoid using the SQLSTMT package or SQLEXEC functions within your method run(). These packages force the program to "callback" to the client or a single controller, effectively killing your parallel performance. For high-speed data manipulation on CAS, stick to native DS2 variable assignments and the Hash Package, ensuring that your data is partitioned correctly so each thread works on its own independent slice of the data.

When migrating your DS2 logic to the CAS server using the ds2.runDS2 action, you must distinguish between global and local operations.
The DS2 language leverages the multithreaded architecture of SAS© Viya and the CAS server for efficient parallel execution. DS2 programs can be executed via the PROC DS2 procedure, the ds2.runDS2 action of CAS (usable with PROC CAS or third-party languages), and can integrate FedSQL statements. It also offers the ability to publish and execute DATA step and DS2 models on CAS, Hadoop, or Teradata. DS2 supports various data sources, including Apache Spark and JDBC-compatible databases via the SAS© Compute server or SAS© data connectors for CAS. It is important to note that some functionalities, such as the DS2 SQLSTMT package, the DS2 SQLEXEC function, and certain uses of the DS2 HASH package, are not directly supported on the CAS server. On CAS, only caslibs defined in the CAS session are accessible by DS2 programs.