CAS

Get Started! A Beginner's Guide to Programming in the SAS® Cloud Analytic Services (CAS) Environment

Simon 25 views
Difficulty Level
Débutant
Published on :
Michael

Expert Advice

Michael

The secret to mastering SAS CAS is minimizing data movement between the client and the server. Always aim to process your data 'in-place' using CAS-enabled procedures rather than pulling tables back to the Compute Server for every step. This habit is the key to unlocking the true speed of distributed, in-memory analytics.

Introduction

Starting something new, whether a personal or professional project, can be intimidating. Yet, it is often a very rewarding process. If you are hesitant to start programming with SAS©® Cloud Analytic Services (CAS), know that the benefits are considerable.

CAS is the analytical engine at the heart of SAS© Viya. It is a server that provides a cloud-based execution environment for data management and analysis. Its strength lies in its ability to process data in-memory and in a distributed manner across multiple machines, using multithreaded processing. This allows for extremely high-performance analysis on very large tables.

This article aims to guide you through your first steps with CAS, covering the entire process from establishing a session to analyzing the data.

The CAS Architecture in Brief

The CAS server can run on a single machine (SMP) or in a distributed mode on multiple machines (MPP). The distributed architecture includes:

  • A Controller node: It communicates with client applications and directs the work.

  • Worker nodes: They perform calculations and analysis on the data rows stored in memory on their respective node.

Get Started! A Beginner's Guide to Programming in the SAS® Cloud Analytic Services (CAS) Environment -

The Key Advantages of CAS

Here is a summary of the main advantages of using the CAS environment:

Table 1: Benefits of the CAS Environment

BenefitDescription
Processing SpeedThe distributed architecture and in-memory processing allow for extremely fast analyses on large datasets.
Fault ToleranceIf a worker node stops responding, the controller redirects the work to another node using a copy of the data, ensuring continuity.
ScalabilityIt is possible to horizontally add worker nodes to distribute the load and improve processing times.
Open CodebaseAlthough this article focuses on SAS©, CAS also supports languages like Python, Java, Lua, and R.

Prerequisite: Using SAS©® Studio

For SAS© programmers, SAS© Studio is the recommended interface for interacting with CAS. A web interface accessible from any device, it offers effective tools like predefined code snippets for common tasks (e.g., listing CAS session options).

Get Started! A Beginner's Guide to Programming in the SAS® Cloud Analytic Services (CAS) Environment -

The CAS Programming Workflow

The typical CAS programming process follows these steps:

  1. Establish a CAS session.

  2. Allocate a CAS library (caslib).

  3. Load data into CAS.

  4. Manipulate data (e.g., with the DATA step).

  5. Execute actions with PROC CAS.

  6. Analyze the data.

Step 1: Establish a CAS Session

The first mandatory step is to initiate a CAS session. This is the communication channel between your client (SAS© Studio) and the CAS server. The session manages your authentication, resource allocation, and ensures process isolation.

The basic syntax is very simple:

1cas casauto;

This statement starts a session named CASAUTO with default properties. If the connection is successful, the SAS© log will display information about the port, the session UUID, the user, the default active caslib, and the number of worker nodes used.

Step 2: Allocate a CAS Library (Caslib)

"Caslibs" are the storage spaces for your in-memory data in CAS. They also contain the access controls for the data.

  • Types of caslibs: Personal (accessible only by you), Predefined (global, managed by the administrator), or Manually Added.

  • Scope: "Session" (visible only in your current session) or "Global" (shared between sessions).

To list the properties of your default active caslib (often CASUSER), use:

1caslib casuser list;

Important: The Link between Traditional SAS© and CAS The name of a caslib is not a standard SAS© "libref". To use traditional SAS© procedures or the DATA step with CAS tables, you must associate a libref with the caslib via the LIBNAME statement using the CAS engine:

1LIBNAME mycas cas caslib=casuser;

Step 3: Load Data into CAS

CAS offers great flexibility for loading data (SAS© files, CSV, Excel, databases...). Here are three common methods:

Table 2: Methods for Loading Data into CAS

MethodDescriptionSimplified Code Example
DATA StepIdeal for experienced SAS© programmers. Allows loading existing SAS© datasets.data mycas.cars; set sashelp.cars; run;
PROC CASUTILUtility procedure dedicated to managing CAS files and tables. Similar to PROC IMPORT for external files.proc casutil; load file='/path/cars.xls' casout='cars2' importoptions=(filetype='xls'); quit;
PROC CAS (loadTable Action)Uses the CASL language to execute server actions. A powerful and scriptable method.proc cas; table.loadtable / path='cars.csv' casout={name='cars3'} ...; quit;

Step 4: Manipulate Data with the DATA Step

The good news for SAS© users is that the trusty DATA step works in CAS.

The Major Difference: Parallel Processing In CAS, the DATA step runs in a distributed manner. Very large datasets are divided among the available "threads" on the different machines. The DATA step code is copied and executed simultaneously on each thread, processing only the portion of data local to that thread.

Get Started! A Beginner's Guide to Programming in the SAS® Cloud Analytic Services (CAS) Environment -

Example of Parallel Processing: In the example below, we add transaction fees to a large banking table. Using the automatic variable _THREADID_ in the log shows that the code is running on multiple different threads (e.g., 4 threads).

1DATA mycas.updated_transaction_history;
2 SET mycas.transaction_history;
3 /* Logique pour ajouter des frais selon l'année */
4 IF year(transaction_dt)=2013 THEN fee=1;
5 /* ... autres années ... */
6 new_transaction_amt=transaction_amt+fee;
7 put _threadid_=; /* Affiche le numéro du thread dans le journal */
8RUN;

The log will display messages like NOTE: Running DATA step in Cloud Analytic Services. and multiple lines for _THREADID_=1, _THREADID_=2, etc.

Caution with BY-group Processing: Due to data distribution, the order of BY-group processing is not guaranteed in CAS, unlike in classic SAS©. Groups are distributed across nodes, and each node processes its groups independently to speed up the process.

Step 5: Use PROC CAS to Execute Actions

PROC CAS is the interface for executing the CAS Language (CASL). CASL interacts with the server via "actions". Actions are requests for specific tasks (table management, analyses, etc.), grouped into "action sets".

Here are examples of actions from the table action set for managing data:

  1. Check if a table exists and retrieve info:

1PROC CAS;
2 SESSION casauto;
3 /* Vérifier si la table existe */
4 TABLE.tableexists RESULT=r / caslib='casuser' name='updated_transaction_history';
5
6 IF (r.exists) THEN DO;
7 /* Obtenir les infos de la table */
8 TABLE.tableinfo / caslib='casuser' name='updated_transaction_history';
9 /* Récupérer (fetch) un échantillon de lignes */
10 TABLE.fetch / TABLE={caslib='casuser', name='updated_transaction_history'} from=1 to=20;
11 END;
12QUIT;

Promote a table (make it global): Tables created often have "session" scope. The promote action allows them to be made available to other users.

1PROC CAS;
2 SESSION casauto;
3 TABLE.promote / caslib='casuser' name='updated_transaction_history' targetlib='casuser';
4QUIT;

Step 6: Analyze the Data

Analysis can be done via CASL actions or SAS© Viya procedures.

Method A: Use a CAS Action (e.g., simple.freq) The simple action set provides basic analytical functions.

1PROC CAS;
2 SESSION casauto;
3 /* Distribution de fréquence du statut de transaction groupé par année */
4 SIMPLE.freq /
5 inputs={'transaction_status'}
6 TABLE={caslib='casuser', name='updated_transaction_history', groupby={name='year'}};
7QUIT;

Method B: Use a SAS© Viya Procedure (e.g., PROC MDSUMMARY) SAS© Viya offers many procedures (statistics, data mining, machine learning) that run in CAS. PROC MDSUMMARY is a powerful statistical procedure for calculating descriptive statistics.

1/* Préparation des données : ajout d'une colonne mois */
2DATA mycas.updated_transaction_history2;
3 SET mycas.updated_transaction_history;
4 month=put(transaction_dt,monname8.);
5RUN;
6 
7/* Calcul des sommes par année et mois */
8PROC MDSUMMARY DATA=mycas.updated_transaction_history2(where=(fee ne 0));
9 var fee;
10 groupby year month / out=mycas.summary_transaction_history;
11RUN;
12 
13/* Affichage des résultats (car MDSUMMARY ne produit qu'une table de sortie) */
14PROC PRINT DATA=mycas.summary_transaction_history label;
15 title 'Résumé de l\'historique des transactions';
16 var year month _sum_;
17 label _sum_='Total collecté ($)';
18 FORMAT _sum_ dollar8.;
19RUN;

SAS© Cloud Analytic Services (CAS) offers a powerful, distributed, and in-memory environment to accelerate your analyses. By following the steps of connecting, loading data via caslibs, and using familiar tools like the DATA step or new ones like PROC CAS, you can start harnessing the power of SAS© Viya today. Don't be afraid to try something new!