SAS VIYA Guide

Understanding and Managing Caslibs in SAS Viya

Simon 23/04/2021 11 vues

In the SAS©® Viya™ architecture, effective data management relies on a fundamental concept: the Caslib. A Caslib is an in-memory space on the CAS (Cloud Analytics Services) server intended to hold tables, access controls, and information about data sources.

This article explores the different types of Caslibs, their scope, and how to manipulate them using SAS© code, with a focus on best practices for loading and sharing data.

What is a Caslib?

A Caslib acts as a unified access point. It allows connecting the CAS server to:

  1. External data sources (files, databases like Oracle or Hadoop).

  2. In-memory tables that have been loaded onto the CAS server.

It also associates access controls that define which user groups or individuals are allowed to interact with the data.

Caslib Types

There are three main categories of Caslibs, defined by how they are created and managed:

1. Personal Caslib

This library is configured during the CAS server installation. When a CAS session is initiated, the personal Caslib is always available with a global scope for the current user. It allows access to CAS tables from any session using the same user ID (e.g., casuser).

2. Predefined Caslib

Managed by CAS administrators, these libraries have a global scope. They are typically used for popular data sources shared by a wide range of users (for example, a Hadoop-Hive or Oracle connection common to the entire team). The administrator manages access permissions.

3. Manually Added Caslib

Authorized users can add Caslibs via a CASLIB statement (for example in SAS©® Studio). This is the preferred method for ad hoc data access, when the user does not necessarily want to share the data with the entire server.


Caslib Scope: Session vs. Global

The concept of scope is crucial for understanding data visibility and persistence.

Session-Scope Caslib

If a Caslib is defined without the GLOBAL option, it is limited to the current session.

  • Availability: Tables loaded into this Caslib are only visible to the user's specific CAS session.

  • Persistence: If the user opens a new session, the Caslib and its tables will no longer be accessible.

Code Example (Session-Scope): The code below creates a local Hive connection to the session. Note the absence of the GLOBAL option and the PROMOTE option.

Illustration
1CAS mySession host="myServer.com" SESSOPTS=(CASLIB=casuser TIMEOUT=999 LOCALE="en_US");
2 
3/* Assignation d'une Caslib Hive standard (Session scope) */
4caslib hivelib desc="HIVE Caslib"
5 datasource=(SRCTYPE="HIVE",SERVER="myServerHadoop.com",
6 HADOOPCONFIGDIR="/opt/sas/hadoop/client_conf/",
7 HADOOPJARPATH="/opt/sas/hadoop/client_jar/",
8 schema="default", dfDebug=sqlinfo);
9 
10/* Chargement des tables Hive (En mémoire) */
11PROC CASUTIL;
12 load casdata="stocks" casout="stocks" outcaslib="hivelib" incaslib="hivelib";
13QUIT;

If the user tries to access this stocks table from a new session (mySession2), they will receive an error indicating that the Caslib does not exist:

ERROR: The caslib 'hivelib' does not exist in this session.

Global-Scope Caslib

A Caslib defined with the GLOBAL option is accessible to other users or sessions, subject to access controls.

  • Sharing: For a table to be shared, it must be loaded with the PROMOTE option.

  • Persistence: The library definition persists beyond the single session.

Code Example (Global-Scope): Here, the GLOBAL option makes the library persistent, and PROMOTE makes the table accessible to others.

1CAS mySession host="myServer.com" SESSOPTS=(CASLIB=casuser TIMEOUT=999 LOCALE="en_US");
2 
3/* Assignation d'une Caslib Hive Globale */
4caslib hivelib desc="HIVE Caslib"
5 datasource=(SRCTYPE="HIVE",SERVER="myServerHadoop.com",
6 HADOOPCONFIGDIR="/opt/sas/hadoop/client_conf/",
7 HADOOPJARPATH="/opt/sas/hadoop/client_jar/",
8 schema="default", dfDebug=sqlinfo) GLOBAL;
9 
10/* Chargement et promotion de la table */
11PROC CASUTIL;
12 load casdata="stocks" casout="stocks" outcaslib="hivelib" incaslib="hivelib" PROMOTE;
13QUIT;
Once promoted, the table is visible to any user with access rights. Another user can then see and use this table via the CASLIB _ALL_ ASSIGN; statement.

Data Loading Mechanisms

It is important to distinguish between two mechanisms when loading data into a Caslib:

  1. Client-side load: The data resides on the client machine (for example, where SAS© Studio is running). The transfer is from the client to the CAS server.

    • Syntax: Use load data=... in PROC CASUTIL.

  2. Server-side load: The data files are physically accessible by the CAS server (on the CAS controller or an NFS mount). This is often more performant for large volumes.

    • Syntax: Use load casdata=... via data connectors (like PATH or HIVE).

    • Technical note: For path-based Caslibs (PATH), the physical path must be on the CAS controller server (or a shared drive mounted on it).

Administration and Deletion

CAS administrators can manage privileges via SAS© Environment Manager. However, it is sometimes necessary to manage Caslibs programmatically.

To drop a Caslib (if you have permissions), you can use the PROC CAS procedure:

1PROC CAS;
2 TABLE.dropCaslib caslib="NomDeLaCaslib" quiet=TRUE;
3RUN;
4QUIT;