CAS

CAS Storage Optimization: Mastering the DVR Format When Loading Data

Simon 24/05/2023 7 views

In SAS© Viya environments, RAM management is critical. As data volumes increase, it becomes imperative to use intelligent storage strategies. One of the most effective methods is the use of the deafaultmemoryFormat="DVR" (Duplicate Value Reduction) parameter, which significantly reduces the size of in-memory tables by compressing repeated values.

However, applying this format when loading data from a classic SAS© library (like WORK) to CAS can be technically confusing. This article explores the limitations of classic methods and presents the optimal solution.

The Challenge: Loading SAS© Data to CAS with DVR

The standard approach for loading a SAS© table (located on the Compute Server) to the CAS server typically involves using PROC CASUTIL.

1/* Méthode standard - ne permet pas l'optimisation DVR directe */
2PROC CASUTIL;
3 load DATA=maTableWork casout="maTableCAS";
4QUIT;

The problem is that the LOAD DATA statement of PROC CASUTIL does not offer a native option to define the deafaultmemoryFormat. Therefore, the table is loaded with the default format, consuming more memory than necessary.

The False Good Idea: Two-Step Loading

Faced with this limitation, a commonly attempted workaround consists of:

  1. Loading the table normally (standard format).

  2. Using the table.copyTable action to create a compressed copy in DVR.

  3. Deleting the original table.

While functional, this method is inefficient ("clunky"). It temporarily doubles memory usage and unnecessarily increases I/O.

The Optimal Solution: The UPLOAD Statement in PROC CAS

To load data directly from the SAS© environment (Compute Server) to CAS while applying the DVR format in a single step, the best practice is to use the UPLOAD statement within PROC CAS.

The UPLOAD statement not only allows transferring the file but also offers granular control over output parameters (casout) and import options.

Advantages of this method

  1. Direct: No temporary table needed.

  2. DVR Compression: Immediate application of duplicate reduction.

  3. VARCHAR Conversion: Possibility to convert fixed character strings to VARCHAR on the fly, offering additional memory space reduction.

Note :
Code Example
Here's how to load a table located in the WORK library directly in DVR format:
1PROC CAS;
2 /* Nettoyage préalable si nécessaire */
3 ACTION TABLE.droptable / name="somedata" quiet=true;
4
5 /* Chargement optimisé */
6 upload /
7 /* Récupération dynamique du chemin physique de la table SAS */
8 path="%sysfunc(pathname(work))/somedata.sas7bdat"
9
10 /* Configuration de la table de sortie CAS */
11 casout={
12 caslib="casuser"
13 name="somedata"
14 promote=true, /* Rendre la table globale */
15 memoryformat="DVR", /* Activation de la compression DVR */
16 replication=0 /* Ajuster la réplication selon les besoins */
17 }
18
19 /* Options d'importation supplémentaires */
20 importoptions={
21 filetype="BASESAS",
22 varcharConversion=17 /* Convertit les CHAR > 16 octets en VARCHAR */
23 }
24 ;
25QUIT;

Why does this work better?

The UPLOAD statement in PROC CAS acts as a direct bridge. By specifying the physical path of the .sas©7bdat file (via %sysfunc(pathname(work))), you instruct the CAS server to read the file and immediately structure it according to your casout specifications.

The varcharConversion option is an ideal complement to DVR. While DVR compresses repeated values, switching to the VARCHAR type reduces the space allocated to variable-length character strings, thus maximizing storage efficiency.

The use of the DVR format is a powerful lever for optimizing your SAS© Viya environment. To implement it effectively from existing SAS© data, abandon the intermediate steps of PROC CASUTIL in favor of the UPLOAD statement in PROC CAS. You will gain in performance, code simplicity, and memory space.