ETL SAS VIYA CAS

SAS Viya Memory Optimization: Saving Space with CAS Table Compression

This code is also available in: Deutsch Español
Difficulty Level
Intermediate
Published on :
Simon

Expert Advice

Simon
Expert SAS et fondateur.

In SAS Viya, compression isn't a "set it and forget it" feature. Because CAS is a massively parallel processing (MPP) engine, the way you store data directly impacts how it is distributed across your nodes.

Compression vs. Partitioning
Your code demonstrates combining PARTITION= and COMPRESS=. This is an expert-level pattern. By partitioning the data, you reduce the amount of data that needs to be decompressed for specific queries, as CAS only "touches" the partitions required for the analysis.
SAS© Cloud Analytic Services supports in-memory table compression. When a table is loaded or created via a DATA step or PROC CAS with the COMPRESS=YES option, the CAS server compresses the rows. Compression applies to all variables (character and numeric) and all rows in the table. Although compression reduces memory usage, it can lead to performance degradation for operations requiring data block decompression. Compression ratios vary depending on the data type, with character variables containing many blanks generally compressing better. Specific considerations apply when using WHERE clauses or SASHDAT tables.
Data Analysis

Type : CREATION_INTERNE


Examples use generated data (datalines) or the SASHELP library.

1 Code Block
DATA Step Data
Explanation :
This example illustrates creating a compressed CAS table from an existing table using a DATA Step. The `compress=yes` option is specified in the output table options, instructing CAS to compress data in memory. A `PROC CASUTIL CONTENTS` is used to verify table properties, including compression status.
Copied!
1/* Établit une connexion CAS (à adapter à votre environnement) */
2LIBNAME mycas cas caslib=casuser;
3 
4/* Crée une table non compressée pour l'exemple */
5DATA mycas.ventes_non_compresse;
6 LENGTH produit $20 quantite prix 8;
7 INPUT produit $ quantite prix;
8 DATALINES;
9Pomme 10 1.50
10Banane 5 0.75
11Orange 12 1.20
12;
13RUN;
14 
15/* Crée une copie compressée de la table 'ventes_non_compresse' */
16DATA mycas.ventes_compresse (compress=yes);
17 SET mycas.ventes_non_compresse;
18RUN;
19 
20/* Affiche les propriétés de la table compressée (y compris la compression) */
21PROC CASUTIL incaslib="casuser" outcaslib="casuser";
22 contents casdata="ventes_compresse";
23RUN;
24QUIT;
2 Code Block
DATA Step Data
Explanation :
This example shows how to add new rows to an already compressed CAS table. The `append=yes` option is used in the DATA Step. New rows, even if not explicitly compressed in the append DATA Step, will be automatically compressed by the CAS server because the `mycas.ventes_compresse` table is already defined as compressed. This is demonstrated by the absence of the `compress=yes` option in the append DATA Step.
Copied!
1/* Établit une connexion CAS (à adapter à votre environnement) */
2LIBNAME mycas cas caslib=casuser;
3 
4/* Assurez-vous que la table compressée de l'exemple 1 existe */
5PROC CASUTIL incaslib="casuser" outcaslib="casuser";
6 load casdata="ventes_non_compresse" outcasdata="ventes_compresse" replace options=(compress='YES');
7RUN;
8 
9/* Ajout de nouvelles lignes à la table compressée existante */
10DATA mycas.ventes_compresse (append=yes);
11 LENGTH produit $20 quantite prix 8;
12 INPUT produit $ quantite prix;
13 DATALINES;
14Poire 7 2.10
15Kiwi 8 1.80
16;
17RUN;
18 
19/* Affiche le contenu de la table compressée mise à jour */
20PROC PRINT DATA=mycas.ventes_compresse;
21RUN;
3 Code Block
DATA Step Data
Explanation :
This example creates a CAS table that is both partitioned and compressed. The `partition=(categorie)` option specifies the partitioning variable, and `compress=yes` enables compression for the in-memory table. New rows added to this table will be automatically partitioned and compressed accordingly. `PROC CASUTIL CONTENTS` with the `details` option allows verification of these attributes.
Copied!
1/* Établit une connexion CAS (à adapter à votre environnement) */
2LIBNAME mycas cas caslib=casuser;
3 
4/* Crée une table SAS en local pour l'exemple */
5DATA temp_produits;
6 LENGTH categorie $10 produit $20 quantite prix 8;
7 INPUT categorie $ produit $ quantite prix;
8 DATALINES;
9Fruits Pomme 10 1.50
10Fruits Banane 5 0.75
11Legumes Carotte 20 0.50
12Legumes Salade 3 2.00
13;
14RUN;
15 
16/* Charge et compresse la table, en la partitionnant par 'categorie' */
17DATA mycas.produits_part_comp (partition=(categorie) compress=yes);
18 SET temp_produits;
19RUN;
20 
21/* Affiche les propriétés de la table (y compris les informations de partition et de compression) */
22PROC CASUTIL incaslib="casuser" outcaslib="casuser";
23 contents casdata="produits_part_comp" details;
24RUN;
25QUIT;
4 Code Block
PROC CAS Data
Explanation :
This example uses `PROC CAS` to load a CSV file into memory and compress it directly. The `table.loadTable` method is used, and the `compress=TRUE` option is passed in `casOut` to ensure the table is compressed upon loading. `table.tableInfo` verifies that the table is indeed compressed by displaying details such as compressed and uncompressed size. Note that for a table already present in CAS (e.g., SASHDAT), the behavior of COMPRESS= may vary.
Copied!
1/* Établit une connexion CAS (à adapter à votre environnement) */
2options casport=5570 cashost="cloud.example.com";
3 
4/* Crée une table SASHELP simple à charger */
5DATA _null_;
6 SET sashelp.class;
7 file "/tmp/class.csv" dlm=',';
8 IF _n_ = 1 THEN put 'Name,Sex,Age,Height,Weight';
9 put name sex age height weight;
10RUN;
11 
12/* Charge le fichier CSV en mémoire CAS en spécifiant la compression */
13PROC CAS;
14 SESSION casauto;
15 TABLE.loadTable /
16 caslib="casuser",
17 path="/tmp/class.csv",
18 casOut={name="class_compresse_cas", compress=TRUE, replace=TRUE},
19 promote=TRUE;
20 RUN;
21 
22 /* Affiche les informations de la table pour vérifier la compression */
23 TABLE.tableInfo / caslib="casuser", name="class_compresse_cas", fullinfo=TRUE;
24 RUN;
25QUIT;
Pro Tip
While the DATA step is convenient for compression, using PROC CAS or PROC CASUTIL to load data (as seen in Example 4) is generally more efficient for large external files. It allows the CAS server to handle the compression during the initial parallel load, rather than forcing the SAS Compute Server to process it row-by-row before sending it to CAS.
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
Copyright Info : Copyright © SAS Institute Inc. All Rights Reserved.


Related Documentation

Aucune documentation spécifique pour cette catégorie.