Published on :
ETL INTERNAL_CREATION

Detailed Journey of a DATA Step Program

This code is also available in: Deutsch Español Français
Awaiting validation
The functional analysis of this document highlights the seamless integration of traditional SAS© DATA Step with the distributed CAS processing engine. It explains how classic SAS© statements such as SET, KEEP, and DROP work in CAS, emphasizing that the DATA Step executes where the data is stored. The document also addresses the default multithreaded processing of DATA Steps in CAS, the necessary conditions for a DATA Step to execute in CAS (input and output tables must be CAS tables, or a CAS output table with at least one variable), the role of caslibs as containers for files and in-memory tables, and the recommended use of PROC CASUTIL for loading SAS© datasets to CAS. Finally, it mentions the DSACCEL= system option for controlling the DATA Step execution location.
Data Analysis

Type : INTERNAL_CREATION


The examples use generated data (datalines) or the SASHELP library, ensuring the autonomy of each code block.

1 Code Block
PROC CASUTIL / PROC PRINT Data
Explanation :
This example initializes a CAS session and an associated libname, then loads the first 10 observations of the 'cars' table from the SASHELP library into a CAS table named 'cars_basic'. Finally, it displays the first 10 observations of this CAS table for verification.
Copied!
1cas casauto;
2LIBNAME mycas cas;
3caslib _all_ assign;
4 
5PROC CASUTIL;
6 load DATA=sashelp.cars
7 outcaslib='casuserhdfs'
8 casout='cars_basic' replace;
9RUN; QUIT;
10 
11PROC PRINT DATA=mycas.cars_basic(obs=10);
12title '10 premières observations de SASHELP.CARS dans CAS';
13RUN;
2 Code Block
DATA STEP Data
Explanation :
After loading the 'cars' table into CAS, this code block creates a new CAS table, 'cars_transformed_mpg', by calculating a new variable 'CombinedMPG' from the MPG_Highway and MPG_City variables. It uses a simple weighting (40% highway, 60% city) and formats the new variable, then displays the first 10 rows.
Copied!
1cas casauto;
2LIBNAME mycas cas;
3caslib _all_ assign;
4 
5PROC CASUTIL;
6 load DATA=sashelp.cars
7 outcaslib='casuserhdfs'
8 casout='cars_transformed' replace;
9RUN; QUIT;
10 
11DATA mycas.cars_transformed_mpg(promote=yes);
12 SET mycas.cars_transformed;
13 CombinedMPG = (MPG_Highway * 0.40) + (MPG_City * 0.60);
14 FORMAT CombinedMPG 5.1;
15RUN;
16 
17PROC PRINT DATA=mycas.cars_transformed_mpg(obs=10);
18 var Make Model CombinedMPG;
19 title 'MPG Combiné calculé (10 premières observations)';
20RUN;
3 Code Block
DATA STEP / PROC PRINT Data
Explanation :
This example first loads 'cars' data into CAS. Then, it executes a complex DATA Step in CAS that filters vehicles to include only 'USA' origin SUVs or 'Asia' origin Sedans. A new variable 'AvgMPG' is calculated. The 'BY Origin Type' processing allows grouping of results, and a message is displayed in the log for each new group to illustrate group processing. Finally, it displays the filtered and processed results.
Copied!
1cas casauto;
2LIBNAME mycas cas;
3caslib _all_ assign;
4 
5PROC CASUTIL;
6 load DATA=sashelp.cars
7 outcaslib='casuserhdfs'
8 casout='cars_filtered' replace;
9RUN; QUIT;
10 
11DATA mycas.cars_filtered_grouped(promote=yes);
12 SET mycas.cars_filtered;
13 BY Origin Type;
14 where (Type = 'SUV' and Origin = 'USA') or (Type = 'Sedan' and Origin = 'Asia');
15 AvgMPG = mean(MPG_City, MPG_Highway);
16 IF first.Origin THEN put '---- Nouvelle Origine et Type ----';
17 put Origin= Type= AvgMPG= Make=;
18 keep Origin Type Make AvgMPG;
19RUN;
20 
21PROC PRINT DATA=mycas.cars_filtered_grouped;
22 title 'MPG Moyen des SUV Américains et Sedans Asiatiques';
23RUN;
4 Code Block
PROC CASUTIL Data
Explanation :
This example highlights PROC CASUTIL's table management capabilities. It first creates a small temporary CAS table. Then, it uses 'LIST TABLES' to view available tables, 'CONTENTS' to get detailed information about the temporary table, and 'DROPTABLE' to remove it from CAS memory. A second 'LIST TABLES' command confirms the deletion. This demonstrates the basic lifecycle of an in-memory CAS table.
Copied!
1cas casauto;
2LIBNAME mycas cas;
3caslib _all_ assign;
4 
5* Création d'une table CAS temporaire pour la démonstration;
6DATA mycas.temp_table(promote=yes);
7 x = 1;
8 y = 'Test';
9RUN;
10 
11PROC CASUTIL;
12 list tables caslib='casuserhdfs'; * Liste toutes les tables dans la caslib spécifiée;
13 contents casdata='temp_table'; * Affiche les détails de la table temp_table;
14 droptable casdata='temp_table'; * Supprime la table temp_table de la mémoire CAS;
15 list tables caslib='casuserhdfs'; * Vérifie que la table a été supprimée;
16RUN; QUIT;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
Copyright Info : Copyright © SAS Institute Inc. All Rights Reserved.