Published on :
ETL SASHELP_CARS

Execute a DATA step in CAS on a CAS table

This code is also available in: Deutsch Español Français
Awaiting validation
Executing a DATA step in CAS means that the SAS© code is processed directly by the CAS server, allowing it to leverage in-memory distributed processing for increased performance on large volumes of data. For a DATA step to execute in CAS, input and output data must be CAS tables, accessible via a CAS engine libref (e.g., 'mycas'). It is crucial that the output table contains at least one variable, as zero-column tables are not supported in CAS. The CASUTIL procedure is recommended for loading existing SAS© datasets into CAS tables. The system automatically detects that a DATA step should run in CAS as soon as a CAS libref is used for input and/or output data. Certain limitations apply to the language elements supported in a DATA step executed in CAS.
Data Analysis

Type : SASHELP_CARS


The examples use the SASHELP.CARS dataset, which is loaded into CAS memory using PROC CASUTIL to ensure the autonomy of the examples.

1 Code Block
DATA STEP Data
Explanation :
This example illustrates the simplest use of a DATA step in CAS. It loads the SASHELP.CARS dataset into CAS memory, then creates a new CAS table named 'cars_with_eff'. A new variable 'Efficiency_Category' is added based on city fuel consumption (MPG_City). The use of 'mycas' librefs for input and output ensures the DATA step executes in CAS. A PROC PRINT is used to display the first observations of the new table, and the temporary table is dropped at the end.
Copied!
1LIBNAME mycas cas; /* Crée un libref CAS */
2 
3PROC CASUTIL outcaslib="casuser" replace;
4 load DATA=sashelp.cars casout="cars_cas";
5RUN;
6 
7DATA mycas.cars_with_eff;
8 SET mycas.cars_cas;
9 IF MPG_City > 25 THEN Efficiency_Category = 'Haute';
10 ELSE Efficiency_Category = 'Standard';
11 RUN;
12 
13PROC PRINT DATA=mycas.cars_with_eff(obs=5);
14 title 'Exemple Basique: Voitures avec catégorie d''efficacité';
15RUN;
16 
17PROC CASUTIL outcaslib="casuser" drop TABLE="cars_with_eff";
18RUN;
19 
2 Code Block
DATA STEP Data
Explanation :
This intermediate example shows how to filter data and select specific variables when executing a DATA step in CAS. It selects only 'Sports' type cars with highway fuel consumption (MPG_Highway) greater than 20, using a WHERE clause. The 'KEEP=' option is used to retain only the 'Make', 'Model', 'Type', 'Origin', and 'MSRP' variables in the output CAS table named 'sporty_cars'. This optimizes the final table by storing only relevant information.
Copied!
1LIBNAME mycas cas;
2 
3PROC CASUTIL outcaslib="casuser" replace;
4 load DATA=sashelp.cars casout="cars_cas";
5RUN;
6 
7DATA mycas.sporty_cars(keep=Make Model Type Origin MSRP);
8 SET mycas.cars_cas;
9 where Type = 'Sports' and MPG_Highway > 20;
10 RUN;
11 
12PROC PRINT DATA=mycas.sporty_cars(obs=5);
13 title 'Exemple Intermédiaire: Voitures de sport économes en carburant';
14RUN;
15 
16PROC CASUTIL outcaslib="casuser" drop TABLE="sporty_cars";
17RUN;
18 
3 Code Block
DATA STEP Data
Explanation :
This advanced example demonstrates the use of automatic variables specific to distributed execution in CAS, such as '_N_', '_THREADID_', and '_HOSTNAME_'. It creates a new 'Row_ID' variable based on '_N_' and a 'Thread_Info' variable to capture the details of the thread and CAS worker node processing the observation. Additionally, it uses CALL SYMPUTX to create macro variables for specific observations, which can be useful for auditing or tracking distributed processing. A macro is included to verify the creation of these macro variables.
Copied!
1LIBNAME mycas cas;
2 
3PROC CASUTIL outcaslib="casuser" replace;
4 load DATA=sashelp.cars casout="cars_cas";
5RUN;
6 
7DATA mycas.cars_processed;
8 SET mycas.cars_cas;
9 Row_ID = _N_;
10 Thread_Info = cats('Thread: ', _THREADID_, ' on node: ', _HOSTNAME_);
11 IF (_N_ between 1 and 10) or (mod(_N_, 100) = 0) THEN call symputx(cats('Obs',_N_), Make);
12 RUN;
13 
14%macro check_macros;
15 %global Obs1 Obs10 Obs100;
16 %IF %symexist(Obs1) %THEN %put Observation 1 (Make): &Obs1.;
17 %IF %symexist(Obs10) %THEN %put Observation 10 (Make): &Obs10.;
18 %IF %symexist(Obs100) %THEN %put Observation 100 (Make): &Obs100.;
19%mend;
20%check_macros;
21 
22PROC PRINT DATA=mycas.cars_processed(obs=10);
23 title 'Exemple Avancé: Traitement et informations de thread';
24RUN;
25 
26PROC CASUTIL outcaslib="casuser" drop TABLE="cars_processed";
27RUN;
28 
4 Code Block
DATA STEP Data
Explanation :
This example focuses on data management and conditional writing in a DATA step in CAS, a common practice in the Viya environment for data quality assurance. It begins by creating a local data table with some simulated missing values, then loads it into CAS. The DATA step reads this table and writes observations to two separate output CAS tables: 'cars_clean' for complete data and 'cars_problematic' for observations containing missing values in the fuel consumption columns. A calculated variable 'Avg_MPG' is created for clean data. The use of the '_ERROR_' keyword also allows error conditions to be reported in the log, which is useful for debugging in a distributed environment. This is a typical use case for preparing data before further analysis in CAS.
Copied!
1LIBNAME mycas cas;
2 
3/* Création d'une table d'entrée avec quelques données manquantes */
4DATA cars_data_local;
5 INPUT Make $ Model $ Type $ MPG_City MPG_Highway;
6 DATALINES;
7Toyota Camry Sedan 28 39
8Honda Civic Sedan 30 40
9Ford F-150 Truck . 22
10Chevrolet Silverado Truck 16 .
11BMW X5 SUV 20 27
12;;
13RUN;
14 
15PROC CASUTIL outcaslib="casuser" replace;
16 load DATA=cars_data_local casout="cars_with_missing";
17RUN;
18 
19DATA mycas.cars_clean mycas.cars_problematic;
20 SET mycas.cars_with_missing;
21 /* Vérifier les valeurs manquantes pour la consommation de carburant */
22 IF missing(MPG_City) or missing(MPG_Highway) THEN DO;
23 OUTPUT mycas.cars_problematic; /* Écrire les lignes avec problèmes dans une table séparée */
24 _ERROR_ = 1; /* Marquer l'observation comme ayant une erreur pour le log */
25 END;
26 ELSE DO;
27 Avg_MPG = (MPG_City + MPG_Highway) / 2;
28 OUTPUT mycas.cars_clean;
29 END;
30 RUN;
31 
32PROC PRINT DATA=mycas.cars_clean(obs=5);
33 title 'Exemple Viya: Voitures sans données manquantes';
34RUN;
35 
36PROC PRINT DATA=mycas.cars_problematic(obs=5);
37 title 'Exemple Viya: Voitures avec données manquantes';
38RUN;
39 
40PROC CASUTIL outcaslib="casuser" drop TABLE="cars_with_missing" casout="cars_clean" casout="cars_problematic";
41RUN;
42 
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
Copyright Info : Copyright © SAS Institute Inc. All Rights Reserved.