Published on :
ETL CREATION_INTERNE

Visualize DATA Step Processing Information with Automatic Variables

This code is also available in: Deutsch Español Français
Awaiting validation
When the DATA Step executes in SAS© Cloud Analytic Services (CAS), it automatically operates in multithreaded mode, even on a single CAS machine. The DATA Step program is replicated across the CAS cluster, with each node executing the DATA Step with multiple threads on a portion of the table. The automatic variable '_THREADID_' allows visualizing the ID of the current thread processing a data row. The 'FIRST.variable' is used to identify the beginning of a variable group specified in a BY statement, which is crucial for understanding how data is grouped and ordered in a distributed environment. The SINGLE=YES parameter can be used to force single-threaded execution, useful for small to medium-sized tables where order is important and performance is not a constraint.
Data Analysis

Type : CREATION_INTERNE


Examples use SASHELP data (sashelp.cars) or internally created tables (datalines) to ensure their autonomy and reproducibility.

1 Code Block
DATA STEP Data
Explanation :
This example illustrates the use of the _THREADID_ automatic variable to see which thread processes each observation in a multithreaded DATA Step on CAS. It also uses FIRST.make to identify the start of each 'make' group.
Copied!
1DATA mycas.cars;
2 SET sashelp.cars;
3RUN;
4
5DATA mycas.cars_processed;
6 SET mycas.cars;
7 BY make type;
8 IF first.make THEN first_make_flag="BY Group";
9 ELSE first_make_flag="";
10 threadid = _threadid_;
11 keep make type first_make_flag threadid;
12RUN;
13PROC PRINT DATA=mycas.cars_processed;
14 title 'Cars By Make By Type - Multithreaded Processing';
15RUN;
16 
2 Code Block
DATA STEP Data
Explanation :
This case compares default (multithreaded) execution and forced single-threaded execution (via SINGLE=YES) in CAS, using _THREADID_ to show the difference in assigned thread IDs. Only the first 10 observations are displayed for comparison.
Copied!
1DATA mycas.cars;
2 SET sashelp.cars;
3RUN;
4 
5/* Exécution multithread par défaut */
6DATA mycas.cars_multi;
7 SET mycas.cars;
8 threadid = _threadid_;
9 keep make model threadid;
10RUN;
11 
12/* Exécution monothread forcée */
13DATA mycas.cars_single(single=yes);
14 SET mycas.cars;
15 threadid = _threadid_;
16 keep make model threadid;
17RUN;
18 
19PROC PRINT DATA=mycas.cars_multi(obs=10);
20 title 'Traitement Multithread (extrait)';
21RUN;
22PROC PRINT DATA=mycas.cars_single(obs=10);
23 title 'Traitement Monothread (extrait)';
24RUN;
25 
3 Code Block
DATA STEP Data
Explanation :
This advanced example combines the automatic variables _THREADID_, _N_ (DATA Step iteration number) and _ERROR_ (error indicator) within a DATA Step executed in CAS. It shows how these variables behave in a multithreaded environment and how _ERROR_ can be manipulated to signal specific conditions, in addition to identifying the start of BY groups.
Copied!
1DATA _null_;
2 INPUT id $ value;
3 DATALINES;
41 A
51 B
62 C
73 D
83 E
94 F
10;
11RUN;
12 
13DATA mycas.test_data;
14 SET _null_;
15RUN;
16 
17DATA mycas.processed_data;
18 SET mycas.test_data;
19 BY id;
20 IF first.id THEN group_start = 1;
21 ELSE group_start = 0;
22 record_num = _n_;
23 thread_id = _threadid_;
24 IF value = 'B' THEN _error_ = 1;
25 error_flag = _error_;
26 keep id value group_start record_num thread_id error_flag;
27RUN;
28 
29PROC PRINT DATA=mycas.processed_data;
30 title 'Analyse des Variables Automatiques _N_ et _ERROR_ dans CAS';
31RUN;
32 
4 Code Block
DATA STEP Data
Explanation :
This example is designed for performance and debugging in a CAS environment. It creates a dataset of 100 observations divided into groups. The DATA Step uses _THREADID_ to display in the log (via the PUT statement) which thread starts processing each new group. This can be useful for understanding workload distribution and effective DATA Step parallelism on the CAS cluster, particularly for identifying potential imbalances or verifying that processing is well-distributed. The output is just an overview, with the main information being in the log.
Copied!
1/* Création d'une table CAS temporaire */
2DATA mycas.sample_data;
3 DO i = 1 to 100;
4 group = ceil(i/10);
5 value = ranuni(0) * 100;
6 OUTPUT;
7 END;
8RUN;
9 
10/* DATA Step avec logging pour suivre les threads */
11DATA mycas.debug_output;
12 SET mycas.sample_data;
13 BY group;
14 thread_id = _threadid_;
15 IF first.group THEN DO;
16 put 'DEBUG: Nouveau groupe ' group ' sur thread ' thread_id;
17 END;
18 OUTPUT;
19RUN;
20 
21/* Affichage des 10 premières lignes du résultat pour validation */
22PROC PRINT DATA=mycas.debug_output(obs=10);
23 title 'Sortie du DATA Step avec IDs de Thread';
24RUN;
25 
26/* Pour une analyse complète du logging, consulter le log SAS Studio */
27 
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
Copyright Info : Copyright © SAS Institute Inc. All Rights Reserved.