Published on :
DATA Step CREATION_INTERNE

Execute DATA Step in Multi-threads (CAS)

This code is also available in: Deutsch Español Français
Awaiting validation
The multi-threaded execution of a DATA step in CAS allows for automatic parallelization of data processing operations across all available nodes of the CAS server. This significantly improves performance for large datasets. The CAS engine manages data distribution and thread coordination, making this functionality transparent to the user in most cases. The 'SINGLE=NO' option is generally not required because multi-threading is the default behavior for DATA steps that read from and write to CAS tables. The main exception is a DATA step with no input data, which runs by default in a single thread.
Data Analysis

Type : CREATION_INTERNE


Examples use generated data (datalines) or the SASHELP library.

1 Code Block
DATA STEP Data
Explanation :
This example illustrates the default behavior of the DATA step in CAS, which is to run in multiple threads. First, a SAS demonstration table (sashelp.class) is loaded into CAS memory. Then, a second DATA step reads this CAS table and writes a new CAS table. The automatic variable `_NTHREADS_` is used to confirm that the DATA step is indeed running using multiple threads.
Copied!
1LIBNAME mycas cas;
2 
3/* Crée une table CAS à partir de sashelp.class */
4DATA mycas.class_data;
5 SET sashelp.class;
6RUN;
7 
8/* Exécute une étape DATA multi-threadée sur la table CAS */
9DATA mycas.class_processed;
10 SET mycas.class_data;
11 /*_NTHREADS_ est une variable automatique qui affiche le nombre de threads*/
12 put 'Nombre de threads actifs : ' _nthreads_;
13RUN;
2 Code Block
DATA STEP Data
Explanation :
This example creates a small CAS table directly from 'datalines'. Then, it performs a simple calculation (Total_Valeur = Prix * Quantite) in a DATA step. The `single=no` option is explicitly added to the output table, although it is the default behavior, to highlight the intent of multi-threaded execution. The `_NTHREADS_` variable confirms the use of multiple threads and shows the thread number for each observation processed.
Copied!
1LIBNAME mycas cas;
2 
3/* Création d'une table d'exemple dans CAS */
4DATA mycas.produits;
5 INPUT ID Produit $ Prix Quantite;
6 DATALINES;
7 1 Pomme 1.5 100
8 2 Poire 2.0 150
9 3 Banane 0.75 200
10 4 Orange 1.2 120
11 ;
12RUN;
13 
14/* Traitement multi-threadé avec calcul de la valeur totale */
15DATA mycas.produits_valeur(single=no);
16 SET mycas.produits;
17 Total_Valeur = Prix * Quantite;
18 put 'Traitement sur thread ' _nthreads_ ' pour ID ' ID;
19RUN;
3 Code Block
DATA STEP (MERGE) Data
Explanation :
This advanced example demonstrates a join operation (MERGE) between two CAS tables ('employes' and 'salaires') using the multi-threaded DATA step. Prior sorting by 'EmpID' is implicitly handled by CAS if the tables are already sorted or if CAS can optimize it. The `single=no` option is used to confirm multi-threaded behavior. A conditional check on `_NTHREADS_` is added to show whether the operation is running in multi-thread mode or not, which should be the case here.
Copied!
1LIBNAME mycas cas;
2 
3/* Création de deux tables CAS pour la jointure */
4DATA mycas.employes;
5 INPUT EmpID Nom $ Departement $;
6 DATALINES;
7 101 Alice Ventes
8 102 Bob Marketing
9 103 Charlie Ventes
10 104 David IT
11 ;
12RUN;
13 
14DATA mycas.salaires;
15 INPUT EmpID Salaire Annuel;
16 DATALINES;
17 101 60000
18 102 75000
19 103 62000
20 104 80000
21 ;
22RUN;
23 
24/* Jointure des tables en multi-threads */
25DATA mycas.employes_complet (single=no);
26 MERGE mycas.employes mycas.salaires;
27 BY EmpID;
28 IF _nthreads_ > 1 THEN put 'Jointure en cours sur un thread multiple.';
29 ELSE put 'Jointure en cours sur un seul thread.';
30RUN;
4 Code Block
PROC CAS / DATA STEP Data
Explanation :
This example illustrates a more in-depth integration with the Viya/CAS environment. It starts by ensuring that a CAS session is active. Then, it creates a CAS table with a large number of observations to better simulate a production scenario where multi-threading is essential. A 'proc cas' is used to execute a CAS action ('simple.summary') to generate descriptive statistics, demonstrating the interaction between PROC CAS and DATA steps. Finally, a DATA step is executed in multi-threaded mode to add a 'categorie' column based on a condition, while displaying the processing thread for each observation, confirming parallel execution. The `casport` and `cashost` options are placeholders and should be adapted to the user's CAS environment.
Copied!
1LIBNAME mycas cas;
2 
3/* Démarrer une session CAS si non déjà active (pour l'autonomie de l'exemple) */
4/* Si une session est déjà active, cette étape sera ignorée ou signalée. */
5options casport=5570 cashost='localhost'; /* Adaptez si votre configuration CAS est différente */
6cas mycas;
7 
8/* Création d'une table temporaire en CAS avec des données plus volumineuses */
9DATA mycas.donnees_large;
10 DO i = 1 to 100000;
11 valeur1 = rand('Uniform');
12 valeur2 = i * 10;
13 OUTPUT;
14 END;
15RUN;
16 
17/* Utilisation d'une action CAS pour obtenir des statistiques, puis traitement DATA step multi-threadé */
18PROC CAS;
19 LOADACTIONSET 'simple';
20 SIMPLE.summary RESULT=summary_res /
21 TABLE={name='donnees_large', caslib='mycas'}
22 inputs={'valeur1', 'valeur2'};
23 PRINT summary_res;
24RUN;
25 
26/* Traitement conditionnel et agrégation en multi-threads */
27DATA mycas.resultat_agrege;
28 SET mycas.donnees_large;
29 IF valeur1 > 0.5 THEN categorie = 'Haute';
30 ELSE categorie = 'Basse';
31 put 'Thread ' _nthreads_ ': Traitement de l''observation ' _n_;
32RUN;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
Copyright Info : Copyright © SAS Institute Inc. All Rights Reserved.