The multi-threaded execution of a DATA step in CAS allows for automatic parallelization of data processing operations across all available nodes of the CAS server. This significantly improves performance for large datasets. The CAS engine manages data distribution and thread coordination, making this functionality transparent to the user in most cases. The 'SINGLE=NO' option is generally not required because multi-threading is the default behavior for DATA steps that read from and write to CAS tables. The main exception is a DATA step with no input data, which runs by default in a single thread.
Data Analysis
Type : CREATION_INTERNE
Examples use generated data (datalines) or the SASHELP library.
1 Code Block
DATA STEP Data
Explanation : This example illustrates the default behavior of the DATA step in CAS, which is to run in multiple threads. First, a SAS demonstration table (sashelp.class) is loaded into CAS memory. Then, a second DATA step reads this CAS table and writes a new CAS table. The automatic variable `_NTHREADS_` is used to confirm that the DATA step is indeed running using multiple threads.
Copied!
libname mycas cas;
/* Crée une table CAS à partir de sashelp.class */
data mycas.class_data;
set sashelp.class;
run;
/* Exécute une étape DATA multi-threadée sur la table CAS */
data mycas.class_processed;
set mycas.class_data;
/*_NTHREADS_ est une variable automatique qui affiche le nombre de threads*/
put 'Nombre de threads actifs : ' _nthreads_;
run;
1
LIBNAME mycas cas;
2
3
/* Crée une table CAS à partir de sashelp.class */
4
DATA mycas.class_data;
5
SET sashelp.class;
6
RUN;
7
8
/* Exécute une étape DATA multi-threadée sur la table CAS */
9
DATA mycas.class_processed;
10
SET mycas.class_data;
11
/*_NTHREADS_ est une variable automatique qui affiche le nombre de threads*/
12
put 'Nombre de threads actifs : ' _nthreads_;
13
RUN;
2 Code Block
DATA STEP Data
Explanation : This example creates a small CAS table directly from 'datalines'. Then, it performs a simple calculation (Total_Valeur = Prix * Quantite) in a DATA step. The `single=no` option is explicitly added to the output table, although it is the default behavior, to highlight the intent of multi-threaded execution. The `_NTHREADS_` variable confirms the use of multiple threads and shows the thread number for each observation processed.
Copied!
libname mycas cas;
/* Création d'une table d'exemple dans CAS */
data mycas.produits;
input ID Produit $ Prix Quantite;
datalines;
1 Pomme 1.5 100
2 Poire 2.0 150
3 Banane 0.75 200
4 Orange 1.2 120
;
run;
/* Traitement multi-threadé avec calcul de la valeur totale */
data mycas.produits_valeur(single=no);
set mycas.produits;
Total_Valeur = Prix * Quantite;
put 'Traitement sur thread ' _nthreads_ ' pour ID ' ID;
run;
1
LIBNAME mycas cas;
2
3
/* Création d'une table d'exemple dans CAS */
4
DATA mycas.produits;
5
INPUT ID Produit $ Prix Quantite;
6
DATALINES;
7
1 Pomme 1.5100
8
2 Poire 2.0150
9
3 Banane 0.75200
10
4 Orange 1.2120
11
;
12
RUN;
13
14
/* Traitement multi-threadé avec calcul de la valeur totale */
15
DATA mycas.produits_valeur(single=no);
16
SET mycas.produits;
17
Total_Valeur = Prix * Quantite;
18
put 'Traitement sur thread ' _nthreads_ ' pour ID ' ID;
19
RUN;
3 Code Block
DATA STEP (MERGE) Data
Explanation : This advanced example demonstrates a join operation (MERGE) between two CAS tables ('employes' and 'salaires') using the multi-threaded DATA step. Prior sorting by 'EmpID' is implicitly handled by CAS if the tables are already sorted or if CAS can optimize it. The `single=no` option is used to confirm multi-threaded behavior. A conditional check on `_NTHREADS_` is added to show whether the operation is running in multi-thread mode or not, which should be the case here.
Copied!
libname mycas cas;
/* Création de deux tables CAS pour la jointure */
data mycas.employes;
input EmpID Nom $ Departement $;
datalines;
101 Alice Ventes
102 Bob Marketing
103 Charlie Ventes
104 David IT
;
run;
data mycas.salaires;
input EmpID Salaire Annuel;
datalines;
101 60000
102 75000
103 62000
104 80000
;
run;
/* Jointure des tables en multi-threads */
data mycas.employes_complet (single=no);
merge mycas.employes mycas.salaires;
by EmpID;
if _nthreads_ > 1 then put 'Jointure en cours sur un thread multiple.';
else put 'Jointure en cours sur un seul thread.';
run;
1
LIBNAME mycas cas;
2
3
/* Création de deux tables CAS pour la jointure */
4
DATA mycas.employes;
5
INPUT EmpID Nom $ Departement $;
6
DATALINES;
7
101 Alice Ventes
8
102 Bob Marketing
9
103 Charlie Ventes
10
104 David IT
11
;
12
RUN;
13
14
DATA mycas.salaires;
15
INPUT EmpID Salaire Annuel;
16
DATALINES;
17
10160000
18
10275000
19
10362000
20
10480000
21
;
22
RUN;
23
24
/* Jointure des tables en multi-threads */
25
DATA mycas.employes_complet (single=no);
26
MERGE mycas.employes mycas.salaires;
27
BY EmpID;
28
IF _nthreads_ > 1THEN put 'Jointure en cours sur un thread multiple.';
29
ELSE put 'Jointure en cours sur un seul thread.';
30
RUN;
4 Code Block
PROC CAS / DATA STEP Data
Explanation : This example illustrates a more in-depth integration with the Viya/CAS environment. It starts by ensuring that a CAS session is active. Then, it creates a CAS table with a large number of observations to better simulate a production scenario where multi-threading is essential. A 'proc cas' is used to execute a CAS action ('simple.summary') to generate descriptive statistics, demonstrating the interaction between PROC CAS and DATA steps. Finally, a DATA step is executed in multi-threaded mode to add a 'categorie' column based on a condition, while displaying the processing thread for each observation, confirming parallel execution. The `casport` and `cashost` options are placeholders and should be adapted to the user's CAS environment.
Copied!
libname mycas cas;
/* Démarrer une session CAS si non déjà active (pour l'autonomie de l'exemple) */
/* Si une session est déjà active, cette étape sera ignorée ou signalée. */
options casport=5570 cashost='localhost'; /* Adaptez si votre configuration CAS est différente */
cas mycas;
/* Création d'une table temporaire en CAS avec des données plus volumineuses */
data mycas.donnees_large;
do i = 1 to 100000;
valeur1 = rand('Uniform');
valeur2 = i * 10;
output;
end;
run;
/* Utilisation d'une action CAS pour obtenir des statistiques, puis traitement DATA step multi-threadé */
proc cas;
loadactionset 'simple';
simple.summary result=summary_res /
table={name='donnees_large', caslib='mycas'}
inputs={'valeur1', 'valeur2'};
print summary_res;
run;
/* Traitement conditionnel et agrégation en multi-threads */
data mycas.resultat_agrege;
set mycas.donnees_large;
if valeur1 > 0.5 then categorie = 'Haute';
else categorie = 'Basse';
put 'Thread ' _nthreads_ ': Traitement de l''observation ' _n_;
run;
1
LIBNAME mycas cas;
2
3
/* Démarrer une session CAS si non déjà active (pour l'autonomie de l'exemple) */
4
/* Si une session est déjà active, cette étape sera ignorée ou signalée. */
5
options casport=5570 cashost='localhost'; /* Adaptez si votre configuration CAS est différente */
6
cas mycas;
7
8
/* Création d'une table temporaire en CAS avec des données plus volumineuses */
9
DATA mycas.donnees_large;
10
DO i = 1 to 100000;
11
valeur1 = rand('Uniform');
12
valeur2 = i * 10;
13
OUTPUT;
14
END;
15
RUN;
16
17
/* Utilisation d'une action CAS pour obtenir des statistiques, puis traitement DATA step multi-threadé */
18
PROC CAS;
19
LOADACTIONSET'simple';
20
SIMPLE.summary RESULT=summary_res /
21
TABLE={name='donnees_large', caslib='mycas'}
22
inputs={'valeur1', 'valeur2'};
23
PRINT summary_res;
24
RUN;
25
26
/* Traitement conditionnel et agrégation en multi-threads */
27
DATA mycas.resultat_agrege;
28
SET mycas.donnees_large;
29
IF valeur1 > 0.5THEN categorie = 'Haute';
30
ELSE categorie = 'Basse';
31
put 'Thread ' _nthreads_ ': Traitement de l''observation ' _n_;
32
RUN;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. WeAreCAS is an independent community site and is not affiliated with SAS Institute Inc.
This site uses technical and analytical cookies to improve your experience.
Read more.