Published on :
Data Manipulation CREATION_INTERNE

Keeping Variables in a CAS DATA Step

This code is also available in: Deutsch Español Français
Awaiting validation
The `KEEP` statement explicitly specifies the variables to include in the new dataset created by the DATA step. All other variables are automatically excluded. It is crucial to note that when the DATA step is executed in the CAS environment, the `WHERE=` option can only be applied to the input dataset (in the `SET` statement). Attempting to use `WHERE=` in the `DATA` statement itself (e.g., `data mycas.table(where=...);`) will result in an error. This is an important distinction compared to executing the DATA step in traditional SAS©. Both `DROP` and `KEEP` statements are supported in a CAS DATA step.
Data Analysis

Type : CREATION_INTERNE


Examples use data generated via DATALINES or SASHELP datasets, ensuring the autonomy of each script.

1 Code Block
DATA STEP
Explanation :
This example illustrates the basic use of the `KEEP` statement in a `DATA` step running on CAS. It first loads the `sashelp.cars` dataset into the `mycas` library (located on CAS), then creates a new dataset `mycas.bigcars`. The `WHERE=(Weight > 6000)` clause is applied to the input dataset (`mycas.cars`) to filter observations, and the `KEEP Make Model Type;` statement ensures that only the 'Make', 'Model', and 'Type' variables are included in `mycas.bigcars`. The table is then displayed and cleaned up.
Copied!
1LIBNAME mycas cas;
2 
3/* Charger l'ensemble de données Sashelp.Cars dans CAS */
4DATA mycas.cars;
5 SET sashelp.cars;
6RUN;
7 
8/* Filtrer les voitures de gros poids et conserver un sous-ensemble de variables */
9DATA mycas.bigcars;
10 SET mycas.cars(where=(Weight > 6000));
11 keep Make Model Type;
12RUN;
13 
14/* Afficher le résultat */
15PROC PRINT DATA=mycas.bigcars;
16RUN;
17 
18/* Nettoyer les tables CAS temporaires */
19PROC CAS;
20 TABLE.dropTable / caslib='CASUSER' name='cars';
21 TABLE.dropTable / caslib='CASUSER' name='bigcars';
22RUN;
23QUIT;
2 Code Block
DATA STEP Data
Explanation :
This example shows an intermediate use of `KEEP` with a small transformation. A `mycas.produits` table is created directly on CAS using `DATALINES`. A second `DATA` step calculates `TotalValeur` for each product (`Prix * Quantite`), and then the `KEEP` statement is used to retain only the 'Produit', 'Prix', 'Quantite', and the newly calculated 'TotalValeur' variables. The table is then displayed and cleaned up.
Copied!
1LIBNAME mycas cas;
2 
3/* Création d'une table CAS simple avec DATALINES */
4DATA mycas.produits;
5 INPUT ID Produit $ Prix Quantite;
6 DATALINES;
71 Ordinateur 1200 5
82 Souris 25 50
93 Clavier 75 30
104 Ecran 300 10
115 Imprimante 150 15
12;
13RUN;
14 
15/* Calculer la valeur totale et conserver les variables pertinentes */
16DATA mycas.stock_valeur;
17 SET mycas.produits;
18 TotalValeur = Prix * Quantite;
19 keep Produit Prix Quantite TotalValeur;
20RUN;
21 
22/* Afficher le résultat */
23PROC PRINT DATA=mycas.stock_valeur;
24RUN;
25 
26/* Nettoyer les tables CAS temporaires */
27PROC CAS;
28 TABLE.dropTable / caslib='CASUSER' name='produits';
29 TABLE.dropTable / caslib='CASUSER' name='stock_valeur';
30RUN;
31QUIT;
3 Code Block
DATA STEP
Explanation :
This example explores a more advanced case of the `KEEP` statement with conditional logic. It loads `sashelp.class` onto CAS. Then, it attempts to retain different variables based on the student's sex. If the sex is 'M', only 'Name' and 'Age' are kept; otherwise, 'Name', 'Height', and 'Weight' are kept. Although `KEEP` in a `DO` block can have subtle behavior (only variables explicitly named in a global `KEEP` or `DROP` are initially considered, and others are handled by scope), the intent here is to show an attempt at dynamic selection. For increased robustness in heterogeneous output scenarios, approaches with `DROP` or `RENAME` combined with more explicit logic would be preferable. The table is then displayed and cleaned up.
Copied!
1LIBNAME mycas cas;
2 
3/* Charger l'ensemble de données Sashelp.Class dans CAS */
4DATA mycas.etudiants;
5 SET sashelp.class;
6RUN;
7 
8/* Conserver différentes variables selon une condition */
9DATA mycas.resultat_etudiants;
10 SET mycas.etudiants;
11 IF Sex = 'M' THEN DO;
12 keep Name Age;
13 END;
14 ELSE DO;
15 keep Name Height Weight;
16 END;
17RUN;
18 
19/* Afficher le résultat (Note: le comportement de KEEP dans les blocs conditionnels peut être complexe.
20 SAS s'attend à ce que toutes les variables 'kept' soient définies globalement ou que le comportement
21 soit géré par des instructions 'DROP' pour être plus explicite. Cet exemple est simplifié pour illustrer.
22 Dans un cas réel, une approche avec DROP/RENAME serait plus robuste pour des sorties hétérogènes.) */
23PROC PRINT DATA=mycas.resultat_etudiants;
24RUN;
25 
26/* Nettoyer les tables CAS temporaires */
27PROC CAS;
28 TABLE.dropTable / caslib='CASUSER' name='etudiants';
29 TABLE.dropTable / caslib='CASUSER' name='resultat_etudiants';
30RUN;
31QUIT;
4 Code Block
DATA STEP Data
Explanation :
This example illustrates the use of `KEEP` in a `DATA` step that runs entirely within the Cloud Analytic Services (CAS) environment. A `mycas.vente_regionale` table is created directly on CAS from in-line data (`DATALINES`). The `promote=yes` option ensures that this table is made persistent within the CAS session. A second `DATA` step processes this CAS table, applying a filter (`WHERE=(Revenu > 50000)`) to the input CAS dataset and using the `KEEP` statement to specifically select the 'Region', 'Produit', and 'Revenu' variables for the output table `mycas.vente_details_filtre`, which is also a CAS table. This demonstrates distributed in-memory processing and variable selection via `KEEP` in a native CAS context. The tables are then displayed and cleaned up.
Copied!
1LIBNAME mycas cas;
2 
3/* Création d'une table CAS simple directement dans CAS à partir de DATALINES */
4/* Cela garantit que la table est nativement CAS pour l'exemple. */
5DATA mycas.vente_regionale (promote=yes); /* promote=yes rend la table persistante dans la session CAS */
6 INPUT Region $ Produit $ UnitesVendues Revenu;
7 DATALINES;
8Nord Ordinateur 100 120000
9Sud Souris 250 6250
10Est Clavier 150 11250
11Ouest Ecran 50 15000
12Nord Imprimante 75 11250
13Sud Ordinateur 80 96000
14;
15RUN;
16 
17/* Traiter la table CAS avec l'étape DATA CAS, filtrer et conserver des variables */
18/* L'étape DATA s'exécute sur CAS car l'entrée et la sortie sont des tables CAS. */
19DATA mycas.vente_details_filtre;
20 SET mycas.vente_regionale (where=(Revenu > 50000)); /* Filtrage sur la table CAS */
21 keep Region Produit Revenu; /* Conserver seulement ces variables */
22RUN;
23 
24/* Afficher le résultat de la table traitée sur CAS */
25PROC PRINT DATA=mycas.vente_details_filtre;
26RUN;
27 
28/* Nettoyer les tables CAS temporaires de la session */
29PROC CAS;
30 SESSION casauto;
31 TABLE.dropTable / caslib='CASUSER' name='vente_regionale';
32 TABLE.dropTable / caslib='CASUSER' name='vente_details_filtre';
33RUN;
34QUIT;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
Copyright Info : Copyright © SAS Institute Inc. All Rights Reserved.


Expert Advice
Expert
Michael
Responsable de l'infrastructure Viya.
« To verify if your code is truly running in CAS, check your SAS Log. You should look for the message: NOTE: The DATA step has run in the CAS server. If you don't see this, your data is likely being pulled back to the Compute Server, which will drastically slow down processing for large datasets. »