Published on :
ETL CREATION_INTERNE

KEEP Statement

This code is also available in: Deutsch Español Français
Awaiting validation
The KEEP statement allows a DATA step to write only the variables specified in one or more SAS© data sets. The KEEP statement applies to all SAS© data sets created within the same DATA step and can appear anywhere in the step. If no KEEP or DROP statement appears, all data sets created in the DATA step contain all variables.
If the same variable is listed in both DROP and KEEP statements, DROP takes precedence over KEEP, regardless of the order of the statements, and the variable is dropped.
Note: Do not use KEEP and DROP statements in the same DATA step.
Comparisons:
* The KEEP statement cannot be used in PROC SAS© steps. The data set option KEEP= can be.
* The KEEP statement applies to all output data sets named in the DATA statement. To write different variables to different data sets, you must use the data set option KEEP=.
* The DROP statement is a parallel statement that specifies variables to omit from the output data set.
* KEEP and DROP statements select variables to include or exclude from output data sets. The subsetting IF statement selects observations.
* Do not confuse the KEEP statement with the RETAIN statement. The RETAIN statement causes SAS© to retain the value of a variable from one DATA step iteration to the next. The KEEP statement does not affect the value of variables, but only specifies which variables to include in the output data sets.
Data Analysis

Type : CREATION_INTERNE


Examples use generated data (datalines) or SASHELP.

1 Code Block
DATA STEP Data
Explanation :
This example demonstrates how to use the KEEP statement to specify which variables to keep in a new `employees_subset` data set. Only the specified variables (`name`, `address`, `city`, `state`, `zip`, `phone`) will be included in the final data set.
Copied!
1DATA employees;
2 INPUT name $ address $ city $ state $ zip $ phone $;
3 DATALINES;
4John Doe 123 Main St Anytown CA 90210 555-1234
5Jane Smith 456 Oak Ave Othercity NY 10001 555-5678
6;
7RUN;
8 
9DATA employees_subset;
10 SET employees;
11 keep name address city state zip phone;
12RUN;
2 Code Block
DATA STEP Data
Explanation :
This example uses the KEEP statement to include only the `name` and `avg` variables in the `average` output data set. Variables `score1` through `score20`, from which `avg` is calculated, are not written to the `average` data set.
Copied!
1DATA scores;
2 INPUT name $ score1-score20;
3 DATALINES;
4Alice 85 90 78 92 88 76 95 89 80 82 77 91 85 93 86 79 90 84 87 94
5Bob 70 65 72 75 68 80 73 78 71 76 69 81 74 79 70 82 75 77 71 80
6;
7RUN;
8 
9DATA average;
10 SET scores;
11 keep name avg;
12 avg=mean(of score1-score20);
13RUN;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
Banner
Expert Advice
Expert
Michael
Responsable de l'infrastructure Viya.
« ever use KEEP and DROP in the same DATA step; it creates logical redundancy and can lead to confusion since DROP always takes precedence. Additionally, if you are reading from a massive dataset but only need a few columns, use the KEEP= option on the SET statement. This prevents unnecessary variables from ever entering the PDV, providing the maximum possible performance boost. »