ETL SASHELP_DATA_SET

Examples: WHERE Processing

This code is also available in: Deutsch Français
Awaiting validation
This guide thoroughly explores the use of WHERE expressions in SAS© DATA steps. It illustrates the selection of rows based on simple conditions, compound WHERE expressions with logical operators (AND, OR, NOT), and the impact of evaluation order. Specific examples demonstrate the application of the WHERE= option for datasets, emphasizing the distinction between processing on the SAS© compute server and the CAS server. Performance improvement through dataset indexing is also addressed. Key insights include the ability of WHERE expressions to improve performance by reducing read rows, their applicability in both SAS© and CAS environments, and the importance of using parentheses to control the evaluation order of boolean operators.
Data Analysis

Type : SASHELP_DATA_SET


The examples use the built-in SASHELP.SHOES, SASHELP.CLASS, and SASHELP.BASEBALL datasets.

1 Code Block
DATA STEP Data
Explanation :
This example uses the WHERE statement to conditionally select rows from the sashelp.class dataset and writes the selected rows to the mylib.shoes table. The first DATA step executes in SAS and subsets the data, selecting only rows where sales are greater than or equal to $500,000. The filtered data is loaded into the mylib.shoes output table. The second DATA step executes on the mylib.shoes table to further subset the data. This DATA step selects only rows where the values of the Region variable are "Canada". The selected rows are then written to the mylib.shoes2 output table.
Copied!
1LIBNAME mylib ;
2 
3DATA mylib.shoes;
4 SET sashelp.shoes;
5 where Sales>=500000;
6RUN;
7PROC PRINT DATA=mylib.shoes;
8 var Region Product Sales;
9 RUN;
10 
11DATA mylib.shoes2;
12 SET mylib.shoes;
13 where Region="Canada";
14RUN;
15 
16PROC PRINT DATA=mylib.shoes2;
17 var Region Product Sales;
18RUN;
2 Code Block
DATA STEP Data
Explanation :
This example uses a compound WHERE expression to select a subset of data from the input mylib.shoes table. The CASUTIL procedure first loads the SAS dataset sashelp.shoes into CAS. The DATA step then executes in CAS to conditionally select rows where sales are greater than or equal to $500,000 AND the region is "Canada". The compound WHERE expression consists of two WHERE expressions joined by the AND operator.
Copied!
1 
2DATA mylib.shoes;
3 SET mylib.shoes;
4 where Sales>=500000 and Region="Canada";
5 keep Region Product Sales;
6RUN;
7 
8PROC PRINT DATA=mylib.shoes; RUN;
3 Code Block
DATA STEP Data
Explanation :
In the following example, the AND operator is used in the WHERE statement to find rows based on age and sex conditions.
Copied!
1DATA class;
2 SET sashelp.class;
3 where sex="M" and age >= 15;
4RUN;
5PROC PRINT DATA=class;
6RUN;
4 Code Block
DATA STEP Data
Explanation :
This example uses the OR operator to find rows that satisfy either of the conditions.
Copied!
1DATA class;
2 SET sashelp.class;
3 where sex="M" or age>=15;
4RUN;
5PROC PRINT DATA=class;
6 title 'OR finds all Males and Anyone 15 Years or Older';
7RUN;
5 Code Block
DATA STEP Data
Explanation :
This example uses the less than (<) symbol to find rows that satisfy the age and sex criteria.
Copied!
1DATA class;
2 SET sashelp.class;
3 where age < 15 and sex NE "M";
4RUN;
5PROC PRINT DATA=class;
6 title 'Finds Females
7 Older less than 15 Years';
8RUN;
6 Code Block
DATA STEP Data
Explanation :
The order in which SAS processes expressions joined by boolean operators affects the output. The default order of operations is to process NOT expressions first, then AND expressions, and OR expressions last.
Copied!
1DATA class;
2 SET sashelp.class;
3 where age>15 or height<60 and sex="F";
4RUN;
5PROC PRINT DATA=class;
6 title 'age > 15 OR height < 60 AND sex = F';
7RUN;
7 Code Block
DATA STEP Data
Explanation :
To control the order of evaluation, use parentheses. Here is the same example, except that parentheses are used to specify that the OR expression is evaluated first, then the AND expression.
Copied!
1DATA class;
2 SET sashelp.class;
3 where (age>15 or height<60) and sex="F";
4RUN;
5PROC PRINT DATA=class;
6 title '(age > 15 OR height < 60) AND sex = F';
7RUN;
8 Code Block
DATA STEP Data
Explanation :
This example uses the WHERE= dataset option to conditionally select rows from the sashelp.shoes dataset.
Copied!
1DATA sales;
2 SET sashelp.shoes(where=(Region="Canada" and Sales<2000));
3RUN;
4PROC PRINT DATA=sales; RUN;
9 Code Block
DATA STEP
Explanation :
To execute the DATA step so that processing is performed on the CAS server, you can specify the WHERE= dataset option on the input table in the SET statement. The WHERE= dataset option is supported for processing in CAS only when specified on the input CAS table (in the SET statement). If you specify the WHERE= dataset option on the output CAS table, you will not get an error as shown in the following example.
Copied!
1/* Specify the WHERE= data set option on the output CAS
2 table (unsupported for CAS DATA step processing) */
3 
4DATA mylib.sales(where=(Region="Canada" and Sales<2000));
5 SET mylib.shoes;
6RUN;
7PROC PRINT DATA=mylib.sales; RUN;
10 Code Block
DATA STEP Data
Explanation :
In the following example, the first DATA step creates an output dataset, mybaseball, and the index= option adds a simple index to the team variable. The second DATA step reads the dataset and selects for processing only rows where the team name is Atlanta.
Copied!
1DATA mybaseball(index=(team));
2 SET sashelp.baseball;
3RUN;
4 
5DATA mybaseball;
6 SET sashelp.baseball;
7 where Team="Atlanta";
8 keep Name Team Position;
9RUN;
10PROC PRINT DATA=mybaseball; RUN;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
Copyright Info : Copyright © SAS Institute Inc. All Rights Reserved