Published on :
Data Manipulation CREATION_INTERNE

Examples: WHERE Statement Processing

This code is also available in: Español Français
Awaiting validation
Details the different ways to apply data filters using the WHERE clause in DATA steps and dataset options, emphasizing the importance of operation order for complex expressions and the efficiency of CAS processing. It also covers the impact of using indexes to improve performance.
Data Analysis

Type : CREATION_INTERNE


Examples use generated data (datalines) or SASHELP.

1 Code Block
DATA STEP / PROC PRINT Data
Explanation :
This first example uses the WHERE statement in a DATA step to filter rows from the `sashelp.shoes` table where sales are greater than or equal to 500,000. The results are saved in `mylib.shoes`. A second DATA step then filters `mylib.shoes` to retain only sales from the 'Canada' region and stores the result in `mylib.shoes2`. Both tables are then displayed via `PROC PRINT`.
Copied!
1LIBNAME mylib ;
2 
3DATA mylib.shoes;
4 SET sashelp.shoes;
5 where Sales>=500000;
6RUN;
7PROC PRINT DATA=mylib.shoes;
8 var Region Product Sales;
9 RUN;
10 
11DATA mylib.shoes2;
12 SET mylib.shoes;
13 where Region="Canada";
14RUN;
15 
16PROC PRINT DATA=mylib.shoes2;
17 var Region Product Sales;
18RUN;
2 Code Block
DATA STEP / PROC PRINT Data
Explanation :
This example uses a compound WHERE expression with the AND operator to select rows in the `mylib.shoes` table (presumed loaded in CAS) where sales are greater than or equal to 500,000 AND the region is 'Canada'. Only the Region, Product, and Sales variables are kept. The result is then displayed via `PROC PRINT`.
Copied!
1 
2DATA mylib.shoes;
3 SET mylib.shoes;
4 where Sales>=500000 and Region="Canada";
5 keep Region Product Sales;
6RUN;
7 
8PROC PRINT DATA=mylib.shoes; RUN;
3 Code Block
DATA STEP / PROC PRINT Data
Explanation :
This block filters individuals from the `sashelp.class` table who are male AND are 15 years old or older.
Copied!
1DATA class;
2 SET sashelp.class;
3 where sex="M" and age >= 15;
4RUN;
5PROC PRINT DATA=class;
6RUN;
4 Code Block
DATA STEP / PROC PRINT Data
Explanation :
This block selects all male individuals OR all individuals who are 15 years old or older from the `sashelp.class` table.
Copied!
1DATA class;
2 SET sashelp.class;
3 where sex="M" or age>=15;
4RUN;
5PROC PRINT DATA=class;
6 title 'OR finds all Males and Anyone 15 Years or Older';
7RUN;
5 Code Block
DATA STEP / PROC PRINT Data
Explanation :
This block identifies individuals from the `sashelp.class` table who are under 15 years old AND are NOT male (i.e., females under 15).
Copied!
1DATA class;
2 SET sashelp.class;
3 where age < 15 and sex NE "M";
4RUN;
5PROC PRINT DATA=class;
6 title 'Finds Females
7 Older less than 15 Years';
8RUN;
6 Code Block
DATA STEP / PROC PRINT Data
Explanation :
Illustrates the default order of operations (NOT, then AND, then OR) for a complex WHERE expression. Selects individuals from `sashelp.class` who are female AND whose height is less than 60, OR those over 15 years old.
Copied!
1DATA class;
2 SET sashelp.class;
3 where age>15 or height<60 and sex="F";
4RUN;
5PROC PRINT DATA=class;
6 title 'age > 15 OR height < 60 AND sex = F';
7RUN;
7 Code Block
DATA STEP / PROC PRINT Data
Explanation :
Demonstrates how parentheses modify the order of evaluation. Selects individuals from `sashelp.class` who are female AND (are over 15 years old OR measure less than 60).
Copied!
1DATA class;
2 SET sashelp.class;
3 where (age>15 or height<60) and sex="F";
4RUN;
5PROC PRINT DATA=class;
6 title '(age > 15 OR height < 60) AND sex = F';
7RUN;
8 Code Block
DATA STEP / PROC PRINT Data
Explanation :
This example uses the WHERE= option on the SET statement to filter rows from `sashelp.shoes` where the region is 'Canada' and sales are less than 2000. This syntax is supported for processing on the CAS server when the input table is a CAS table.
Copied!
1DATA sales;
2 SET sashelp.shoes(where=(Region="Canada" and Sales<2000));
3RUN;
4PROC PRINT DATA=sales; RUN;
9 Code Block
DATA STEP / PROC PRINT Data
Explanation :
This example shows the use of the WHERE= option on the output table (`DATA statement`). For CAS tables, this approach does not allow the DATA step to be executed on the CAS server; processing is offloaded to the SAS Compute Server. A log note confirms this. Although no error is generated, efficiency is reduced for large CAS tables.
Copied!
1/* Specify the WHERE= data set option on the output CAS
2 table (unsupported for CAS DATA step processing) */
3 
4DATA mylib.sales(where=(Region="Canada" and Sales<2000));
5 SET mylib.shoes;
6RUN;
7PROC PRINT DATA=mylib.sales; RUN;
10 Code Block
DATA STEP / PROC PRINT Data
Explanation :
This block illustrates how to improve the performance of WHERE queries by creating an index on a variable. The first DATA step creates the `mybaseball` table from `sashelp.baseball` and adds an index on the `team` variable. The second DATA step then filters this table for 'Atlanta' teams, leveraging the index for faster processing, and displays the results.
Copied!
1DATA mybaseball(index=(team));
2 SET sashelp.baseball;
3RUN;
4 
5DATA mybaseball;
6 SET sashelp.baseball;
7 where Team="Atlanta";
8 keep Name Team Position;
9RUN;
10PROC PRINT DATA=mybaseball; RUN;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
Banner
Expert Advice
Expert
Michael
Responsable de l'infrastructure Viya.
« The WHERE statement is the most efficient way to subset data in SAS because it filters observations before they are fully loaded into the Program Data Vector (PDV). Mastering its nuances—from logical operator precedence to CAS-specific behavior—is essential for building high-performance data pipelines. »