Details the different ways to apply data filters using the WHERE clause in DATA steps and dataset options, emphasizing the importance of operation order for complex expressions and the efficiency of CAS processing. It also covers the impact of using indexes to improve performance.
Data Analysis
Type : CREATION_INTERNE
Examples use generated data (datalines) or SASHELP.
1 Code Block
DATA STEP / PROC PRINT Data
Explanation : This first example uses the WHERE statement in a DATA step to filter rows from the `sashelp.shoes` table where sales are greater than or equal to 500,000. The results are saved in `mylib.shoes`. A second DATA step then filters `mylib.shoes` to retain only sales from the 'Canada' region and stores the result in `mylib.shoes2`. Both tables are then displayed via `PROC PRINT`.
Copied!
libname mylib ;
data mylib.shoes;
set sashelp.shoes;
where Sales>=500000;
run;
proc print data=mylib.shoes;
var Region Product Sales;
run;
data mylib.shoes2;
set mylib.shoes;
where Region="Canada";
run;
proc print data=mylib.shoes2;
var Region Product Sales;
run;
1
LIBNAME mylib ;
2
3
DATA mylib.shoes;
4
SET sashelp.shoes;
5
where Sales>=500000;
6
RUN;
7
PROC PRINTDATA=mylib.shoes;
8
var Region Product Sales;
9
RUN;
10
11
DATA mylib.shoes2;
12
SET mylib.shoes;
13
where Region="Canada";
14
RUN;
15
16
PROC PRINTDATA=mylib.shoes2;
17
var Region Product Sales;
18
RUN;
2 Code Block
DATA STEP / PROC PRINT Data
Explanation : This example uses a compound WHERE expression with the AND operator to select rows in the `mylib.shoes` table (presumed loaded in CAS) where sales are greater than or equal to 500,000 AND the region is 'Canada'. Only the Region, Product, and Sales variables are kept. The result is then displayed via `PROC PRINT`.
Copied!
data mylib.shoes;
set mylib.shoes;
where Sales>=500000 and Region="Canada";
keep Region Product Sales;
run;
proc print data=mylib.shoes; run;
1
2
DATA mylib.shoes;
3
SET mylib.shoes;
4
where Sales>=500000 and Region="Canada";
5
keep Region Product Sales;
6
RUN;
7
8
PROC PRINTDATA=mylib.shoes; RUN;
3 Code Block
DATA STEP / PROC PRINT Data
Explanation : This block filters individuals from the `sashelp.class` table who are male AND are 15 years old or older.
Copied!
data class;
set sashelp.class;
where sex="M" and age >= 15;
run;
proc print data=class;
run;
1
DATA class;
2
SET sashelp.class;
3
where sex="M" and age >= 15;
4
RUN;
5
PROC PRINTDATA=class;
6
RUN;
4 Code Block
DATA STEP / PROC PRINT Data
Explanation : This block selects all male individuals OR all individuals who are 15 years old or older from the `sashelp.class` table.
Copied!
data class;
set sashelp.class;
where sex="M" or age>=15;
run;
proc print data=class;
title 'OR finds all Males and Anyone 15 Years or Older';
run;
1
DATA class;
2
SET sashelp.class;
3
where sex="M" or age>=15;
4
RUN;
5
PROC PRINTDATA=class;
6
title 'OR finds all Males and Anyone 15 Years or Older';
7
RUN;
5 Code Block
DATA STEP / PROC PRINT Data
Explanation : This block identifies individuals from the `sashelp.class` table who are under 15 years old AND are NOT male (i.e., females under 15).
Copied!
data class;
set sashelp.class;
where age < 15 and sex NE "M";
run;
proc print data=class;
title 'Finds Females
Older less than 15 Years';
run;
1
DATA class;
2
SET sashelp.class;
3
where age < 15 and sex NE "M";
4
RUN;
5
PROC PRINTDATA=class;
6
title 'Finds Females
7
Older less than 15 Years';
8
RUN;
6 Code Block
DATA STEP / PROC PRINT Data
Explanation : Illustrates the default order of operations (NOT, then AND, then OR) for a complex WHERE expression. Selects individuals from `sashelp.class` who are female AND whose height is less than 60, OR those over 15 years old.
Copied!
data class;
set sashelp.class;
where age>15 or height<60 and sex="F";
run;
proc print data=class;
title 'age > 15 OR height < 60 AND sex = F';
run;
1
DATA class;
2
SET sashelp.class;
3
where age>15 or height<60 and sex="F";
4
RUN;
5
PROC PRINTDATA=class;
6
title 'age > 15 OR height < 60 AND sex = F';
7
RUN;
7 Code Block
DATA STEP / PROC PRINT Data
Explanation : Demonstrates how parentheses modify the order of evaluation. Selects individuals from `sashelp.class` who are female AND (are over 15 years old OR measure less than 60).
Copied!
data class;
set sashelp.class;
where (age>15 or height<60) and sex="F";
run;
proc print data=class;
title '(age > 15 OR height < 60) AND sex = F';
run;
1
DATA class;
2
SET sashelp.class;
3
where (age>15 or height<60) and sex="F";
4
RUN;
5
PROC PRINTDATA=class;
6
title '(age > 15 OR height < 60) AND sex = F';
7
RUN;
8 Code Block
DATA STEP / PROC PRINT Data
Explanation : This example uses the WHERE= option on the SET statement to filter rows from `sashelp.shoes` where the region is 'Canada' and sales are less than 2000. This syntax is supported for processing on the CAS server when the input table is a CAS table.
Copied!
data sales;
set sashelp.shoes(where=(Region="Canada" and Sales<2000));
run;
proc print data=sales; run;
1
DATA sales;
2
SET sashelp.shoes(where=(Region="Canada" and Sales<2000));
3
RUN;
4
PROC PRINTDATA=sales; RUN;
9 Code Block
DATA STEP / PROC PRINT Data
Explanation : This example shows the use of the WHERE= option on the output table (`DATA statement`). For CAS tables, this approach does not allow the DATA step to be executed on the CAS server; processing is offloaded to the SAS Compute Server. A log note confirms this. Although no error is generated, efficiency is reduced for large CAS tables.
Copied!
/* Specify the WHERE= data set option on the output CAS
table (unsupported for CAS DATA step processing) */
data mylib.sales(where=(Region="Canada" and Sales<2000));
set mylib.shoes;
run;
proc print data=mylib.sales; run;
1
/* Specify the WHERE= data set option on the output CAS
2
table (unsupported for CAS DATA step processing) */
3
4
DATA mylib.sales(where=(Region="Canada" and Sales<2000));
5
SET mylib.shoes;
6
RUN;
7
PROC PRINTDATA=mylib.sales; RUN;
10 Code Block
DATA STEP / PROC PRINT Data
Explanation : This block illustrates how to improve the performance of WHERE queries by creating an index on a variable. The first DATA step creates the `mybaseball` table from `sashelp.baseball` and adds an index on the `team` variable. The second DATA step then filters this table for 'Atlanta' teams, leveraging the index for faster processing, and displays the results.
Copied!
data mybaseball(index=(team));
set sashelp.baseball;
run;
data mybaseball;
set sashelp.baseball;
where Team="Atlanta";
keep Name Team Position;
run;
proc print data=mybaseball; run;
1
DATA mybaseball(index=(team));
2
SET sashelp.baseball;
3
RUN;
4
5
DATA mybaseball;
6
SET sashelp.baseball;
7
where Team="Atlanta";
8
keep Name Team Position;
9
RUN;
10
PROC PRINTDATA=mybaseball; RUN;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
« The WHERE statement is the most efficient way to subset data in SAS because it filters observations before they are fully loaded into the Program Data Vector (PDV). Mastering its nuances—from logical operator precedence to CAS-specific behavior—is essential for building high-performance data pipelines. »
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. WeAreCAS is an independent community site and is not affiliated with SAS Institute Inc.
This site uses technical and analytical cookies to improve your experience.
Read more.