Proc SQL

How to filter a table from a list of identifiers with PROC SQL

Simon 8 views

It is very common when manipulating data in SAS© to have to extract information from a main table (source dataset) based on a specific list of identifiers contained in another table.

If you have ever tried to write a WHERE clause that directly references another table without success, this article explains the correct syntax to perform this operation efficiently with PROC SQL.

Illustration

The Scenario

Let's imagine we have two tables:

  1. Table A: A restricted list of patient identifiers (pat_id). This is our filter.

  2. Table B: A complete database containing measurements (var1, var2) for many patients.

The objective: To create a new dataset containing only the rows from Table B whose pat_id is present in Table A.

Illustration

The Common Error

Intuitively, one might want to write a query that looks like this: ... where pat_id in tableA;

However, SAS© does not allow referencing a raw table directly in an IN clause. You must use what is called a subquery.

The Solution: The Subquery

For the WHERE clause to work, it must compare the pat_id variable to a list of values returned by a SELECT query.

Here is the correct syntax:

1PROC SQL;
2 create TABLE want as
3 select *
4 from table_b
5 where pat_id in ( select pat_id from table_a );
6QUIT;

How does it work?

  1. The parenthesis (Subquery): The code select pat_id from table_a is executed first. It generates the list of IDs to keep (here: 1, 2, 3).

  2. The main query: SAS© then goes through table_b and keeps only the rows where the pat_id matches one of the IDs found by the subquery.

Final Result

By running the code above, you will get the following want table, which is an exact copy of the data from Table B, filtered by Table A:

pat_idvar1var2
1353.5
2776.7
3155.3

This method is clean, efficient, and standardized for filtering relational data in SAS© via the SQL language.