It is very common when manipulating data in SAS© to have to extract information from a main table (source dataset) based on a specific list of identifiers contained in another table.
If you have ever tried to write a WHERE clause that directly references another table without success, this article explains the correct syntax to perform this operation efficiently with PROC SQL.
The Scenario
Let's imagine we have two tables:
Table A: A restricted list of patient identifiers (pat_id). This is our filter.
Table B: A complete database containing measurements (var1, var2) for many patients.
The objective: To create a new dataset containing only the rows from Table B whose pat_id is present in Table A.
The Common Error
Intuitively, one might want to write a query that looks like this:
... where pat_id in tableA;
However, SAS© does not allow referencing a raw table directly in an IN clause. You must use what is called a subquery.
The Solution: The Subquery
For the WHERE clause to work, it must compare the pat_id variable to a list of values returned by a SELECT query.
Here is the correct syntax:
How does it work?
The parenthesis (Subquery): The code select pat_id from table_a is executed first. It generates the list of IDs to keep (here: 1, 2, 3).
The main query: SAS© then goes through table_b and keeps only the rows where the pat_id matches one of the IDs found by the subquery.
Final Result
By running the code above, you will get the following want table, which is an exact copy of the data from Table B, filtered by Table A:
| pat_id | var1 | var2 |
| 1 | 35 | 3.5 |
| 2 | 77 | 6.7 |
| 3 | 15 | 5.3 |
This method is clean, efficient, and standardized for filtering relational data in SAS© via the SQL language.