How to join two tables without a common key

It often happens in SAS^© programming that you need to join two datasets. Usually, we use a MERGE statement in a DATA step or a LEFT/INNER JOIN in SQL via a common key (for example, a customer ID).

But how do you do it when the two tables have no variables in common and the join depends on complex conditions, like values falling within ranges? This is called a Non-Equi Join (Non-Equi Join).

Let's look at a practical case from a user discussion to understand how to solve this problem elegantly with PROC SQL.

The Misleading Good Idea: The Loop in the DATA Step

Faced with this problem, the first intuition is often to try an iterative approach using a DATA step. The idea would be to read a row from the accounts table, then loop through the entire intervals table to find a match.

Although technically possible, this approach is complex to code in SAS^©:

It requires manipulating two read pointers.
You have to manage resetting the read of the intervals table for each account (which might require the POINT= option or multiple SET statements).
The code quickly becomes cumbersome and difficult to maintain.

As one contributor in the discussion points out: "I wouldn't waste time on a loop methodology."

The Elegant Solution: PROC SQL and the Cartesian Product

The most effective and readable solution lies in using PROC SQL. Unlike the DATA step, SQL is designed to naturally handle complex relationships between sets without worrying about sort order or explicit loops.

The trick is to perform a cartesian join (all rows from A combined with all rows from B) and immediately filter the result.

1	PROC SQL;
2	CREATE TABLE DATA3 AS
3	SELECT
4	t1.grpnm,
5	t2.acct,
6	t2.bal1,
7	t2.bal2
8	FROM
9	intervals t1, /* Table 1 */
10	accounts t2 /* Table 2 */
11	WHERE
12	/* Condition pour le premier solde */
13	t2.bal1 BETWEEN t1.min1 AND t1.max1
14	AND
15	/* Condition pour le second solde */
16	t2.bal2 BETWEEN t1.min2 AND t1.max2;
17	QUIT;

How to join two tables without a common key

Niveau de difficulté

Publicado el : 12/03/2017

The Misleading Good Idea: The Loop in the DATA Step

The Elegant Solution: PROC SQL and the Cartesian Product

Aviso importante