One of the classic challenges in Data Management is to join two tables not on strict equality, but on a relative time window.
The typical scenario is as follows: you have a reference table containing key event dates (e.g., balance sheet date, marketing campaign date) and a second table containing a history of transactions. The objective is to retrieve, for each key event, all transactions that occurred within the preceding 12 months.
Unlike a classic merge (MERGE) which requires identical keys, this problem requires an inequality join.
The Scenario
Let's take two datasets to illustrate the problem:
Reference Table (HAVE_BASE): Contains companies and their pivot dates. Note that company 'A' appears twice with two different dates.
Event Table (HAVE_EVENT): Contains the history of financial results (EPS) by company at various dates.
The objective is to extract the rows from the EVENT table where the date is between [Reference Date - 12 months] and [Reference Date].
Data Preparation
Since dates are often imported as character strings (e.g., "199701"), the first critical step is to convert them into true numeric SAS© dates to allow for arithmetic calculations.
Why is the DATA Step (Merge) Not Suitable?
An intuitive attempt would be to use the DATA step with a MERGE statement. However, MERGE matches rows sequentially based on a strict equality of the BY variables.
If you try to filter with conditions like if date >= intnx('month', date_b, -12), it will often fail in "Many-to-Many" cases (multiple reference dates for the same company) because the SAS© read pointer cannot "go back" to re-test the historical rows for the second reference date.
The Optimal Solution: PROC SQL
The SQL language is perfect for this case because it generates a Cartesian product on the join key (company), and then filters the results according to the time window. This allows a single event row to be associated with multiple reference dates if the windows overlap.
The Code
The key function here is INTNX, which allows you to shift a date by a specific number of intervals (here, -12 months).
Analysis of the WHERE clause
base.company = evt.company: Essential. Without this, you would be comparing the dates of company A with those of company B.
intnx('month', base.date, -12): Calculates the lower bound of the window (the date exactly 12 months before the reference date).
The operator <=: The framing Lower_Bound <= Event_Date <= Upper_Bound is the most efficient and readable method for defining the interval.
Result and Interpretation
If company 'A' has a reference date in January 1997 and another in June 1996:
SQL will first process January 1997: It will search for all of 'A's events between Jan 96 and Jan 97.
It will then process June 1996: It will search for all of 'A's events between June 95 and June 96.
If an event occurs in May 1996, it will appear twice in the final result (once associated with the '97 reference, once with the '96 reference), which is the correct expected behavior for this type of analysis.
To join data over time intervals (rolling windows):
Prefer PROC SQL over the DATA step.
Ensure your dates are in numeric format.
Use INTNX to dynamically calculate the bounds of your time window in the WHERE clause.