SAS9

Analyze the evolution before and after a key event

Simon 14 views

In longitudinal studies, particularly in the medical or marketing fields, it is common to need to analyze the evolution of a variable (such as BMI, revenue, or a risk score) in relation to a pivotal event (a surgery, a marketing campaign, a subscription).

Unlike a simple "date-to-date" comparison, the difficulty here lies in the fact that the reference date (the event) is specific to each individual and is itself buried within the data.

This article explains how to extract and compare data at specific intervals (for example: 6 months before and 6 months after) around a target event.

The Scenario

Let's take the example of patients monitored for their BMI. We have several records per patient. One of these records corresponds to a surgery (surgery = 1). The objective is to identify the measurements taken 6 months before and 6 months after this surgery to evaluate the impact of the operation.

The Raw Data

The data includes an identifier, a visit date, a surgery indicator (0 = no, 1 = yes), and the BMI measurement.

Important note: Always ensure that your dates are imported as numeric variables (date format) and not as text, to allow for arithmetic calculations.

1DATA diff_date;
2 INPUT Date_Visit : mmddyy10. ID surgery BMI;
3 FORMAT Date_Visit mmddyy10.;
4 CARDS;
507/02/2002 1 0 40
611/14/2002 1 0 35
705/15/2003 1 1 35 /* Date de chirurgie pour ID 1 */
811/04/2003 1 0 25
905/04/2004 2 0 50
1011/10/2004 2 0 53
1105/03/2005 2 1 45 /* Date de chirurgie pour ID 2 */
1205/03/2006 2 0 32
1305/03/2007 2 0 25
14;
15RUN;

The Logic Problem

To filter the rows located at 't - 6 months' or 't + 6 months', SAS© needs to know the date 't' (the surgery) when it reads any other row for the same patient. However, during a standard sequential read, SAS© only 'knows' the current row.

The Solution: The Self-Merge

The most effective method is to merge the table with itself.

  1. Table A (Base): Contains the patient's entire history.

  2. Table B (Reference): Is filtered to contain only the row of the event (surgery).

By merging A and B by identifier (ID), we will 'broadcast' the surgery date to all of the patient's rows. Each observation will then have its own date (Date_Visit) and the reference date (_Date_Ref) on the same row, making the comparison trivial.

The SAS© Code

Here is how to implement this logic in a single DATA step:

1DATA want;
2 /* On fusionne la table complète avec une version filtrée d'elle-même */
3 MERGE diff_date
4 diff_date(keep=id date_visit surgery
5 rename=(date_visit=_Date_Ref surgery=_surgery_flag)
6 where=(_surgery_flag=1));
7 BY id;
8 
9 /* Calcul de la fenêtre temporelle */
10 /* On cherche les dates hors de la fenêtre 'Chirurgie +/- 6 mois' ou l'inverse selon le besoin */
11
12 /* Exemple : Identifier les visites proches de -6 mois ou +6 mois */
13 Months_Diff = intck('month', _Date_Ref, Date_Visit);
14
15 /* Optionnel : Filtrer pour ne garder que les visites d'intérêt */
16 /* Ici, on garde ce qui est entre 5 et 7 mois avant ou après */
17 IF (Months_Diff between -7 and -5) or (Months_Diff between 5 and 7) or (_surgery_flag=1);
18RUN;

Detailed Technical Explanation

  1. rename=(date_visit=_Date_Ref ...): In the second mention of the diff_date table, we rename the visit date. This prevents the merge from overwriting the original date. _Date_Ref will become a new column containing the surgery date, repeated on each of the patient's rows.

  2. where=(_surgery_flag=1): This filter is applied to the second table before the merge. For each ID, SAS© only keeps the surgery row. During the MERGE, this single row is joined to all the history rows for that ID (One-to-Many relationship).

  3. intck('month', ...) or intnx(...): Once we have both dates on the same row, we can use:

    • INTCK to count the number of intervals (months) between the two dates.

    • INTNX to calculate a theoretical target date (e.g., _Date_Ref + 6 months) and compare it to the actual date.

Variant: Strict Filtering with INTNX

If your goal is to check if a date falls exactly (or strictly) before or after a 6-month window, you can use the INTNX function directly in an IF condition.

The example below illustrates how to exclude the 'neutral' period and keep only the distant data (more than 6 months apart):

1/* Garder uniquement les données situées au-delà de 6 mois d'écart (avant ou après) */
2IF date_visit < intnx('month', _Date_Ref, -6, 's')
3or date_visit > intnx('month', _Date_Ref, 6, 's');
4 
The 's' (same) argument in INTNX ensures that if the surgery is on the 15th of the month, the calculation will be based on the 15th of the preceding/following months.

The Self-Merge is a powerful technique in SAS© for relational intra-subject analyses. It avoids the complex use of loops or variable retention (RETAIN) and allows for massive processing of relative date comparisons.

Key takeaways:

  • Use MERGE data data(...) to report a unique piece of information (event date) across a subject's entire history.

  • Always rename the variables from the reference table to avoid overwriting.

  • Use INTCK or INTNX to properly manage calendar logic.