In longitudinal studies, particularly in the medical or marketing fields, it is common to need to analyze the evolution of a variable (such as BMI, revenue, or a risk score) in relation to a pivotal event (a surgery, a marketing campaign, a subscription).
Unlike a simple "date-to-date" comparison, the difficulty here lies in the fact that the reference date (the event) is specific to each individual and is itself buried within the data.
This article explains how to extract and compare data at specific intervals (for example: 6 months before and 6 months after) around a target event.
The Scenario
Let's take the example of patients monitored for their BMI. We have several records per patient. One of these records corresponds to a surgery (surgery = 1). The objective is to identify the measurements taken 6 months before and 6 months after this surgery to evaluate the impact of the operation.
The Raw Data
The data includes an identifier, a visit date, a surgery indicator (0 = no, 1 = yes), and the BMI measurement.
Important note: Always ensure that your dates are imported as numeric variables (date format) and not as text, to allow for arithmetic calculations.
data diff_date;
input Date_Visit : mmddyy10. ID surgery BMI;
format Date_Visit mmddyy10.;
cards;
07/02/2002 1 0 40
11/14/2002 1 0 35
05/15/2003 1 1 35 /* Date de chirurgie pour ID 1 */
11/04/2003 1 0 25
05/04/2004 2 0 50
11/10/2004 2 0 53
05/03/2005 2 1 45 /* Date de chirurgie pour ID 2 */
05/03/2006 2 0 32
05/03/2007 2 0 25
;
run;
The most effective method is to merge the table with itself.
Table A (Base): Contains the patient's entire history.
Table B (Reference): Is filtered to contain only the row of the event (surgery).
By merging A and B by identifier (ID), we will 'broadcast' the surgery date to all of the patient's rows. Each observation will then have its own date (Date_Visit) and the reference date (_Date_Ref) on the same row, making the comparison trivial.
Here is how to implement this logic in a single DATA step:
data want;
/* On fusionne la table complète avec une version filtrée d'elle-même */
merge diff_date
diff_date(keep=id date_visit surgery
rename=(date_visit=_Date_Ref surgery=_surgery_flag)
where=(_surgery_flag=1));
by id;
/* Calcul de la fenêtre temporelle */
/* On cherche les dates hors de la fenêtre 'Chirurgie +/- 6 mois' ou l'inverse selon le besoin */
/* Exemple : Identifier les visites proches de -6 mois ou +6 mois */
Months_Diff = intck('month', _Date_Ref, Date_Visit);
/* Optionnel : Filtrer pour ne garder que les visites d'intérêt */
/* Ici, on garde ce qui est entre 5 et 7 mois avant ou après */
if (Months_Diff between -7 and -5) or (Months_Diff between 5 and 7) or (_surgery_flag=1);
run;
1
DATA want;
2
/* On fusionne la table complète avec une version filtrée d'elle-même */
/* Optionnel : Filtrer pour ne garder que les visites d'intérêt */
16
/* Ici, on garde ce qui est entre 5 et 7 mois avant ou après */
17
IF (Months_Diff between -7 and -5) or (Months_Diff between 5 and 7) or (_surgery_flag=1);
18
RUN;
Detailed Technical Explanation
rename=(date_visit=_Date_Ref ...): In the second mention of the diff_date table, we rename the visit date. This prevents the merge from overwriting the original date. _Date_Ref will become a new column containing the surgery date, repeated on each of the patient's rows.
If your goal is to check if a date falls exactly (or strictly) before or after a 6-month window, you can use the INTNX function directly in an IF condition.
The example below illustrates how to exclude the 'neutral' period and keep only the distant data (more than 6 months apart):
/* Garder uniquement les données situées au-delà de 6 mois d'écart (avant ou après) */
if date_visit < intnx('month', _Date_Ref, -6, 's')
or date_visit > intnx('month', _Date_Ref, 6, 's');
1
/* Garder uniquement les données situées au-delà de 6 mois d'écart (avant ou après) */
2
IF date_visit < intnx('month', _Date_Ref, -6, 's')
3
or date_visit > intnx('month', _Date_Ref, 6, 's');
4
The 's' (same) argument in INTNX ensures that if the surgery is on the 15th of the month, the calculation will be based on the 15th of the preceding/following months.
Use MERGE data data(...) to report a unique piece of information (event date) across a subject's entire history.
Always rename the variables from the reference table to avoid overwriting.
Use INTCK or INTNX to properly manage calendar logic.
Important Disclaimer
The codes and examples provided on WeAreCAS.eu are for educational purposes. It is imperative not to blindly copy-paste them into your production environments. The best approach is to understand the logic before applying it. We strongly recommend testing these scripts in a test environment (Sandbox/Dev). WeAreCAS accepts no responsibility for any impact or data loss on your systems.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. WeAreCAS is an independent community site and is not affiliated with SAS Institute Inc.
This site uses technical and analytical cookies to improve your experience.
Read more.