causalanalysis caEffect

High-Volume Observational Health Study (IPW)

Scénario de test & Cas d'usage

Business Context

A public health institute is analyzing a dataset of 1 million patients to estimate the effect of a specific lifestyle change ('Active') vs 'Sedentary' on recovery days. Due to the data volume, they opt for the Inverse Probability Weighting (IPW) method for efficiency.
About the Set : causalanalysis

Causal inference analysis and effect estimation.

Discover all actions of causalanalysis
Data Preparation

Simulation of a large dataset (1 Million rows) to test performance and stability.

Copied!
1 
2DATA mycas.health_study;
3call streaminit(999);
4DO i = 1 to 1000000;
5bmi = rand('NORMAL', 25, 4);
6age = rand('UNIFORM', 18, 80);
7IF rand('UNIFORM') < (1 / (1 + exp(-( -1 + 0.02*age)))) THEN lifestyle = 'Active';
8ELSE lifestyle = 'Sedentary';
9recovery_days = 14 - 2*(lifestyle='Active') + 0.1*bmi + rand('POISSON', 2);
10OUTPUT;
11END;
12 
13RUN;
14 

Étapes de réalisation

1
Pre-calculation of Propensity Scores (Treatment Probabilities) using Logistic Regression.
Copied!
1 
2PROC CAS;
3logistic TABLE={name='health_study'}, class={'lifestyle'}, model={depvar='lifestyle', effects={'age', 'bmi'}}, OUTPUT={casout={name='health_scored', replace=true}, copyVars='ALL', predProbs=true};
4 
5RUN;
6 
2
Execution of caEffect using IPW method on the large dataset.
Copied!
1 
2PROC CAS;
3causalanalysis.caEffect TABLE={name='health_scored'}, method='IPW', treatVar={name='lifestyle'}, outcomeVar={name='recovery_days', type='CONTINUOUS'}, pom={{trtLev='Active', trtProb='_PredProbs_'}, {trtLev='Sedentary', trtProb='_PredProbs_'}}, difference={{evtLev='Active', refLev='Sedentary'}};
4 
5RUN;
6 

Expected Result


The action processes the 1 million records efficiently without memory errors. It returns the 'POMs' table utilizing the pre-calculated probability weights (_PredProbs_) to adjust for the confounding variables (Age, BMI) via IPW.