High-Volume Observational Health Study (IPW)

Business Context

A public health institute is analyzing a dataset of 1 million patients to estimate the effect of a specific lifestyle change ('Active') vs 'Sedentary' on recovery days. Due to the data volume, they opt for the Inverse Probability Weighting (IPW) method for efficiency.

About the Set : causalanalysis

Causal inference analysis and effect estimation.

Discover all actions of causalanalysis

Data Preparation

Simulation of a large dataset (1 Million rows) to test performance and stability.

Copied!

1
2	DATA mycas.health_study;
3	call streaminit(999);
4	DO i = 1 to 1000000;
5	bmi = rand('NORMAL', 25, 4);
6	age = rand('UNIFORM', 18, 80);
7	IF rand('UNIFORM') < (1 / (1 + exp(-( -1 + 0.02*age)))) THEN lifestyle = 'Active';
8	ELSE lifestyle = 'Sedentary';
9	recovery_days = 14 - 2(lifestyle='Active') + 0.1bmi + rand('POISSON', 2);
10	OUTPUT;
11	END;
12
13	RUN;
14

Étapes de réalisation

Pre-calculation of Propensity Scores (Treatment Probabilities) using Logistic Regression.

Copied!

1
2	PROC CAS;
3	logistic TABLE={name='health_study'}, class={'lifestyle'}, model={depvar='lifestyle', effects={'age', 'bmi'}}, OUTPUT={casout={name='health_scored', replace=true}, copyVars='ALL', predProbs=true};
4
5	RUN;
6

Execution of caEffect using IPW method on the large dataset.

Copied!

1
2	PROC CAS;
3	causalanalysis.caEffect TABLE={name='health_scored'}, method='IPW', treatVar={name='lifestyle'}, outcomeVar={name='recovery_days', type='CONTINUOUS'}, pom={{trtLev='Active', trtProb='_PredProbs_'}, {trtLev='Sedentary', trtProb='_PredProbs_'}}, difference={{evtLev='Active', refLev='Sedentary'}};
4
5	RUN;
6

Expected Result

The action processes the 1 million records efficiently without memory errors. It returns the 'POMs' table utilizing the pre-calculated probability weights (_PredProbs_) to adjust for the confounding variables (Age, BMI) via IPW.

Voir la documentation technique de caEffect