CAEFFECT Procedure: Estimation by Regression Adjustment

This example explores the CAEFFECT procedure to estimate the causal effect of smoking (Smoking) on infant mortality (Death). A gradient boosting model (PROC GRADBOOST) is first fitted on a synthetic dataset (inspired by SASHELP.BIRTHWGT) to predict the outcome 'Death' based on smoking and several confounding variables (AgeGroup, Drinking, Married, SomeCollege). The predicted model is then used by PROC CAEFFECT to estimate the potential outcome means for each treatment level ('Yes' and 'No'). The example also shows how to provide predicted counterfactual outcome values directly to PROC CAEFFECT using PROC ASTORE for scoring the restored GRADBOOST model, and an alternative method (IPW) with PROC LOGSELECT for propensity modeling.

Data Analysis

Type : CREATION_INTERNE

Examples use synthetic data generated by DATA STEP, inspired by the SASHELP.BIRTHWGT dataset, to ensure autonomy and reproducibility for each code block.

1 Code Block

DATA STEP / PROC CAS Data

Explanation :
This code block initializes a CAS session, creates a 'mylib' library if it doesn't exist, then generates a small synthetic dataset 'birthwgt_synth_local' in the SAS work library. This dataset is then loaded into the CAS session under the name 'mylib.birthwgt_synth' to be used by CAS procedures.

Copied!

1	/* --- Configuration de la session CAS et de la bibliothèque --- */
2	PROC CAS;
3	SESSION casauto;
4	/* Crée une bibliothèque CAS temporaire 'mylib' si elle n'existe pas */
5	IF not caslibexists('mylib') THEN DO;
6	caslib mylib datasource=(path='/tmp/') global;
7	END;
8	QUIT;
9
10	/* --- Données synthétiques pour la démonstration (SAS Data Step local) --- */
11	DATA work.birthwgt_synth_local;
12	INPUT Smoking $ AgeGroup $ Married $ Drinking $ SomeCollege $ Death $;
13	DATALINES;
14	Yes '1' Yes No Yes No
15	No '1' No No No No
16	Yes '2' Yes Yes Yes Yes
17	No '2' Yes No No No
18	Yes '3' No Yes No No
19	No '3' No No Yes No
20	Yes '1' Yes No Yes No
21	No '1' No No No No
22	Yes '2' Yes Yes Yes Yes
23	No '2' Yes No No No
24	Yes '3' No Yes No No
25	No '3' No No Yes No
26	;
27	RUN;
28
29	PROC CAS;
30	SESSION casauto;
31	/* Charge les données synthétiques de la session SAS locale (WORK) vers la session CAS (mylib) */
32	upload caslib="mylib" DATA="work.birthwgt_synth_local" casout="birthwgt_synth" promote;
33	QUIT;
34
35	/* Les exemples suivants utiliseront 'mylib.birthwgt_synth' */

2 Code Block

PROC GRADBOOST / PROC CAEFFECT

Explanation :
This example illustrates the minimal use of PROC CAEFFECT for regression adjustment estimation. A GRADBOOST model is first trained on the 'mylib.birthwgt_synth' dataset and saved in an analytic store (mylib.gbOutMod_ex1). Then, PROC CAEFFECT uses this pre-trained model, specified by the RESTORE= option, to calculate potential outcome means (POM) for different levels of the 'Smoking' treatment variable.

Copied!

1	/* --- Étape 1 : Entraîner un modèle GRADBOOST et le sauvegarder --- */
2	PROC GRADBOOST DATA=mylib.birthwgt_synth ntrees=10 seed=12345;
3	target Death / level=nominal;
4	INPUT Smoking AgeGroup Married Drinking SomeCollege / level=nominal;
5	savestate rstore=mylib.gbOutMod_ex1;
6	RUN;
7
8	/* --- Étape 2 : Exécuter PROC CAEFFECT avec le modèle restauré --- */
9	PROC CAEFFECT DATA=mylib.birthwgt_synth;
10	treatvar Smoking;
11	outcomevar Death(event='Yes') / type=Categorical;
12	outcomemodel restore=mylib.gbOutMod_ex1 predname=P_DeathYes;
13	pom treatlev='Yes';
14	pom treatlev='No';
15	RUN;
16
17	/* --- Nettoyage de l'analytic store --- */
18	PROC ASTORE;
19	delete rstore=mylib.gbOutMod_ex1;
20	QUIT;
21

3 Code Block

PROC GRADBOOST / PROC CAEFFECT / PROC PRINT

Explanation :
This example extends the use of PROC CAEFFECT by explicitly specifying the covariates used for adjustment via the ADJUST statement. It also shows how to save detailed POM estimation results to a CAS output table (mylib.caeffect_stats_ex2) using the OUTSTAT option, allowing for further analysis of the estimation results.

Copied!

1	/* --- Étape 1 : Entraîner un modèle GRADBOOST et le sauvegarder --- */
2	PROC GRADBOOST DATA=mylib.birthwgt_synth ntrees=20 seed=54321; /* Plus d'arbres pour l'exemple */
3	target Death / level=nominal;
4	INPUT Smoking AgeGroup Married Drinking SomeCollege / level=nominal;
5	savestate rstore=mylib.gbOutMod_ex2;
6	RUN;
7
8	/* --- Étape 2 : Exécuter PROC CAEFFECT avec des options supplémentaires --- */
9	PROC CAEFFECT DATA=mylib.birthwgt_synth;
10	treatvar Smoking;
11	outcomevar Death(event='Yes') / type=Categorical;
12	outcomemodel restore=mylib.gbOutMod_ex2 predname=P_DeathYes;
13	pom treatlev='Yes';
14	pom treatlev='No';
15	/* Spécification explicite des variables d'ajustement (confounding variables) */
16	adjust AgeGroup Married Drinking SomeCollege;
17	/* Sauvegarde des statistiques d'estimation dans une table CAS */
18	outstat mylib.caeffect_stats_ex2;
19	RUN;
20
21	/* --- Afficher les résultats sauvegardés --- */
22	PROC PRINT DATA=mylib.caeffect_stats_ex2; RUN;
23
24	/* --- Nettoyage --- */
25	PROC ASTORE;
26	delete rstore=mylib.gbOutMod_ex2;
27	QUIT;
28	PROC CAS;
29	droptable "caeffect_stats_ex2" incaslib="mylib";
30	QUIT;

4 Code Block

PROC GRADBOOST / PROC ASTORE / PROC CAEFFECT

Explanation :
This advanced example demonstrates an approach where counterfactual outcome predictions for each treatment level are first calculated separately using PROC ASTORE. The pre-trained GRADBOOST model is used to score the dataset twice, once by forcing 'Smoking' to 'Yes' and once by forcing 'Smoking' to 'No'. The resulting prediction columns are then provided directly to PROC CAEFFECT via the PREDOUT= option of the POM statement. This method is flexible for complex outcome models or specific data processing chains.

Copied!

/* --- Étape 1 : Entraîner un modèle GRADBOOST et le sauvegarder --- */ proc gradboost data=mylib.birthwgt_synth ntrees=15 seed=67890; target Death / level=nominal; input Smoking AgeGroup Married Drinking SomeCollege / level=nominal; savestate rstore=mylib.gbOutMod_ex3; run; /* --- Étape 2 : Calculer les prédictions contrefactuelles pour Smoking='Yes' --- */ data mylib.gbPredData_temp; set mylib.birthwgt_synth; tempSmoking = Smoking; /* Sauvegarde la valeur observée de Smoking */ Smoking = 'Yes'; /* Impose 'Yes' pour la prédiction contrefactuelle */ run; proc astore; score data=mylib.gbPredData_temp out=mylib.gbPredData_temp_scored_yes rstore=mylib.gbOutMod_ex3 copyvars=(tempSmoking AgeGroup Married Drinking SomeCollege Death); run; /* --- Étape 3 : Calculer les prédictions contrefactuelles pour Smoking='No' --- */ data mylib.gbPredData_final; set mylib.gbPredData_temp_scored_yes; rename P_DeathYes = Pred_DeathYes_SmokingYes; /* Renomme la première prédiction */ Smoking = 'No'; /* Impose 'No' pour la deuxième prédiction contrefactuelle */ run; proc astore; score data=mylib.gbPredData_final out=mylib.gbPredData_final_scored rstore=mylib.gbOutMod_ex3 copyvars=(tempSmoking Pred_DeathYes_SmokingYes Death); run; /* --- Étape 4 : Nettoyage et préparation des données finales pour CAEFFECT --- */ data mylib.gbPredData_final_scored; set mylib.gbPredData_final_scored; rename P_DeathYes = Pred_DeathYes_SmokingNo; Smoking=tempSmoking; /* Restaure la variable Smoking observée */ drop tempSmoking; run; /* --- Étape 5 : Exécuter PROC CAEFFECT avec les prédictions précalculées --- */ proc caeffect data=mylib.gbPredData_final_scored; treatvar Smoking; outcomevar Death(event='Yes') / type=Categorical; pom treatlev='Yes' predOut=Pred_DeathYes_SmokingYes; pom treatlev='No' predOut=Pred_DeathYes_SmokingNo; run; /* --- Nettoyage --- */ proc astore; delete rstore=mylib.gbOutMod_ex3; quit; proc cas; droptable "gbPredData_temp" incaslib="mylib"; droptable "gbPredData_temp_scored_yes" incaslib="mylib"; droptable "gbPredData_final" incaslib="mylib"; droptable "gbPredData_final_scored" incaslib="mylib"; quit;

1	/* --- Étape 1 : Entraîner un modèle GRADBOOST et le sauvegarder --- */
2	PROC GRADBOOST DATA=mylib.birthwgt_synth ntrees=15 seed=67890;
3	target Death / level=nominal;
4	INPUT Smoking AgeGroup Married Drinking SomeCollege / level=nominal;
5	savestate rstore=mylib.gbOutMod_ex3;
6	RUN;
7
8	/* --- Étape 2 : Calculer les prédictions contrefactuelles pour Smoking='Yes' --- */
9	DATA mylib.gbPredData_temp;
10	SET mylib.birthwgt_synth;
11	tempSmoking = Smoking; /* Sauvegarde la valeur observée de Smoking */
12	Smoking = 'Yes'; /* Impose 'Yes' pour la prédiction contrefactuelle */
13	RUN;
14
15	PROC ASTORE;
16	score DATA=mylib.gbPredData_temp out=mylib.gbPredData_temp_scored_yes
17	rstore=mylib.gbOutMod_ex3
18	copyvars=(tempSmoking AgeGroup Married Drinking SomeCollege Death);
19	RUN;
20
21	/* --- Étape 3 : Calculer les prédictions contrefactuelles pour Smoking='No' --- */
22	DATA mylib.gbPredData_final;
23	SET mylib.gbPredData_temp_scored_yes;
24	rename P_DeathYes = Pred_DeathYes_SmokingYes; /* Renomme la première prédiction */
25	Smoking = 'No'; /* Impose 'No' pour la deuxième prédiction contrefactuelle */
26	RUN;
27
28	PROC ASTORE;
29	score DATA=mylib.gbPredData_final out=mylib.gbPredData_final_scored
30	rstore=mylib.gbOutMod_ex3
31	copyvars=(tempSmoking Pred_DeathYes_SmokingYes Death);
32	RUN;
33
34	/* --- Étape 4 : Nettoyage et préparation des données finales pour CAEFFECT --- */
35	DATA mylib.gbPredData_final_scored;
36	SET mylib.gbPredData_final_scored;
37	rename P_DeathYes = Pred_DeathYes_SmokingNo;
38	Smoking=tempSmoking; /* Restaure la variable Smoking observée */
39	drop tempSmoking;
40	RUN;
41
42
43	/* --- Étape 5 : Exécuter PROC CAEFFECT avec les prédictions précalculées --- */
44	PROC CAEFFECT DATA=mylib.gbPredData_final_scored;
45	treatvar Smoking;
46	outcomevar Death(event='Yes') / type=Categorical;
47	pom treatlev='Yes' predOut=Pred_DeathYes_SmokingYes;
48	pom treatlev='No' predOut=Pred_DeathYes_SmokingNo;
49	RUN;
50
51	/* --- Nettoyage --- */
52	PROC ASTORE;
53	delete rstore=mylib.gbOutMod_ex3;
54	QUIT;
55	PROC CAS;
56	droptable "gbPredData_temp" incaslib="mylib";
57	droptable "gbPredData_temp_scored_yes" incaslib="mylib";
58	droptable "gbPredData_final" incaslib="mylib";
59	droptable "gbPredData_final_scored" incaslib="mylib";
60	QUIT;

5 Code Block

PROC LOGSELECT / PROC CAEFFECT

Explanation :
This example illustrates causal effect estimation using the Inverse Probability Weighting (IPW) method, an alternative to regression adjustment. The IPW method requires modeling propensity scores (the probability of being assigned to a given treatment based on covariates). Here, a logistic model (PROC LOGSELECT) is used to predict these scores, which are then passed to PROC CAEFFECT via the PROPENSITY= option in the OUTCOMEMODEL statement, by specifying METHOD IPW. This is an example of PROC CAEFFECT's flexibility in using different causal estimation methods.

Copied!

1	/* --- Étape 1 : Estimer les scores de propension avec PROC LOGSELECT --- */
2	/* Pour la méthode IPW, nous modélisons la probabilité de recevoir le traitement (Smoking) */
3	PROC LOGSELECT DATA=mylib.birthwgt_synth;
4	model Smoking = AgeGroup Married Drinking SomeCollege;
5	OUTPUT out=mylib.propensity_scores pred=Propensity;
6	RUN;
7
8	/* --- Étape 2 : Exécuter PROC CAEFFECT avec la méthode IPW --- */
9	PROC CAEFFECT DATA=mylib.propensity_scores;
10	treatvar Smoking;
11	outcomevar Death(event='Yes') / type=Categorical;
12	outcomemodel propensity=Propensity; /* Utilise les scores de propension prédits */
13	pom treatlev='Yes';
14	pom treatlev='No';
15	method ipw; /* Spécifie la méthode IPW */
16	RUN;
17
18	/* --- Nettoyage --- */
19	PROC CAS;
20	droptable "propensity_scores" incaslib="mylib";
21	QUIT;
22

This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.

Retour à la liste

Expert Advice

Michael

Responsable de l'infrastructure Viya.

« When using the RESTORE= method, ensure the SAVESTATE from your predictive model is stored in a global CAS library. This allows the distributed CAS worker nodes to access the model weights simultaneously, which is critical for maintaining performance on multi-million row datasets. »