BART Procedure: Storage and Scoring

The BART procedure allows for the construction of robust predictive models. This functionality is particularly useful for prediction on new datasets or for analyzing interventions on explanatory variables. The provided example illustrates the creation of a BART model, its storage in an analytical store, and then its loading for prediction and calculation of predictive margins. The TRAININMEM and MAPINMEM options are used to optimize the process in CAS memory. The MARGIN and MARGINDIFF statements are presented for analyzing the effects of specific changes on predictors.

Data Analysis

Type : INTERNAL_CREATION

The examples use synthetic data generated by SAS's rand() function to simulate a dataset with a continuous response variable (y) and 40 continuous explanatory variables (x1-x40). Only x1-x5 actually affect the response variable.

1 Code Block

PROC BART Data

Explanation :
This example generates a synthetic dataset of 10,000 observations and trains a BART model using `PROC BART`. The model is then stored in an analytical store named `mylib.modelFit` using the `STORE` statement. The `trainInMem` and `mapInMem` options are used to improve performance by managing data and model elements in CAS memory, which is recommended for moderately sized data.

Copied!

1	/* Création du jeu de données d'entraînement simulé */
2	DATA mylib.inputData / single =yes;
3	drop j w1-w40;
4	array x{40};
5	array w{40};
6	call streaminit(6524);
7	pi=constant("pi");
8
9	DO i=1 to 10000;
10	u = rand("Uniform");
11	DO j=1 to dim(x);
12	w{j} = rand("Uniform");
13	x{j} = (w{j} + u)/2;
14	END;
15
16	f1 = sin(pi * x1 * x2 );
17	f2 = (x3-0.5)**2;
18	f3 = x4;
19	f4 = x5;
20	fb = 10f1 +20f2+10f3+5f4;
21
22	y = fb + rand("Normal");
23	OUTPUT;
24	END;
25	RUN;
26
27	/* Entraînement du modèle BART et stockage dans un magasin analytique */
28	PROC BART DATA=mylib.inputData seed=9181 trainInMem mapInMem;
29	model y = x1-x40;
30	store mylib.modelFit;
31	RUN;

2 Code Block

PROC BART / PROC MEANS Data

Explanation :
This second example generates a new dataset of 1,000 observations (`mylib.toScoreData`) for prediction. It then uses `PROC BART` with the `RESTORE=mylib.modelFit` option to load the previously saved model and predict the response variable. The `OUTPUT` statement is used to create a `mylib.scoredData` table containing the predictions (`predResp`) and residuals (`residual`). Finally, a `DATA` step and `PROC MEANS` are employed to calculate the Mean Squared Error (ASE) of the predictions, allowing for evaluation of model generalization.

Copied!

1	/* Création du jeu de données à scorer simulé */
2	DATA mylib.toScoreData / single =yes;
3	drop j w1-w40;
4	array x{40};
5	array w{40};
6	call streaminit(1972);
7	pi=constant("pi");
8
9	DO i=1 to 1000;
10	u = rand("Uniform");
11	DO j=1 to dim(x);
12	w{j} = rand("Uniform");
13	x{j} = (w{j} + u)/2;
14	END;
15
16	f1 = sin(pi * x1 * x2 );
17	f2 = (x3-0.5)**2;
18	f3 = x4;
19	f4 = x5;
20	fb = 10f1 +20f2+10f3+5f4;
21
22	y = fb + rand("Normal");
23	OUTPUT;
24	END;
25	RUN;
26
27	/* Scoring des nouvelles observations en utilisant le modèle stocké */
28	PROC BART DATA=mylib.toScoreData restore=mylib.modelFit;
29	OUTPUT out = mylib.scoredData pred = predResp resid = residual;
30	RUN;
31
32	/* Calcul de l'erreur carrée moyenne (ASE) pour les données scorées */
33	DATA fitCheck;
34	SET mylib.scoredData;
35	SquareError = residual * residual;
36	RUN;
37
38	PROC MEANS DATA=fitCheck mean;
39	var SquareError;
40	RUN;

3 Code Block

PROC BART

Explanation :
This example illustrates the use of the `MARGIN` statement to calculate the predictive margins of the BART model. Using the saved model (`mylib.modelFit`) and training data, intervention scenarios are defined where the values of explanatory variables (here `x1`, `x2`, `x3`) are modified. For example, 'Scenario1' sets `x2` to 0.25, while 'Scenario2' sets `x2` to 0.25 and `x3` to 0.5. This allows for analyzing the impact of controlled changes on model predictions.

Copied!

1	/* Calcul des marges prédictives avec interventions */
2	PROC BART restore = mylib.modelFit DATA=mylib.inputData;
3	margin "Scenario1" x2 = 0.25;
4	margin "Scenario2" x2 = 0.25 x3 = 0.5;
5	margin "x1Ref" x1 = 0.25;
6	margin "x1Evt1" x1 = 0.5;
7	margin "x1Evt2" x1 = 0.75;
8	RUN;

4 Code Block

PROC BART

Explanation :
This final example builds on the calculation of predictive margins to perform comparisons. The `MARGINDIFF` statement is used to specify contrasts between different predictive margins defined by the `MARGIN` statement. Here, the effects of the intervention on `x1` at 0.5 and 0.75 are compared to a reference value of `x1` at 0.25. This directly provides the mean differences of the predictive margin estimates with their credibility intervals, offering a comparative analysis of the scenarios.

Copied!

1	/* Calcul et comparaison des différences de marges prédictives */
2	PROC BART restore = mylib.modelFit DATA=mylib.inputData;
3	margin "x1Ref" x1 = 0.25;
4	margin "x1Evt1" x1 = 0.5;
5	margin "x1Evt2" x1 = 0.75;
6	margindiff event = "x1Evt1" ref = "x1Ref" / label= "x1:0.5 - 0.25";
7	margindiff event = "x1Evt2" ref = "x1Ref" / label= "x1:0.75 - 0.25";
8	RUN;

This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.

Retour à la liste