The BART procedure allows for the construction of robust predictive models. This functionality is particularly useful for prediction on new datasets or for analyzing interventions on explanatory variables. The provided example illustrates the creation of a BART model, its storage in an analytical store, and then its loading for prediction and calculation of predictive margins. The TRAININMEM and MAPINMEM options are used to optimize the process in CAS memory. The MARGIN and MARGINDIFF statements are presented for analyzing the effects of specific changes on predictors.
Data Analysis
Type : INTERNAL_CREATION
The examples use synthetic data generated by SAS's rand() function to simulate a dataset with a continuous response variable (y) and 40 continuous explanatory variables (x1-x40). Only x1-x5 actually affect the response variable.
1 Code Block
PROC BART Data
Explanation : This example generates a synthetic dataset of 10,000 observations and trains a BART model using `PROC BART`. The model is then stored in an analytical store named `mylib.modelFit` using the `STORE` statement. The `trainInMem` and `mapInMem` options are used to improve performance by managing data and model elements in CAS memory, which is recommended for moderately sized data.
Copied!
/* Création du jeu de données d'entraînement simulé */
data mylib.inputData / single =yes;
drop j w1-w40;
array x{40};
array w{40};
call streaminit(6524);
pi=constant("pi");
do i=1 to 10000;
u = rand("Uniform");
do j=1 to dim(x);
w{j} = rand("Uniform");
x{j} = (w{j} + u)/2;
end;
f1 = sin(pi * x1 * x2 );
f2 = (x3-0.5)**2;
f3 = x4;
f4 = x5;
fb = 10*f1 +20*f2+10*f3+5*f4;
y = fb + rand("Normal");
output;
end;
run;
/* Entraînement du modèle BART et stockage dans un magasin analytique */
proc bart data=mylib.inputData seed=9181 trainInMem mapInMem;
model y = x1-x40;
store mylib.modelFit;
run;
1
/* Création du jeu de données d'entraînement simulé */
2
DATA mylib.inputData / single =yes;
3
drop j w1-w40;
4
array x{40};
5
array w{40};
6
call streaminit(6524);
7
pi=constant("pi");
8
9
DO i=1 to 10000;
10
u = rand("Uniform");
11
DO j=1 to dim(x);
12
w{j} = rand("Uniform");
13
x{j} = (w{j} + u)/2;
14
END;
15
16
f1 = sin(pi * x1 * x2 );
17
f2 = (x3-0.5)**2;
18
f3 = x4;
19
f4 = x5;
20
fb = 10*f1 +20*f2+10*f3+5*f4;
21
22
y = fb + rand("Normal");
23
OUTPUT;
24
END;
25
RUN;
26
27
/* Entraînement du modèle BART et stockage dans un magasin analytique */
Explanation : This second example generates a new dataset of 1,000 observations (`mylib.toScoreData`) for prediction. It then uses `PROC BART` with the `RESTORE=mylib.modelFit` option to load the previously saved model and predict the response variable. The `OUTPUT` statement is used to create a `mylib.scoredData` table containing the predictions (`predResp`) and residuals (`residual`). Finally, a `DATA` step and `PROC MEANS` are employed to calculate the Mean Squared Error (ASE) of the predictions, allowing for evaluation of model generalization.
Copied!
/* Création du jeu de données à scorer simulé */
data mylib.toScoreData / single =yes;
drop j w1-w40;
array x{40};
array w{40};
call streaminit(1972);
pi=constant("pi");
do i=1 to 1000;
u = rand("Uniform");
do j=1 to dim(x);
w{j} = rand("Uniform");
x{j} = (w{j} + u)/2;
end;
f1 = sin(pi * x1 * x2 );
f2 = (x3-0.5)**2;
f3 = x4;
f4 = x5;
fb = 10*f1 +20*f2+10*f3+5*f4;
y = fb + rand("Normal");
output;
end;
run;
/* Scoring des nouvelles observations en utilisant le modèle stocké */
proc bart data=mylib.toScoreData restore=mylib.modelFit;
output out = mylib.scoredData pred = predResp resid = residual;
run;
/* Calcul de l'erreur carrée moyenne (ASE) pour les données scorées */
data fitCheck;
set mylib.scoredData;
SquareError = residual * residual;
run;
proc means data=fitCheck mean;
var SquareError;
run;
1
/* Création du jeu de données à scorer simulé */
2
DATA mylib.toScoreData / single =yes;
3
drop j w1-w40;
4
array x{40};
5
array w{40};
6
call streaminit(1972);
7
pi=constant("pi");
8
9
DO i=1 to 1000;
10
u = rand("Uniform");
11
DO j=1 to dim(x);
12
w{j} = rand("Uniform");
13
x{j} = (w{j} + u)/2;
14
END;
15
16
f1 = sin(pi * x1 * x2 );
17
f2 = (x3-0.5)**2;
18
f3 = x4;
19
f4 = x5;
20
fb = 10*f1 +20*f2+10*f3+5*f4;
21
22
y = fb + rand("Normal");
23
OUTPUT;
24
END;
25
RUN;
26
27
/* Scoring des nouvelles observations en utilisant le modèle stocké */
OUTPUT out = mylib.scoredData pred = predResp resid = residual;
30
RUN;
31
32
/* Calcul de l'erreur carrée moyenne (ASE) pour les données scorées */
33
DATA fitCheck;
34
SET mylib.scoredData;
35
SquareError = residual * residual;
36
RUN;
37
38
PROC MEANSDATA=fitCheck mean;
39
var SquareError;
40
RUN;
3 Code Block
PROC BART
Explanation : This example illustrates the use of the `MARGIN` statement to calculate the predictive margins of the BART model. Using the saved model (`mylib.modelFit`) and training data, intervention scenarios are defined where the values of explanatory variables (here `x1`, `x2`, `x3`) are modified. For example, 'Scenario1' sets `x2` to 0.25, while 'Scenario2' sets `x2` to 0.25 and `x3` to 0.5. This allows for analyzing the impact of controlled changes on model predictions.
Explanation : This final example builds on the calculation of predictive margins to perform comparisons. The `MARGINDIFF` statement is used to specify contrasts between different predictive margins defined by the `MARGIN` statement. Here, the effects of the intervention on `x1` at 0.5 and 0.75 are compared to a reference value of `x1` at 0.25. This directly provides the mean differences of the predictive margin estimates with their credibility intervals, offering a comparative analysis of the scenarios.
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. WeAreCAS is an independent community site and is not affiliated with SAS Institute Inc.
This site uses technical and analytical cookies to improve your experience.
Read more.