Published on :
Statistical CREATION_INTERNE

Change Point Model with PROC MCMC

This code is also available in: Deutsch Español Français
Awaiting validation
The script implements a change point model to identify a structural change in the relationship between two variables. It begins by creating an internal dataset, then uses PROC MCMC to estimate the parameters of a piecewise linear model, including the position of the change point 'cp'. The estimation results (posterior means) are then stored in macro variables via a DATA _NULL_ step. Finally, PROC SGPLOT is used twice: first to visualize the raw data, and second to overlay the data, the estimated regression lines on either side of the change point, and the posterior density of the change point's location.
Data Analysis

Type : CREATION_INTERNE


The data is entirely generated within the script using a DATA step with a DATALINES statement. No external data is needed.

1 Code Block
DATA STEP Data
Explanation :
This block creates the 'stagnant' dataset from internal data (datalines). The ' @@' option in the INPUT statement allows reading multiple observations on the same data line.
Copied!
1title 'Change Point Model';
2DATA stagnant;
3 INPUT y x @code_sas_json/8_SAS_Intro_ReadFile_MultiCol_@@.json;
4 ind = _n_;
5 DATALINES;
6 1.12 -1.39 1.12 -1.39 0.99 -1.08 1.03 -1.08
7 0.92 -0.94 0.90 -0.80 0.81 -0.63 0.83 -0.63
8 0.65 -0.25 0.67 -0.25 0.60 -0.12 0.59 -0.12
9 0.51 0.01 0.44 0.11 0.43 0.11 0.43 0.11
10 0.33 0.25 0.30 0.25 0.25 0.34 0.24 0.34
11 0.13 0.44 -0.01 0.59 -0.13 0.70 -0.14 0.70
12-0.30 0.85 -0.33 0.85 -0.46 0.99 -0.43 0.99
13-0.65 1.19
14;
15RUN;
2 Code Block
PROC SGPLOT
Explanation :
Initial visualization of raw data as a scatter plot to inspect the relationship between x and y.
Copied!
1 
2PROC SGPLOT
3DATA=stagnant;
4scatter x=x y=y;
5RUN;
6 
3 Code Block
PROC MCMC
Explanation :
Core of the analysis. PROC MCMC is used to fit a Bayesian model. The code defines the parameters (alpha, cp, beta1, beta2, s2), their prior distributions, and the linear model that states that the slope (beta) changes based on the position of x relative to the change point 'cp'. The procedure generates samples from the posterior distribution of the parameters.
Copied!
1PROC MCMC DATA=stagnant outpost=postout seed=24860 ntu=1000
2 nmc=20000;
3 ods select PostSumInt;
4 ods OUTPUT PostSumInt=ds;
5 
6 array beta[2];
7 parms alpha cp beta1 beta2;
8 parms s2;
9 
10 prior cp ~ unif(-1.3, 1.1);
11 prior s2 ~ uniform(0, 5);
12 prior alpha beta: ~ normal(0, v = 1e6);
13 
14 j = 1 + (x >= cp);
15 mu = alpha + beta[j] * (x - cp);
16 model y ~ normal(mu, var=s2);
17RUN;
4 Code Block
DATA STEP
Explanation :
This _NULL_ DATA step (does not create a table) reads the posterior means calculated by PROC MCMC (stored in the 'ds' table) and assigns them to macro variables (e.g., &cp, &beta1, &beta2) for later use.
Copied!
1DATA _null_;
2 SET ds;
3 call symputx(parameter, mean);
4RUN;
5 Code Block
DATA STEP Data
Explanation :
Creates a dataset 'b' to plot the regression lines of the fitted model. It uses macro variables (&cp, &alpha, &beta1, &beta2) to calculate the predicted values of y1.
Copied!
1DATA b;
2 missing A;
3 INPUT x1 @code_sas_json/8_SAS_Intro_ReadFile_MultiCol_@@.json;
4 IF x1 eq .A THEN x1 = &cp;
5 IF _n_ <= 2 THEN y1 = &alpha + &beta1 * (x1 - &cp);
6 ELSE y1 = &alpha + &beta2 * (x1 - &cp);
7 DATALINES;
8 -1.5 A 1.2
9;
10RUN;
6 Code Block
PROC KDE Data
Explanation :
Uses the KDE (Kernel Density Estimation) procedure on the MCMC output to estimate the posterior probability distribution of the 'cp' parameter (the change point).
Copied!
1 
2PROC KDE
3DATA=postout;
4univar cp / out=m1 (drop=count);
5RUN;
6 
7 Code Block
DATA STEP Data
Explanation :
This block adjusts (scales and shifts) the density value calculated by PROC KDE so that it can be legibly overlaid on the final graph.
Copied!
1DATA m1;
2 SET m1;
3 density = (density / 25) - 0.653;
4RUN;
8 Code Block
DATA STEP Data
Explanation :
Combines the original data ('stagnant'), the lines from the fitted model ('b'), and the change point density data ('m1') into a single dataset for the final graph.
Copied!
1DATA all;
2 SET stagnant b m1;
3RUN;
9 Code Block
PROC SGPLOT
Explanation :
Generates the complete final graph that overlays: the original scatter plot, the two line segments from the change point model, and the density curve of the posterior distribution of the 'cp' parameter.
Copied!
1PROC SGPLOT DATA=all noautolegend;
2 scatter x=x y=y;
3 series x=x1 y=y1 / lineattrs = graphdata2;
4 series x=value y=density / lineattrs = graphdata1;
5RUN;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
Copyright Info : S A S S A M P L E L I B R A R Y