Published on :
Statistical CREATION_INTERNE

Example 3 for PROC GLMSELECT - Donoho and Johnstone's Bumps Function

This code is also available in: Deutsch Français
Awaiting validation
The script begins by generating a dataset named 'DoJoBumps'. This dataset contains an 'x' variable, a 'bumps' variable calculated from a complex mathematical function (the 'Bumps' function), and a 'bumpsWithNoise' variable which is the 'bumps' variable with added random noise. Then, several plots are produced with PROC SGPLOT to visualize the original data, the noisy data, and attempts at smoothing with LOESS and PBSPLINE methods. The main step uses PROC GLMSELECT with a spline effect (SPLINE) on the 'x' variable to model the 'bumpsWithNoise' variable. The knot selection method is 'multiscale'. Finally, the result of the GLMSELECT model prediction is plotted and compared to the original 'bumps' curve to evaluate the quality of the fit.
Data Analysis

Type : CREATION_INTERNE


The 'DoJoBumps' dataset is entirely created algorithmically within the DATA step. It does not depend on any external data source or SASHELP. The data is generated to simulate the 'Bumps' function with added noise.

1 Code Block
DATA STEP Data
Explanation :
This DATA STEP block creates the 'DoJoBumps' table. It generates 2048 observations. For each observation, it calculates an 'x' value, then calls a 'compute' subroutine to calculate the 'bumps' value based on a complex formula involving 11 coefficients (Donoho and Johnstone's Bumps function). Gaussian noise, whose seed is fixed by the macro variable 'random', is added to create the 'bumpsWithNoise' variable.
Copied!
1DATA DoJoBumps;
2 keep x bumps bumpsWithNoise;
3 
4 pi = arcos(-1);
5 
6 DO n=1 to 2048;
7 x=(2*n-1)/4096;
8 link compute;
9 bumpsWithNoise=bumps+rannor(&random)*sqrt(5);
10 OUTPUT;
11 END;
12 stop;
13 
14compute:
15 array t(11) _temporary_ (.1 .13 .15 .23 .25 .4 .44 .65 .76 .78 .81);
16 array b(11) _temporary_ ( 4 5 3 4 5 4.2 2.1 4.3 3.1 5.1 4.2);
17 array w(11) _temporary_ (.005 .005 .006 .01 .01 .03 .01 .01 .005 .008 .005);
18 
19 bumps=0;
20 DO i=1 to 11;
21 bumps=bumps+b[i]*(1+abs((x-t[i])/w[i]))**-4;
22 END;
23 bumps=bumps*10.528514619;
24 return;
25RUN;
2 Code Block
PROC SGPLOT
Explanation :
This block uses PROC SGPLOT to overlay two series plots: the noisy data curve ('bumpsWithNoise') in black and the original un-noisy curve ('bumps') in red. This allows visualizing the effect of added noise on the original function.
Copied!
1PROC SGPLOT DATA=DoJoBumps;
2 yaxis display=(nolabel);
3 series x=x y=bumpsWithNoise/lineattrs=(color=black);
4 series x=x y=bumps/lineattrs=(color=red);
5RUN;
3 Code Block
PROC SGPLOT
Explanation :
This block uses PROC SGPLOT to compare the original 'bumps' curve with a LOESS smoothing curve applied to the noisy data 'bumpsWithNoise'. LOESS smoothing is a non-parametric method for estimating local trend and shows a first attempt at denoising.
Copied!
1PROC SGPLOT DATA=DoJoBumps;
2 yaxis display=(nolabel);
3 series x=x y=bumps;
4 loess x=x y=bumpsWithNoise / lineattrs=(color=red) nomarkers;
5RUN;
4 Code Block
PROC SGPLOT
Explanation :
This block uses PROC SGPLOT to compare the original 'bumps' curve with a penalized B-spline smoothing curve applied to the noisy data 'bumpsWithNoise'. This is another smoothing method, often more flexible than LOESS.
Copied!
1PROC SGPLOT DATA=DoJoBumps;
2 yaxis display=(nolabel);
3 series x=x y=bumps;
4 pbspline x=x y=bumpsWithNoise /
5 lineattrs=(color=red) nomarkers;
6RUN;
5 Code Block
PROC GLMSELECT
Explanation :
This block is the core of the analysis. It uses PROC GLMSELECT to model the dependent variable 'bumpsWithNoise'. The 'EFFECT spl = spline(x ...)' statement defines a spline effect on the 'x' variable. The 'multiscale' method is used for spline knot selection, which is effective for functions with multiple variations. The model is then fitted and predictions are saved in an 'out1' table under the 'pBumps' variable.
Copied!
1PROC GLMSELECT DATA=dojoBumps;
2 effect spl = spline(x / knotmethod=multiscale(endscale=8)
3 split details);
4 model bumpsWithNoise=spl;
5 OUTPUT out=out1 p=pBumps;
6RUN;
6 Code Block
PROC SGPLOT
Explanation :
This final block uses PROC SGPLOT to visualize the quality of the GLMSELECT model fit. It overlays the original 'bumps' curve with the curve of predicted values ('pBumps') by the spline model. This shows how the model successfully recovered the underlying data structure despite the noise.
Copied!
1PROC SGPLOT DATA=out1;
2 yaxis display=(nolabel);
3 series x=x y=bumps;
4 series x=x y=pBumps / lineattrs=(color=red);
5RUN;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
Copyright Info : S A S S A M P L E L I B R A R Y