The script begins by generating a dataset named 'DoJoBumps'. This dataset contains an 'x' variable, a 'bumps' variable calculated from a complex mathematical function (the 'Bumps' function), and a 'bumpsWithNoise' variable which is the 'bumps' variable with added random noise. Then, several plots are produced with PROC SGPLOT to visualize the original data, the noisy data, and attempts at smoothing with LOESS and PBSPLINE methods. The main step uses PROC GLMSELECT with a spline effect (SPLINE) on the 'x' variable to model the 'bumpsWithNoise' variable. The knot selection method is 'multiscale'. Finally, the result of the GLMSELECT model prediction is plotted and compared to the original 'bumps' curve to evaluate the quality of the fit.
Data Analysis
Type : CREATION_INTERNE
The 'DoJoBumps' dataset is entirely created algorithmically within the DATA step. It does not depend on any external data source or SASHELP. The data is generated to simulate the 'Bumps' function with added noise.
1 Code Block
DATA STEP Data
Explanation : This DATA STEP block creates the 'DoJoBumps' table. It generates 2048 observations. For each observation, it calculates an 'x' value, then calls a 'compute' subroutine to calculate the 'bumps' value based on a complex formula involving 11 coefficients (Donoho and Johnstone's Bumps function). Gaussian noise, whose seed is fixed by the macro variable 'random', is added to create the 'bumpsWithNoise' variable.
Copied!
data DoJoBumps;
keep x bumps bumpsWithNoise;
pi = arcos(-1);
do n=1 to 2048;
x=(2*n-1)/4096;
link compute;
bumpsWithNoise=bumps+rannor(&random)*sqrt(5);
output;
end;
stop;
compute:
array t(11) _temporary_ (.1 .13 .15 .23 .25 .4 .44 .65 .76 .78 .81);
array b(11) _temporary_ ( 4 5 3 4 5 4.2 2.1 4.3 3.1 5.1 4.2);
array w(11) _temporary_ (.005 .005 .006 .01 .01 .03 .01 .01 .005 .008 .005);
bumps=0;
do i=1 to 11;
bumps=bumps+b[i]*(1+abs((x-t[i])/w[i]))**-4;
end;
bumps=bumps*10.528514619;
return;
run;
Explanation : This block uses PROC SGPLOT to overlay two series plots: the noisy data curve ('bumpsWithNoise') in black and the original un-noisy curve ('bumps') in red. This allows visualizing the effect of added noise on the original function.
Copied!
proc sgplot data=DoJoBumps;
yaxis display=(nolabel);
series x=x y=bumpsWithNoise/lineattrs=(color=black);
series x=x y=bumps/lineattrs=(color=red);
run;
1
PROC SGPLOTDATA=DoJoBumps;
2
yaxis display=(nolabel);
3
series x=x y=bumpsWithNoise/lineattrs=(color=black);
4
series x=x y=bumps/lineattrs=(color=red);
5
RUN;
3 Code Block
PROC SGPLOT
Explanation : This block uses PROC SGPLOT to compare the original 'bumps' curve with a LOESS smoothing curve applied to the noisy data 'bumpsWithNoise'. LOESS smoothing is a non-parametric method for estimating local trend and shows a first attempt at denoising.
Explanation : This block uses PROC SGPLOT to compare the original 'bumps' curve with a penalized B-spline smoothing curve applied to the noisy data 'bumpsWithNoise'. This is another smoothing method, often more flexible than LOESS.
Explanation : This block is the core of the analysis. It uses PROC GLMSELECT to model the dependent variable 'bumpsWithNoise'. The 'EFFECT spl = spline(x ...)' statement defines a spline effect on the 'x' variable. The 'multiscale' method is used for spline knot selection, which is effective for functions with multiple variations. The model is then fitted and predictions are saved in an 'out1' table under the 'pBumps' variable.
Explanation : This final block uses PROC SGPLOT to visualize the quality of the GLMSELECT model fit. It overlays the original 'bumps' curve with the curve of predicted values ('pBumps') by the spline model. This shows how the model successfully recovered the underlying data structure despite the noise.
Copied!
proc sgplot data=out1;
yaxis display=(nolabel);
series x=x y=bumps;
series x=x y=pBumps / lineattrs=(color=red);
run;
1
PROC SGPLOTDATA=out1;
2
yaxis display=(nolabel);
3
series x=x y=bumps;
4
series x=x y=pBumps / lineattrs=(color=red);
5
RUN;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. WeAreCAS is an independent community site and is not affiliated with SAS Institute Inc.
This site uses technical and analytical cookies to improve your experience.
Read more.