This script illustrates the use of the GLMSELECT procedure for variable selection. It begins by generating a dataset of regressors via the `%makeRegressorData` macro. Then, the `%AddDepVar` macro adds a dependent variable `y` based on a linear model and random error. Finally, two calls to `PROC GLMSELECT` are made: the first with the LASSO selection method, and the second with the Group LASSO method.
Data Analysis
Type : CREATION_INTERNE
The data is entirely generated within the script. The `%makeRegressorData` macro creates independent variables (continuous and classification) with random values. The `%AddDepVar` macro then adds the dependent variable `y` based on a linear formula applied to the previously created variables.
1 Code Block
Macro Data
Explanation : Definition of two macros. `%makeRegressorData` generates the explanatory variables (continuous and categorical) using random number functions. `%AddDepVar` adds a dependent variable `y` to a dataset based on a formula and random error.
Copied!
%macro makeRegressorData(data=,nObs=500,nCont=5,nClass=5,nLev=3);
data &data;
drop i j;
%if &nCont>0 %then %do; array x{&nCont} x1-x&nCont; %end;
%if &nClass>0 %then %do; array c{&nClass} c1-c&nClass;%end;
do i = 1 to &nObs;
%if &nCont>0 %then %do;
do j= 1 to &nCont;
x{j} = rannor(1);
end;
%end;
%if &nClass > 0 %then %do;
do j=1 to &nClass;
if mod(j,3) = 0 then c{j} = ranbin(1,&nLev,.6);
else if mod(j,3) = 1 then c{j} = ranbin(1,&nLev,.5);
else if mod(j,3) = 2 then c{j} = ranbin(1,&nLev,.4);
end;
%end;
output;
end;
run;
%mend;
%macro AddDepVar(data=,modelRHS =,errorStd = 1);
data &data;
set &data;
y = &modelRHS + &errorStd * rannor(1);
run;
%mend;
Explanation : Execution of macros to create the `traindata` work table. The first macro generates regressors, and the second calculates and adds the dependent variable `y`.
Explanation : Enables ODS graphics, then executes the `glmselect` procedure to perform model selection using the LASSO method. Classification variables `c1-c5` can be split. A spline effect is created for `x1`. The SBC criterion is used to choose the best model among the 20 selection steps.
Copied!
ods graphics on;
proc glmselect data=traindata plots=coefficients;
class c1-c5/split;
effect s1=spline(x1/split);
model y = s1 x2-x5 c:/
selection=lasso(steps=20 choose=sbc);
run;
1
ods graphics on;
2
3
PROC GLMSELECTDATA=traindata plots=coefficients;
4
class c1-c5/split;
5
effect s1=spline(x1/split);
6
model y = s1 x2-x5 c:/
7
selection=lasso(steps=20 choose=sbc);
8
RUN;
4 Code Block
PROC GLMSELECT
Explanation : Executes `proc glmselect` a second time, using the Group LASSO selection method. Variables `x2`, `x3`, and `x4` are grouped into a collection effect `s2`, forcing their selection or exclusion as a block. The `rho` parameter adjusts the penalty.
Copied!
proc glmselect data=traindata plots=coefficients;
class c1-c5;
effect s1=spline(x1);
effect s2=collection(x2 x3 x4);
model y = s1 s2 x5 c:/
selection=grouplasso(steps=20 choose=sbc rho=0.8);
run;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. WeAreCAS is an independent community site and is not affiliated with SAS Institute Inc.
This site uses technical and analytical cookies to improve your experience.
Read more.