The Ultimate SAS Model Tuning Guide: How to Automate Regressor Selection with PROC GLMSELECT

This code is also available in: Deutsch Español Français
Difficulty Level
Beginner
Published on :
Michael

Expert Advice

Michael
Responsable de l'infrastructure Viya.

One of the most critical differences highlighted in your code is the SPLIT option in the CLASS statement. In the standard LASSO call, SPLIT allows individual levels of a categorical variable to enter or leave the model independently. In contrast, the Group LASSO call (without SPLIT) treats the entire classification effect as a single group, ensuring that you don't end up with "fragmented" variables where only one or two dummy levels are significant.

This script illustrates the use of the GLMSELECT procedure for variable selection. It begins by generating a dataset of regressors via the `%makeRegressorData` macro. Then, the `%AddDepVar` macro adds a dependent variable `y` based on a linear model and random error. Finally, two calls to `PROC GLMSELECT` are made: the first with the LASSO selection method, and the second with the Group LASSO method.
Data Analysis

Type : CREATION_INTERNE


The data is entirely generated within the script. The `%makeRegressorData` macro creates independent variables (continuous and classification) with random values. The `%AddDepVar` macro then adds the dependent variable `y` based on a linear formula applied to the previously created variables.

1 Code Block
Macro Data
Explanation :
Definition of two macros. `%makeRegressorData` generates the explanatory variables (continuous and categorical) using random number functions. `%AddDepVar` adds a dependent variable `y` to a dataset based on a formula and random error.
Copied!
1%macro makeRegressorData(DATA=,nObs=500,nCont=5,nClass=5,nLev=3);
2 DATA &DATA;
3 drop i j;
4 %IF &nCont>0 %THEN %DO; array x{&nCont} x1-x&nCont; %END;
5 %IF &nClass>0 %THEN %DO; array c{&nClass} c1-c&nClass;%END;
6 DO i = 1 to &nObs;
7 %IF &nCont>0 %THEN %DO;
8 DO j= 1 to &nCont;
9 x{j} = rannor(1);
10 END;
11 %END;
12 %IF &nClass > 0 %THEN %DO;
13 DO j=1 to &nClass;
14 IF mod(j,3) = 0 THEN c{j} = ranbin(1,&nLev,.6);
15 ELSE IF mod(j,3) = 1 THEN c{j} = ranbin(1,&nLev,.5);
16 ELSE IF mod(j,3) = 2 THEN c{j} = ranbin(1,&nLev,.4);
17 END;
18 %END;
19 OUTPUT;
20 END;
21 RUN;
22%mend;
23 
24%macro AddDepVar(DATA=,modelRHS =,errorStd = 1);
25 DATA &DATA;
26 SET &DATA;
27 y = &modelRHS + &errorStd * rannor(1);
28 RUN;
29%mend;
2 Code Block
DATA STEP Data
Explanation :
Execution of macros to create the `traindata` work table. The first macro generates regressors, and the second calculates and adds the dependent variable `y`.
Copied!
1%makeRegressorData(DATA=traindata,nObs=500,nCont=5,nClass=5,nLev=3);
2 
3%AddDepVar(DATA = traindata,
4 modelRHS= x1 +
5 0.1*x2 - 0.1*x3 - 0.01* x4 -
6 c1,
7 errorStd= 1);
3 Code Block
PROC GLMSELECT
Explanation :
Enables ODS graphics, then executes the `glmselect` procedure to perform model selection using the LASSO method. Classification variables `c1-c5` can be split. A spline effect is created for `x1`. The SBC criterion is used to choose the best model among the 20 selection steps.
Copied!
1ods graphics on;
2 
3PROC GLMSELECT DATA=traindata plots=coefficients;
4 class c1-c5/split;
5 effect s1=spline(x1/split);
6 model y = s1 x2-x5 c:/
7 selection=lasso(steps=20 choose=sbc);
8RUN;
4 Code Block
PROC GLMSELECT
Explanation :
Executes `proc glmselect` a second time, using the Group LASSO selection method. Variables `x2`, `x3`, and `x4` are grouped into a collection effect `s2`, forcing their selection or exclusion as a block. The `rho` parameter adjusts the penalty.
Copied!
1PROC GLMSELECT DATA=traindata plots=coefficients;
2 class c1-c5;
3 effect s1=spline(x1);
4 effect s2=collection(x2 x3 x4);
5 model y = s1 s2 x5 c:/
6 selection=grouplasso(steps=20 choose=sbc rho=0.8);
7RUN;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
Copyright Info : S A S S A M P L E L I B R A R Y


Related Documentation

Aucune documentation spécifique pour cette catégorie.