Published on :
Statistics CREATION_INTERNE

Example 8 for PROC GLMSELECT

This code is also available in: Deutsch Español Français
Awaiting validation
This script illustrates the use of the GLMSELECT procedure for variable selection. It begins by generating a dataset of regressors via the `%makeRegressorData` macro. Then, the `%AddDepVar` macro adds a dependent variable `y` based on a linear model and random error. Finally, two calls to `PROC GLMSELECT` are made: the first with the LASSO selection method, and the second with the Group LASSO method.
Data Analysis

Type : CREATION_INTERNE


The data is entirely generated within the script. The `%makeRegressorData` macro creates independent variables (continuous and classification) with random values. The `%AddDepVar` macro then adds the dependent variable `y` based on a linear formula applied to the previously created variables.

1 Code Block
Macro Data
Explanation :
Definition of two macros. `%makeRegressorData` generates the explanatory variables (continuous and categorical) using random number functions. `%AddDepVar` adds a dependent variable `y` to a dataset based on a formula and random error.
Copied!
1%macro makeRegressorData(DATA=,nObs=500,nCont=5,nClass=5,nLev=3);
2 DATA &DATA;
3 drop i j;
4 %IF &nCont>0 %THEN %DO; array x{&nCont} x1-x&nCont; %END;
5 %IF &nClass>0 %THEN %DO; array c{&nClass} c1-c&nClass;%END;
6 DO i = 1 to &nObs;
7 %IF &nCont>0 %THEN %DO;
8 DO j= 1 to &nCont;
9 x{j} = rannor(1);
10 END;
11 %END;
12 %IF &nClass > 0 %THEN %DO;
13 DO j=1 to &nClass;
14 IF mod(j,3) = 0 THEN c{j} = ranbin(1,&nLev,.6);
15 ELSE IF mod(j,3) = 1 THEN c{j} = ranbin(1,&nLev,.5);
16 ELSE IF mod(j,3) = 2 THEN c{j} = ranbin(1,&nLev,.4);
17 END;
18 %END;
19 OUTPUT;
20 END;
21 RUN;
22%mend;
23 
24%macro AddDepVar(DATA=,modelRHS =,errorStd = 1);
25 DATA &DATA;
26 SET &DATA;
27 y = &modelRHS + &errorStd * rannor(1);
28 RUN;
29%mend;
2 Code Block
DATA STEP Data
Explanation :
Execution of macros to create the `traindata` work table. The first macro generates regressors, and the second calculates and adds the dependent variable `y`.
Copied!
1%makeRegressorData(DATA=traindata,nObs=500,nCont=5,nClass=5,nLev=3);
2 
3%AddDepVar(DATA = traindata,
4 modelRHS= x1 +
5 0.1*x2 - 0.1*x3 - 0.01* x4 -
6 c1,
7 errorStd= 1);
3 Code Block
PROC GLMSELECT
Explanation :
Enables ODS graphics, then executes the `glmselect` procedure to perform model selection using the LASSO method. Classification variables `c1-c5` can be split. A spline effect is created for `x1`. The SBC criterion is used to choose the best model among the 20 selection steps.
Copied!
1ods graphics on;
2 
3PROC GLMSELECT DATA=traindata plots=coefficients;
4 class c1-c5/split;
5 effect s1=spline(x1/split);
6 model y = s1 x2-x5 c:/
7 selection=lasso(steps=20 choose=sbc);
8RUN;
4 Code Block
PROC GLMSELECT
Explanation :
Executes `proc glmselect` a second time, using the Group LASSO selection method. Variables `x2`, `x3`, and `x4` are grouped into a collection effect `s2`, forcing their selection or exclusion as a block. The `rho` parameter adjusts the penalty.
Copied!
1PROC GLMSELECT DATA=traindata plots=coefficients;
2 class c1-c5;
3 effect s1=spline(x1);
4 effect s2=collection(x2 x3 x4);
5 model y = s1 s2 x5 c:/
6 selection=grouplasso(steps=20 choose=sbc rho=0.8);
7RUN;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
Copyright Info : S A S S A M P L E L I B R A R Y