The `glm` action fits linear regression models using the method of least squares. It allows specifying various model effects, selection methods, and output options. It supports confidence intervals, diagnostic statistics, and saving model output to CAS tables. Classification variables can be defined with global options or individually, and polynomial and spline effects are also supported. Data can be partitioned for training, validation, and testing.
| Parameter | Description |
|---|---|
| alpha | Specifies the significance level to use for the construction of all confidence intervals. Default: 0.05. Range: (0, 1). |
| attributes | Changes the attributes of variables used in this action. Subparameters: format, formattedLength, label, *name, nfd, nfl. |
| byLimit | Specifies that the analysis not be performed if the number of BY groups exceeds the specified value. Minimum value: 1. |
| class | Names the classification variables to be used as explanatory variables in the analysis. Subparameters: countMissing, descending, ignoreMissing, levelizeRaw, maxLev, order, param, ref, split, *vars. |
| classGlobalOpts | Lists options that apply to all classification variables. Subparameters: countMissing, descending, ignoreMissing, levelizeRaw, maxLev, order, param, ref, split. |
| classLevelsPrint | When set to False, suppresses the display of class levels. Default: TRUE. |
| clb | When set to True, displays upper and lower confidence limits for the parameter estimates. Default: FALSE. |
| code | Writes SAS DATA step code for computing predicted values of the fitted model. Subparameters include casOut (for output table settings), comment, fmtWdth, indentSize, intoCutPt, iProb, labelId, lineSize, noTrim, pCatAll, tabForm. |
| collection | Defines a set of variables that are treated as a single effect that has multiple degrees of freedom. Subparameters: details, *name, *vars. |
| display | Specifies a list of results tables to send to the client for display. Subparameters: caseSensitive, exclude, excludeAll, keyIsPath, names, pathType, traceNames. |
| freq | Names the numeric variable that contains the frequency of occurrence of each observation. |
| inputs | Specifies variables to use for analysis. Subparameters: format, formattedLength, label, *name, nfd, nfl. |
| maxParameters | Specifies that models not be fit if the number of parameters exceeds the specified value. Minimum value: 0. |
| model | Names the dependent variable, explanatory effects, and model options. Subparameters: addlaststopstep, clb, depVars, effects, entry, include, informative, noint, ridge, ss3, start, stb, tol, vif, xpx, xpxScaled, xpxUnscaled. |
| model.depVars | Subparameter of `model`. Specifies one or more variables to use as response variables in the model. Subparameter: name. |
| model.effects | Subparameter of `model`. Specifies a list of effects that define the model. Subparameters: interaction, maxInteract, nest, *vars. |
| model.include | Subparameter of `model`. Specifies effects to include at the start of the selection process. Can be an integer or a list of effects. |
| model.informative | Subparameter of `model`. When set to True, models missing values using extra model effects. Default: FALSE. |
| model.noint | Subparameter of `model`. When set to True, does not include the intercept term in the model. Default: FALSE. |
| model.ridge | Subparameter of `model`. Specifies the ridge constant values for ridge regression. |
| model.ss3 | Subparameter of `model`. When set to True, performs a model analysis of variance based on type III sums of squares. Default: FALSE. |
| model.start | Subparameter of `model`. Specifies effects to use to begin the selection process in FORWARD, FORWARDSWAP, and STEPWISE methods. Can be an integer or a list of effects. |
| model.stb | Subparameter of `model`. When set to True, produces standardized regression coefficients. Default: FALSE. |
| model.tol | Subparameter of `model`. When set to True, produces tolerance values for the estimates. Default: FALSE. |
| model.vif | Subparameter of `model`. When set to True, produces variance inflation factors with the parameter estimates. Default: FALSE. |
| model.xpx | Subparameter of `model`. Crossproducts. Default: FALSE. |
| model.xpxScaled | Subparameter of `model`. Scaled Crossproducts. Default: FALSE. |
| model.xpxUnscaled | Subparameter of `model`. Unscaled Crossproducts. Default: FALSE. |
| multimember | Uses one or more classification variables specified in the vars parameter such that each observation can be associated with one or more levels. Subparameters: details, *name, noEffect, stdize, *vars, weight. |
| nClassLevelsPrint | Limits the display of class levels. The value 0 suppresses all levels. Minimum value: 0. |
| nominals | Specifies nominal variables to use for analysis. Subparameters: format, formattedLength, label, *name, nfd, nfl. |
| output | Creates a table on the server that contains observationwise statistics, computed after fitting the model. Subparameters: *casOut (for output table settings), cooksD, copyVars, covRatio, dffits, h, lcl, lclm, likeDist, pred, press, resid, role, rStudent, stdi, stdp, stdr, student, ucl, uclm. |
| outputTables | Lists the names of results tables to save as CAS tables on the server. Subparameters: groupByVarsRaw, includeAll, names, repeated, replace. |
| parmEstLevDetails | Specifies whether to add raw and formatted values of classification variables in the ParameterEstimates table. Options: NONE, RAW, RAW_AND_FORMATTED. Default: RAW. |
| partByFrac | Specifies the fractions of the data to be used for validation and testing. Subparameters: seed, test, validate. |
| partByVar | Names the variable and its values used to partition the data into training, validation, and testing roles. Subparameters: *name, test, train, validate. |
| polynomial | Specifies a polynomial effect. All specified variables must be numeric. Subparameters: degree, details, labelStyle, mDegree, *name, noSeparate, standardize, *vars. |
| selection | Specifies the method and options for performing model selection. Subparameters: adaptive, bestSubsetOptions, candidates, choose, competitive, details, elasticNetOptions, enscale, ensteps, fcpSelectionOptions, gamma, hierarchy, kappa, L2, L2HIGH, L2LOW, lsCoeffs, maxEffects, maxSteps, method, minEffects, orderSelect, plots, relaxed, select, slEntry, slStay, stop, stopHorizon. |
| selection.bestSubsetOptions | Subparameter of `selection`. Specifies options to perform best-subset selection. Subparameters: best, computeBeta, displayAIC, displayBIC, displayGMSEP, displayJP, displayMSE, displayPC, displayRMSE, displaySBC, displaySP, displaySSE, sigma. |
| selection.elasticNetOptions | Subparameter of `selection`. Specifies options to use in performing elastic net selection methods. Subparameters: absFConv, fConv, gConv, lambda, mixing, numLambda, rho, solver. |
| selection.fcpSelectionOptions | Subparameter of `selection`. Specifies options to use in performing the folded concave penalized (FCP) selection methods. Subparameters: alpha, bigM, coefTol, intTol, lambda, lambdaGrid, maxAlpha, maxIterAlpha, maxIterLambda, maxLambda, maxTime, minAlpha, minLambda, scale, solver. |
| spline | Expands variables into spline bases whose form depends on the specified parameters. Subparameters: basis, dataBoundary, degree, details, knotMax, knotMethod, knotMin, *name, naturalCubic, separate, split, *vars. |
| ss3 | When set to True, performs a model analysis of variance based on type III sums of squares. Default: FALSE. |
| store | Stores regression models to a binary large object (BLOB). Subparameters: caslib, compress, indexVars, label, lifetime, maxMemSize, memoryFormat, name, promote, replace, replication, tableRedistUpPolicy, threadBlockSize, timeStamp, where. |
| table | Specifies the input data table. Subparameters: caslib, computedOnDemand, computedVars, computedVarsProgram, dataSourceOptions, groupBy, groupByMode, importOptions, *name, orderBy, singlePass, vars, where, whereTable. |
| target | Specifies the target variable to use for analysis. |
| weight | Names the numeric variable to use to perform a weighted analysis of the data. |
This example shows how to create a simple CAS table for use with the `glm` action.
| 1 | DATA casuser.mydata; |
| 2 | INPUT x y z @@; |
| 3 | CARDS; |
| 4 | 1 10 100 2 12 110 3 15 120 4 18 130 5 20 140 |
| 5 | 6 22 150 7 25 160 8 28 170 9 30 180 10 33 190 |
| 6 | ; |
This example performs a simple linear regression using `x` as the independent variable and `y` as the dependent variable.
| 1 | PROC CAS; |
| 2 | regression.glm / |
| 3 | TABLE={name='mydata'}, |
| 4 | model={depVars={{name='y'}}, effects={{vars={'x'}}}}; |
| 5 | RUN; |
| 6 | QUIT; |
This example demonstrates fitting a linear regression model with multiple predictors, including a classification variable, and generating an output table with predicted values and residuals.
| 1 | PROC CAS; |
| 2 | /* Load the data */ |
| 3 | DATA casuser.cars; |
| 4 | SET sashelp.cars; |
| 5 | IF make='Audi' THEN type_cat='German'; |
| 6 | ELSE IF make='BMW' THEN type_cat='German'; |
| 7 | ELSE IF make='Toyota' THEN type_cat='Japanese'; |
| 8 | ELSE IF make='Honda' THEN type_cat='Japanese'; |
| 9 | ELSE type_cat='Other'; |
| 10 | RUN; |
| 11 | |
| 12 | /* Run the glm action */ |
| 13 | regression.glm / |
| 14 | TABLE={name='cars'}, |
| 15 | model={depVars={{name='MSRP'}}, effects={{vars={'Horsepower'}}, {vars={'type_cat'}}, {vars={'Horsepower', 'type_cat'}, interaction='CROSS'}}}, |
| 16 | class={{vars={'type_cat'}}}, |
| 17 | OUTPUT={casOut={name='predicted_cars', replace=true}, pred='PredictedMSRP', resid='Residuals'}; |
| 18 | RUN; |
| 19 | QUIT; |