The logistic action fits logistic regression models for binary, binomial, and multinomial response data in SAS Viya. It provides a comprehensive set of tools for statistical modeling, including various link functions (Logit, Probit, Cloglog), and supports both classification and continuous variables. The action is highly customizable, offering multiple model selection methods like Forward, Backward, and Stepwise, as well as modern techniques like LASSO and Elastic Net for handling high-dimensional data. It can generate a rich set of output tables, including parameter estimates, odds ratios, fit statistics, and scoring code for model deployment.
| Parameter | Description |
|---|---|
| alpha | Specifies the significance level for constructing all confidence intervals. |
| class | Specifies the classification variables to be used as explanatory variables in the analysis. |
| model | Defines the model to be fit, including the dependent variable(s) and explanatory effects. |
| selection | Specifies the method for model selection, such as FORWARD, BACKWARD, STEPWISE, or LASSO. |
| output | Creates an output CAS table containing observation-wise statistics like predicted values and residuals. |
| store | Saves the fitted model to a CAS table as a binary object for later scoring or analysis. |
| ctable | Generates a classification table to evaluate model performance, including statistics like accuracy, sensitivity, and specificity. |
| oddsratio | Computes and displays odds ratios for specified variables, which is useful for interpreting the effect of predictors. |
| lackfit | Performs the Hosmer and Lemeshow goodness-of-fit test to assess how well the model fits the data. |
| repeated | Specifies options for analyzing repeated measures data, defining subject and correlation structures. |
| weight | Specifies a variable to use for weighting the observations in the analysis. |
| freq | Specifies a variable that contains the frequency of occurrence for each observation. |
| partByFrac | Partitions the input data by specifying fractions for training, validation, and testing sets. |
| partByVar | Partitions the data based on the values of a specified variable. |
This SAS code snippet creates a sample CAS table named 'getstarted'. The table contains information about patients, including their survival status, gender, age, cholesterol level, and smoking habits. This dataset is suitable for demonstrating how to fit a logistic regression model to predict a binary outcome.
| 1 | DATA casuser.getstarted; |
| 2 | INPUT STATUS $ Sex $ Age Cholesterol Smoking; |
| 3 | DATALINES; |
| 4 | Dead Male 55 220 20 |
| 5 | Alive Female 55 180 10 |
| 6 | Dead Male 65 240 30 |
| 7 | Alive Female 45 170 5 |
| 8 | Dead Female 70 260 15 |
| 9 | Alive Male 48 210 0 |
| 10 | ; |
| 11 | RUN; |
This example demonstrates a basic logistic regression analysis. It uses the 'getstarted' table and models the binary 'Status' variable, with 'Dead' as the event of interest. The model includes 'Sex' as a classification variable and 'Age' and 'Smoking' as continuous explanatory variables.
| 1 | PROC CAS; |
| 2 | regression.logistic TABLE={name='getstarted'}, |
| 3 | class={'Sex'}, |
| 4 | model={depvars={{name='Status', options={event='Dead', order='FORMATTED'}}}, |
| 5 | effects={'Sex', 'Age', 'Smoking'}}; |
| 6 | RUN; |
This example performs a logistic regression with stepwise model selection to identify the most significant predictors. It also demonstrates how to generate an output dataset with predicted probabilities and request odds ratios for the final selected model's variables.
| 1 | PROC CAS; |
| 2 | regression.logistic TABLE='getstarted', |
| 3 | class={'Sex'}, |
| 4 | model={depvar={{name='Status', options={event='Dead'}}}, |
| 5 | effects={'Sex', 'Age', 'Cholesterol', 'Smoking'}}, |
| 6 | selection={method='STEPWISE', details='ALL'}, |
| 7 | OUTPUT={casOut={name='logistic_output', replace=true}, pred='predProb', role='role'}, |
| 8 | oddsratio={vars={'Sex', 'Age', 'Smoking'}}$ |
| 9 | RUN; |
This example fits a generalized logit model for a multinomial response variable. It uses the 'getstarted' dataset and models the 'Smoking' status (categorized) based on 'Age' and 'Sex'. This is useful when the response variable has more than two unordered categories.
| 1 | PROC CAS; |
| 2 | regression.logistic TABLE='getstarted', |
| 3 | class={'Sex'}, |
| 4 | model={depvar={{name='Smoking'}}, |
| 5 | effects={'Sex', 'Age'}, |
| 6 | link='GLOGIT'}; |
| 7 | RUN; |