The script consists of two independent analyses. The first part creates a 'drinking' table to analyze the link between alcohol consumption and cirrhosis rates by country. It generates a scatter plot, executes several regression models with PROC REG, including a model excluding a specific country (France), and identifies influential observations. The second part creates a 'universe' table containing data on galaxy distance and velocity. It visualizes this data and fits a linear regression model without an intercept to illustrate Hubble's Law, also identifying high-leverage points.
Data Analysis
Type : CREATION_INTERNE
Both datasets, 'drinking' and 'universe', are generated within the script using a DATA STEP and the 'cards' or 'datalines' statement, making them self-contained.
1 Code Block
DATA STEP Data
Explanation : Creates the 'drinking' work table from manually entered data using the 'cards' statement. The table contains three variables: country name, alcohol consumption, and cirrhosis rate.
Copied!
data drinking;
input country $ 1-12 alcohol cirrhosis;
cards;
France 24.7 46.1
Italy 15.2 23.6
W.Germany 12.3 23.7
Austria 10.9 7.0
Belgium 10.8 12.3
USA 9.9 14.2
Canada 8.3 7.4
E&W 7.2 3.0
Sweden 6.6 7.2
Japan 5.8 10.6
Netherlands 5.7 3.7
Ireland 5.6 3.4
Norway 4.2 4.3
Finland 3.9 3.6
Israel 3.1 5.4
;
run;
1
DATA drinking;
2
INPUT country $ 1-12 alcohol cirrhosis;
3
CARDS;
4
France 24.746.1
5
Italy 15.223.6
6
W.Germany 12.323.7
7
Austria 10.97.0
8
Belgium 10.812.3
9
USA 9.914.2
10
Canada 8.37.4
11
E&W 7.23.0
12
Sweden 6.67.2
13
Japan 5.810.6
14
Netherlands 5.73.7
15
Ireland 5.63.4
16
Norway 4.24.3
17
Finland 3.93.6
18
Israel 3.15.4
19
;
20
RUN;
2 Code Block
PROC SGPLOT
Explanation : Generates a scatter plot to visualize the relationship between alcohol consumption ('alcohol') and cirrhosis ('cirrhosis'). Each point is labeled with the country name. The commented-out code block shows an older method to obtain a similar result with PROC GPLOT.
Explanation : Performs a simple linear regression to model the cirrhosis rate as a function of alcohol consumption. `ODS GRAPHICS ON` automatically generates regression diagnostic plots. The commented-out code presents an alternative for superimposing a regression line on a scatter plot with PROC SGPLOT.
Explanation : Executes a new linear regression model excluding the observation for France, which was identified as a potentially influential point in the previous graph.
Copied!
proc reg data=drinking;
model cirrhosis=alcohol;
where country ne 'France';
run; quit;
1
PROC REGDATA=drinking;
2
model cirrhosis=alcohol;
3
where country ne 'France';
4
RUN; QUIT;
5 Code Block
PROC REG Data
Explanation : Re-executes the regression on the entire dataset and saves the diagnostic statistics into a new 'regout' table. The PROC PRINT procedure is then used to display observations that are considered outliers (absolute studentized residual > 2) or influential points (leverage > 0.3).
Copied!
proc reg data=drinking;
model cirrhosis=alcohol;
output out=regout predicted=pred student=zres h=leverage;
run; quit;
proc print data=regout;
where abs(zres)>2 or leverage>.3;
run;
Explanation : Generates a scatter plot to visualize the relationship between a galaxy's distance and its recession velocity, adding labels to the axes. The commented-out code shows the equivalent with the deprecated PROC GPLOT procedure.
Explanation : Fits a linear regression model for velocity as a function of distance. The `NOINT` option forces the regression line to pass through the origin, which is consistent with Hubble's Law (Velocity = H0 * Distance).
Explanation : Re-executes the regression without an intercept and saves the diagnostic statistics to the 'regout' table (overwriting the previous one). PROC PRINT then displays observations with a leverage greater than 0.08, thus identifying the most influential points on the model estimation.
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. WeAreCAS is an independent community site and is not affiliated with SAS Institute Inc.
This site uses technical and analytical cookies to improve your experience.
Read more.