The script begins by creating a `haseman_soares` dataset from inline data (`datalines`). It then transforms this dataset to have one observation per frequency. Two macros, `%GOF_BB` and `%GOF_RCB`, are called to perform the Goodness-of-Fit tests. The residuals generated by these macros are then combined, sorted, and ranked to prepare for the creation of Q-Q plots. Finally, `PROC SGPANEL` is used to visualize these residuals as Q-Q plots, allowing for the evaluation of the residual distribution.
Data Analysis
Type : INTERNAL_CREATION
The initial `haseman_soares` dataset is created directly within the script via a `datalines` statement, then transformed to expand observations based on the `freq` column.
1 Code Block
DATA STEP Data
Explanation : This DATA STEP block creates the `haseman_soares` dataset using raw data provided by `datalines`. It reads variables `m` and `t1` to `t10`. A `DO OVER` loop on the array `tt` (composed of `t1` to `t10`) is used to transform the data, creating rows for each non-null `freq`, where `t` is the column index and `freq` is the corresponding value.
Explanation : This DATA STEP block post-processes the `haseman_soares` dataset. It deletes observations where `freq` is missing. For each remaining observation, it generates a number of rows equal to the value of `freq`, thus denormalizing the data so that each row represents a single occurrence of the event. Variables `i` and `freq` are then dropped.
Copied!
data haseman_soares;
set haseman_soares;
if freq = . then delete;
do i=1 to freq;
output;
end;
drop i freq;
run;
1
DATA haseman_soares;
2
SET haseman_soares;
3
IF freq = . THEN delete;
4
DO i=1 to freq;
5
OUTPUT;
6
END;
7
drop i freq;
8
RUN;
3 Code Block
ODS
Explanation : These ODS (Output Delivery System) statements enable HTML output and graphics generation for subsequent procedures. The output will be saved in HTML format.
Copied!
ods html;
ods graphics on;
1
ods html;
2
ods graphics on;
4 Code Block
MACRO
Explanation : This block calls two SAS macros, `%GOF_BB` and `%GOF_RCB`, which are assumed to perform Goodness-of-Fit tests. They take the `haseman_soares` dataset as input and use variables `t` and `m` for their calculations. The `title2` parameter is used to add a subtitle to the output generated by the macros.
Copied!
%GOF_BB (inds=haseman_soares,t=t,m=m,title2=DataSet III -- Haseman and Soares (1976));
%GOF_RCB(inds=haseman_soares,t=t,m=m,title2=Dataset III -- Haseman and Soares (1976));
1
%GOF_BB (inds=haseman_soares,t=t,m=m,title2=DataSet III -- Haseman and Soares (1976));
2
%GOF_RCB(inds=haseman_soares,t=t,m=m,title2=Dataset III -- Haseman and Soares (1976));
3
5 Code Block
DATA STEP Data
Explanation : This DATA STEP combines the `Resid_BB` and `Resid_RCB` datasets, which contain the residuals from the Goodness-of-Fit tests, into a new single dataset named `Resid_BB_RCB`. This prepares the data for unified analysis and visualization.
Copied!
*--- Construct QQ Plots;
data Resid_BB_RCB;
set Resid_BB Resid_RCB;
run;
1
*--- Construct QQ Plots;
2
DATA Resid_BB_RCB;
3
SET Resid_BB Resid_RCB;
4
RUN;
6 Code Block
PROC SORT
Explanation : The `PROC SORT` procedure sorts the `Resid_BB_RCB` dataset by the `Distribution` variable. This sorting is essential for subsequent analysis steps, particularly for `PROC RANK` and `PROC SGPANEL` which may require data sorted by group.
Copied!
proc sort data=Resid_BB_RCB;
by Distribution;
run;
1
2
PROC SORT
3
DATA=Resid_BB_RCB;
4
BY Distribution;
5
6
RUN;
7
7 Code Block
PROC RANK Data
Explanation : The `PROC RANK` procedure is used to calculate normal ranks (normal quantiles) of the residuals. It takes `Resid_BB_RCB` as input and creates a new dataset `new_qqplots`. The `normal=blom` option uses Blom's formula for calculating normal scores, and `ties=mean` handles ties by assigning them the mean rank. The `Resid` variable is ranked, and the result is stored in the new variable `NQuant`, grouped by `Distribution`.
Copied!
proc rank data=Resid_BB_RCB out=new_qqplots normal=blom ties=mean;
by Distribution;
var Resid;
ranks NQuant;
run;
Explanation : This block uses `PROC SGPANEL` to generate Q-Q (quantile-quantile) plots of the residuals. The plot is paneled by `Distribution`, meaning a distinct Q-Q plot will be created for each `Distribution` value. Titles are defined, axis labels are customized, and a regression line (`reg`) is added to the plot to facilitate comparison of residuals with a theoretical normal distribution.
Copied!
proc sgpanel data=new_qqplots noautolegend;
panelby Distribution;
title1 "DataSet III -- Haseman and Soares (1976)";
title2 "QQ-Plots of Residuals based on Observed and Expected Frequencies";
label Resid="Residuals" NQuant="Normal Quantiles";
reg x=Resid y=NQuant;
run;
1
PROC SGPANELDATA=new_qqplots noautolegend;
2
panelby Distribution;
3
title1 "DataSet III -- Haseman and Soares (1976)";
4
title2 "QQ-Plots of Residuals based on Observed and Expected Frequencies";
Explanation : These instructions disable ODS graphics generation and close the HTML destination, thus terminating the output of results to the HTML file.
Copied!
ods graphics off;
ods html close;
1
ods graphics off;
2
ods html close;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
Copyright Info : Source: Haseman and Soares (1976)
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. WeAreCAS is an independent community site and is not affiliated with SAS Institute Inc.
This site uses technical and analytical cookies to improve your experience.
Read more.