The script begins by defining a `kupper_haseman` dataset via `datalines`. It then calculates the ratio `t/m` using `PROC SURVEYMEANS` and stores this value in a macro-variable `Pi`. This is followed by `DATA` steps to calculate intermediate variables necessary for the GOF test formula, including individual contributions to the statistic. `PROC MEANS` is used to sum these contributions. Finally, another `DATA` step calculates the final Chi-square statistic (`X2`) and its p-value (`Pval`) using the `probchi` function. The results are presented in an HTML output via `PROC PRINT`.
Data Analysis
Type : CREATION_INTERNE
The data used, `kupper_haseman`, is created internally directly within the script via a `datalines` statement. It comes from Kupper and Haseman (1978).
1 Code Block
DATA STEP Data
Explanation : This block creates the `kupper_haseman` dataset which contains the `t` (number of successes) and `m` (number of trials) observations used for the Goodness-of-Fit test. The data is integrated directly into the script via the `datalines` statement.
Copied!
data kupper_haseman;
input t m;
datalines;
0 5
2 5
1 7
0 8
2 8
3 8
0 9
4 9
1 10
6 10
;
1
DATA kupper_haseman;
2
INPUT t m;
3
DATALINES;
4
0 5
5
25
6
17
7
0 8
8
28
9
38
10
0 9
11
49
12
110
13
610
14
;
2 Code Block
PROC SURVEYMEANS
Explanation : This block uses `PROC SURVEYMEANS` to calculate the ratio of `t` to `m` from the `kupper_haseman` dataset. The estimated ratio result is stored in a temporary dataset named `Ratio`. The `ods select none` and `ods select all` statements are used to suppress the display of standard procedure output.
Explanation : This `DATA` block reads the `Ratio` dataset (containing the calculated ratio) and uses the `call symput` function to assign the value of the `Ratio` variable to a macro-variable named `Pi`. `trim(left(Ratio))` ensures that the value is clean (without unnecessary spaces) before being stored in the macro-variable.
Copied!
data Ratio;
set Ratio;
call symput('Pi',trim(left(Ratio)));
run;
1
DATA Ratio;
2
SET Ratio;
3
call symput('Pi',trim(left(Ratio)));
4
RUN;
4 Code Block
DATA STEP
Explanation : This `DATA` block creates the `out1` dataset starting from `kupper_haseman`. It calculates several intermediate variables (`pi`, `pic`, `pipic`, `mpi`, `t_mpi`, `pit_mpi`, `tpic`, `mm_1`, `aux`) essential for Tarone's GOF test formula. The macro-variable `&Pi` is used for the estimated probability. Only `aux` and `mm_1` are kept for subsequent steps.
Copied!
data out1;
set kupper_haseman;
pi = Π
pic = 1 - pi;
pipic = pi * pic;
mpi = m * pi;
t_mpi = t - mpi;
pit_mpi = pi * t_mpi;
tpic = t * pic;
mm_1 = m * (m-1);
aux = ( t_mpi*t_mpi + pit_mpi - tpic ) / pipic;
keep aux mm_1;
run;
1
DATA out1;
2
SET kupper_haseman;
3
pi = Π
4
pic = 1 - pi;
5
pipic = pi * pic;
6
mpi = m * pi;
7
t_mpi = t - mpi;
8
pit_mpi = pi * t_mpi;
9
tpic = t * pic;
10
mm_1 = m * (m-1);
11
aux = ( t_mpi*t_mpi + pit_mpi - tpic ) / pipic;
12
keep aux mm_1;
13
RUN;
5 Code Block
PROC MEANS
Explanation : This block uses `PROC MEANS` to calculate the sums of the `aux` and `mm_1` variables from the `out1` dataset. The aggregated results (the sums) are stored in a new dataset `out2`. The `noprint` option suppresses the default display of `PROC MEANS` statistics.
Copied!
proc means data=out1 sum noprint;
var aux mm_1;
output out=out2 sum=aux mm_1;
run;
1
PROC MEANSDATA=out1 sum noprint;
2
var aux mm_1;
3
OUTPUT out=out2 sum=aux mm_1;
4
RUN;
6 Code Block
DATA STEP
Explanation : This `DATA` block finalizes the GOF test calculations. It reads the `out2` dataset (containing the sums of `aux` and `mm_1`) and calculates the Chi-square statistic (`x2`) and its p-value (`pval`) using the `probchi` function. Descriptive labels and display formats are applied to the `X2` and `PVal` variables.
Copied!
data out2;
set out2;
label X2 = "GOF Test";
label PVal = "P-Value";
format X2 8.2 Pval pvalue6.;
x2 = aux / sqrt( 2*mm_1 );
x2 = x2 * x2;
pval = 1 - probchi(x2,1,0);
run;
1
DATA out2;
2
SET out2;
3
label X2 = "GOF Test";
4
label PVal = "P-Value";
5
FORMAT X2 8.2 Pval pvalue6.;
6
x2 = aux / sqrt( 2*mm_1 );
7
x2 = x2 * x2;
8
pval = 1 - probchi(x2,1,0);
9
RUN;
7 Code Block
PROC PRINT
Explanation : This block generates the final output in HTML format. It sets a title for the report and uses `PROC PRINT` to display the `x2` (the GOF test statistic) and `pval` (the p-value) variables from the `out2` dataset. The `noobs` option suppresses the observation column, and `label` uses the variable labels for display.
Copied!
ods html;
title "Tarone (1979) GOF Test";
proc print data=out2 noobs label;
var x2 pval;
run;
ods html close;
1
ods html;
2
title "Tarone (1979) GOF Test";
3
PROC PRINTDATA=out2 noobs label;
4
var x2 pval;
5
RUN;
6
ods html close;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
Copyright Info : Tarone (1979) Two-sided GOF test
H0:Binomial Distribution Versus
H1:Generalized Binomial Distribution with Additive Interaction (Altham, 1978)
Altham's model was simultaneously proposed by Kupper and Haseman (1978)
and termed 'Correlated Binomial Model'
Data below were taken from Kupper and Haseman (1978, page 75)
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. WeAreCAS is an independent community site and is not affiliated with SAS Institute Inc.
This site uses technical and analytical cookies to improve your experience.
Read more.