Sans titre - WeAreCAS

The script begins by defining a `kupper_haseman` dataset via `datalines`. It then calculates the ratio `t/m` using `PROC SURVEYMEANS` and stores this value in a macro-variable `Pi`. This is followed by `DATA` steps to calculate intermediate variables necessary for the GOF test formula, including individual contributions to the statistic. `PROC MEANS` is used to sum these contributions. Finally, another `DATA` step calculates the final Chi-square statistic (`X2`) and its p-value (`Pval`) using the `probchi` function. The results are presented in an HTML output via `PROC PRINT`.

Data Analysis

Type : CREATION_INTERNE

The data used, `kupper_haseman`, is created internally directly within the script via a `datalines` statement. It comes from Kupper and Haseman (1978).

1 Code Block

DATA STEP Data

Explanation :
This block creates the `kupper_haseman` dataset which contains the `t` (number of successes) and `m` (number of trials) observations used for the Goodness-of-Fit test. The data is integrated directly into the script via the `datalines` statement.

Copied!

1	DATA kupper_haseman;
2	INPUT t m;
3	DATALINES;
4	0 5
5	2 5
6	1 7
7	0 8
8	2 8
9	3 8
10	0 9
11	4 9
12	1 10
13	6 10
14	;

2 Code Block

PROC SURVEYMEANS

Explanation :
This block uses `PROC SURVEYMEANS` to calculate the ratio of `t` to `m` from the `kupper_haseman` dataset. The estimated ratio result is stored in a temporary dataset named `Ratio`. The `ods select none` and `ods select all` statements are used to suppress the display of standard procedure output.

Copied!

1	ods select none;
2	ods OUTPUT Ratio=Ratio;
3	PROC SURVEYMEANS DATA=kupper_haseman;
4	ratio t/m;
5	RUN;
6	ods select all;

3 Code Block

DATA STEP

Explanation :
This `DATA` block reads the `Ratio` dataset (containing the calculated ratio) and uses the `call symput` function to assign the value of the `Ratio` variable to a macro-variable named `Pi`. `trim(left(Ratio))` ensures that the value is clean (without unnecessary spaces) before being stored in the macro-variable.

Copied!

1	DATA Ratio;
2	SET Ratio;
3	call symput('Pi',trim(left(Ratio)));
4	RUN;

4 Code Block

DATA STEP

Explanation :
This `DATA` block creates the `out1` dataset starting from `kupper_haseman`. It calculates several intermediate variables (`pi`, `pic`, `pipic`, `mpi`, `t_mpi`, `pit_mpi`, `tpic`, `mm_1`, `aux`) essential for Tarone's GOF test formula. The macro-variable `&Pi` is used for the estimated probability. Only `aux` and `mm_1` are kept for subsequent steps.

Copied!

1	DATA out1;
2	SET kupper_haseman;
3	pi = Π
4	pic = 1 - pi;
5	pipic = pi * pic;
6	mpi = m * pi;
7	t_mpi = t - mpi;
8	pit_mpi = pi * t_mpi;
9	tpic = t * pic;
10	mm_1 = m * (m-1);
11	aux = ( t_mpi*t_mpi + pit_mpi - tpic ) / pipic;
12	keep aux mm_1;
13	RUN;

5 Code Block

PROC MEANS

Explanation :
This block uses `PROC MEANS` to calculate the sums of the `aux` and `mm_1` variables from the `out1` dataset. The aggregated results (the sums) are stored in a new dataset `out2`. The `noprint` option suppresses the default display of `PROC MEANS` statistics.

Copied!

1	PROC MEANS DATA=out1 sum noprint;
2	var aux mm_1;
3	OUTPUT out=out2 sum=aux mm_1;
4	RUN;

6 Code Block

DATA STEP

Explanation :
This `DATA` block finalizes the GOF test calculations. It reads the `out2` dataset (containing the sums of `aux` and `mm_1`) and calculates the Chi-square statistic (`x2`) and its p-value (`pval`) using the `probchi` function. Descriptive labels and display formats are applied to the `X2` and `PVal` variables.

Copied!

1	DATA out2;
2	SET out2;
3	label X2 = "GOF Test";
4	label PVal = "P-Value";
5	FORMAT X2 8.2 Pval pvalue6.;
6	x2 = aux / sqrt( 2*mm_1 );
7	x2 = x2 * x2;
8	pval = 1 - probchi(x2,1,0);
9	RUN;

7 Code Block

PROC PRINT

Explanation :
This block generates the final output in HTML format. It sets a title for the report and uses `PROC PRINT` to display the `x2` (the GOF test statistic) and `pval` (the p-value) variables from the `out2` dataset. The `noobs` option suppresses the observation column, and `label` uses the variable labels for display.

Copied!

1	ods html;
2	title "Tarone (1979) GOF Test";
3	PROC PRINT DATA=out2 noobs label;
4	var x2 pval;
5	RUN;
6	ods html close;

This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.

Copyright Info : Tarone (1979) Two-sided GOF test H0:Binomial Distribution Versus H1:Generalized Binomial Distribution with Additive Interaction (Altham, 1978) Altham's model was simultaneously proposed by Kupper and Haseman (1978) and termed 'Correlated Binomial Model' Data below were taken from Kupper and Haseman (1978, page 75)

Retour à la liste