Published on :
Statistics INTERNAL_CREATION

Score Test Statistic, Dean (1992)

This code is also available in: Deutsch Español Français
Awaiting validation
The script begins by creating an internal dataset 'toxoplasmosis' via a DATA STEP and datalines, including variables for the number of successes (t), the total number of trials (m), and a 'rain' variable which is copied to 'z'. Then, the 'z' variable is standardized using PROC STDIZE. A PROC GLIMMIX is used to fit a generalized linear mixed model with a logit link and binomial distribution, modeling 't/m' as a function of 'z', 'z*z' and 'z*z*z'. Model predictions are exported for score statistic calculation. A subsequent DATA STEP calculates intermediate terms necessary for the test. PROC MEANS aggregates these terms, and a final DATA STEP calculates the Z-statistic of the score test and its associated p-value. Finally, PROC PRINT displays the results in formatted HTML output.
Data Analysis

Type : INTERNAL_CREATION


Data is created directly within the script via a DATA STEP and 'datalines'.

1 Code Block
DATA STEP Data
Explanation :
This DATA STEP block creates the 'toxoplasmosis' dataset by reading raw data (t, m, rain) directly from 'datalines'. A new variable 'z' is created as a copy of the 'rain' variable.
Copied!
1 DATA toxoplasmosis;
2 INPUT t m rain;
3 z = rain;
4 DATALINES;
5 2 4 1735
6 3 10 1936
7 1 5 2000
8 3 10 1973
9 2 2 1750
10 3 5 1800
11 2 8 1750
12 7 19 2077
13 3 6 1920
14 8 10 1800
15 7 24 2050
16 0 1 1830
17 15 30 1650
18 4 22 2200
19 0 1 2000
20 6 11 1770
21 0 1 1920
22 33 54 1770
23 4 9 2240
24 5 18 1620
25 2 12 1756
26 0 1 1650
27 8 11 2250
28 41 77 1796
29 24 51 1890
30 7 16 1871
31 46 82 2063
32 9 13 2100
33 23 43 1918
34 53 75 1834
35 8 13 1780
36 3 10 1900
37 1 6 1976
38 23 37 2292
39 ;
2 Code Block
PROC STDIZE
Explanation :
This procedure standardizes the 'z' variable in the 'toxoplasmosis' dataset. The output dataset overwrites the original, ensuring that subsequent calculations use the standardized version of 'z'.
Copied!
1 
2PROC STDIZE
3DATA=toxoplasmosis out=toxoplasmosis;
4var z;
5 
6RUN;
7 
3 Code Block
PROC GLIMMIX
Explanation :
PROC GLIMMIX is used to fit a generalized linear mixed model. The model specifies a binomial response (t/m), a logit link function, and includes 'z', 'z*z', and 'z*z*z' as predictors. The 's' option requests summary statistics. The 'output' clause creates a new dataset 'pdata' containing predicted probabilities ('pi') without random effects (noblup) and on the response scale (ilink).
Copied!
1 ods select none;
2 PROC GLIMMIX DATA=toxoplasmosis;
3 model t/m = z z*z z*z*z / link=logit dist=bin s;
4 OUTPUT out=pdata pred(noblup ilink) = pi;
5 RUN;
6 ods select all;
4 Code Block
DATA STEP
Explanation :
This DATA STEP reads the 'pdata' dataset (created by GLIMMIX) and calculates several intermediate variables ('pic', 'pipic', 'mpi', 't_mpi', 'pit_mpi', 'tpic', 'mm_1', 'aux') necessary for Dean's score test statistic formula. Only 'aux' and 'mm_1' are kept for subsequent steps.
Copied!
1 DATA pdata;
2 SET pdata;
3 pic = 1 - pi;
4 pipic = pi * pic;
5 mpi = m * pi;
6 t_mpi = t - mpi;
7 pit_mpi = pi * t_mpi;
8 tpic = t * pic;
9 mm_1 = m * (m-1);
10 aux = ( t_mpi*t_mpi + pit_mpi - tpic ) / pipic;
11 keep aux mm_1;
12 RUN;
5 Code Block
PROC MEANS
Explanation :
PROC MEANS is used here to calculate the sum of 'aux' and 'mm_1' variables across the entire 'pdata' dataset. The result is stored in a new dataset called 'new', and the 'noprint' option suppresses the display of PROC MEANS' default output.
Copied!
1 PROC MEANS DATA=pdata sum noprint;
2 var aux mm_1;
3 OUTPUT out=new sum=aux mm_1;
4 RUN;
6 Code Block
DATA STEP
Explanation :
This DATA STEP reads the 'new' dataset (containing the sums of 'aux' and 'mm_1') and calculates the Z-statistic of the score test as well as its p-value ('pval'). Formats and labels are applied to the variables for better presentation.
Copied!
1 DATA new;
2 SET new;
3 label Z = "GOF Test";
4 label PVal = "P-Value";
5 FORMAT Z 8.2 Pval pvalue6.;
6 z = aux / sqrt( 2*mm_1 );
7 pval = 1 - probnorm( z );
8 RUN;
7 Code Block
PROC PRINT
Explanation :
This block generates the final output. ODS HTML is enabled to direct output to an HTML file (or the SAS Studio environment). A title is set. PROC PRINT is used to display the 'z' (test statistic) and 'pval' (p-value) variables from the 'new' dataset. The 'noobs' option suppresses the observation number, and 'label' uses the defined labels for column headers. ODS HTML is then closed.
Copied!
1 ods html;
2 title "Score Test Statistic, Dean (1992)";
3 PROC PRINT DATA=new noobs label;
4 var z pval;
5 RUN;
6 ods html close;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
Copyright Info : Score Test Statistic, Dean (1992). Example from Efron (1978, 1986).