Published on :
Statistical CREATION_INTERNE

Homogeneity Test for Aggregated Trinomial Results

This code is also available in: Deutsch Español Français
Awaiting validation
The script begins with a DATA step to generate a dataset named `test_of_homogeneity`. It simulates categorical responses (low, medium, high) for 175 panelists (`subjid`) and 8 repeated measurements per panelist, for two different products. The simulation uses a 'random-clumped' multinomial distribution to introduce intra-cluster correlation. Then, `PROC SURVEYLOGISTIC` is used to model the response `y` as a function of `product`, taking into account the cluster structure via the `CLUSTER subjid` statement. Probability estimates and comparisons are performed with `LSMEANS` and `ESTIMATE`. Finally, `PROC SURVEYFREQ` calculates a chi-square test for the association between `product` and `y`, also adjusted for clustering.
Data Analysis

Type : CREATION_INTERNE


The data is entirely generated within the first DATA STEP. The script simulates trinomial results for two products using predefined parameters (number of clusters, cluster size, underlying probabilities, intra-cluster correlation) and the `uniform()` function for random number generation.

1 Code Block
DATA STEP Data
Explanation :
This DATA STEP block generates the `test_of_homogeneity` table. It simulates data for `n` subjects (clusters) and `m` observations per subject, for two products. It uses a 'random-clumped multinomial' simulation method to create correlated trinomial responses, based on predefined probabilities (`pi11`, `pi12`, etc.) and an intra-cluster correlation (`rho2`). The seed (`seed`) is fixed for reproducibility.
Copied!
1DATA test_of_homogeneity;
2n = 175; *--- Number of Panelists (Clusters) per Test Product;
3m = 8; *--- Number of Repeated Measurements per Panelist;
4rho2 = 0.15; *--- Intra Cluster Correlation;
5pi11 = 0.880; *--- Probability Category 1, Product 1;
6pi21 = 0.900; *--- Probability Category 1, Product 2;
7pi12 = 0.110; *--- Probability Category 2, Product 1;
8pi22 = 0.075; *--- Probability Category 2, Product 2;
9seed = 1974; *--- Initial Seed;
10rho = sqrt(rho2);
11cpi12 = pi11 + pi12;
12cpi22 = pi21 + pi22;
13 DO j = 1 to n;
14 *--- Product 1;
15 Product = 1;
16 Subjid = j;
17 yy = 3;
18 u = uniform( seed );
19 IF u < cpi12 THEN yy = 2;
20 IF u < pi11 THEN yy = 1;
21 DO i=1 to m;
22 Y = 3;
23 u = uniform( seed );
24 IF u < rho THEN y = yy;
25 ELSE DO;
26 uu = uniform( seed );
27 IF uu < cpi12 THEN y = 2;
28 IF uu < pi11 THEN y = 1;
29 END;
30 OUTPUT;
31 END;
32 *--- Product 2;
33 Product = 2;
34 Subjid = j + n;
35 yy = 3;
36 u = uniform( seed );
37 IF u < cpi22 THEN yy = 2;
38 IF u < pi21 THEN yy = 1;
39 DO i=1 to m;
40 Y = 3;
41 u = uniform( seed );
42 IF u < rho THEN y = yy;
43 ELSE DO;
44 uu = uniform( seed );
45 IF uu < cpi22 THEN y = 2;
46 IF uu < pi21 THEN y = 1;
47 END;
48 OUTPUT;
49 END;
50 END;
51keep subjid product y;
52RUN;
2 Code Block
PROC SURVEYLOGISTIC
Explanation :
This procedure fits a logistic regression model for survey data. It models the nominal response variable `y` as a function of `product` with a generalized logit link (`link=glogit`). The `CLUSTER subjid` statement is crucial as it adjusts standard errors for intra-subject correlation. `LSMEANS` and `ESTIMATE` statements are used to obtain adjusted probabilities by category and to compare products.
Copied!
1ods html;
2PROC SURVEYLOGISTIC DATA=test_of_homogeneity;
3 class product subjid / param=glm;
4 model y (ref=First) = product / link=glogit varadj=morel;
5 cluster subjid;
6 lsmeans product / ilink;
7 estimate 'P12' int 1 product 1 0 / category='1' ilink;
8 estimate 'P22' int 1 product 0 1 / category='1' ilink;
9 estimate 'P13' int 1 product 1 0 / category='2' ilink;
10 estimate 'P23' int 1 product 0 1 / category='2' ilink;
11 estimate 'P12 Vs P22' product 1 -1 / category='1' exp;
12 estimate 'P13 Vs P23' product 1 -1 / category='2' exp;
13RUN;
14ods html close;
3 Code Block
PROC SURVEYFREQ
Explanation :
This procedure calculates frequencies and performs a chi-square test (Rao-Scott) for the association between the `product` variable and the `y` response. Like `PROC SURVEYLOGISTIC`, it uses the `CLUSTER subjid` statement to account for the clustered sampling design and provide valid test statistics.
Copied!
1ods html;
2PROC SURVEYFREQ DATA=test_of_homogeneity;
3 cluster subjid;
4 tables product * y / chisq;
5RUN;
6ods html close;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.
Banner
Expert Advice
Expert
Stéphanie
Spécialiste Machine Learning et IA.
« While PROC SURVEYLOGISTIC provides robust p-values for the product effect, always check the Design Effect (Deff) in your output. A high Deff indicates that the clustering is significantly impacting your precision, justifying the use of these complex "Survey" procedures over standard logistic regression. »