Homogeneity Test for Aggregated Trinomial Results

The script begins with a DATA step to generate a dataset named `test_of_homogeneity`. It simulates categorical responses (low, medium, high) for 175 panelists (`subjid`) and 8 repeated measurements per panelist, for two different products. The simulation uses a 'random-clumped' multinomial distribution to introduce intra-cluster correlation. Then, `PROC SURVEYLOGISTIC` is used to model the response `y` as a function of `product`, taking into account the cluster structure via the `CLUSTER subjid` statement. Probability estimates and comparisons are performed with `LSMEANS` and `ESTIMATE`. Finally, `PROC SURVEYFREQ` calculates a chi-square test for the association between `product` and `y`, also adjusted for clustering.

Data Analysis

Type : CREATION_INTERNE

The data is entirely generated within the first DATA STEP. The script simulates trinomial results for two products using predefined parameters (number of clusters, cluster size, underlying probabilities, intra-cluster correlation) and the `uniform()` function for random number generation.

1 Code Block

DATA STEP Data

Explanation :
This DATA STEP block generates the `test_of_homogeneity` table. It simulates data for `n` subjects (clusters) and `m` observations per subject, for two products. It uses a 'random-clumped multinomial' simulation method to create correlated trinomial responses, based on predefined probabilities (`pi11`, `pi12`, etc.) and an intra-cluster correlation (`rho2`). The seed (`seed`) is fixed for reproducibility.

Copied!

1	DATA test_of_homogeneity;
2	n = 175; *--- Number of Panelists (Clusters) per Test Product;
3	m = 8; *--- Number of Repeated Measurements per Panelist;
4	rho2 = 0.15; *--- Intra Cluster Correlation;
5	pi11 = 0.880; *--- Probability Category 1, Product 1;
6	pi21 = 0.900; *--- Probability Category 1, Product 2;
7	pi12 = 0.110; *--- Probability Category 2, Product 1;
8	pi22 = 0.075; *--- Probability Category 2, Product 2;
9	seed = 1974; *--- Initial Seed;
10	rho = sqrt(rho2);
11	cpi12 = pi11 + pi12;
12	cpi22 = pi21 + pi22;
13	DO j = 1 to n;
14	*--- Product 1;
15	Product = 1;
16	Subjid = j;
17	yy = 3;
18	u = uniform( seed );
19	IF u < cpi12 THEN yy = 2;
20	IF u < pi11 THEN yy = 1;
21	DO i=1 to m;
22	Y = 3;
23	u = uniform( seed );
24	IF u < rho THEN y = yy;
25	ELSE DO;
26	uu = uniform( seed );
27	IF uu < cpi12 THEN y = 2;
28	IF uu < pi11 THEN y = 1;
29	END;
30	OUTPUT;
31	END;
32	*--- Product 2;
33	Product = 2;
34	Subjid = j + n;
35	yy = 3;
36	u = uniform( seed );
37	IF u < cpi22 THEN yy = 2;
38	IF u < pi21 THEN yy = 1;
39	DO i=1 to m;
40	Y = 3;
41	u = uniform( seed );
42	IF u < rho THEN y = yy;
43	ELSE DO;
44	uu = uniform( seed );
45	IF uu < cpi22 THEN y = 2;
46	IF uu < pi21 THEN y = 1;
47	END;
48	OUTPUT;
49	END;
50	END;
51	keep subjid product y;
52	RUN;

2 Code Block

PROC SURVEYLOGISTIC

Explanation :
This procedure fits a logistic regression model for survey data. It models the nominal response variable `y` as a function of `product` with a generalized logit link (`link=glogit`). The `CLUSTER subjid` statement is crucial as it adjusts standard errors for intra-subject correlation. `LSMEANS` and `ESTIMATE` statements are used to obtain adjusted probabilities by category and to compare products.

Copied!

1	ods html;
2	PROC SURVEYLOGISTIC DATA=test_of_homogeneity;
3	class product subjid / param=glm;
4	model y (ref=First) = product / link=glogit varadj=morel;
5	cluster subjid;
6	lsmeans product / ilink;
7	estimate 'P12' int 1 product 1 0 / category='1' ilink;
8	estimate 'P22' int 1 product 0 1 / category='1' ilink;
9	estimate 'P13' int 1 product 1 0 / category='2' ilink;
10	estimate 'P23' int 1 product 0 1 / category='2' ilink;
11	estimate 'P12 Vs P22' product 1 -1 / category='1' exp;
12	estimate 'P13 Vs P23' product 1 -1 / category='2' exp;
13	RUN;
14	ods html close;

3 Code Block

PROC SURVEYFREQ

Explanation :
This procedure calculates frequencies and performs a chi-square test (Rao-Scott) for the association between the `product` variable and the `y` response. Like `PROC SURVEYLOGISTIC`, it uses the `CLUSTER subjid` statement to account for the clustered sampling design and provide valid test statistics.

Copied!

1	ods html;
2	PROC SURVEYFREQ DATA=test_of_homogeneity;
3	cluster subjid;
4	tables product * y / chisq;
5	RUN;
6	ods html close;

This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.

Retour à la liste

Expert Advice

Stéphanie

Spécialiste Machine Learning et IA.

« While PROC SURVEYLOGISTIC provides robust p-values for the product effect, always check the Design Effect (Deff) in your output. A high Deff indicates that the clustering is significantly impacting your precision, justifying the use of these complex "Survey" procedures over standard logistic regression. »