Calculating the Kappa Coefficient for Movie Ratings

The script analyzes movie ratings from Siskel and Ebert. Initially, the data is created and formatted. Then, a simple concordance analysis is performed with PROC FREQ to obtain Cohen's Kappa. Subsequently, the data is transformed using PROC IML for use in PROC NLMIXED, which estimates Kappa through a customized likelihood model. The results from both NLMIXED approaches (via the ESTIMATE statement and by manual parameter transformation) are then displayed and compared.

Data Analysis

Type : CREATION_INTERNE

The main dataset 'movie_ratings' is created directly in the code using a 'datalines' statement. Subsequent datasets are derived from this initial table.

1 Code Block

DATA STEP Data

Explanation :
This block creates the 'movie_ratings' dataset. It reads 3 columns (x1-x3) and transforms the table structure to obtain a long format where each row represents a Siskel/Ebert rating combination with a weight 'w'.

Copied!

1	DATA movie_ratings;
2	INPUT x1-x3;
3	Siskel = _n_;
4	Ebert = 1; w=x1; OUTPUT;
5	Ebert = 2; w=x2; OUTPUT;
6	Ebert = 3; w=x3; OUTPUT;
7	keep Siskel Ebert w;
8	DATALINES;
9	24 8 13
10	8 13 11
11	10 9 64
12	;
13	RUN;

2 Code Block

PROC FORMAT

Explanation :
Defines a custom format 'abc' to transform numeric rating values (1, 2, 3) into textual labels ('Con', 'Mixed', 'Pro').

Copied!

1	PROC FORMAT;
2	value abc 1 = 'Con'
3	2 = 'Mixed'
4	3 = 'Pro';
5	RUN;

3 Code Block

PROC FREQ

Explanation :
Uses PROC FREQ to generate a contingency table between Siskel's and Ebert's ratings. The '/agree' option requests the calculation of concordance statistics, including the Kappa coefficient. The variable 'w' is used as a weight.

Copied!

1	ods html;
2	title1 "Siskel's and Ebert's Movie Ratings -- Agresti and Winner (1997)";
3	title2 "Kappa Results using PROC FREQ";
4	PROC FREQ DATA=movie_ratings;
5	tables Siskel * Ebert / agree;
6	weight w;
7	FORMAT Siskel Ebert abc.;
8	RUN;
9	ods graphics off;
10	ods html close;

4 Code Block

PROC IML Data

Explanation :
This PROC IML block reads the data and transforms it into a matrix 't' for modeling with NLMIXED. It prepares the data to model the joint distribution of ratings. The result is stored in the 'new' table.

Copied!

1	ods html;
2	ods graphics on;
3	PROC IML;
4	use movie_ratings;
5	read all into x;
6
7	n0 = nrow(x);
8	n1 = sum(x[,3]);
9	t1 = j(n1,3,0);
10	t2 = j(n1,3,0);
11
12	row = 0;
13	DO j=1 to n0;
14	DO t=1 to x[j,3];
15	row = row + 1;
16	t1[row,x[j,1]] = 1;
17	t2[row,x[j,2]] = 1;
18	END;
19	END;
20
21	t = t1 + t2;
22	create new var{t1 t2 t3};
23	append from t;
24	QUIT;

5 Code Block

PROC NLMIXED Data

Explanation :
Fits a nonlinear model using a general log-likelihood function (loglik) to estimate model parameters, including 'a0' which is related to Kappa. The 'estimate' statement directly calculates Kappa and its confidence interval. Results are saved in 'Parms_Estimates' and 'Estimates' tables.

Copied!

1	ods OUTPUT ParameterEstimates=Parms_Estimates
2	AdditionalEstimates=Estimates;
3	PROC NLMIXED DATA=new;
4	parms a0=0, b01=0, b02=0;
5	rho = 1 / (1 + exp(-a0));
6	eta1 = exp(b01);
7	eta2 = exp(b02);
8	p1 = eta1/(1+eta1+eta2);
9	p2 = eta2/(1+eta1+eta2);
10	p3 = 1-p1-p2;
11	m = t1+t2+t3;
12	c = (1-rho2)/(rho2);
13	const = lgamma(m+1)-lgamma(t1+1)-lgamma(t2+1)-lgamma(t3+1);
14	loglik = lgamma(c)-lgamma(m+c)+lgamma(t1+cp1)+lgamma(t2+cp2)
15	+lgamma(t3+cp3)-lgamma(cp1)-lgamma(cp2)-lgamma(cp3)
16	+const;
17	model t1 ~ general(loglik);
18	estimate 'Kappa' 1 / (1 + exp(-a0)) / (1 + exp(-a0));
19	RUN;

6 Code Block

DATA STEP Data

Explanation :
These two data steps modify the PROC NLMIXED result tables. The first simply renames a column. The second applies an inverse transformation to the 'a0' parameter and its confidence bounds to manually calculate Kappa.

Copied!

1	DATA Estimates;
2	SET Estimates;
3	rename Estimate = Kappa;
4	keep Estimate Lower Upper;
5	RUN;
6
7	DATA Parms_Estimates;
8	SET Parms_Estimates;
9	IF Parameter = 'a0';
10	Estimate = 1 / (1 + exp(-Estimate)) / (1 + exp(-Estimate));
11	Lower = 1 / (1 + exp(-Lower)) / (1 + exp(-Lower));
12	Upper = 1 / (1 + exp(-Upper)) / (1 + exp(-Upper));
13	rename Estimate = Kappa;
14	keep Estimate Lower Upper;
15	RUN;

7 Code Block

PROC PRINT

Explanation :
Displays the two final tables containing Kappa estimates obtained by the two calculation methods from PROC NLMIXED.

Copied!

1	title2 "Kappa Results using the Estimate Statement in NLMIXED";
2	PROC PRINT DATA=Estimates noobs;
3	RUN;
4
5	title2 "Kappa Results using the Inverse Method in NLMIXED";
6	PROC PRINT DATA=Parms_Estimates noobs;
7	RUN;
8	ods html close;

This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.

Retour à la liste