Published on :
Statistical CREATION_INTERNE

Calculating the Kappa Coefficient for Movie Ratings

This code is also available in: Deutsch Español Français
Awaiting validation
The script analyzes movie ratings from Siskel and Ebert. Initially, the data is created and formatted. Then, a simple concordance analysis is performed with PROC FREQ to obtain Cohen's Kappa. Subsequently, the data is transformed using PROC IML for use in PROC NLMIXED, which estimates Kappa through a customized likelihood model. The results from both NLMIXED approaches (via the ESTIMATE statement and by manual parameter transformation) are then displayed and compared.
Data Analysis

Type : CREATION_INTERNE


The main dataset 'movie_ratings' is created directly in the code using a 'datalines' statement. Subsequent datasets are derived from this initial table.

1 Code Block
DATA STEP Data
Explanation :
This block creates the 'movie_ratings' dataset. It reads 3 columns (x1-x3) and transforms the table structure to obtain a long format where each row represents a Siskel/Ebert rating combination with a weight 'w'.
Copied!
1DATA movie_ratings;
2 INPUT x1-x3;
3 Siskel = _n_;
4 Ebert = 1; w=x1; OUTPUT;
5 Ebert = 2; w=x2; OUTPUT;
6 Ebert = 3; w=x3; OUTPUT;
7 keep Siskel Ebert w;
8 DATALINES;
9 24 8 13
10 8 13 11
11 10 9 64
12;
13RUN;
2 Code Block
PROC FORMAT
Explanation :
Defines a custom format 'abc' to transform numeric rating values (1, 2, 3) into textual labels ('Con', 'Mixed', 'Pro').
Copied!
1PROC FORMAT;
2 value abc 1 = 'Con'
3 2 = 'Mixed'
4 3 = 'Pro';
5RUN;
3 Code Block
PROC FREQ
Explanation :
Uses PROC FREQ to generate a contingency table between Siskel's and Ebert's ratings. The '/agree' option requests the calculation of concordance statistics, including the Kappa coefficient. The variable 'w' is used as a weight.
Copied!
1ods html;
2title1 "Siskel's and Ebert's Movie Ratings -- Agresti and Winner (1997)";
3title2 "Kappa Results using PROC FREQ";
4PROC FREQ DATA=movie_ratings;
5 tables Siskel * Ebert / agree;
6 weight w;
7 FORMAT Siskel Ebert abc.;
8RUN;
9ods graphics off;
10ods html close;
4 Code Block
PROC IML Data
Explanation :
This PROC IML block reads the data and transforms it into a matrix 't' for modeling with NLMIXED. It prepares the data to model the joint distribution of ratings. The result is stored in the 'new' table.
Copied!
1ods html;
2ods graphics on;
3PROC IML;
4 use movie_ratings;
5 read all into x;
6 
7 n0 = nrow(x);
8 n1 = sum(x[,3]);
9 t1 = j(n1,3,0);
10 t2 = j(n1,3,0);
11 
12 row = 0;
13 DO j=1 to n0;
14 DO t=1 to x[j,3];
15 row = row + 1;
16 t1[row,x[j,1]] = 1;
17 t2[row,x[j,2]] = 1;
18 END;
19 END;
20 
21 t = t1 + t2;
22 create new var{t1 t2 t3};
23 append from t;
24QUIT;
5 Code Block
PROC NLMIXED Data
Explanation :
Fits a nonlinear model using a general log-likelihood function (loglik) to estimate model parameters, including 'a0' which is related to Kappa. The 'estimate' statement directly calculates Kappa and its confidence interval. Results are saved in 'Parms_Estimates' and 'Estimates' tables.
Copied!
1ods OUTPUT ParameterEstimates=Parms_Estimates
2 AdditionalEstimates=Estimates;
3PROC NLMIXED DATA=new;
4 parms a0=0, b01=0, b02=0;
5 rho = 1 / (1 + exp(-a0));
6 eta1 = exp(b01);
7 eta2 = exp(b02);
8 p1 = eta1/(1+eta1+eta2);
9 p2 = eta2/(1+eta1+eta2);
10 p3 = 1-p1-p2;
11 m = t1+t2+t3;
12 c = (1-rho**2)/(rho**2);
13 const = lgamma(m+1)-lgamma(t1+1)-lgamma(t2+1)-lgamma(t3+1);
14 loglik = lgamma(c)-lgamma(m+c)+lgamma(t1+c*p1)+lgamma(t2+c*p2)
15 +lgamma(t3+c*p3)-lgamma(c*p1)-lgamma(c*p2)-lgamma(c*p3)
16 +const;
17 model t1 ~ general(loglik);
18 estimate 'Kappa' 1 / (1 + exp(-a0)) / (1 + exp(-a0));
19RUN;
6 Code Block
DATA STEP Data
Explanation :
These two data steps modify the PROC NLMIXED result tables. The first simply renames a column. The second applies an inverse transformation to the 'a0' parameter and its confidence bounds to manually calculate Kappa.
Copied!
1DATA Estimates;
2 SET Estimates;
3 rename Estimate = Kappa;
4 keep Estimate Lower Upper;
5RUN;
6 
7DATA Parms_Estimates;
8 SET Parms_Estimates;
9 IF Parameter = 'a0';
10 Estimate = 1 / (1 + exp(-Estimate)) / (1 + exp(-Estimate));
11 Lower = 1 / (1 + exp(-Lower)) / (1 + exp(-Lower));
12 Upper = 1 / (1 + exp(-Upper)) / (1 + exp(-Upper));
13 rename Estimate = Kappa;
14 keep Estimate Lower Upper;
15RUN;
7 Code Block
PROC PRINT
Explanation :
Displays the two final tables containing Kappa estimates obtained by the two calculation methods from PROC NLMIXED.
Copied!
1title2 "Kappa Results using the Estimate Statement in NLMIXED";
2PROC PRINT DATA=Estimates noobs;
3RUN;
4 
5title2 "Kappa Results using the Inverse Method in NLMIXED";
6PROC PRINT DATA=Parms_Estimates noobs;
7RUN;
8ods html close;
This material is provided "as is" by We Are Cas. There are no warranties, expressed or implied, as to merchantability or fitness for a particular purpose regarding the materials or code contained herein. We Are Cas is not responsible for errors in this material as it now exists or will exist, nor does We Are Cas provide technical support for it.