
Intraclass Kappa for Test-Retest Reliability Measurement
Learn about the importance of establishing reliability in structured interview and questionnaire instruments for valid measurement. Discover how the Intraclass Kappa statistic can address rater bias in test-retest reliability assessments. Find out how to calculate and interpret the Intraclass Kappa using a bootstrap procedure.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Measuring Test-Retest Reliability: The Intraclass Kappa Dennis G. Fisher, Grace L. Reynolds California State University, Long Beach Eric Neri, Art Noda,, Helena Chmura Kraemer Stanford University
Background ANYONE USING STRUCTURED INTERVIEW, OR QUESTIONNAIRE INSTRUMENTS MUST ESTABLISH THE PSYCHOMETRIC PROPERTIES OF THEIR INSTRUMENT (I.E., RELIABILITY AND VALIDITY). THE FIRST PROPERTY WHICH MUST BE ESTABLISHED IS RELIABILITY BECAUSE ONE CANNOT HAVE A VALID MEASURE UNLESS THE MEASURE HAS SUFFICIENT RELIABILITY. WHEN THE DATA ARE DICHOTOMOUS THE MOST COMMON MEASURE IS COHEN S KAPPA (Cohen, 1960). THERE IS AN ASSUMPTION OF INDEPENDENCE: The units are independent. (p. 38). THIS MEASURE IS APPROPRIATE FOR INTERRATER RELIABILITY BECAUSE THE DATA MEET THE ASSUMPTION OF INDEPENDENCE.
Background (continued) Cohen s kappa forgives rater bias which is not desirable for a measure that is used in test-retest reliability. WHAT IS THE SOLUTION?
Solution The correct statistic to report in this situation is the INTRACLASS KAPPA (Kraemer, Periyakoil, & Noda, 2002). =(p0 pC)/(1 pC) Where p0is the probability of AGREEMENT, and pC= P2+ P 2which is the PERCENTAGE AGREEMENT CORRECTED FOR CHANCE (PACC). The PACC has been shown to equal p2/ PP (Kraemer, 1979). The intraclass kappa counts rater bias as error which is more appropriate for a reliability measure. The PACC for Cohen s kappa is PQ+P Q where P is proportion positive for time 1 and Q is proportion positive for time 2, or P =1-P, Q =1- Q. P=Mean(p(i)), Var(p)=Variance(p(i)) which corresponds to the classical definition of reliability: The ratio of the true variance Var(p) to the observed variance (PP ).
HOW TO OBTAIN THE INTRACLASS KAPPA? We present a BOOTSTRAP procedure to obtain both the point value of reliability, which is the MEDIAN of the BOOTSTRAP replications, and the UPPER (97.5 PERCENTILE) and LOWER CONFIDENCE (2.5 PERCENTILE) LIMITS.
SAS DATA Original data come from Dowling-Guyer et al. (1994). 219 Injection Drug Users from Anchorage, Denver, Detroit, Houston, Long Beach, Miami, New York, Philadelphia, Portland, San Francisco, and Tucson (20 each site) were administered the Risk Behavior Assessment at two time points 48-hours apart. 48 hours was used because it is how long urine tests for illicit drugs are valid for. Data _Null_ ; Set race1 ; File H:\data\ct.dat : Put presid anyct anyct2 ; Run ;
Read in data Data Kraemer ; Input presid anyct anyct2 ; Label presid = Identification number Anyct = Time 1 Ever diagnosed with Chlamydia trachomatis Anyct2= Time 2 Ever diagnosed with Chlamydia trachomatis ; Cards ; There is an assumption that the data are missing at random (MAR).
Remove cases where missing Data Kraemer ; Set Kraemer ; If anyct = . Or anyct2 = . Then delete ; Run ;
SAS macro to calculate the intraclass kappa and 95% bootstrap confidence intervals %macro Intraclass_kappa_K(INDSN,RVALUE1,RVALUE2,NBOOTS ,BOOTSEED,R1,R2); PROC SURVEYSELECT data = &INDSN; Out = _bootsample seed = &BOOTSEED method = urs Samprate = 1 outhits rep = &NBOOTS noprint ; Run; URS = Unrestricted Random Sampling Macro will not run in 9.4 (TS1M2). It will run in 9.4(TS1M5) and 9.4(TS1M6)
Calculate intraclass kappa PO = (p11 + p22) / 100 ; PACC = ((p1dot + pdot1)/2) / 100 ; PE = PACC**2 + (1-PACC)**2 ; IKC = (PO PE) / (1 PE) ; PO= Probability of agreement PACC = Percentage agreement corrected for chance IKC = Intraclass Kappa Intraclass Kappa is Sensitivity+Sensitivity-1 where Sensitivity and Specificity refer to that of the second rating compared to the first rating.
Obtain median, 2.5thand 97.5th percentile PROC UNIVARIATE DATA=_BKS NOPRINT ; VAR VALUE ; OUTPUT OUT=_BOOTKAP MEDIAN=KAPMED PCTLPTS=2.5 97.5 PCTLPRE=KAPMED; RUN ; FULL CODE IS IN PAPER AND IS AVAILABLE FROM: DENNIS.FISHER@CSULB.EDU
TABLE 1. Test-retest reliability for sexually transmitted infections have you been told by a doctor or a nurse that you had___? Infection Hepatitis B Gonorrhea Syphilis Chlamydia Genital Warts Genital Herpes 1.0001.0001.000 Trichomonas Yeast Infection HIV Get result? LCL=95% Lower Confidence Limit. UCL=95% Upper Confidence Limit. Cohen LCL UCL .8524 .7715 .9333 .8581 .7863 .9299 .7771 .6370 .9172 .8288 .5960 1.000 .9450 .8376 1.000 Intraclass LCL .852 .858 .777 .829 .945 1.000 .940 .894 .804 .106 UCL .920 .922 .906 1.000 1.000 .765 .779 .616 .495 .793 1.0001.000 .875 .769 .572 -.064 .267 .9397 .8813 .9982 .8944 .7923 .9965 .8041 .6177 .9906 .2250.1185.3315 .987 .978 .959
Herpes * Herpes2 No 214 0 214 Yes 0 5 5 No Yes 214 5 219 Kappa should not be computed unless there are at least 20 for each marginal. Here we only have 5 for the 2. and .2 marginals.
References Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37- 46. doi:10.1177/001316446002000104 Dowling-Guyer, S., Johnson, M. E., Fisher, D. G., & Needle, R. (1994). Reliability of drug users' self-reported HIV risk behaviors and validity of self-reported recent drug use. Assessment, 1(4), 383-392. Kraemer, H. C. (1979). Ramifications of a population model for !k as a coefficient of reliability. Psychometrika, 44(4), 461-472. doi:10.1007/BF02296208 Kraemer, H. C., Periyakoil, V. S., & Noda, A. (2002). Kappa coefficients in medical research. Statistics in Medicine, 21, 2109-2129.
Monday Tips 1. Use Cohen s Kappa for interrater reliability and validity measures. This is available in PROC FREQ AGREE ; 2. Use Intraclass Kappa for test-retest reliability or intrarater reliability measures. This is available using the macro presented in the corresponding paper. It is also available from Dennis.Fisher@csulb.edu.
Contact Information Name: Dennis G. Fisher, Ph.D. Company: California State University, Long Beach City/State: Long Beach/CA Phone:562-961-9185 Email: Dennis.Fisher@csulb.edu