File(s) not publicly available
Understanding the common interrater reliability measures
Health and rehabilitation professionals use a range of outcome instruments to evaluate the effectiveness of their interventions. In order to be evidenced-based practitioners, we need to understand the psychometric properties of these instruments and to be able to interpret the statistics used to test their psychometric properties. This paper focuses on inter-rater reliability. Different statistical methods for computing inter-rater reliability can be classified into one of three categories: consensus estimates, consistency estimates, and measurement estimates. The common statistical methods such as Kappa, intraclass correlation and Many-Facets Rasch Model are described along with the advantages and disadvantages of each approach. For each category of estimates, one paper has been chosen from the therapy and rehabilitation literature to illustrate the use of a number of commonly utilised inter-rater reliability measures. It is hoped that this overview will provide practitioners, students and/or new researchers with a ready reference of key measurements used for determining inter-rater reliability.