Analysis of repeated categorical ratings: going beyond inter-rater agreement

Event status:

You are welcome to attend the following Statistics and Stochastic colloquium (part of the Colloquium Series of the Department of Mathematics and Statistics) at La Trobe University.

Thursday 18 June 2020 12:00 pm until Thursday 18 June 2020 01:00 pm (Add to calendar)
Andriy Olenko
Presented by:
Dr Damjan Vukcevic, University of Melbourne
Type of Event:

A common task in health and medicine is the classification of patient information into one of several categories by a trained expert. This could include assessing the presence and type of a tumour from a medical image or providing a disease diagnosis from a series of medical tests. Often such judgements are hard to make and error prone: two experts may rate the same scenario differently or the same expert may provide alternative ratings of the same scenario when rating it multiple times on different occasions.

Analysing the performance of such expert ‘raters’, and the accuracy of their ‘ratings’ across a series of ‘items’, is a common theme in much of the health and medical literature, especially in the setting where the true underlying category is unknown. Existing approaches, such as Cohen’s kappa, focus only on assessing inter-agreement, and have known problems stemming from the lack of any notion of underlying truth and the difficulty of coping with repeated ratings by the same rater.

Here we present and implement methods that explicitly model an underlying true category for each item and can cope naturally with any number of ratings for each item, including repeated ratings by the same rater. We implement Bayesian versions of these models using the probabilistic programming language Stan, and create an R package to fit and interrogate the output of these models.

Using real and simulated datasets, which are designed to mimic a wide range of medical scenarios, we test the performance of these models in estimating the true class of each item. We also explore situations such as having raters with much poorer accuracy, and comparisons with other (non-model-based) approaches.


Other events by type



12th Apr 2021 11:35am

Advanced search

April Next Previous

  • Sun
  • Mon
  • Tue
  • Wed
  • Thu
  • Fri
  • Sat