How do Ties Affect the Uncertainty in Rank-Biased Overlap?
M. Corsi, and J. Urbano
International ACM SIGIR Conference on Information Retrieval in the Asia Pacific, 125-134, 2024.
Abstract
Rank-Biased Overlap (RBO) is a popular measure of the similarity between two rankings. A key characteristic of RBO is that it can be computed even when the rankings are not fully seen and only a prefix is known, but this introduces uncertainty in the computation. In such cases, one would normally compute the point estimate RBOEXT, as well as bounds representing the best and worst cases; their difference is thus a residual quantifying the amount of uncertainty. Another source of uncertainty is the presence of tied items, because their actual relative order is unknown. Current approaches to this issue similarly provide a point estimate by considering the average RBO score over all the permutations of the ties, such as RBOa. However, there is currently no approach to quantify and bound the uncertainty due to ties, just as there is for the uncertainty due to unseen items. In this paper we fill this gap and provide algorithmic solutions to the problem of finding the arrangements of tied items that yield the lowest and highest possible RBO scores, naturally leading to total bounds and residuals. We also show that the current RBOa estimate only equals the average RBO over permutations when the rankings have the same length, so we also generalize it to rankings of different lengths. In summary, this work provides a full account for the uncertainty in RBO, allowing practitioners to make more sensible decisions on the grounds of rank similarity. The main realization is that residuals can actually be much larger once we account for both sources of uncertainty. To illustrate this, we present empirical results using both synthetic and TREC data, demonstrating that a realistic picture for the residual of RBO can only be provided by considering both sources of uncertainty.