Crowdsourcing Preference Judgments for Evaluation of Music Similarity Tasks

J. Urbano, J. Morato, M. Marrero and D. Martín
ACM SIGIR Workshop on Crowdsourcing for Search Evaluation, pp. 9-16, 2010.

Most innovative paper award, supported by Microsoft Bing.


Music similarity tasks, where musical pieces similar to a query should be retrieved, are quite troublesome to evaluate. Ground truths based on partially ordered lists were developed to cope with problems regarding relevance judgment, but they require such man-power to generate that the official MIREX evaluations had to turn over more affordable alternatives. However, in house evaluations keep using these partially ordered lists because they are still more suitable for similarity tasks. In this paper we propose a cheaper alternative to generate these lists by using crowdsourcing to gather music preference judgments. We show that our method produces lists very similar to the original ones, while dealing with some defects of the original methodology. With this study, we show that crowdsourcing is a perfectly viable alternative to evaluate music systems without the need for experts.