Harvesting Microblogs for Contextual Music Similarity Estimation: A Co-occurrence-based Framework

M. Schedl, D. Hauger and J. Urbano
Journal of Multimedia Systems, vol. 20, no. 6, pp. 693-705, 2014.


Microtexts are a valuable, albeit noisy, source to infer collaborative information. As music plays an important role in many human lives, microblogs on music-related activities are available in abundance. This paper investigates different strategies to estimate music similarity from these data sources. In particular, we first present a framework to extract co-occurrence scores between music artists from microblogs and then investigate 12 similarity estimation functions to subsequently derive resemblance scores. We evaluate the approaches on a collection of microblogs crawled from Twitter over a period of 10 months and compare them to standard tf-idf approaches. As evaluation criteria we use precision and recall in an artist retrieval task as well as rank proximity. We show that collaborative chatter on music can be effectively used to develop music artist similarity measures, which are a core part of every music retrieval and recommendation system. Furthermore, we analyze the effects of the "long tail" on retrieval results and investigate whether results are consistent over time, using a second dataset.