Compressing tags to find interesting media groups
Matthijs van Leeuwen, Francesco Bonchi, Börkur Sigurbjörnsson, and Arno Siebes.
In: Proceeding of the 18th ACM conference on Information and knowledge management (CIKM 2009).
Link: doi
Abstract
On photo sharing websites like Flickr and Zooomr, users are offered the possibility to assign tags to their uploaded pictures. Using these tags to find interesting groups of semantically related pictures in the result set of a given query is a problem with obvious applications. We analyse this problem from a Minimum Description Length (MDL) perspective and develop an algorithm that finds the most interesting groups. The method is based on Krimp, which finds small sets of patterns that characterise the data using compression. These patterns are sets of tags, often assigned together to photos. The better a database compresses, the more structure it contains and thus the more homogeneous it is. Following this observation we devise a compression-based measure. Our experiments on Flickr data show that the most interesting and homogeneous groups are found. We show extensive examples and compare to clusterings on the Flickr website.
Bibtex
@inproceedings{1646099, author = {van Leeuwen, Matthijs and Bonchi, Francesco and Sigurbj"{o}rnsson, B"{o}rkur and Siebes, Arno}, title = {Compressing tags to find interesting media groups}, booktitle = {CIKM '09: Proceeding of the 18th ACM conference on Information and knowledge management}, year = {2009}, isbn = {978-1-60558-512-3}, pages = {1147--1156}, location = {Hong Kong, China}, doi = {http://doi.acm.org/10.1145/1645953.1646099}, publisher = {ACM}, address = {New York, NY, USA}, }