Classifying Tags using Open Content Resources
Simon Overell, Börkur Sigurbjörnsson, and Roelof van Zwol.
In: Proceedings of the Second ACM International Conference on Web Search and Data Mining (WSDM 2009).
Link: acm
Abstract
Tagging has emerged as a popular means to annotate on-line objects such as bookmarks, photos and videos. Tags vary in semantic meaning and can describe different aspects of a media object. Tags describe the content of the media as well as locations, dates, people and other associated meta-data. Being able to automatically classify tags into semantic categories allows us to understand better the way users annotate media objects and to build tools for viewing and browsing the media objects. In this paper we present a generic method for classifying tags using third party open content resources, such as Wikipedia and the Open Directory. Our method uses structural patterns that can be extracted from resource meta-data. We describe the implementation of our method on Wikipedia using WordNet categories as our classification schema and ground truth. Two structural patterns found in Wikipedia are used for training and classification: categories and templates. We apply our system to classifying Flickr tags. Compared to a WordNet baseline our method increases the coverage of the Flickr vocabulary by 115%. We can classify many important entities that are not covered by WordNet, such as, London Eye, Big Island, Ronaldinho, geocaching and wii.
Bibtex
@inproceedings{overell2008classifying, author = {Simon Overell and B"{o}rkur Sigurbj"{o}rnsson and Roelof van Zwol}, title = {Classifying Tags using Open Content Resources}, booktitle = {Proceedings of the Second ACM International Conference on Web Search and Data Mining (WSDM 2009)}, year = {2009}, pages = {T.b.a.}, location = {Barcelona, Spain}, publisher = {ACM},}