Filtering and Clustering XML Retrieval Results

19th August 2007 Börkur Sigurbjörnsson

Jaap Kamps, Marijn Koolen, and Börkur Sigurbjörnsson.

In: Comparative Evaluation of XML Information Retrieval Systems. Lecture Notes in Computer Science.

Abstract

As part of the INEX 2006 Adhoc Track, we conducted a range of experiments with filtering and clustering XML element retrieval results. Our basic retrieval engine retrieves arbitrary elements from the collection (corresponding to the Thorough Task). These runs are filtered to remove textual overlap between elements (corresponding to the Focused Task). The resulting runs can be clustered per article (corresponding to the All in Context Task). Finally, we select the “best” element for each article (corresponding to the Best in Context Task). Our main findings are the following. First, a complete element index outperforms a restricted index based on section-structure, albeit the differences are small. Second, grouping non-overlapping elements per article does not lead to performance degradation, but may improve scores. Third, all restrictions of the “pure” element runs (by removing overlap, by grouping elements per article, or by selecting a single element per article) lead to some but only moderate loss of precision.

Bibtex

@inproceedings{kamps2007filtering, author  = {Jaap Kamps and Marijn Koolen and B"orkur Sigurbj"ornsson}, title   = {Filtering and Clustering XML Retrieval Results}, booktitle = {Comparative Evaluation of XML Information Retrieval Systems}, series  = {Lecture Notes in Computer Science},  volume  = {4518}, pages   = {121--136}, publisher = {Springer-Verlag}, year   = {2007},}

Scribbles

A collection of non-fiction writing by Börkur Sigurbjörnsson.

Filtering and Clustering XML Retrieval Results

19th August 2007 Börkur Sigurbjörnsson

Abstract

Bibtex