Börkur Sigurbjörnsson.

In: SIKS Dissertation Series.

Search engines play an important role in our daily lives. Most of us use search engines every day in search of information or services. State-of-the-art Internet search engines use a simple—yet powerful—interface. The user describes her information need by a few keywords and the search engine returns a list of documents that are likely to answer her information need. Each document is presented using its title and a short summary of the document’s content—a.k.a. snippet. By clicking on a document title the user is routed to the corresponding document.

When a user is presented with a ranked list of relevant documents her search task is usually not over. The next step for her is to dive into the documents themselves in search for the precise piece of information she is looking for. When searching long documents this can be a tedious and time consuming task. Thus we ask ourselves: can we give the user a more focused type of access to the relevant information in this scenario?

In this thesis we study this question in the context of semi-structured document collections—more precisely, XML documents. Using the XML language, various document structure can be encoded—such as sections, sub-sections, paragraphs, section titles, author names, italicized text, etc. The XML markup divides the document text into a hierarchy of text objects—a.k.a. elements. In the thesis we ask ourselves whether we can exploit the hierarchical structure to give users a more focused access to the relevant information?

Our approach is twofold. First, we build a search engine that returns relevant XML elements as a response to the user’s query. Second, we build an interface that uses the list of relevant elements to give focused access to the relevant documents. We evaluate the XML search engine using the so-called INEX test collection and evaluate the interface in an interactive experiment.

The main contribution of this thesis is to provide extensive evaluation of various methods for retrieving XML elements. We also show how the search engine can be put into action via a simple interface.


