Malik, Saadia:
Interactive information retrieval with structured documents
Duisburg, Essen, 2009
Dissertation / Fach: Informatik
Fakultät für Ingenieurwissenschaften » Informatik und Angewandte Kognitionswissenschaft
Malik, Saadia
Betreuer(in), Doktorvater:
Fuhr, Norbert
Gutachter(in), Rezensent(in):
Lalmas, Mounia
Duisburg, Essen
IX, 157 S.
DuEPublico ID:
Signatur der UB:
Duisburg-Essen, Univ., Diss. 2009


In recent years there has been a growing realisation in the IR community that the interaction of searchers with information is an indispensable component of the IR process. As a result, issues relating to interactive IR have been extensively investigated in the last decade. This research has been performed in the context of unstructured documents or in the context of the loosely-defined structure encountered in web pages. XML documents, on the other hand, define a different context, by offering the possibility of navigating within the structure of a single document, or of following links to other documents. Relatively little work has been carried out to study user interaction with IR systems that make use of the additional features offered by XML documents. As part of the INEX initiative for the evaluation of XML retrieval, the INEX interactive track has focused on interactive XML retrieval since 2004. Here user friendly exposition to various features of XML documents is provided and some new features are designed and implemented to enable searchers to have access to their desired information in an efficient manner. In this study interaction entails three levels: query formulation, inspecting result list, and examining the detail. For query formulation, suggesting related terms is a conventional method to assist searchers. Here we investigate the related terms derived from two different co-occurrence units: elements and documents. In addition, contextual aspect is added to facilitate the searchers for appropriate selection of terms. Results showed the usefulness of suggesting related terms and some what acceptance of the contextual related tool. For inspecting the result list, classic document retrieval systems such as web search engines retrieve whole documents, and leave it to the searchers to collect their required information from possibly a lengthy text. In contrast, element retrieval aims at a focused view of information by pointing to the optimal access points of the document. A number of strategies have been investigated for presenting result lists. For examining the detail of a document, traditionally the complete document is presented to a searcher and here again the searcher has to put in effort to reach its required information. We investigated the use of additional support such as a table of contents along with document detail. In addition, we also investigated graphical representations of documents depicting its structure and granularity of retrieved elements along with their estimated relevance. Here the table of contents was found to be a very useful features for examining details. In order to conduct the analysis of searcher's interaction, a visualisation technique based on Tree Map was developed. It depicts the search interaction with element retrieval system. A number of browsing strategies has been identified with the help of this tool. The value of element retrieval for searchers and comparison between two focused approaches such as element and passage retrieval system was also evaluated. The study suggests that searchers find elements useful for their tasks and they locate a lot of the relevant information in specific elements rather than full documents. Sections, in particular, appear to be helpful. In order to provide user-specific support, the system needs feedback from searchers, who in turn, are very reluctant to give this information explicitly. Therefore, we investigated to what extent the different features can be used as relevance predictors. Of the five features regarded, primarily the reading time is a useful relevance predictor. Overall, relevance predictors for structured documents seem to be much weaker than for the case of atomic documents.