Korpus-adaptive Eigennamenerkennung

Rössler, Marc

auf die Merkliste MODS BibTeX EndNote RIS

Rössler, Marc:

Duisburg, Essen, 2006

2006Dissertation

Allgemeines, SonstigesFakultät für Ingenieurwissenschaften » Informatik und Angewandte Kognitionswissenschaft » Informatik » Wissensbasierte und Natürlichsprachliche Systeme

Titel in Deutsch:

Korpus-adaptive Eigennamenerkennung

Autor*in:

Rössler, Marc

Akademische Betreuung:

Hoeppner, Wolfgang

Erscheinungsort:

Duisburg, Essen

Erscheinungsjahr:

2006

Umfang:

195 S. : graph. Darst.

DuEPublico 1 ID

14746

URN

urn:nbn:de:hbz:465-20070131-144948-6

Signatur der UB:

ZZX655261

Notiz:

Duisburg, Essen, Univ., Diss., 2008

Sprache des Textes:

Deutsch

Abstract in Deutsch:

Named Entity Recognition (NER) is an important step towards the automatic analysis of natural language and is needed for a series of natural language applications. The task of NER requires the recognition and classification of proper names and other unique identifiers according to a predefined category system, e.g. the “traditional” categories PERSON, ORGANIZATION (companies, associations) and LOCATION. While most of the previous work deals with the recognition of these traditional categories within English newspaper texts, the approach presented in this thesis is beyond that scope. The approach is particularly motivated by NER which is more challenging than the classical task, such as German, or the identification of biomedical entities within scientific texts. Additionally, the approach addresses the ease-of-development and maintainability of NER-services by emphasizing the need for “corpus-adaptive” systems, with “corpus-adaptivity” describing whether a system can be easily adapted to new tasks and to new text corpora. In order to implement such a corpus-adaptive system, three design guidelines are proposed: (i) the consequent use of machine-learning techniques instead of manually created linguistic rules; (ii) a strict data-oriented modelling of the phenomena instead of a generalization based on intellectual categories; (iii) the usage of automatically extracted knowledge about Named Entities, gained by analysing large amounts of raw texts. A prototype was implemented according to these guidelines and its evaluation shows the feasibility of the approach. The system originally developed for a German newspaper corpus could easily be adapted and applied to the extraction of biomedical entities within scientific abstracts written in English and therefore gave proof of the corpus-adaptivity of the approach. Despite the limited resources in comparison with other state-of-the-art systems, the prototype scored competitive results for some of the categories.

Universitätsbibliographie

Publikationsverzeichnis der Universität Duisburg-Essen

Rössler, Marc: Korpus-adaptive Eigennamenerkennung

Abstract in Deutsch: