Item request has been placed! ×
Item request cannot be made. ×
loading  Processing Request

Cross-Domain and Cross-Language Porting of Shallow Parsing

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • معلومة اضافية
    • Contributors:
      Stepanov, Evgeny
    • بيانات النشر:
      TRENTO
      Università degli studi di Trento
    • الموضوع:
      2014
    • Collection:
      Università degli Studi di Trento: CINECA IRIS
    • نبذة مختصرة :
      EEnglish was the main focus of attention of the Natural Language Processing (NLP) community for years. As a result, there are significantly more annotated linguistic resources in English than in any other language. Consequently, data-driven tools for automatic text or speech processing are developed mainly for English. Developing similar corpora and tools for other languages is an important issue. However, this requires significant amount of effort. Recently, Statistical Machine Translation (SMT) techniques and parallel corpora were used to transfer annotations from a linguistic resource rich languages to a resource-poor languages for a variety of Natural Language Processing (NLP) tasks, including Part-of-Speech tagging, Noun Phrase chunking, dependency parsing, textual entailment, etc. This cross-language NLP paradigm relies on the solution of the following sub-problems: - Data-driven NLP techniques are very sensitive to the differences in training and testing conditions. Different domains, such as financial news-wire and biomedical publications, have different distributions of NLP task-specific properties; thus, the domain adaptation of the source language tools -- either the development of models with good cross-domain performance or tuned to the target domain -- is critical. - Another difference in training and testing conditions arises with cross-genre applications such as written text (monologues) and spontaneous dialog data. Properties of written text such as punctuation and the notion of sentence are not present in spoken conversation transcriptions. Thus, style-adaptation techniques to cover a wider range of genres is critical as well. - The basis of cross-language porting is parallel corpora. Unfortunately, parallel corpora are scarce. Thus, generation or retrieval of parallel corpora between the languages of interest is important. Additionally, these parallel corpora most often are not in the domains of interest; consequently, the cross-language porting should be augmented with SMT domain adaptation ...
    • Relation:
      firstpage:1; lastpage:115; numberofpages:115; https://hdl.handle.net/11572/368990
    • الدخول الالكتروني :
      https://hdl.handle.net/11572/368990
    • Rights:
      info:eu-repo/semantics/openAccess ; license:Tutti i diritti riservati (All rights reserved)
    • الرقم المعرف:
      edsbas.35C1B38