Item request has been placed! ×
Item request cannot be made. ×
loading  Processing Request

WeBiText: building large heterogeneous translation memories from parallel Web content

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • معلومة اضافية
    • بيانات النشر:
      ASLIB
    • الموضوع:
      2008
    • Collection:
      National Research Council Canada: NRC Publications Archive
    • نبذة مختصرة :
      This paper investigates the extent to which a useful general purpose Translation Memory (TM) can be built based on very large amounts of heterogeneous parallel texts mined from the Web. In particular, we evaluate whether such a TM could add value over TMs built from other large, publicly available parallel corpora, such as the Canadian Hansard. In the case of Canadian translators working with English and French, we show that the answer to both questions is a resounding yes. Using field data collected through contextualized observation and interviews with translators at their workplace, we show how this concept is well grounded in existing workpractices of translators, especially Canadian ones. We also show that a TM based on 10 million pairs of pages from Government of Canada Web sites is able to cover 90% of the translation problems observed in our interview subjects. This turns out to be significantly better than coverage of a general purpose TM built from a smaller corpus, namely, the Canadian Hansard. The difference is most notable for the harder problems, such as specialized terminology. We also evaluate the approach on Web parallel corpora for other languages (European Commission Web sites, and 5000 Inuktitut-English pages harvested from the Nunavut domain), and find the approach to not be as advantageous there. We conclude that, while the concept of building TMs from Web corpora holds great promise, more research may be needed to make it work for language pairs other than English-French. ; Peer reviewed: Yes ; NRC publication: Yes
    • File Description:
      text
    • Relation:
      Proceedings of Translating and the Computer 30, Translating and the Computer 30: Conference and Exhibition, November 27-28, 2008, London, United Kingdom, ISBN: 0851424864, Publication date: 2008-11
    • الدخول الالكتروني :
      https://nrc-publications.canada.ca/eng/view/ft/?id=a05f4b93-c0e8-4383-97d2-728e08e458e5
      https://nrc-publications.canada.ca/eng/view/object/?id=a05f4b93-c0e8-4383-97d2-728e08e458e5
      https://nrc-publications.canada.ca/fra/voir/objet/?id=a05f4b93-c0e8-4383-97d2-728e08e458e5
    • Rights:
      Creative Commons Attribution-NonCommercial-ShareAlike 3.0 International License (CC BY-NC-SA 3.0) (https://creativecommons.org/licenses/by-nc-sa/3.0/) ; Attribution - Pas d’Utilisation Commerciale - Partage dans les Mêmes Conditions 3.0 non transposé (CC BY-NC-SA 3.0) (https://creativecommons.org/licenses/by-nc-sa/3.0/deed.fr)
    • الرقم المعرف:
      edsbas.72D0367A