Between History and Natural Language Processing: Study, Enrichment and Online Publication of French Parliamentary Debates of the Early Third Republic (1881–1899)

Item request has been placed!

Item request cannot be made.

Processing Request

اقرأ أكثر حفظ في قائمتي

المؤلفون: Puren, Marie; Pellet, Aurélien; Bourgeois, Nicolas; Vernus, Pierre; Lebreton, Fanny
المصدر:
Proceedings of The Workshop ParlaCLARIN III within the 13th Language Resources and Evaluation Conference ; https://hal.science/hal-03762935 ; European Language Resources Association (ELRA). Proceedings of The Workshop ParlaCLARIN III within the 13th Language Resources and Evaluation Conference, , pp.16-24, 2022, 979-10-95546-85-6 ; http://www.lrec-conf.org/proceedings/lrec2022/workshops/ParlaCLARINIII/2022.parlaclariniii-1.0.pdf
الموضوع:
OCR; parliamentary debates; XML-TEI; Natural language processing; France; Third Republic; topic modelling; word embedding; [SHS.HIST]Humanities and Social Sciences/History; [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]; [INFO.INFO-CY]Computer Science [cs]/Computers and Society [cs.CY]; [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing; [SHS.INFO]Humanities and Social Sciences/Library and information sciences
نوع التسجيلة:
book part
اللغة:
English

معلومة اضافية
- Contributors:
  EPITECH; Méthodes Numériques pour les Sciences de l'Humain et de la Société (MNSHS); Epitech Technology; Centre Jean Mabillon (CJM); École nationale des chartes (ENC); Université Paris Sciences et Lettres (PSL)-Université Paris Sciences et Lettres (PSL); Université Lumière - Lyon 2 (UL2); LAboratoire de Recherche Historique Rhône-Alpes - UMR5190 (LARHRA); École normale supérieure de Lyon (ENS de Lyon)-Université Lumière - Lyon 2 (UL2)-Université Jean Moulin - Lyon 3 (UJML); Université de Lyon-Université de Lyon-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA); Axe de Recherche en Histoire Numérique (LARHRA ARHN ); Université de Lyon-Université de Lyon-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-École normale supérieure de Lyon (ENS de Lyon)-Université Lumière - Lyon 2 (UL2)-Université Jean Moulin - Lyon 3 (UJML); DataLab BnF; European Language Resources Association (ELRA)
- بيانات النشر:
  HAL CCSD
- الموضوع:
  2022
- Collection:
  Portail HAL de l'Université Lumière Lyon 2
- نبذة مختصرة :
  International audience ; We present the AGODA (Analyse semantique et Graphes relationnels pour l’Ouverture des Débats à l’Assemblée nationale) project, which aims to create a platform for consulting and exploring digitised French parliamentary debates (1881-1940) available in the digital library of the National Library of France. This project brings together historians and NLP specialists: parliamentary debates are indeed an essential source for French history of the contemporary period, but also for linguistics. This project therefore aims to produce a corpus of texts that can be easily exploited with computational methods, and that respect the TEI standard. Ancient parliamentary debates are also an excellent case study for the development and application of tools for publishing and exploring large historical corpora. In this paper, we present the steps necessary to produce such a corpus. We detail the processing and publication chain of these documents, in particular by mentioning the problems linked to the extraction of texts from digitised images. We also introduce the first analyses that we have carried out on this corpus with “bag-of-words” techniques not too sensitive to OCR quality (namely topic modelling and word embedding).
- ISBN:
  979-1-09-554685-6
- Relation:
  hal-03762935; https://hal.science/hal-03762935; https://hal.science/hal-03762935/document; https://hal.science/hal-03762935/file/2022_purenetal_parlaclarin3.pdf
- الدخول الالكتروني :
  https://hal.science/hal-03762935
  https://hal.science/hal-03762935/document
  https://hal.science/hal-03762935/file/2022_purenetal_parlaclarin3.pdf
- Rights:
  info:eu-repo/semantics/OpenAccess
- الرقم المعرف:
  edsbas.2FEA9F2E

تعليقات

No Comments.

Between History and Natural Language Processing: Study, Enrichment and Online Publication of French Parliamentary Debates of the Early Third Republic (1881–1899)

اتصل بنا

اتبع