Contributors: Cognition, Langues, Langage, Ergonomie (CLLE-ERSS); École Pratique des Hautes Études (EPHE); Université Paris Sciences et Lettres (PSL)-Université Paris Sciences et Lettres (PSL)-Université Toulouse - Jean Jaurès (UT2J); Université de Toulouse (UT)-Université de Toulouse (UT)-Université Bordeaux Montaigne (UBM)-Centre National de la Recherche Scientifique (CNRS); University of Turku; BCL, équipe Logométrie : corpus, traitements, modèles; Bases, Corpus, Langage (UMR 7320 - UCA / CNRS) (BCL); Université Nice Sophia Antipolis (1965 - 2019) (UNS)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UniCA)-Université Nice Sophia Antipolis (1965 - 2019) (UNS)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UniCA)
نبذة مختصرة : International audience ; Wikipedia is a popular and extremely useful resource for studies in both linguistics and natural language processing (Yano and Kang, 2008; Ferschke et al., 2013). This paper introduces a new language resource based on the French Wikipedia online discussion pages, the WikiTalk corpus. The publicly available corpus includes 160M words and 3M posts structured into 1M thematic sections and has been syntactically parsed with the Talismane toolkit (Urieli, 2013). In this paper, we present the first results of experiments aiming at classifying and profiling the talk pages and threads in order to determine criteria for selecting discussions with conflicts.
No Comments.