Summarization assessment methodology for multiple corpora using queries and classification for functional evaluation

Item request has been placed!

Item request cannot be made.

Processing Request

اقرأ على الانترنت اقرأ أكثر حفظ في قائمتي

المؤلفون: Sam Wolyn; Steven J. Simske
المصدر:
Integrated Computer-Aided Engineering. 29:227-239
الموضوع:
Computational Theory and Mathematics; Artificial Intelligence; Software; Computer Science Applications; Theoretical Computer Science
الدخول الالكتروني :
https://explore.openaire.eu/search/publication?articleId=doi_________::04ce88dcbb1d5b7f2ef501b2ab7504e2
https://doi.org/10.3233/ica-220680

معلومة اضافية
- بيانات النشر:
  IOS Press, 2022.
- الموضوع:
  2022
- نبذة مختصرة :
  Extractive summarization is an important natural language processing approach used for document compression, improved reading comprehension, key phrase extraction, indexing, query set generation, and other analytics approaches. Extractive summarization has specific advantages over abstractive summarization in that it preserves style, specific text elements, and compound phrases that might be more directly associated with the text. In this article, the relative effectiveness of extractive summarization is considered on two widely different corpora: (1) a set of works of fiction (100 total, mainly novels) available from Project Gutenberg, and (2) a large set of news articles (3000) for which a ground truthed summarization (gold standard) is provided by the authors of the news articles. Both sets were evaluated using 5 different Python Sumy algorithms and compared to randomly-generated summarizations quantitatively. Two functional approaches to assessing the efficacy of summarization using a query set on both the original documents and their summaries, and using document classification on a 12-class set to compare among different summarization approaches, are introduced. The results, unsurprisingly, show considerable differences consistent with the different nature of these two data sets. The LSA and Luhn summarization approaches were most effective on the database of fiction, while all five summarization approaches were similarly effective on the database of articles. Overall, the Luhn approach was deemed the most generally relevant among those tested.
- ISSN:
  1875-8835
  1069-2509
- الرقم المعرف:
  10.3233/ica-220680
- الرقم المعرف:
  edsair.doi...........04ce88dcbb1d5b7f2ef501b2ab7504e2

تعليقات

No Comments.

Summarization assessment methodology for multiple corpora using queries and classification for functional evaluation

اتصل بنا

اتبع