Item request has been placed! ×
Item request cannot be made. ×
loading  Processing Request

CS4984/CS5984: Big Data Text Summarization Team 17 ETDs

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • معلومة اضافية
    • بيانات النشر:
      Virginia Tech
    • الموضوع:
      2018
    • Collection:
      VTechWorks (VirginiaTech)
    • نبذة مختصرة :
      Given the current explosion of information over various media such as electronic and physical texts, concise and relevant data has become key to the understanding of things. Summarization, which essentially is the process of reducing the text to convey only the salient aspects, has emerged as a challenging task in the field of Natural Language Processing. In a scientific construct, academia has been generating voluminous amounts of data in the form of theses and dissertations. Obtaining the chapter-wise summary of an electronic thesis or dissertation can be a computationally expensive task, particularly because of its length and the subject to which it pertains to. Through this course, research and development of various summarization techniques, primarily extractive and abstractive summarization, were analyzed. There have been various developments in the field of deep learning to tackle problems related to summarization and produce coherent and meaningful summaries for news articles. In this project, tools that could be used to generate coherent and concise summaries of long electronic theses and dissertations (ETDs) were developed as well. The major concern initially was to get the text from a PDF file of an ETD. GROBID and Scienceparse were used as pre-processing tools to carry out this task and presented the text from a PDF in a structured format such as XML or JSON file. The outputs from each of the tools were compared qualitatively as well as quantitatively. After this, a transfer learning approach was adopted, wherein a pre-trained model was tweaked to fit to the task of summarizing each ETD. This came in as a challenge to make the model learn the nuances of an ETD. An iterative approach was used to explore various networks, each trying to improve the shortcomings of the previous one in its novel way. Existing deep learning models including Sequence-2-Sequence, Pointer Generator Networks, and A Hybrid Extractive-Abstractive Reinforce-Selecting Sentence Rewriting Network, were used to generate and test ...
    • File Description:
      application/x-zip-compressed; application/pdf; application/vnd.openxmlformats-officedocument.presentationml.presentation
    • Relation:
      http://hdl.handle.net/10919/86420
    • الدخول الالكتروني :
      http://hdl.handle.net/10919/86420
    • Rights:
      In Copyright ; http://rightsstatements.org/vocab/InC/1.0/
    • الرقم المعرف:
      edsbas.B1A62737