Item request has been placed! ×
Item request cannot be made. ×
loading  Processing Request

The Amphan Dataset: Humanitarian Classification of Bengali Disaster-Related Tweets

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • معلومة اضافية
    • بيانات النشر:
      Zenodo
    • الموضوع:
      2025
    • Collection:
      Zenodo
    • نبذة مختصرة :
      The Amphan Dataset (Bengali Disaster-Related Tweets) This Amphan dataset is a contribution of the paper titled "Humanitarian Classification of Crisis-related Microblogs in Bengali: A Comparison of Multilingual Pre-trained Language Models", which has been accepted for publication in the International Journal of Disaster Risk Reduction (IJDRR), Elsevier !! For more details about the dataset, please refer to our paper. Overview The Amphan dataset is a collection of 2,400 Bengali-language tweets related to Cyclone Amphan, which struck the Bay of Bengal region in May 2020. The dataset aims to support the classification of disaster-related social media content in low-resource languages such as Bengali, with a focus on humanitarian aid and crisis response. To our knowledge, this is the first publicly available dataset of disaster-related social media posts in Bengali. Despite being developed in the context of Cyclone Amphan, this dataset has broader relevance due to the frequent occurrence of cyclonic storms affecting the region comprising Bangladesh and eastern India. Dataset structure The Amphan dataset in this repository is provided in Excel (.xlsx) format and contains 2,400 tweets (microblogs). The dataset includes the following columns: ID: Unique identifier for each tweet Tweet_in_Bengali: Original tweet text written in Bengali Tweet_in_EN_IndicTrans: English translation generated via a neural machine translation model, IndicTrans Label: One or more humanitarian class labels assigned to the tweet Annotation and Class Labels The tweets were annotated with a set of humanitarian class labels based on the type of information conveyed. The dataset is multi-label, meaning a tweet can belong to more than one class. The labels are as follows: affected_individual caution_advice_updates displaced_and_evacuations donations_and_volunteering infrastructure_and_utilities_damage injured_or_dead_people missing_and_found_people requests_or_needs response_efforts sympathy_and_support not_humanitarian License The Amphan dataset is ...
    • Relation:
      https://zenodo.org/records/15865419; oai:zenodo.org:15865419; https://doi.org/10.5281/zenodo.15865419
    • الرقم المعرف:
      10.5281/zenodo.15865419
    • الدخول الالكتروني :
      https://doi.org/10.5281/zenodo.15865419
      https://zenodo.org/records/15865419
    • Rights:
      Creative Commons Attribution Non Commercial Share Alike 4.0 International ; cc-by-nc-sa-4.0 ; https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode ; © 2025 Copyright held by the owner / author(s).
    • الرقم المعرف:
      edsbas.34E70635