Item request has been placed! ×
Item request cannot be made. ×
loading  Processing Request

NCHLT Setswana RoBERTa language model

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • المؤلفون: Roald Eiselen
  • المصدر:
    Web ; Government Documents
  • نوع التسجيلة:
    other/unknown material
  • اللغة:
    Tswana
  • معلومة اضافية
    • Contributors:
      Rico Koen; Albertus Kruger; Jacques van Heerden
    • بيانات النشر:
      North-West University; Centre for Text Technology (CTexT)
    • الموضوع:
      2023
    • نبذة مختصرة :
      Contextual masked language model based on the RoBERTa architecture (Liu et al., 2019). The model is trained as a masked language model and not fine-tuned for any downstream process. The model can be used both as a masked LM or as an embedding model to provide real-valued vectorised respresentations of words or string sequences for Setswana text.
    • File Description:
      Training data: Paragraphs: 515,961; Token count: 14,518,437; Vocab size: 30,000; Embedding dimensions: 768; 235.79MB (Zipped); application/octet-stream
    • Relation:
      https://hdl.handle.net/20.500.12185/641
    • الدخول الالكتروني :
      https://doi.org/20.500.12185/641
      https://hdl.handle.net/20.500.12185/641
    • Rights:
      Creative Commons Attribution 4.0 International (CC-BY 4.0)
    • الرقم المعرف:
      edsbas.CA61AAF0