Item request has been placed!
×
Item request cannot be made.
×
Processing Request
NCHLT Setswana RoBERTa language model
Item request has been placed!
×
Item request cannot be made.
×
Processing Request
- المؤلفون: Roald Eiselen
- المصدر:
Web ; Government Documents
- نوع التسجيلة:
other/unknown material
- اللغة:
Tswana
- معلومة اضافية
- Contributors:
Rico Koen; Albertus Kruger; Jacques van Heerden
- بيانات النشر:
North-West University; Centre for Text Technology (CTexT)
- الموضوع:
2023
- نبذة مختصرة :
Contextual masked language model based on the RoBERTa architecture (Liu et al., 2019). The model is trained as a masked language model and not fine-tuned for any downstream process. The model can be used both as a masked LM or as an embedding model to provide real-valued vectorised respresentations of words or string sequences for Setswana text.
- File Description:
Training data: Paragraphs: 515,961; Token count: 14,518,437; Vocab size: 30,000; Embedding dimensions: 768; 235.79MB (Zipped); application/octet-stream
- Relation:
https://hdl.handle.net/20.500.12185/641
- الدخول الالكتروني :
https://doi.org/20.500.12185/641
https://hdl.handle.net/20.500.12185/641
- Rights:
Creative Commons Attribution 4.0 International (CC-BY 4.0)
- الرقم المعرف:
edsbas.CA61AAF0
No Comments.