Item request has been placed!

Item request cannot be made.

Processing Request

VoxVietnam: a Large-Scale Multi-Genre Dataset for Vietnamese Speaker Recognition

Item request has been placed!

Item request cannot be made.

Processing Request

اقرأ أكثر حفظ في قائمتي

المؤلفون: Vu, Hoang Long; Dat, Phuong Tuan; Nhi, Pham Thao; Hao, Nguyen Song; Trang, Nguyen Thi Thu
الموضوع:
Computer Science - Sound; Computer Science - Computation and Language; Electrical Engineering and Systems Science - Audio and Speech Processing
نوع التسجيلة:
Working Paper
الدخول الالكتروني :
http://arxiv.org/abs/2501.00328

معلومة اضافية
- الموضوع:
  2024
- Collection:
  Computer Science
- نبذة مختصرة :
  Recent research in speaker recognition aims to address vulnerabilities due to variations between enrolment and test utterances, particularly in the multi-genre phenomenon where the utterances are in different speech genres. Previous resources for Vietnamese speaker recognition are either limited in size or do not focus on genre diversity, leaving studies in multi-genre effects unexplored. This paper introduces VoxVietnam, the first multi-genre dataset for Vietnamese speaker recognition with over 187,000 utterances from 1,406 speakers and an automated pipeline to construct a dataset on a large scale from public sources. Our experiments show the challenges posed by the multi-genre phenomenon to models trained on a single-genre dataset, and demonstrate a significant increase in performance upon incorporating the VoxVietnam into the training process. Our experiments are conducted to study the challenges of the multi-genre phenomenon in speaker recognition and the performance gain when the proposed dataset is used for multi-genre training.
  Comment: Accepted to 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2025)
- الرقم المعرف:
  edsarx.2501.00328

تعليقات

No Comments.