Leveraging Self-supervised Audio Representations for Data-Efficient Acoustic Scene Classification

Item request has been placed!

Item request cannot be made.

Processing Request

اقرأ أكثر حفظ في قائمتي

المؤلفون: Cai, Yiqiang; Li, Shengchen; Shao, Xi
الموضوع:
Computer Science - Sound; Electrical Engineering and Systems Science - Audio and Speech Processing
نوع التسجيلة:
Working Paper
الدخول الالكتروني :
http://arxiv.org/abs/2408.14862

معلومة اضافية
- الموضوع:
  2024
- Collection:
  Computer Science
- نبذة مختصرة :
  Acoustic scene classification (ASC) predominantly relies on supervised approaches. However, acquiring labeled data for training ASC models is often costly and time-consuming. Recently, self-supervised learning (SSL) has emerged as a powerful method for extracting features from unlabeled audio data, benefiting many downstream audio tasks. This paper proposes a data-efficient and low-complexity ASC system by leveraging self-supervised audio representations extracted from general-purpose audio datasets. We introduce BEATs, an audio SSL pre-trained model, to extract the general representations from AudioSet. Through extensive experiments, it has been demonstrated that the self-supervised audio representations can help to achieve high ASC accuracy with limited labeled fine-tuning data. Furthermore, we find that ensembling the SSL models fine-tuned with different strategies contributes to a further performance improvement. To meet low-complexity requirements, we use knowledge distillation to transfer the self-supervised knowledge from large teacher models to an efficient student model. The experimental results suggest that the self-supervised teachers effectively improve the classification accuracy of the student model. Our best-performing system obtains an average accuracy of 56.7%.
  Comment: Accepted by DCASE Workshop 2024
- الرقم المعرف:
  edsarx.2408.14862

تعليقات

No Comments.

Leveraging Self-supervised Audio Representations for Data-Efficient Acoustic Scene Classification

اتصل بنا

اتبع