Item request has been placed! ×
Item request cannot be made. ×
loading  Processing Request

Parliamentary spoken corpus of Czech ParlaSpeech-CZ 1.0

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • معلومة اضافية
    • بيانات النشر:
      Jožef Stefan Institute
    • الموضوع:
      2024
    • Collection:
      Linguistic Data and NLP Tools (CLARIN - Common Language Resources and Technology Infrastructure, Slovenia)
    • نبذة مختصرة :
      The ParlaSpeech-CZ dataset is built from the transcripts of parliamentary proceedings available in the Czech part of the ParlaMint corpus, and the parliamentary recordings available from the AudioPSP dataset (http://hdl.handle.net/11234/1-5404). The corpus consists of audio segments that correspond to specific sentences in the transcripts. The transcript contains word-level alignments to the recordings, allowing for simple further segmentation of long sentences into shorter segments for ASR and other memory-sensitive applications. Each segment has a reference to the ParlaMint 4.0 corpus (http://hdl.handle.net/11356/1859) via utterance IDs and character offsets. All the speaker information from the ParlaMint corpus is available via the "speaker_info" key. Different to other ParlaSpeech datasets, each instance in this dataset has an additional "sentence_id" key referring to the ParlaMint sentence ID, and an additional "id" key in the description of each word referring to the ParlaMint word ID. Namely, in this dataset original ParlaMint sentence and word segmentation was kept due to a different, centralised processing approach. Additionally, the "audio_source" key is also available, pointing at the original audio recording from the AudioPSP dataset.
    • File Description:
      text/plain; charset=utf-8; application/gzip; application/octet-stream; downloadable_files_count: 5
    • Relation:
      https://aclanthology.org/2022.parlaclarin-1.16; https://link.springer.com/chapter/10.1007/978-3-030-83527-9_25; http://hdl.handle.net/11356/1337; http://hdl.handle.net/11356/1785
    • Rights:
      Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ; https://creativecommons.org/licenses/by-sa/4.0/ ; PUB
    • الرقم المعرف:
      edsbas.FFD54059