German Speech Corpus aligned with CTC segmentation

Alignments on Librivox and Spoken Wikipedia Corpus (SWC) with CTC segmentation:

Dataset	Length	Speakers	Utterances
SWC	210h	363	78214
Librivox	804h	251	368532

The pre-processed text and alignments can be found on https://github.com/lumaku/german-corpus-aligned

Source of the audio files:

SWC: German Spoken Wikipedia Corpus
Librivox: from the IDs in the metadata file books-German.json. Audiofiles can be automatically retrieved via id using the LibriVox API, e.g. https://librivox.org/api/feed/audiobooks/?id=82&format=json , and then downloading the URL. See the Downloads section for the collected mp3 files. (Convert to wav with ffmpeg -i path/to/audio.mp3 -ac 1 -ar 16000 path/to/audio.wav)

See the Downloads section for a pre-trained model.

Downloads

Librivox + SWC alignments
Librivox MP3 bundle in librivox-de-mp3.tar.zst. The full file has sha1sum fff13810471def3c342ee500dc7d2d6b78c9b64a
Pre-trained German ESPnet 1 Transformer model german.transformer.v1.tar.gz; sha1sum a2f2f9a25ca27e4b9b968f175d9d87c305f4b155

Reference

The full paper can be found in the preprint https://arxiv.org/abs/2007.09127 or published at https://doi.org/10.1007/978-3-030-60276-5_27.

To cite this work:

@InProceedings{ctcsegmentation, author="K{\"u}rzinger, Ludwig and Winkelbauer, Dominik and Li, Lujun and Watzel, Tobias and Rigoll, Gerhard", editor="Karpov, Alexey and Potapova, Rodmonga", title="CTC-Segmentation of Large Corpora for German End-to-End Speech Recognition", booktitle="Speech and Computer", year="2020", publisher="Springer International Publishing", address="Cham", pages="267--278", abstract="Recent end-to-end Automatic Speech Recognition (ASR) systems demonstrated the ability to outperform conventional hybrid DNN/HMM ASR. Aside from architectural improvements in those systems, those models grew in terms of depth, parameters and model capacity. However, these models also require more training data to achieve comparable performance.", isbn="978-3-030-60276-5" }

To top

Lehrstuhl für Mensch-Maschine-Kommunikation

Prof. Dr.-Ing. W. Hemmert (kommissarisch)

Theresienstraße 90
80333 München

Tel. +49 (0)89 289 28541
Fax. +49 (0)89 289 28535

E-Mail: mmk@ei.tum.de