German Speech Corpus aligned with CTC segmentation

Alignments on Librivox and Spoken Wikipedia Corpus (SWC) with CTC segmentation:

Dataset Length Speakers Utterances
SWC 210h 363 78214
Librivox 804h 251 368532


The pre-processed text and alignments can be found on 

Source of the audio files:

See the Downloads section for a pre-trained model.



The full paper can be found in the preprint or published at

