Fork me on GitHub


Open tools and data for cloudless automatic speech recognition



Automatic Speech Recognition has made big leaps forward thanks to advances in Deep Learning in recent times. On specific tasks it's reaching human parity. The algorithms making this possible are published by researches and big tech companies' Deep Learning frameworks are open source.

Our mission is to make it as easy possible to take advantage of these developments tailored to your business needs.

lockKeep Ownership of your Data

There are numerous benefits of having a custom Automatic Speech Recognition System that is tailored to your needs.

cloud_offNo Cloud

For one, no (audio) data ever has to leave your company or device - data which potentially can bear sensitive personal or corporate information.

settingsDomain Specific Models

Furthermore, an Automatic Speech Recognition System trained for your use case will probably perform better than a general system. Sometimes it's enough to train a domain specific language model, which can be trained in an unsupervised manner based on existing texts. Sometimes it's necessary to train an acoustic model. To this end a speech corpus consisting of audio data together with its manually created transcriptions is necessary.

copyrightStay in Control of your Model

You have complete control over what data goes into the training of your model, and how it should behave.

cropFine-tuned models for your environment

But not only the acoustic and language models can be tailored to your needs - the underlying model can also be adapted to existing constraints regarding computing power. Clearly, a full fledged server has more computing resources than a mobile computer, e.g. a Raspberry Pi.

attach_moneyCheaper than Cloud Services

Most cloud based Speech To Text systems charge per time unit of transcribed audio. The training of a custom Speech To Text system on the other hand requires an investment upfront. But in the long run a domain specific system can turn out to be cheaper.

Use Cases

The use cases for Speech To Text technology can be broadly devided into command and control and into large vocabulary transcription tasks.

Command & Control

directions_car

Automotive

apps

IVR

face

Assistive Tech for Elderlies or Persons with Disabilities

Large Vocabulary Transcription

theaters

Generation of Subtitles

description

Transcription of Audio Archives

school

Education

call

Transcription of Telephone Calls

local_hospital

Medical Documentation

group

Transcription of Meetings

What we have to offer

extension

Customized Models

Choose among different speech and text corpora to create models tailored to your use case, e.g. for phone calls, distant microphone recordings, or television news.

sort_by_alpha

Pronunciation Lexica

We have created and are expanding a German pronunciation lexicon containing more than 370k words and their phonetic representations. Furthermore an English Lexicon based on CMU Dict is also available.

group

Pre-trained ASR Models

Directly use pre-trained Kaldi or CMU Sphinx models for your own ASR projects - available here for English and German.

translate

Grapheme to Phoneme Models

Use our scripts and pronunciation lexica to train grapheme to phoneme models or download and use a pre-trained model.

group

Supported Speech Corpora

The scripts form a processing pipeline that supports several speech corpora out of the box: VoxForge (English and German), German Speechdata Package V2, LibriSpeech, Forschergeist (German), and Zamia (German).

group

Grow Open Speech Corpora

Prepare LibriVox data for training acoustic models. Help grow open speech corpora by efficiently correcting existing transcripts of spontaneous speech for example from podcasts.

flash_on

Original Speech Corpora

This project encompasses two original German speech corpora: Forschergeist and Zamia. The former contains spontaneous speech, the latter audio recordings for interactions between humans and smart speakers.

group

Embedded Speech Recognition

Use our pre-trained models for CMU Sphinx and Kaldi to do speech recognition on resource constraint systems, e.g. Raspberry Pi.

settings

Remix Speech Corpora

Create noisy versions of training data or re-encode audio data to simulate different environments, e.g. noisy background or different telephone codecs.



mail