Automatic Speech Recognition has made big leaps forward thanks to advances in Deep Learning in recent times. On specific tasks it's reaching human parity. The algorithms making this possible are published by researches and big tech companies' Deep Learning frameworks are open source.
Our mission is to make it as easy as possible to take advantage of these developments tailored to your business needs.
There are numerous benefits of having a custom Automatic Speech Recognition System that is tailored to your needs.
For one, no (audio) data ever has to leave your company or device - data which potentially can bear sensitive personal or corporate information.
Furthermore, an Automatic Speech Recognition System trained for your use case will probably perform better than a general system. Sometimes it's enough to train a domain specific language model, which can be trained in an unsupervised manner based on existing texts. Sometimes it's necessary to train an acoustic model. To this end a speech corpus consisting of audio data together with its manually created transcriptions is necessary.
You have complete control over what data goes into the training of your model, and how it should behave.
But not only the acoustic and language models can be tailored to your needs - the underlying model can also be adapted to existing constraints regarding computing power. Clearly, a full fledged server has more computing resources than a mobile computer, e.g. a Raspberry Pi.
Most cloud based Speech To Text systems charge per time unit of transcribed audio. The training of a custom Speech To Text system on the other hand requires an investment upfront. But in the long run a domain specific system can turn out to be cheaper.
The use cases for Speech To Text technology can be broadly devided into command and control and into large vocabulary transcription tasks.
Choose among different speech and text corpora to create models tailored to your use case, e.g. for phone calls, distant microphone recordings, or television news.
We have created and are expanding a German pronunciation lexicon containing more than 370k words and their phonetic representations. Furthermore an English Lexicon based on CMU Dict is also available.
Directly use pre-trained Kaldi or CMU Sphinx models for your own ASR projects - available here for English and German.
Use our scripts and pronunciation lexica to train grapheme to phoneme models or download and use a pre-trained model.
The scripts form a processing pipeline that supports several speech corpora out of the box: VoxForge (English and German), German Speechdata Package V2, LibriSpeech, Forschergeist (German), and Zamia (German).
Prepare LibriVox data for training acoustic models. Help grow open speech corpora by efficiently correcting existing transcripts of spontaneous speech for example from podcasts.
This project encompasses two original German speech corpora: Forschergeist and Zamia. The former contains spontaneous speech, the latter audio recordings for interactions between humans and smart speakers.
Use our pre-trained models for CMU Sphinx and Kaldi to do speech recognition on resource constraint systems, e.g. Raspberry Pi.
Create noisy versions of training data or re-encode audio data to simulate different environments, e.g. noisy background or different telephone codecs.