Overview
Speech technology is increasingly being used to create highly interactive, voice-activated applications. From voice-control, to smart assistants, to speech transcription and translation, to closed-captioning and language learning, the improved accuracy and processing speed of this technology is enhancing the quality of applications and delivering greater user experiences.
In this course we use the open-source Sphinx toolkit (aka CMU Sphinx) to demonstrate and model various types of speech-enabled applications. By the end of the course participants should have a solid grasp of the tools and techniques needed to apply speech technology to their own applications. Sphinx 4 will be the basis for this training, however, coverage of Sphinx 3 can also be arranged.
Audience
- Software developers and programmers
Format of the course
- Part lecture, part discussion, heavy hands-on practice, occasional tests to gauge understanding
Requirements
- An understanding of the fundamentals of speech technology
- Programming experience, expecially Java for Sphinx 4 and C for PocketSphinx
Course Outline
Introduction to speech and speech technology
Downloading and building Sphinx
Preparing packages and models
Eclipse IDE setup
Overview of the CMU Sphinx toolkit
Building an application with Sphinx4
Building the dictionary
Building the language model
Adapting existing acoustic model
Building an acoustic model
Building an Android application with PocketSphinx
Performance tuning
Summary and conclusion