Part III – Speech Recognition

Speech recognition, the ability of a computer to understand human speech, started about 3 0 years ago and initially only had the ability to recognize the digits 0 through 9 and “yes” or “no”. Since this preceded  the wide spread use of touch tone phones, this early speech recognition technology allowed callers with rotary dial phones to access database information via IVRs by speaking rather than using touch tone input.

AS computer processors and memory became faster and cheaper along with the advancement in speech technology, computers gained the ability to recognize larger and larger vocabularies with increased accuracy. Three companies emerged, with each using a technology originally developed by Stanford, MIT, or universities. The technology today can understand complete thoughts such as ordering airline tickets by saying “I want to fly from Boston to San Francisco on November 22nd in the afternoon”

Another enabling technology is text to speech (TTS), which allows computers to read text and respond spoken speech. In early implementations of TTS, the spoken speech was very computer sounding, known as speech synthesis. Again, with the increases in computer processing, the spoken speech today is called concatenated speech, which consists of combining human recorded phonemes and combining them to form words. Phonemes are the smallest elements of human speech. English has 40 phonemes.

In my next blog I’ll discuss Computer Telephony Integration (CTI) and skill based routing, which allows a caller’s information to transfer with the call and be routed to the best available agent

