Page 7 - Topbox Proposal for Marriott - Final
P. 7
It has been said that the disadvantage is that the words in the search terms need to be in the dictionary in order to be found by the transcription search engine. This is true and is a hinderance in use cases where there is no ability to predict what will be said on a call, or more to the point, what you care was said on a call. For business analytics, users typically know what they are looking for; there are a finite number of reasons a consumer will call a supplier and most CX and Customer Care professionals can detail what those call drivers are.
Another benefit to LVCSR is that you can use word substitution to assist when call recordings are low quality or there is background noise that obscures the pronunciation; for example combining words (e.g. “see Alice” for “cialis”) or by using “sounds-like” approaches to word matching (e.g. Horizon and Verizon).
The initial processing of the audio also takes a bit longer than with a phonetic approach because of the large vocabulary that has to be analyzed, even though search time is actually much faster and more accurate. If you are responsible for Customer Experience and are trying
to uncover the root causes for a sudden spike in call traffic or if there is an
issue with one of your self-service channels (or perhaps a product issue) it’s imperative that you find that out as soon as possible. Faster search speeds mean you can comb through more customer calls in a shorter amount of time and get to the heart of the issue quickly, with less human resources cost.
With an LVCSR approach, the larger contexts that the sounds occur in are taken into account as well. This compensates for the fact that some sounds are very ambiguous and tend to merge with neighboring sounds
(e.g. “dish soap”) and the same sequence of sounds can be different word sequences: “let us pray” vs. “lettuce spray”. The LVCSR approach algorithm- ically determines which alternatives are more likely, letting the computer do most of the work.
LVCSR typically has a much higher precision since they are more likely to contain the words that were actually said, but lower recall due to unusual words or recognition errors. To compensate for this, a LVCSR approach provides a transcript of the words around the key term, allowing users to visually skim the “snippet” and determine if it is relevant or not. In addition, the fact that there is an actual transcript of the conversation allows for the automated analysis of the frequency of various words and phrases. This can reveal trends and metrics that the system hasn’t been specifically told to look for.
So Which Speech Recognition System is the Best?
The final decision comes down to looking at what is “best fit” is for your business problem in terms of cost, value, and manual effort. In general, phonetics is an appropriate fit for search-seldom applications. For larger enterprise applications the LVCSR approach to speech analytics is better suited. In the grand scheme of things, using phonetics is a bit like using cassette tapes in an era of MP3s. It still works and has niche benefits but for the most part the technology is outdated and there are more powerful options available to contact centers.
CONFIDENTIAL 7