Page 6 - Topbox Proposal for Marriott - Final
P. 6

 Speech analytics is one arrow in the quiver when it comes to CX analytics but it is a valuable tool because it can be used to diagnose and address customer service issues, efficiency gains, agent training gaps and many others. There are two approaches to speech analytics: phonetics and LVCSR. So which is better for gathering insights from you customer interactions?
Phonetics
Phonetic speech analytics search pre-processes the audio into the phoneme (sequential sounds) and encodes the result into the possible words.
Pros and Cons of Phonetics Systems
The primary advantage of a phonetic approach is that words do not have to be in a predefined vocabulary to be found, provided the phonemes are recognizable. For instance, when searching for the name of the drug “cialis”, the term may still be found in the text if that sequence of phonemes exists, “S IY AH LIH S”. The disadvantage is that since there are many other words with the same phenome;, the term may be found in many places where it was never actually said, for instance if the actual words were “I don’t know, I’d have to see a list.” or “ see Alice, I told you so”,.
Also, phonetics produces a faster upfront processing rate (turning the phone calls into data), mostly because the size of the “vocabulary” being used is very small as phonetics relies only on the sounds of the language and there aren’t very many unique phonemes in most languages. However, the search process, or actually using a phonetic based application, is much slower since phonemes cannot be as efficiently indexed the way whole words can.
Phonetics also requires a larger footprint for storage, since a word has an average of 4 phonemes, which may be an issue for large scale applications. In addition they do not take into account any higher level knowledge of the language, meaning “most likely” used phrases cannot be filtered correctly, requiring more manual work later.
LVCSR Speech Analytics
Large Vocabulary Continuous Speech Recognition (LVCSR) begins by recognizing phonemes much like a phonetic system, but then applies a dictionary or language model of potentially 100,000 words and phrases to produce a full transcript. In LVCSR every word is recognized and nothing is thrown away or skipped. While the initial process of recognizing the full transcript and not just the individual phonemes requires more processing power than phonetic only recognition, the resulting transcript makes it much easier and faster for contact centers to search and use the gathered information.
Pros and Cons of LVCSR Speech Analytics
LVCSR allows you to combine multiple words to create an utterance comprised of different word sequences (like “the side effect of cialis include” or “cialis pills”), therefore the contextual accuracy is much higher than just the single word lookup of a phonetic approach, so it is more likely that if the word is found, it was really spoken. A good example is the word Bomb. If searching for the word bomb using phonetics, you will see false positives if the word was Obama, or bombardier, because the phenome bomb was detected. With LVCSR you will not have these false positives because the entire word is transcribed into text.
    CONFIDENTIAL 6
 






















































































   4   5   6   7   8