Expanded Media & the MediaPlex

Page 62 - Expanded Media & the MediaPlex

P. 62

Expanded Media - and the MediaPlex 62/206
Joseph Faber: Euphonia - the talking automaton or ‘mechanical speech synthesiser’ 1846.
Mechanically simulating a human voice had been the quest of early natural scientists, mystics, showmen and magicians from antiquity. Famous(and almost unknown) historical figures such as Albertus Magnus, Roger Bacon, Wolfgang Von Kempelen, Charles Wheatstone (inventor of the telegraph and stereograph), and Alexander Graham Bell (inventor of the telephone) were caught up in this quest, with various degrees of success. Voice synthesis has been solved electrically and digitally in the 20th and 21st centuries, most notably with inventions like Dudley Homer’s Voder (1936), Franklyn Cooper’s Pattern Playback machine (which could render spectrographic patterns back into sound); and more recently in software, including the Apple voice-synthesis and speech recognition MacIntalk (1984), and PlainTalk (1993). Personally, I find the latest generation of Macintalk
‘voices’ (there are several dozen of them) useful for testing the efficacy of voice-over dialogue composed as text. Apple Notes - a simple text editor - allows you to export direct to Apple Tunes as synthesized speech in any of the many voices available through Apple System Preferences/ Accessibility. The AI pioneer Ray Kurzweil played a key role in developing voice-synthesis (1975), as well as developing reading/synthesis tools for the blind.
Faber’s Euphonia was ‘played’ with a piano-style keyboard – much like a music synthesiser – such as the Moog or the Fairlight CMI – whereas most modern speech synthesisers are driven by text – and in a very interesting way: text to speech engines are built of a front-end system that first normalises the text (coverts the raw text - often containing heteronyms, numbers and abbreviations - into their equivalent of written words; and then assigning phonetic transcriptions or phonemes (using grapheme-to-phoneme conversion), finally this string of information is converted to prosodic units – the phrases, clauses and sentences necessary to make sense of the speech. This prosodic data and these phonemes are finally converted into appropriate sounds by the back-end using a variety of mathematical techniques, and/or digital dictionaries (containing all words in a language together with their correct pronunciation)..

60 61 62 63 64