Page 307 - AWSAR 2.0
P. 307

 man-machine interaction is still a challenging research problem. One of the major challenges is the language barrier. Language serves as the medium for dissemination and exchange of information. Mostly, the recognition systems in existence provide support for limited languages and dialects. There is a huge requirement to develop similar systems for unaddressed languages and dialects. The current research fills this gap by moving towards developing speech-based recognition systems for multilingual languages.
It is well known that machines (computers) use a single language (binary), but this is not the case with humans. Statistical reports state that roughly 6,500+ languages are spoken across the world. Where India being a multilingual nation, 1500+ languages are spoken by the people among which 22 are officially recognized. Machines equipped with multilingual speech recognition systems allow speakers from different language backgrounds to efficiently interact with them. So, the problem of language barrier can be easily solved by deploying multiple speech- based recognition models. But during an interaction, the machine should switch to an appropriate language model to initiate the recognition process of interest. The language model should be selected based on the language of the speaker for communication. So, the machine first needs to recognize the language of the speaker in order to interpret the speech and allow the speaker to interact further. Especially in countries like India and other parts of Asia, the language recognition task itself poses potential problems because of similarities existing in their scripts and phonemes.
This motivated our research group to work in this area and develop Spoken Language Recognition (SLR) systems to achieve robust recognition performance. We have developed two different SLR systems, one for Indian
Mr. Nettimi Satya Sai Srinivas || 283
languages and other for Oriental languages, respectively. The spoken languages in India broadly fall into four major linguistic families namely Indo-Aryan, Dravidian, Austroasiatic, and Tibeto-Burman. We developed SLR systems to recognize 15 official Indian languages, chosen from three major Indian linguistic families. These include 9 languages (Bengali, Chhattisgarhi, Gujarati, Hindi, Kashmiri, Punjabi, Rajasthani, Sanskrit, and Sindhi) from Indo-Aryan family, 3 languages (Konkani, Tamil, and Telugu) from Dravidian family, and 3 languages (Manipuri, Mizo, and Nagamese) from Tibeto-Burma family. Similarly, we have developed SLR systems to recognize 10 Oriental languages, chosen from five Asian linguistic families, namely Altaic, Austroasiatic, Austronesian, Indo-European, and Sino-Tibetan. These include 4 languages (Japanese, Kazakh, Korean, and Uyghur) from Altaic, 3 languages (Cantonese, Mandarin, and Tibetan) from Sino-Tibetan family, and finally Indonesian, Russian, and Vietnamese from Austronesian, Indo-European, and Austroasiatic families, respectively.
Any speech-based recognition system (SLR in our case) mainly comprises two sub- systems namely, front-end (feature extraction) and back-end (decision making/classification) units, respectively. The front-end feature extraction sub-system implements numerous operations related to the field of digital speech signal processing, while the back- end classification sub-system implements operations related to the field of machine learning and data science. The role of the front-end unit is to extract salient features from the speech signal and pass to the back-end classifier. The role of the back-end unit is to decide (recognize the spoken language in our case) using the fed features and its prior knowledge. The classifier is imparted with prior knowledge during the training process before it is put into the actual use. This process is





























































































   305   306   307   308   309