Page 308 - AWSAR 2.0
P. 308

284 || AWSAR Awarded Popular Science Stories - 2019
analogous to a teacher imparting knowledge to students in a class. The knowledge imparting process is carried out using huge sets of raw data available in the form of pre- recorded audio. In our work, we have used two audio databases namely Indian Institute of Technology Kharagpur Multi-Lingual Indian Language Speech Corpus (IITKGP-MLILSC), and Oriental Language Recognition (OLR) speech corpus.
Initially, the raw audio data from the databases are pre-processed using some signal conditioning techniques, to make them suitable for further processing. The processed signal is subjected to different feature extraction techniques to extract discriminative features for different languages. In our work, we used Mel-Frequency Cepstral Coefficients (MFCC) as features. These features are regarded as state-of-the-art features and are widely incorporated for numerous speech-based recognition tasks. Differently from the state- of-the-art MFCC features, we have proposed a new set of features for SLR, referred as Fourier Parameters (FP). To the best of our knowledge, we are first to introduce the use of FP features for SLR. We have used these proposed features along with MFCC features to develop different SLR systems. Any speech- based feature extraction technique initially divides the speech signal into a number of segments of shorter duration (typically in milliseconds) and separately processes each segment to extract features. In a similar way, a large set of features are extracted from the entire database corresponding to different languages. Not all the extracted features will
be helpful in generating a perfect classifier model. The extracted high-dimensional features may contain some correlated, irrelevant, redundant, and noisy features and may sometimes degrade the training process. In data science parlance, this is referred as “the curse of dimensionality”. To overcome this problem, we have reduced the size of the features using a dimensionality reduction algorithm named as Relief-F feature selection. Thisalgorithmidentifiesasubsetoffeaturesas best features from the available set of features. The best features are fed to the training algorithms to train different classifier models. In our work, we have developed SLR systems using three different types of machine learning classification models namely Support Vector Machines (SVM), Artificial Neural Networks, and Deep Neural Networks using MFCC and FP features. We have tested the performance of the trained models using the features extracted from the independent data sets, which are not used in the training process. From our experimental results, it is observed that our proposed FP features outperformed the state- of-the-art MFCC features. We investigated the net-effect in developing models using the combination of MFCC and FP features. The use of combined features further improved the recognition performance of the systems when compared to the state-of-the-art MFCC and proposed FP features. We were able to obtain 89.40% and 70.80% recognition accuracies with ANN-based SLR systems in case of Indian and Oriental languages, respectively. The proposed FP features performed well on both Indian and Oriental languages.
   






























































































   306   307   308   309   310