Mohamad Khairul Najmi Zailan Centre for Electrical Engineering Studies, Universiti Teknologi MARA, Cawangan Pulau Pinang, 13500 Permatang Pauh, Pulau Pinang, Malaysia Yusnita Mohd Ali Centre for Electrical Engineering Studies, Universiti Teknologi MARA, Cawangan Pulau Pinang, 13500 Permatang Pauh, Pulau Pinang, Malaysia Emilia Noorsal Centre for Electrical Engineering Studies, Universiti Teknologi MARA, Cawangan Pulau Pinang, 13500 Permatang Pauh, Pulau Pinang, Malaysia Mohd Hanapiah Abdullah Centre for Electrical Engineering Studies, Universiti Teknologi MARA, Cawangan Pulau Pinang, 13500 Permatang Pauh, Pulau Pinang, Malaysia Zuraidi Saad Centre for Electrical Engineering Studies, Universiti Teknologi MARA, Cawangan Pulau Pinang, 13500 Permatang Pauh, Pulau Pinang, Malaysia Adni Mat Leh Centre for Electrical Engineering Studies, Universiti Teknologi MARA, Cawangan Pulau Pinang, 13500 Permatang Pauh, Pulau Pinang, Malaysia |
|
Abstract | |
Speech is the utmost communication medium for human beings which conveys rich and valuable information such as accent, gender, emotion and unique identity. Therefore, automatic speaker recognition can be developed based on unique characteristics of one’s speech and utilized for applications such as voice dialing, online banking, and telephone shopping to verify the identity of its users. However, retrieving salient features which are capable of identifying speakers is a challenging problem in speech recognition systems since speech contains abundant information. In this study, a total of 438 audio data obtained from speakers uttering speech in text-independent context is proposed using speech data elicited from three Malay male speakers. The performance of two popularly used feature extraction techniques namely, linear prediction coefficients (LPC) and Mel-frequency cepstral coefficients (MFCC) were compared using discriminant analysis model. Although both features yielded impressive outcomes, the MFCC features surpassed that of LPC by achieving a higher accuracy rate of 99.09%, which was 4.34% higher than the latter. | |
Keyword: speaker recognition; biometric; linear prediction coefficients; mel-frequency coefficients; discriminant analysis |
|
References: | |
[1]H. Jiang and H. Yu, "Research on Speaker Recognition Technology Based on Feature Model," Proceedings of the 3rd Asia-Pacific Conference on Image Processing, Electronics and Computers, Dalian, China, 2022. [2]R. Jahangir, Y. W. Teh, H. F. Nweke, G. Mujtaba, M. A. Al-Garadi, and I. Ali, "Speaker identification through artificial intelligence techniques: A comprehensive review and research challenges," Expert Systems with Applications, vol. 171, p. 114591, 2021. [3]A. Kusuma and D. P. Lestari, "Atom Aligned Sparse Representation Approach for Indonesian Emotional Speaker Recognition System," in 2020 7th International Conference on Advance Informatics: Concepts, Theory and Applications (ICAICTA), IEEE, pp. 1-4. 2020. [4]N. Washani and S. Sharma, "Speech recognition system: A review," International Journal of Computer Applications, vol. 115, no. 18, pp. 7-10, 2015. [5]S. Shaikh Naziya and R. Deshmukh, "Speech recognition system—a review," IOSR J. Comput. Eng, vol. 8, no. 4, pp. 3-8, 2016. [6]M. Jakubec, E. Lieskovska, and R. Jarina, "An Overview of Automatic Speaker Recognition in Adverse Acoustic Environment," in 2020 18th International Conference on Emerging eLearning Technologies and Applications (ICETA): IEEE, pp. 211-218, 2020. [7]N. Chauhan, T. Isshiki, and D. Li, "Speaker recognition using LPC, MFCC, ZCR features with ANN and SVM classifier for large input database," in 2019 IEEE 4th International Conference on Computer and Communication Systems (ICCCS),: IEEE, pp. 130-133, 2019. [8]E. R. Swedia, A. B. Mutiara, and M. Subali, "Deep learning long-short term memory (LSTM) for Indonesian speech digit recognition using LPC and MFCC Feature," in 2018 Third International Conference on Informatics and Computing (ICIC), : IEEE, pp. 1-5. 2018 [9]D. Salvati, C. Drioli, and G. L. Foresti, "A late fusion deep neural network for robust speaker identification using raw waveforms and gammatone cepstral coefficients," Expert Systems with Applications, vol. 222, p. 119750, 2023. [10]S. R. Hasibuan, R. Hidayat, and A. Bejo, "Speaker Recognition Using Mel Frequency Cepstral Coefficient and Self-Organising Fuzzy Logic," in 2020 3rd International Seminar on Research of Information Technology and Intelligent Systems (ISRITI): IEEE, pp. 52-55, 2020. [11]Y. Tu, W. Lin, and M. W. Mak, "A Survey on Text-Dependent and Text-Independent Speaker Verification," IEEE Access, vol. 10, pp. 99038-99049, 2022, doi: 10.1109/ACCESS.2022.3206541. [12]M. A. Yusnita, M. P. Paulraj, S. Yaacob, M. N. Fadzilah, and A. B. Shahriman, "Acoustic Analysis of Formants Across Genders and Ethnical Accents in Malaysian English Using ANOVA," Procedia Engineering, vol. 64, pp. 385-394, 2013. [13]B. M. Nema and A. A. Abdul-Kareem, "Preprocessing signal for speech emotion recognition," Al-Mustansiriyah Journal of Science, vol. 28 (3), pp. 157-165, 2018. [14]M. M. Hasan, H. Ali, M. F. Hossain, and S. Abujar, "Preprocessing of Continuous Bengali Speech for Feature Extraction," in 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT): IEEE, pp. 1-4, 2020. [15]M. Yusnita, M. Paulraj, S. Yaacob, R. Yusuf, and M. N. Fadzilah, "Robust Accent Recognition in Malaysian English using PCA-Transformed Mel-Bands Spectral Energy Statistical Descriptors," Indian Journal of Science and Technology, vol. 8, p. 20, 2015. [16]M. A. Yusnita, M. P. Paulraj, S. Yaacob, S. A. Bakar, and A. Saidatul, "Malaysian English accents identification using LPC and formant analysis," in 2011 IEEE International Conference on Control System, Computing and Engineering: IEEE, pp. 472-476, 2011. [17]O. K. Hamid, "Frame blocking and windowing speech signal," Journal of Information, Communication, and Intelligence Systems (JICIS), vol. 4 (5), pp. 87-94, 2018. [18]M. Manjutha, J. Gracy, P. Subashini, and M. Krishnaveni, "Automated speech recognition system—A literature review," Computational Methods, Communication Techniques And Informatics, vol. 205, pp. 740-741, 2017. [19]Y. Astuti, R. Hidayat, and A. Bejo, "Comparison of Feature Extraction for Speaker Identification System," in 2020 3rd International Seminar on Research of Information Technology and Intelligent Systems (ISRITI): IEEE, pp. 642-645, 2020. [20]M. Yusnita, E. Noorsal, N. F. Mokhtar, S. Z. M. Saad, M. H. Abdullah, and L. C. Chin, "Speech-based gender recognition using linear prediction and mel-frequency cepstral coefficients," Indonesian Journal of Electrical Engineering and Computer Science, vol. 28, pp. 753-761, 2022. [21]M. M. Kabir, M. F. Mridha, J. Shin, I. Jahan, and A. Q. Ohi, "A survey of speaker recognition: Fundamental theories, recognition methods and opportunities," IEEE Access, vol. 9, pp. 79236-79263, 2021. [22]M. Yusnita, M. Paulraj, R. Y. Sazali Yaacob, and A. Shahriman, "Analysis of accent-sensitive words in multi-resolution mel-frequency cepstral coefficients for classification of accents in Malaysian English," International Journal of Automotive and Mechanical Engineering, vol. 7, pp. 1053-1073, 2013.
|