Emotional Speech Recognition using CNN model
DOI:
https://doi.org/10.59461/ijitra.v4i1.164Keywords:
Speech Emotion Recognition, Mel-Spectrogram, MFCCs, Convolutional Neural Networks, Deep Learning, Data Augmentation, Affective ComputingAbstract
Speech Emotion Recognition (SER) is a new area of artificial intelligence that deals with recognizing human emotions from speech signals. Emotions are an important aspect of communication, affecting social interactions and decision-making processes. This paper introduces a complete SER system that uses state-of-the-art deep learning methods to recognize emotions like Happy, Sad, Angry, Neutral, Surprise, Calm, Fear, and Disgust. The suggested model uses Mel-Spectrograms, MFCCs, and Chroma features for efficient feature extraction. Convolutional layers are utilized to capture complex patterns in audio data, whereas dropout layers are included to avoid overfitting and promote model generalization. Data augmentation strategies, such as pitch shifting, noise injection, and time-stretching, are adopted to increase model robustness. Despite improvements in SER, issues like the differentiation of closely correlated emotions, dealing with noisy environments, and real-time performance are domains for future work. This paper advances the research area of affective computing by enhancing emotion recognition performance and widening the scope of SER applications in healthcare, virtual assistants, and customer service systems.
Keywords: Speech Emotion Recognition, Mel-Spectrogram, MFCCs, Convolutional Neural Networks, Deep Learning, Data Augmentation, Affective Computing.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Samyuktha S, Sarwath Unnisa

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.