Emotional Speech Recognition using CNN model

Authors

  • Samyuktha S student, Department of Computer Science, Mount Carmel College, Autonomous, Bangalore, Karnataka, India
  • Sarwath Unnisa Department of Computer Science, Mount Carmel College, Autonomous, Bangalore, Karnataka, India

DOI:

https://doi.org/10.59461/ijitra.v4i1.164

Keywords:

Speech Emotion Recognition, Mel-Spectrogram, MFCCs, Convolutional Neural Networks, Deep Learning, Data Augmentation, Affective Computing

Abstract

Speech Emotion Recognition (SER) is a new area of artificial intelligence that deals with recognizing human emotions from speech signals. Emotions are an important aspect of communication, affecting social interactions and decision-making processes. This paper introduces a complete SER system that uses state-of-the-art deep learning methods to recognize emotions like Happy, Sad, Angry, Neutral, Surprise, Calm, Fear, and Disgust. The suggested model uses Mel-Spectrograms, MFCCs, and Chroma features for efficient feature extraction. Convolutional layers are utilized to capture complex patterns in audio data, whereas dropout layers are included to avoid overfitting and promote model generalization. Data augmentation strategies, such as pitch shifting, noise injection, and time-stretching, are adopted to increase model robustness. Despite improvements in SER, issues like the differentiation of closely correlated emotions, dealing with noisy environments, and real-time performance are domains for future work. This paper advances the research area of affective computing by enhancing emotion recognition performance and widening the scope of SER applications in healthcare, virtual assistants, and customer service systems.

Keywords: Speech Emotion Recognition, Mel-Spectrogram, MFCCs, Convolutional Neural Networks, Deep Learning, Data Augmentation, Affective Computing.

Author Biographies

Samyuktha S, student, Department of Computer Science, Mount Carmel College, Autonomous, Bangalore, Karnataka, India

Effectively completed BSc. at Vijaya College in Jayanagar, and now pursuing MSc in Computer Science, specialization in Data Science at Mount Carmel College, Samyuktha continues to showcase an insatiable eagerness for learning and a steadfast commitment to academic excellence. With her tireless dedication, boundless curiosity, and passion for making a difference, she stands ready to leave a lasting, positive impact on both the academic world and the broader society. She can be contacted at email: samyuktha6665@gmail.com

Sarwath Unnisa, Department of Computer Science, Mount Carmel College, Autonomous, Bangalore, Karnataka, India

Sarwath Unnisa, affiliated with Mount Carmel College, has contributed to various research areas, including cloud computing, AI-driven healthcare, IoT security, and geospatial data ethics. Her work spans multiple publications, covering topics such as AI applications in medical diagnostics, industry-integrated IoT, and deep learning for decision-making. Her research focuses on emerging technologies and explores the intersection of artificial intelligence, machine learning, and ethical considerations in modern computing environments. She can be contacted via email: sarwath@mccblr.edu.in

Downloads

Published

2025-03-29

How to Cite

Samyuktha S, & Sarwath Unnisa. (2025). Emotional Speech Recognition using CNN model. International Journal of Information Technology, Research and Applications, 4(1), 30–38. https://doi.org/10.59461/ijitra.v4i1.164

Issue

Section

Regular Issue