Determining Emotion Intensities from Audio Data Using Ensemble Models: A Late Fusion Approach

Автор: Simon Kipyatich Kiptoo, Kennedy Ogada, Tobias Mwalili

Журнал: International Journal of Intelligent Systems and Applications @ijisa

Статья в выпуске: 6 vol.17, 2025 года.

Бесплатный доступ

This paper presents an ensemble model in the determination of manifestation of emotion intensities from audio-dataset. An emotion denotes the mental state of the human mind or/and thought processes that represents a recognizable pattern of an entity like emotion arousal having a good similarity with its manifestation of vocal, facial or/and bodily signals. In this paper, we propose a stacking, late fusion approach where the best experimental outcome from two base models build from Random Forests and Extreme Gradient Boost are combined using simple majority voting. RAVDESS audio datasets, a public gender balanced dataset built by Ryerson University of Canada for the purpose of emotion study was used. 80% of the dataset was used for training while 20% was used for testing. Two features, MFCC and Chroma were introduced to the base models in a series of experimental setups and the outcome evaluated using confusion matrix, precision, recall and F1-Score. It was then compared to two state-of-the-art works done on KBES and RAVDESS datasets. This approach yielded an overall classification accuracy of 93%.

Еще

Emotion, Emotion Intensity, Multi-modal, Late Fusion, MFCC, Chroma

Короткий адрес: https://sciup.org/15020101

IDR: 15020101 | DOI: 10.5815/ijisa.2025.06.04