Determining Emotion Intensities from Audio Data Using Ensemble Models: A Late Fusion Approach
Автор: Simon Kipyatich Kiptoo, Kennedy Ogada, Tobias Mwalili
Журнал: International Journal of Intelligent Systems and Applications @ijisa
Статья в выпуске: 6 vol.17, 2025 года.
Бесплатный доступ
This paper presents an ensemble model in the determination of manifestation of emotion intensities from audio-dataset. An emotion denotes the mental state of the human mind or/and thought processes that represents a recognizable pattern of an entity like emotion arousal having a good similarity with its manifestation of vocal, facial or/and bodily signals. In this paper, we propose a stacking, late fusion approach where the best experimental outcome from two base models build from Random Forests and Extreme Gradient Boost are combined using simple majority voting. RAVDESS audio datasets, a public gender balanced dataset built by Ryerson University of Canada for the purpose of emotion study was used. 80% of the dataset was used for training while 20% was used for testing. Two features, MFCC and Chroma were introduced to the base models in a series of experimental setups and the outcome evaluated using confusion matrix, precision, recall and F1-Score. It was then compared to two state-of-the-art works done on KBES and RAVDESS datasets. This approach yielded an overall classification accuracy of 93%.
Emotion, Emotion Intensity, Multi-modal, Late Fusion, MFCC, Chroma
Короткий адрес: https://sciup.org/15020101
IDR: 15020101 | DOI: 10.5815/ijisa.2025.06.04