Comprehensive method for multimodal data analysis based on optimization approach

Автор: Ivanov I.A., Brester C. Yu., Sopov E.A.

Журнал: Сибирский аэрокосмический журнал @vestnik-sibsau

Рубрика: Математика, механика, информатика

Статья в выпуске: 4 т.18, 2017 года.

Бесплатный доступ

In this work we propose a comprehensive method for solving multimodal data analysis problems. This method in- volves multimodal data fusion techniques, multi-objective approach to feature selection and neural network ensemble optimization, as well as convolutional neural networks trained with hybrid learning algorithm that includes consecutive use of the genetic optimization algorithm and the back-propagation algorithm. This method is aimed at using different available channels of information and fusing them at data-level and decision-level for achieving better classification accuracy of the target problem. We tested the proposed method on the emotion recognition problem. SAVEE (Surrey Audio-Visual Expressed Emotions) database was used as the raw input data, containing visual markers dataset, audio features dataset and the combined audio-visual dataset. During the experiments, the following variable parameters have been used: multi-objective optimization algorithm - SPEA (Strength Pareto Evolutionary Algorithm), NSGA-2 (Non-dominated Sorting Genetic Algorithm), VEGA (Vector Evaluated Genetic Algorithm), SelfCOMOGA (Self- configuring Co-evolutionary Multi-Objective Genetic Algorithm), classifier ensemble output fusion scheme - voting, averaging class probabilities, meta-classification, as well as resolution of the images used as input for the convolu- tional neural network. The highest emotion recognition accuracy achieved with the proposed method on visual markers data is 65.8 %, on audio features data - 52.3 %, on audio-visual data - 71 %. Overall, SelfCOMOGA algorithm and meta-classification fusion scheme proved to be the most effective algorithms used as part of the proposed comprehen- sive method. Using the combined audio-visual data allowed to improve the emotion recognition rate compared to using just visual or just audio data.

Еще

Multimodal data analysis, multi-objective optimization, feature selection, neural network ensemble, convolutional neural network, evolutionary optimization algorithms

Короткий адрес: https://sciup.org/148177755

IDR: 148177755

Список литературы Comprehensive method for multimodal data analysis based on optimization approach

  • Poria S., Cambria E., Gelbukh A. F. Deep convolutional neural network textual features and multiple kernel learning for utterance-level multimodal sentiment analysis//Proc. of the Conference on Empirical Methods in Natural Language Processing. 2015. P. 2539-2544.
  • Fausser S., Schwenker F. Selective neural network ensembles in reinforcement learning: taking the advantage of many agents//Neurocomputing, 2015. № 169. P. 350-357.
  • Urban traffic flow forecasting through statistical and neural network bagging ensemble hybrid modeling/F. Moretti //Neurocomputing. 2015. № 167. P. 3-7.
  • Zhang L., Suganthan P. N. A survey of randomized algorithms for training neural networks//Information Sciences. 2016. № 364. Р. 146-155.
  • Wold S., Esbensen K., Geladi P. Principal component analysis//Chemometrics and intelligent laboratory systems. 1987. № 2(1-3). Р. 37-52.
  • Chandrashekar G., Sahin F. A survey on feature selection methods//Computers & Electrical Engineering. 2014. № 40(1). Р. 16-28.
  • Han M., Ren W. Global mutual information-based feature selection approach using single-objective and multi-objective optimization//Neurocomputing. 2015. № 168. Р. 47-54.
  • Haq S., Jackson P. J. B. Speaker-dependent audio-visual emotion recognition//International Conference on Audio-Visual Speech Processing. 2009. Р. 53-58.
  • Analysis of emotion recognition using facial expressions, speech and multimodal information/C. Busso //Proceedings of the 6th International Conf. on multimodal interfaces. 2004. P. 205-211.
  • Soleymani M., Pantic M., Pun T. Multimodal emotion recognition in response to videos//IEEE transactions on affective computing. 2012. № 3(2). Р. 211-223.
  • Enhanced semi-supervised learning for multimodal emotion recognition/Z. Zhang //IEEE Intern. Conf. on Acoustics, Speech and Signal Processing. 2016. P. 5185-5189.
  • Multimodal emotion recognition from expressive faces, body gestures and speech/G. Caridakis //IFIP Intern. Conf. on Artificial Intelligence Applications and Innovations. 2007. P. 375-388.
  • Emotion recognition based on joint visual and audio cues/N. Sebe //18th International Conf. on Pattern Recognition (ICPR’06). 2006. № 1. Р. 1136-1139.
  • Backpropagation applied to handwritten zip code recognition/Y. LeCun //Neural computation. 1989. № 1(4). Р. 541-551.
  • Zitzler E., Thiele L. An evolutionary algorithm for multiobjective optimization: the strength Pareto approach//Technical Report № 43, Computer Engineering and Communication Networks Lab. 1998. 40 p.
  • Schaffer J. D. Multiple objective optimization with vector evaluated genetic algorithms//Proceedings of the 1 st International Conference on Genetic Algorithms and Their Applications. 1985. Р. 93-100.
  • A fast and elitist multiobjective genetic algorithm: NSGA-II/K. Deb //IEEE transactions on evolutionary computation. 2002. № 6(2). Р. 182-197.
  • Иванов И. А., Сопов Е. А. Самоконфигурируе-мый генетический алгоритм решения задач поддержки многокритериального выбора//Вестник СибГАУ. 2013. № 1(47). С. 30-35.
  • LeCun Y., Cortes C., Burges C. J. C. The MNIST database of handwritten digits. 1998.
  • Eyben F., Wullmer M., Schuller B. OpenSMILE -the Munich versatile and fast open-source audio feature extractor//Proceedings of the 18th ACM International Conference on Multimedia. 2010. Р. 1459-1462.
Еще
Статья научная