Multi-objective based feature selection and neural networks ensemble method for solving emotion recognition problem

Бесплатный доступ

In this paper we apply multi-objective optimization approach to find a Pareto optimal ensemble of neural network classifiers, which is used for solving the emotion recognition problem. Pareto set of neural networks is found by optimizing two conflicting criteria: maximizing emotion classification rate and minimizing the number of neural network neurons. We implemented several ensemble fusion schemes - voting, averaging class probabilities and adding auxiliary meta-classification layer. The number of audio and video features extracted from raw video sequences for analysis is quite large, so we also applied multi-objective approach in order to find the optimal subset of features. The optimized criteria in this case are maximizing classification rate and minimizing the number of features. The multi-objective approach to neural network parameter optimization and to feature selection was compared to the classic single-objective optimization approach on several datasets. According to experimental results, multi-objective approach to neural net optimization provided on average 7.1 % higher emotion classification rate than single-objective optimization. Applying multi-objective approach to feature selection as well helped to improve the classification rate by 2.8 % compared to single-objective approach, by 5.4 % compared to using principal components analysis, and by 13.9 % compared to not using dimensionality reduction at all. Taking into account the obtained results, we suggest using multi-objective approach to machine learning algorithms optimization and feature selection in further research connected with emotion recognition problem and other complex classification tasks.

Еще

Ensemble, neural network, multi-objective optimization, emotion recognition

Короткий адрес: https://sciup.org/148177549

IDR: 148177549

Список литературы Multi-objective based feature selection and neural networks ensemble method for solving emotion recognition problem

  • Rashid M., Abu-Bakar S. A. R., Mokji M. Human emotion recognition from videos using spatio-temporal and audio features. The Visual Computer, 2012, P. 1269-1275.
  • Kahou S. E., Pal C., Bouthillier X., Froumenty P., Gulcehre C., Memisevic R., Vincent P., Courville A., Bengio Y. Combining modality specific deep neural networks for emotion recognition in video. In Proceedings of the 15th ACM on International Conference on Multimodal Interaction, Sydney, Australia, December 9-13, 2013, P. 543-550.
  • Cruz A., Bhanu B., Thakoor N. Facial emotion recognition in continuous video. In Proceedings of the 21st International Conference on Pattern Recognition (ICPR 2012), Tsukuba, Japan, November 11-15, 2012, P. 1880-1883.
  • Soleymani M., Pantic M., Pun T. Multimodal emotion recognition in response to videos. IEEE Transactions on affective computing, 2012, Vol. 3, No. 2, P. 211-223.
  • Busso C., Deng Z., Yildirim S., Bulut M., Lee C. M., Kazemzadeh A., Lee S., Neumann U., Narayanan S. Analysis of Emotion Recognition using Facial Expressions, Speech and Multimodal Information. In Proceedings of the 6th international conference on Multimodal interfaces, Los Angeles, 2004, P. 205-211.
  • Haq S., Jackson P. J. B. Speaker-dependent audio-visual emotion recognition. In Proceedings Int. Conf. on Auditory-Visual Speech Processing (AVSP'09), Norwich, UK, September 2009, P. 53-58.
  • Eyben F., Wullmer M, Schuller B. OpenSMILE -the Munich versatile and fast open-source audio feature extractor. In Proceedings ACM Multimedia (MM), Florence, Italy, 2010, P. 1459-1462.
  • Sariyanidi E., Gunes H., Gokmen M., Cavallaro A. Local Zernike moment representation for facial affect recognition. Proc. of British Machine Vision Conference, 2013, P. 1-13.
  • Ojala T., Pietikäinen M., Harwood D. A comparative study of texture measures with classification based on feature distributions. Pattern Recognition 29, 1996, P. 51-59.
  • Zhao G., Pietikäinen M. Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans. Pattern Analysis and
  • Machine Intelligence 29(6), 2007, P. 915-928.
  • Sidorov M., Brester C., Semenkin E., Minker W. Speaker state recognition with neural network-based classification and self-adaptive heuristic feature selection. In Proceedings International Conference on Informatics in Control, Automation and Robotics (ICINCO), 2014, Р. 699-703.
  • Zitzler E., Thiele L. An evolutionary algorithm for multiobjective optimization: the strength Pareto approach. Swiss Federal Institute of Technology, Zurich, Switzerland, TIK-Report No. 43, May 1998, P. 1-40.
  • Deb K., Pratap A., Agarwal S., Meyarivan T. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. on Evolutionary Computation, 2002, Vol. 6, No. 2, P. 182-197.
  • Schaffer J. D. Multiple objective optimization with vector evaluated genetic algorithms. Proc. of the 1st International Conference on Genetic Algorithms, 1985, P. 93-100.
  • Ivanov I. A., Sopov E. A. . Vestnik SibGAU, 2013, No. 1 (47), P. 30-35 (In Russ.).
  • Rashid M., Abu-Bakar S. A. R., Mokji M. Human emotion recognition from videos using spatio-temporal and audio features//The Visual Computer. 2012. P. 1269-1275.
  • Combining modality specific deep neural networks for emotion recognition in video/S. E. Kahou //In Proceedings of the 15th ACM on Intern. Conf. on Multimodal Interaction. Sydney, 2013. P. 543-550.
  • Cruz A., Bhanu B., Thakoor N. Facial emotion recognition in continuous video//In Proceedings of the 21st Intern. Conf. on Pattern Recognition (ICPR 2012) (Tsukuba, Japan, November 11-15). 2012. P. 1880-1883.
  • Soleymani M., Pantic M., Pun T. Multimodal emotion recognition in response to videos//IEEE Transactions on affective computing. 2012. Vol. 3, no. 2. P. 211-223.
  • Analysis of Emotion Recognition using Facial Expressions, Speech and Multimodal Information/C. Busso //In Proceedings of the 6th Intern. Conf. on Multimodal interfaces. Los Angeles, 2004. P. 205-211.
  • Haq S., Jackson P. J. B. Speaker-dependent audio-visual emotion recognition//In Proceedings Int. Conf. on Auditory-Visual Speech Processing (AVSP’09). Norwich, 2009, P. 53-58.
  • Eyben F., Wullmer M., Schuller B. OpenSMILE -the Munich versatile and fast open-source audio feature extractor//In Proceedings ACM Multimedia (MM). Florence, 2010. P. 1459-1462.
  • Local Zernike moment representation for facial affect recognition/E. Sariyanidi //Proc. of British Machine Vision Conference. 2013. P. 1-13.
  • Ojala T., Pietikäinen M., Harwood D. A comparative study of texture measures with classification based on feature distributions//Pattern Recognition. 1996. 29. P. 51-59.
  • Zhao G., Pietikäinen M. Dynamic texture recognition using local binary patterns with an application to facial expressions//IEEE Trans. Pattern Analysis and Machine Intelligence. 2007. 29(6). P. 915-928.
  • Speaker state recognition with neural network-based classification and self-adaptive heuristic feature selection/M. Sidorov //In Proceedings Intern. Conf. on Informatics in Control, Automation and Robotics (ICINCO). 2014. P. 699-703.
  • Zitzler E., Thiele L. An evolutionary algorithm for multiobjective optimization: the strength Pareto approach//TIK-Report. 1998. No. 43. Zurich, Switzerland, Swiss Federal Institute of Technology P. 1-40.
  • A fast and elitist multiobjective genetic algorithm: NSGA-II/K. Deb //IEEE Trans. on Evolutionary Computation. 2002. Vol. 6, No. 2. P. 182-197.
  • Schaffer J. D. Multiple objective optimization with vector evaluated genetic algorithms//Proc. of the 1st Intern. Conf. on Genetic Algorithms. 1985. P. 93-100.
  • Иванов И. А., Сопов Е. А. Самоконфигурируемый генетический алгоритм решения задач поддержки многокритериального выбора//Вестник СибГАУ. 2013. № 1 (47). С. 30-35.
Еще
Статья научная