Выявление действий на видео с помощью рекуррентных нейронных сетей

Автор: Буйко Александр Юрьевич, Виноградов Андрей Николаевич

Журнал: Программные системы: теория и приложения @programmnye-sistemy

Рубрика: Искусственный интеллект, интеллектуальные системы, нейронные сети

Статья в выпуске: 4 (35) т.8, 2017 года.

Бесплатный доступ

В настоящей работе рассмотрено применение методов компьютерного зрения и рекуррентных нейронных сетей для решения задачи выявления и классификации действий на видео.В статье приводится описание подхода, применённого авторами для анализа видеофайлов. Рекуррентные нейронные сети выступают в качестве классификатора. На вход классификатору передаются мешки слов, которые являются гистограммами низкоуровневых действий. Гистограммы представляют собой наборы дескрипторов кадров видеофайлов. Для поиска дескрипторов на изображениях используются алгоритмы SIFT, ORB, BRISK, AKAZE.

ID: 143164283 Короткий адрес: https://sciup.org/143164283

Список литературы Выявление действий на видео с помощью рекуррентных нейронных сетей

VNI Global Fixed and Mobile Internet Traffic Forecasts, URL: https://www. cisco. com/c/en/us/solutions/service -provider/visual networking-index-vni/index.html
A. Ekin, A. Tekalp, R. Mehrotra. "Automatic Soccer Video Analysis and Summarization", IEEE Transactions on Image Processing, V. 12. No. 7. 2003. P. 796-807.
Y. Gong, T. Lim, H. Chua. "Automatic Parsing of TV Soccer Programs", IEEE International Conference on Multimedia Computing and Systems, 1995. P. 167-174.
L. Ballan, M. Bertini, A. Del Bimbo, G. Serra. "Action categorization in soccervideos using string kernels", CBMI '09. Seventh International Workshop on ontent-Based Multimedia Indexing (3-5 June 2009, Chania, Crete).
M. Baccouche, F. Mamalet. Action Classification in Soccer Videos with Long neural networks, Technical Report IDSIA-07-02, IDSIA/USISUPSI.
X. Glorot, Y. Bengio. "Understanding the difficulty of training deep feedforward neural networks", AISTATS 2010 (13-15 May 2010, Chia Laguna Resort, Sardinia, Italy), Proceedings of Machine Learning Research, 9. P. 249-256.
N. Srivastava, E. Mansimov, R. Salakhutdinov. Unsupervised Learning of Video Representations using LSTMs, 2015, arXiv: 1502.04681.
K. Simonyan, A. Zisserman. "Two-stream convolutional networks for action recognition in videos", NIPS 2014 (8-13 December 2014, Palais des Congrès de Montréal, Montréal, Canada), Advances in Neural Information Processing Systems, 27.
K. Simonyan, A. Zisserman. Very deep convolutional networks for large-scale image recognition, 2014, arXiv: 1409.1556.
Zh.-Zh. Lan, M. Lin, X. Li, A. G. Hauptmann, B. Raj. Beyond gaussian pyramid: Multi-skip feature stacking for action recognition, 2014, arXiv: 1411.6660.
P. Koniusz, Fei Yan, Ph.-H. Gosselin, K. Mikolajczyk. Higher-Order Occurrence Pooling for Bags-of-Words: Visual ConcePt Detection//IEEE Transactions on Pattern Analysis & Machine Intelligence, 39 2017. С. 313-326.
G. J. Brostow, J. Shotton, J. Fauqueur, R. Cipolla. Segmentation and recognition using structure from motion point clouds//Computer Vision -ECCV 2008, Lecture Notes in Computer Science, т. 5302, Springer, Berlin-Heidelberg, 2008. С. 44-57.
V. Ramanathan, Sh. Mishra, P. Mitra. Quadtree decomposition based extended vector space model for image retrieval//2011 IEEE Workshop on Applications of Computer Vision (WACV) (5-7 Jan. 2011, Kona, HI, USA). С. 139-144.
J. C. van Gemert, J. M. Geusebroek, C. J. Veenman, A. W. M. Smeulders. "Kernel codebooks for scene categorization", Computer Vision -ECCV 2008, Lecture Notes in Computer Science, vol. 5304, Springer, Berlin-Heidelberg. P. 696-709.
J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, Y. Gong. "Locality-constrained linear coding for image classification", 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (13-18 June 2010, San Francisco, CA, USA), 9 p.
J. Sánchez, F. Perronnin, T. Mensink, J. Verbeek.. "Image Classification with the Fisher Vector: Theory and Practice", International Journal of Computer Vision, 105:3. P. 222-245.
K. Chatfield, K. Simonyan, A. Vedaldi, A. Zisserman. "Return of the devil in the details: Delving deep into convolutional nets", British Machine Vision Conference BMVC 2014 (1-5 September, 2014, Nottingham, UK), URL: http://www.bmva.org/bmvc/2014/files/paper054.pdf
G. Csurka, F. Perronnin. "Fisher Vectors: Beyond Bag-of-Visual-Words Image Representations", VISIGRAPP 2010: Computer Vision, Imaging and Computer Graphics. Theory and Applications, Communications in Computer and Information Science, vol. 229, Springer, Berlin-Heidelberg. P. 28-42.
Ch. Olah. Understanding LSTM Networks, August 27, 2015, URL: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Kh. Soomro, A. R. Zamir M. Shah. UCF101: A Dataset of 101 Human Action Classes From Videos in The Wild, CRCV-TR-12-01, 2012.
D. Tran, L. D. Bourdev, R. Fergus, L. Torresani, M. Paluri. C3D: generic features for video analysis, 2014, arXiv: 1412.0767.
J. Donahue, L.-A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, T. Darrell. Long-term recurrent convolutional networks for visual recognition and description, 2014, arXiv: 1411.4389.

Еще

Ред. заметка