Интерпретация действий животного по его изображению во времени, близком к реальному

Автор: Егоров Алексей Дмитриевич, Резник Максим Семнович

Журнал: Компьютерная оптика @computer-optics

Рубрика: Обработка изображений, распознавание образов

Статья в выпуске: 2 т.47, 2023 года.

Бесплатный доступ

Определение действий объекта - сложная и актуальная задача компьютерного зрения. Такую задачу можно решать с помощью информации о положении ключевых точек объекта. Обучение моделей, определяющих положение ключевых точек, требует большой объём данных, включающих в себя информацию о положении этих ключевых точек. В связи с недостатком данных для обучения представлен метод для получения дополнительных данных, а также алгоритм, позволяющий получать высокую точность распознавания действий животных на основании малого числа данных. Достигнутая точность определения положений ключевых точек на тестовой выборке составила 92,3 %. По положению ключевых точек определяется действие объекта. Сравниваются различные подходы к классификации действий по ключевым точкам. Точность определения действий объекта на изображении достигает 73,5 %.

Еще

Компьютерное зрение, обнаружение животных, классификация действий, нейронная сеть, машинное обучение, опорные модели, классификация скелета, аугментация данных

Короткий адрес: https://sciup.org/140297692

IDR: 140297692 | DOI: 10.18287/2412-6179-CO-1138

Список литературы Интерпретация действий животного по его изображению во времени, близком к реальному

Zhou J, Lin K-Y, Li H, Zheng W-S. Graph-based highorder relation modeling for long-term action recognition. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition (CVPR) 2021: 8984-8993.
Wang L, Tong Z, Ji B, Wu G. TDN: Temporal Difference Networks for efficient action recognition. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition (CVPR) 2021: 1895-1904.
Pereira TD, et al. Fast animal pose estimation using deep neural networks. Nat Methods 2019; 16(1): 117-125.
Yu L, et al. Traffic danger recognition with surveillance cameras without training data. 2018 15th IEEE Int Conf on Advanced Video and Signal Based Surveillance (AVSS) 2018: 378-383.
Shu X, et al. Concurrence-aware long short-term sub-memories for person-person action recognition. Proc IEEE Conf on Computer Vision and Pattern Recognition Workshops 2017: 2176-2183.
Seredin OS, Kopylov AV, Surkov EE. The study of skeleton description reduction in the human fall-detection task. Computer Optics 2020; 44(6): 951-958. DOI: 10.18287/2412-6179-CO-753.
Graving JM, Chae D, Naik H, Li L, Koger B, Costelloe BR, Couzin ID. DeepPoseKit, a software toolkit for fast and robust animal pose estimation using deep learning. eLife 2019; 8: e47994.
Shinde S, Kothari A, Gupta V. YOLO based human action recognition and localization. Procedia Comput Sci 2018; 133: 831-838.
Lalitha B, Gomathi V. Review based on image understanding approaches. 2019 IEEE Int Conf on Electrical, Computer and Communication Technologies (ICECCT) 2019: 1-8.
Josyula R, Ostadabbas S. A review on human pose estimation. arXiv Preprint. 2021. Source:
Lin T-Y, et al. Microsoft COCO: Common objects in context. In Book: Fleet D, Pajdla T, Schiele B, Tuytelaars T, eds. Computer Vision -- ECCV 2014. Part V. Cham: Springer; 2014: 740-755.
Tuia D, Kellenberger B, Beery S, et al. Perspectives in machine learning for wildlife conservation. Nat Commun 2022; 13: 792.
Li W, Swetha S, Shah M. Wildlife action recognition using deep learning. Source:
Chen G, Han TX, He Z, Kays R, Forrester T. Deep convo-lutional neural network based species recognition for wild animal monitoring. 2014 IEEE Int Conf on Image Processing (ICIP) 2014: 858-862.
Norouzzadeh MS, et al. Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning. PNAS 2018; 115(25): E5716-E5725.
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2016: 770-778.
Schneider S, Taylor GW, Kremer S. Deep learning object detection methods for ecological camera trap data. 2018 15th Conf on Computer and Robot Vision (CRV) 2018: 321-328.
Bain M, Nagrani A, Schofield D, Berdugo S, Bessa J, Owen J, Hockings KJ, Matsuzawa T, Hayashi M, Biro D, Car-valho S, Zisserman A. Automated audiovisual behavior recognition in wild primates. Sci Adv 2021; 7(46): ea-bi4883.
Schindler F, Steinhage V. Identification of animals and recognition of their actions in wildlife videos using deep learning techniques. Ecol Inform 2021; 61: 101215.
Nath T, Mathis A, Chen AC, Patel A, Bethge M, Mathis MW. Using DeepLabCut for 3D markerless pose estimation across species and behaviors. Nat Protoc 2019; 14(7): 2152-2176.
Zhang J, Chen Z, Tao D. Towards high performance human keypoint detection. Int J Comput Vis 2021; 129(9): 2639-2662.
Cao J, et al. Cross-domain adaptation for animal pose estimation. Proc IEEE/CVF Int Conf on Computer Vision 2019: 9498-9507.
Dewi C, et al. Yolo V4 for advanced traffic sign recognition with synthetic training data generated by various GAN. IEEE Access 2021; 9: 97228-97242.
Redmon J, et al. You only look once: Unified, real-time object detection. Proc IEEE Conf on Computer Vision and Pattern Recognition 2016: 779-788.
meituan/YOLOv6. Source:
Ren S, et al. Faster R-CNN: Towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 2015; 28: 91-99.
Liu W, et al. SSD: Single shot multibox detector. In Book: Leibe B, Matas J, Sebe N, Welling M, eds. Computer Vision - ECCV 2016. Cham: Springer; 2016: 21-37.
Kim J-a, Sung J-Y, Park S-h. Comparison of Faster-RCNN, YOLO, and SSD for real-time vehicle type recognition. 2020 IEEE Int Conf on Consumer Electronics-Asia (ICCE-Asia) 2020: 1-4.
Dr Viraktamath SV, Neelopant A, Navalgi P. Comparison of YOLOv3 and SSD algorithms. Int J Eng Res Technol 2021; 10(02): 193-196.
Sree BB, Bharadwaj VY, Neelima N. An inter-comparative survey on state-of-the-art detectors - R-CNN, YOLO, and SSD. In Book: Reddy ANR, Marla D, Favorskaya MN, Sa-tapathy SC, eds. Intelligent manufacturing and energy sustainability. Singapore: Springer; 2021: 475-483.
Ding X, et al. Local keypoint-based Faster R-CNN. Appl Intell 2020; 50(10): 3007-3022.
Vizilter YV, Gorbatsevich VS, Moiseenko AS. Single-shot face and landmarks detector. Computer Optics 2020; 44(4): 589-595. DOI: 10.18287/2412-6179-CO-674.
He K, Gkioxari G, Dollar P, Girshick R. Mask R-CNN. arXiv Preprint. 2017. Source:
Targ S, Almeida D, Lyman K. Resnet in Resnet: Generalizing residual architectures. arXiv Preprint. 2016. Source:
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv Preprint. 2017. Source:
Egorov AD, Reznik MS. Selection of hyperparameters and data augmentation method for diverse backbone models mask R-CNN. 2021 IV Int Conf on Control in Technical Systems (CTS) 2021: 249-251.
Breiman L. Random forests. Mach Learn 2001; 45(1): 532.
Ke G, et al. LightGBM: A highly efficient gradient boosting decision tree. Adv Neural Inf Process Syst 2017; 30: 3146-3154.
Chen T, Guestrin C. XGBoost: A scalable tree boosting system. Proc 22nd ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining (KDD '16) 2016: 785-794.
Prokhorenkova L, Gusev G, Vorobev A, Dorogush AV, Gulin A. CatBoost: unbiased boosting with categorical features. arXiv Preprint. 2017. Source:
Hancock JT, Khoshgoftaar TM. CatBoost for big data: an interdisciplinary review. J Big Data 2020; 7(1): 94.
Friedman J, Hastie T, Tibshirani R. Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Ann Stat 2000; 28(2): 337-407.
Hastie T, et al. Multi-class AdaBoost. Stat Interface 2009; 2(3): 349-360.
Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv Preprint. 2014. Source:

Еще

Статья научная