ARTIFICIAL INTELLIGENCE AND DECISION MAKING

Искусственный интеллект и принятие решений

2071-8594

270353

10.14357/20718594230310

Analysis of Signals, Audio and Video Information

Анализ сигналов, аудио и видео информации

Research Article

Method for Processing Photo and Video Data from Camera Traps Using a Two-Stage Neural Network Approach

Метод обработки фото- и видеоданных с фотоловушек с использованием двухстадийного нейросетевого подхода

Efremov

Vladislav A.

Ефремов

Владислав Александрович

Russian Federation

Postgraduate student, programmer of the Laboratory of Digital Systems for Special Purposes

Аспирант. Программист лаборатории цифровых систем специального назначения

efremov.va@phystech.edu

Leus

Andrey V.

Леус

Андрей Владимирович

Russian Federation

Candidate of Technical Sciences, Leading Programmer of the Laboratory of Digital Systems for Special Purposes

Кандидат технических наук. Ведущий программист лаборатории цифровых систем специ- ального назначения

leus.av@mipt.ru

Gavrilov

Dmitry A.

Гаврилов

Дмитрий Александрович

Russian Federation

Doctor of Technical Sciences, Director of the Physical and Technical School of the FRCT

Доктор технических наук. Директор физтех школы ФРКТ

gavrilov.da@mipt.ru

Mangazeev

Daniil I.

Мангазеев

Даниил Игоревич

Russian Federation

Master, programmer of the Laboratory of Digital Systems for Special Purposes

Магистр. Программист лаборатории цифровых систем специального назначения

mangazeev.di@phystech.edu

Kholodnyak

Ivan V.

Холодняк

Иван Витальевич

Russian Federation

Master

Магистр

kholodnyak.iv@phystech.edu

Radysh

Alexandra S.

Радыш

Александра Сергеевна

Master

Магистр

radysh.as@phystech.edu

Zuev

Viktor A.

Зуев

Виктор Александрович

Russian Federation

Master

Магистр

zuev.va@phystech.edu

Vodichev

Nikita A.

Водичев

Никита Алексеевич

Russian Federation

Master

Магистр

vodichev.na@phystech.edu

Moscow Institute of Physics and Technology (National Research University)Московский физико-технический институт (национальный исследовательский университет)

15082023

981081511202415112024

2023

ФИЦ ИУ РАН

https://journals.rcsi.science/2071-8594/article/view/270353

The paper proposes a technology for analyzing data from camera traps using two-stage neural network processing. The task of the first stage is to separate empty images from non-empty ones. To solve the problem, a comparative analysis of the YOLOv5, YOLOR, YOLOX architectures was carried out and the most optimal detector model was identified. The task of the second stage is to classify the objects found by the detector. Models such as EfficientNetV2, SeResNet, ResNeSt, ReXNet, ResNet were compared. To train the detector model and the classifier, a data preparation approach was developed, which consists in removing duplicate images from the sample. The method was modified using agglomerative clustering to divide the sample into training, validation, and test. In the task of object detection, the YOLOv5-L6 algorithm was the best with an accuracy of 98.5% on the data set. In the task of classifying the found objects, the ResNeSt-101 architecture was the best of all with a recognition quality of 98.339% on test data.

В работе предложена технология анализа данных с фотоловушек с помощью двухстадийной нейросетевой обработки. Задача первого этапа состоит в отделении пустых изображений от непустых. Для решения задачи проведен сравнительный анализ архитектур YOLOv5, YOLOR, YOLOX и выявлена наиболее оптимальная модель детектора. Задача второго этапа заключается в классификации объектов, найденных детектором. Сравнивались модели EfficientNetV2, SeResNet, ResNeSt, ReXNet, ResNet. Для обучения модели детектора и классификатора разработан подход подготовки данных, заключающийся в удалении изображений-дубликатов из выборки. Метод был модифицирован с помощью агломеративной кластеризации для разделения выборки на обучение, валидацию и тест. В задаче обнаружения объектов лучшим на наборе данных оказался алгоритм YOLOv5-L6 с точностью нахождения 98,5%. В задаче классификации найденных объектов, лучше всех себя показала архитектура ResNeSt-101 с качеством распознавания 98,339% на тестовых данных.

camera trap imagesagglomerative clusteringdeep convolutional neural networksdetectionclassificationtwo-stage approach

изображения с фотоловушекагломеративная кластеризацияглубокие сверточные нейронные сетидетекцияклассификациядвухстадийный подход

O’Connell A. F., Nichols J. D., Karanth K.U. Camera traps in animal ecology: Methods and analyses. – Berlin, Germany: Springer Science & Business Media. 2011. 279 р.

O’Connell A. F., Nichols J. D., Karanth K.U. Camera traps in animal ecology: Methods and analyses. – Berlin, Germany: Springer Science & Business Media. 2011. P. 279

He K., Zhang X., Ren S., Sun J. Deep residual learning for image recognition. // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. P. 770–778.

He K., Zhang X., Ren S., Sun J. Deep residual learning for image recognition. // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. Р. 770-778.

Gavrilov D.A., Lovtsov D.A. Automated processing of visual information using artificial intelligence technologies // Artificial intelligence and decision making. 2020. No. 4. Р. 33 - 46.

Гаврилов Д.А., Ловцов Д.А. Автоматизированная переработка визуальной информации с помощью технологий искусственного интеллекта // Искусственный интеллект и принятие решений. 2020. №4. С. 33 – 46.

Lovtsov D. A., Gavrilov D.A. Automated special purpose optical electronic system’s functional diagnosis // Proc. Int. Semin. Electron Devices Des. Prod. SED-2019 (23 – 24 April 2019). – Prague, Czech Repub. IEEE, 2019. P. 70 – 73.

Yu X., Wang J., Kays R., Jansen P. A., Wang T. H.T. Automated identification of animal species in camera trap images// EURASIP Journal on Image and Video Processing. 2013. Vol. №. 1. P. 52.

Yu X., Wang J., Kays R., Jansen P. A., Wang T. H.T. Automated identification of animal species in camera trap images// EURASIP Journal on Image and Video Processing. 2013. Vol. No 1. P. 52.

Chen G., Han T. X, He Z., Kays R., Forrester T. Deep convolutional neural network based species recognition for wild animal monitoring // IEEE international conference on image processing (ICIP). 2014. P. 858–862.

Gomez-Villa A., Salazar A., Vargas F. Towards automatic wild animal monitoring: Identification of animal species in camera-trap images using very deep convolutional neural networks.// Ecological Informatics. 2017. № 41. Р. 24–32.

Nguyen H., Maclagan S. J., Nguyen T. D., Nguyen T., Flemons P., Andrews K., Ritchie E. G., Phung D. Animal recognition and identification with deep convolutional neural networks for automated wildlife monitoring. //International Conference on Data Science and Advanced Analytics, DSAA. – 2017: Tokyo, Japan 19-21 October 2017. Р. 40-49.

Beery S., Van Horn, G., Perona P. Recognition in Terra Incognita// Computer Vision. ECCV 2018. Lecture Notes in Computer Science. Vol. 11220.

10.

Norouzzadeh M. S., Morris D., Beery S., Joshi N., Jojic N., Clune J. A deep active learning system for species identification and counting in camera trap images. //Methods in Ecology and Evolution. 2021. Vol. 12 (1). Р. 150–161.

11.

Whytock R. C, Świeżewski J., Zwerts J.A. Robust ecological analysis of camera trap data labelled by a machine learning model// Methods in Ecology and Evolution. 2021. №12 (6). Р. 1080 –1092.

Whytock R. C, Świeżewski J., Zwerts J.A. Robust ecological analysis of camera trap data labelled by a machine learning model// Methods in Ecology and Evolution. 2021. No 12 (6). Р. 1080 –1092.

12.

Leus A.V., Efremov V.A. Computer vision methods application for camera traps image analysis within the software for the reserves environmental state monitoring//Proceedings of the Mordovia State Nature Reserve. 2021. Vol. 28. Р.121-129.

13.

Tabak M. A., Norouzzade, M. S., Wolfson D. W., Sweeney S. J., VerCauteren K. C., Snow N. P., Halseth J. M., Di Salvo P. A., Lewis J. S., White M. D., Teton B., Beasley J. C., Schlichting P. E., Boughton R. K., Wight B., Newkirk E. S., Ivan R.S. Machine learning to classify animal species in camera trap images: Applications in ecology. //Methods in Ecology and Evolution. 2018. №10 (4). Р.585–590.

14.

Glenn J. YOLOv5 release v6.1 – 2021 – https://github.com/ultralytics/yolov5/releases/tag/v6.1.

15.

Wang C., Yeh I., Liao H.M. You Only Learn One Representation: Unified Network for Multiple Tasks. 2021.

16.

Ge Z., Liu S., Wang F. Li Z., Sun J. YOLOX: Exceeding YOLO Series in 2021. 2021.

Ge Z., Liu S., Wang F. Li Z., Sun J. YOLOX: Exceeding YOLO Series in 2021. – 2021.

17.

Lin T. Y., Maire M., Belongie S., Hays J., Perona P., Ramanan D., Zitnick C.L. Microsoft COCO: Common objects in context // European conference on computer vision. 2014. Р. 740–755.

18.

Hu J., Shen L., Albanie S., Sun G., Wu E. Squeeze-and-Excitation Networks. // IEEE Transactions on Pattern Analysis and Machine Intelligence. 2020. Vol. 42. № 8. P. 2011-2023.

Hu J., Shen L., Albanie S., Sun G., Wu E. Squeeze-and-Excitation Networks. // IEEE Transactions on Pattern Analysis and Machine Intelligence. 2020. Vol. 42. No 8. P. 2011-2023.

19.

Zhang H., Wu C., Zhang Z., Zhu Y., Zhang Z., Lin H., Sun Y., He T., Mueller J., Manmatha R., Li M., Smola A. ResNeSt: Split-Attention Networks. 2020.

20.

Han D., Yun S., Heo B., Yoo Y.J. ReXNet: Diminishing Representational Bottleneck on Convolutional Neural Network. 2020.

21.

Tan M., Le Q. V. EfficientNetV2: Smaller Models and Faster Training. 2021.

22.

Tan M., Le Q. V. EfficientNet: Rethinking model scaling for convolutional neural networks. 2020.

23.

Sibson R. SLINK: An Optimally Efficient Algorithm for the Single-Link Cluster Method//Comput. J. 1973. №16. Р. 30-34.

Sibson R. SLINK: An Optimally Efficient Algorithm for the Single-Link Cluster Method//Comput. J. 1973. No 16. Р. 30-34