Object tracking with a novel visual-thermal sensor fusion method in template matching

Автор: Satbir Singh, Arun Khosla, Rajiv Kapoor

Журнал: International Journal of Image, Graphics and Signal Processing @ijigsp

Статья в выпуске: 7 vol.11, 2019 года.

Бесплатный доступ

Recently there has been an increase in the use of thermal-visible conjunction technique in the field of surveillance applications due to complementary advantages of both. An amalgamation of these for tracking requires a reasonable scientific procedure that can efficiently make decisions with sound accuracy and excellent precision. The proposed research presents a unique idea for obtaining a robust track estimate with the thermo-visual fusion in the context of fundamental template matching. This method firstly introduces a haphazard transporting control mechanism for individual modality tracking that avoids unexpected estimates. Then it brings together an efficient computation procedure for providing the weighted output using minimal information from the individual trackers. Experiments performed on publically available datasets mark the usefulness of the proposed idea in the context of accuracy, precision and process time in comparison with the state of art methods.

Еще

Sensor Fusion, Object Tracking, Template matching, Thermal Imaging

Короткий адрес: https://sciup.org/15016066

IDR: 15016066   |   DOI: 10.5815/ijigsp.2019.07.03

Текст научной статьи Object tracking with a novel visual-thermal sensor fusion method in template matching

Published Online July 2019 in MECS DOI: 10.5815/ijigsp.2019.07.03

The combined use of thermal and visible sensors in the tracking problem is motivated by the fact that each imagery has its limitations and advantages. The visible imagery though provides a foremost hint of colour of the object, but thermal imagery has got its advantages in case of night or improper vision conditions, illumination variations like the shadow, etc. and in case of the object being camouflaged.

Earlier, many efforts have been formulated to utilize the advantages of these imaging modalities in different surveillance applications. These techniques are discussed in detail in the section 2. The proposed method advances the field by working for the betterment of tracking related issues like time constraints, accuracy, and reliability.

Our method carves out a statistical correlation value based template match framework into an effective fusion approach. Initially, individual imagery track is obtained by template matching, and it is modified with the track object's dimension based haphazard transporting controlling (HTC) process in each frame. Then, the outputs retrieved from individual imagery tracker are fed into the proposed merger algorithm. This algorithm firstly makes adaption in the weight factors to be provided to each imagery and later adjusts the positional coordinates of the final track estimate to remove the offset caused by least weighing modality. The rest organization of the paper is detailed in the following paragraph.

  • II.    Related Work

Due to exciting advantages of thermal-visual fusion, various uses of visual-thermal fusion are found in the literature.

Face recognition inhibiting thermal and visible sensor fusion using background subtraction had been proposed earlier in the literature [1, 2, 3, and 4]. It has also been actively utilized in robotics-based vision applications [5, 6, and 7]. Besides this, the use of visible and thermal imagery in fusion has also been hosted in spacecraft proximity operations [8] to overcome the effect of different lighting conditions in the orbital environment. [9, 10] have advanced the field of image fusion of multisensory data for the improved vision applications.

Notably, the use of said bi-modal information has been increased for object tracking under challenging conditions of vision in recent times. The significance of the field has even made researchers incorporate new intelligent techniques like deep learning and convolutional neural networks [11] in thermo-visual tracking. A two-stream convolution net was formed independently for each domain, and the fusion was achieved by forming a fusion net. Despite a new approach, it is to be noted that it is dependent on a rigorous training process and requires complex hardware processing systems.

It is interesting to note that several methods like [15, 16, and 17] make use of the particle filter [18] based approach to aggregate the information from each sensing domain and formulate a sound decision of track. However, in general, the particle filter based approaches have demerit of particle degeneration and an associated unpredictability in its output estimate, and it generally fails to track in severe visual conditions.

To address this issue, [15] proposed a basic fusion that involved multiplication of the individual specialists that used cues from thermal as well as visible imagery to form their opinions. However, simple multiplication only serves the purpose of video sequences with no veiling conditions. In case of false tracking in one mode of imagery, it may lead to overall false result due to the absence of a procedure to verify this.

Further, the techniques employing fusion to overcome demerits of particle filter [16, 17] make use of fusion step per single frame of the track as many times as the no. of particle chosen. This mechanism takes many computational efforts since most of times background calculations are involved in performing a single step fusion. In addition, the computation cost rises as if no. of the particle is increased to increase the accuracy.

More insight into the advances in this field can be reviewed in [12, 13, and 14]. Table 1 lists the description of some methods that incorporated the visible sensor and thermal sensor conjunctively. The fusion strategy adopted, along with the applicative aim of these, are covered under this table.

  • III.    Proposed Work Methodology

The algorithm is formulated into a three-step approach. First is the original track estimate using correlation-based template matching, second the proposed HTC procedure and third is the said thermo-visual fusion process overview. Fig. 1 presents the overall illustration of the workflow of the complete process. The complete section is divided into the following subsections:

  • A.    Template matching based Tracking

For tracking in a particular domain, a fast template matching process was incorporated. The template matching process used the value of the correlation coefficient between the template region and current frame image regions. The formulae for this is given by:

c(x,y) =

1.г^ш+1,у+п-П(т(и)-т)

J^S i Z j Cltx+Ly+fl-iy^S i Z j Cr’tMj-T)2}

In (1), I stands for the image and T for the template and resembles the pixel position in the template region of the object and the image frame, respectively. Values, and represents the mean values of pixels in the template region and the corresponding image region with a size equivalent to the template size. The output c(x, y) represents the template matching result of a rectangular region in the image with (x, y) as the starting pixel location.

Table 1. Type Sizes for Camera-Ready Papers

Список литературы Object tracking with a novel visual-thermal sensor fusion method in template matching

  • U. Ali and M. Hanif, “Optimized Visual and Thermal Image Fusion for Efficient Face Recognition,” in IEEE International Conference on Information Fusion, 2006.
  • G. Bebis, A. Gyaourova and I. Pavlidis, “Face Recognition by Fusing Thermal Infrared and Visible Imagery,” Image and Vision Computing, vol. 24, no. 7, pp. 727–742, 2006.
  • J. Heo, S. G. Kong, B. R. Abidi, and M. A. Abidi, “Fusion of visual and thermal signatures with eyeglass removal for robust face recognition,” in IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2004, pp. 122–122.
  • T. Wilhelm, H. J. B¨ohme, and H. M. Gross, “A multi-modal system for tracking and analyzing faces on a mobile robot,” Robotics and Autonomous Systems, vol. 48, no. 1, pp. 31–40, 2004.
  • D. R. Perrott, J. Cisneros, R. L. McKinley, and W. R. D’Angelo, “Aurally aided visual search under virtual and free-field listening conditions.” Human Factors, vol. 38, no.4, pp. 702-715, 1996.
  • G. Cielniak, and T. Duckett, “Active People Recognition using Thermal and Grey Images on a Mobile Security Robot,” IEEE/RSJ Int. Conf. Intell. Robot. Syst., 2005, pp. 3610–3615.
  • G. Cielniak, T. Duckett, and A. J. Lilienthal, “Improved data association and occlusion handling for vision-based people tracking by mobile robots,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2007, pp. 3436–3441.
  • G. B. Palmerini and S. Universit`a, “Combining Thermal and Visual Imaging in Spacecraft Proximity Operations,” in International Conference on Control Automation Robotics Vision, 2014, pp. 383–388.
  • Y. Tong, L. Liu, M. Zhao, J. Chen, and H. Li, “Adaptive fusion algorithm of heterogeneous sensor networks under different illumination conditions,” Signal Processing, vol. 126, pp. 149–158, 2016.
  • Z. Zhou, B. Wang, S. Li, and M. Dong, “Perceptual fusion of infrared and visible images through a hybrid multi-scale decomposition with Gaussian and bilateral filters,” Information Fusion, vol. 30, pp. 15–26, 2016.
  • C. Li, X. Wu, N. Zhao, X. Cao, and J. Tang, “Fusing two stream convolutional neural networks for RGB-T object tracking,” Neurocomputing, vol. 281, pp. 78–85, 2018.
  • G. S. Walia and R. Kapoor, “Recent advances on multicue object tracking: a survey,” Artificial Intelligence Review, vol. 46, no. 1, pp. 821–847, 2016.
  • S. Singh, R. Kapoor, and A. Khosla, Cross-Domain Usage in Real Time Video-Based Tracking. U.S.A: IGI Global, 2017, pp. 105–129.
  • J. Ma, Y. Ma, and C. Li, “Infrared and visible image fusion methods and applications: A survey,” Information Fusion, vol. 45, pp. 153–178, 2018.
  • C. O. Conaire, N. E. O. Connor, and A. Smeaton, “Thermo-visual feature fusion for object tracking using multiple spatiogram trackers,” Machine Vision and Applications, vol. 19, no. 5-6, pp. 483–494, 2008.
  • M. Talha and R. Stolkin, “Particle filter tracking of camouflaged targets by adaptive fusion of thermal and visible spectra camera data,” IEEE Sensors Journal, vol. 14, no. 1, pp. 159–166, 2014.
  • J. Xiao, R. Stolkin, M. Oussalah, and A. Leonardis, “Continuously Adaptive Data Fusion and Model Relearning for Particle Filter Tracking With Multiple Features,” IEEE Sensors Journal, vol. 16, no. 8, pp. 2639– 2649, 2016.
  • K. Nummiaro, E. Koller-Meier, and L. Van Gool, “An adaptive colorbased particle filter,” Image and Vision Computing, vol. 21, no. 1, pp. 99–110, 2003.
  • G. Xiao, X. Yun, and J. Wu, “A new tracking approach for visible and infrared sequences based on tracking-before-fusion,” International Journal of Dynamics and Control, vol. 4, no. 1, pp. 40-51, 2016.
  • R. Stolkin, D. Rees, M. Talha, and I. Florescu, “Bayesian fusion of thermal and visible spectra camera data for region based tracking with rapid background adaptation,” in IEEE Int. Conf. Multisens. Fusion Integr. Intell. Syst., 2012, pp. 192–199.
  • E. Fendri, R. R. Boukhriss, M. Hammami, “Fusion of thermal infrared and visible spectra for robust moving object detection,” Pattern Anal. Appl., vol. 20, no. 4, pp. 907–926, 2017.
  • S. R. Schnelle, and A.L. Chan, “Enhanced target tracking through infrared-visible image fusion,” in 14th Int. Conf. Inf. Fusion, 2011, pp. 1–8.
  • Y. Niu, S. Xu, L. Wu, and W. Hu, “Airborne infrared and visible image fusion for target perception based on target region segmentation and discrete wavelet transform,” Math. Probl. Eng., 2012.
  • Y. Wu, E. Blasch, G. Chen, L. Bai, L. Ling, “Multiple source data fusion via sparse representation for robust visual tracking,” in 14th International Conference on Information Fusion, 2011, pp. 1–8.
  • C. Li, H. Cheng, S. Hu, X. Liu, J. Tang, and L. Lin, “Learning Collaborative Sparse Representation for Grayscale-Thermal Tracking,” IEEE Trans. Image Process., vol. 25, no. 12, pp. 5743–5756. 2016.
  • C. Li, S. Hu, S. Gao, and J. Tang, “Real-time grayscale-thermal tracking via laplacian sparse representation,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2016, pp. 54–65.
  • H. P. Liu and F.C. Sun, “Fusion tracking in color and infrared images using joint sparse representation,” Sci. China-Information Sci., vol. 55, no. 3, pp. 590–599, 2012.
  • J. Davis and V. Sharma, “Background-subtraction using contour based fusion of thermal and visible imagery,” Computer Vision and Image Understanding, vol. 106, no. 2-3, pp. 162–182, 2007.
  • Bristol Eden Project Multi-Sensor Data Set, http://www.cis.rit.edu/pelz/scanpaths/data/bristol-eden.htm/.
  • Video Analytics Dataset, https://www.ino.ca/en/video-analytics-dataset/.
Еще
Статья научная