Page 57 - Read Online

P. 57

Zhuang et al. Intell Robot 2024;4(3):276-92 I http://dx.doi.org/10.20517/ir.2024.18 Page 278

YOLO series algorithms provide efficient and accurate solutions for real-time target detection; IR target recog-
nition is usually based on IR images for target recognition, which can prove useful in night vision and adverse
weather conditions.

2.1. Deep learning networks
Currently, deep learning methods for target detection are primarily categorized into two types: two-stage and
one-stage detection algorithms. One-stage detection algorithms such as YOLO and Single Shot Multi-Box
Detector (SSD) typically use a Fully Convolutional Network (FCN) to directly predict from the original image.
While they offer fast processing speed, their accuracy in detecting small objects is relatively low. Two-stage
detection algorithms, such as R-CNN, Fast R-CNN, Faster R-CNN, and Mask R-CNN, capture target details
more effectively but operate at slower detection speeds.

YOLO is a fast and efficient target detection algorithm introduced by Redmon et al. in 2016 [10] . Compared
to traditional two-stage object detection algorithms such as R-CNN, YOLO is a single-stage detection algo-
rithm capable of achieving real-time detection without compromising accuracy. In a pedestrian detection
experiment [13] , a Scale-Aware Fast (SAF) R-CNN model was introduced, using multiple subnetworks to de-
tect pedestrians at different scales, then adaptively combining the outputs to generate the final result. Fan et al.
proposed a data fusion CNN architecture called RoadSeg, which can extract and fuse features from RGB im-
agesand infer surface normal information for accurate free space detection [14] . Inanotherstudy, a DS-Net was
suggested to solve the problem that current neural networks primarily focus on single-task single-task vision
scenarios [15] . The DS-Net was a multitask convolutional neural network designed for AR-HUD environment
perception. Li et al. proposed a vision-based framework for target detection and recognition in autonomous
driving, utilizing an improved YOLOv4 model that reduced the total model parameters by 74% [16] . A U-type
generative adversarial network (GAN) was first developed to fuse visible and IR images. YOLOv3 combined
with transfer learning is adopted using the fused images to train the model on an aerial dataset [17] .

2.2. IR target detection
The studies mentioned above concentrate on obtaining information from visible images. In recent years, the
research on IR technology has been more advanced. Vehicle and pedestrian target detection based on IR
images is gradually becoming an attractive method.

A novel detection method for IR point targets based on eigentargets has been proposed [18] . Han et al. intro-
duced the subblock-level ratio-difference joint local contrast measure (SRDLCM), which enhances real small
targets while suppressing complex backgrounds [19] . A pixel-level classifier was presented for fine-grained de-
tection of pedestrians in night-time CCTV IR images [20] . Eventually, the method maintained more than a
90% F1 score on the test. Nevertheless, the dataset used in this study lacked generality because it was ac-
quired at a specific time and location. Cao et al. proposed a one-stage detector named ThermalDet based on
the deep neural network [21] . A channel-wise enhancement module was used to assign weights to different
channels. Besides, a dual-pass fusion block was added, which combined features from all other levels. This
method reached a mean Average Precision (mAP) of 74.60% on the FLIR dataset. This article [22] proposes
an anchor-free infrared pedestrian detection algorithm, which introduced a cross-scale feature fusion module
and a hierarchical attention mapping module to enhance pedestrian features and suppress background noise.
This algorithm integrates the anchor-free concept, which simplifies the network and improves model gener-
alization. A CFRM_3 method [23] was provided in another work to improve the mono-spectral features with
the fused multispectral features repeatedly in the network. The experimental results showed that the CFRM_3
led to substantial accuracy improvements. Du et al. proposed a weak and occluded vehicle detection method
in complex IR environments [24] . A hard negative example mining block was added to the YOLOv4 model
to depress the interference caused by complex backgrounds, and the accuracy was increased. Narayanan et
al. presented a method for IR pedestrian detection using the HOG and the YOLOv3 [25] . This work was com-

52 53 54 55 56 57 58 59 60 61 62