Page 66 - Read Online
P. 66
Page 287 Zhuang et al. Intell Robot 2024;4(3):276-92 I http://dx.doi.org/10.20517/ir.2024.18
Figure 8. Some examples of the detection result on the FLIR dataset. The first column is the original image, and the second column is the
result of MobileNetV3-YOLOv4.
tor proposed in this experiment outperforms the previously mentioned detection models regarding detection
accuracy and computational resource efficiency. We also compared our model with YOLOv5, YOLOv8s, and
YOLOv3 MobileNetv3. In the IR target detection task, our model significantly outperforms previous models
and aligns more closely with the requirements of our real-time monitoring task. YOLO-IR has demonstrated
outstanding performance on the FILR dataset. Our model achieved higher accuracy in this task with fewer
parameters, improving by 5%, despite some performance degradation. Additionally, we compared Source
Model Guidance based on YOLOv3 (SMG-Y) and PMBW (a Paced MultiStage BlockWise approach to Object
Detection in Thermal Images), both based on visual converters. It can be seen that our method has an absolute
advantage in detection speed and high accuracy. Meanwhile, our model size is only 110MB, which performed
better than many methods. This balanced improvement in the three evaluations makes the proposed method
suitable for deployment on resource-constrained edge devices. Examples of the detection results are displayed
in Figure 8.
To demonstrate the excellent performance of this model, it was compared not only with many other models
on the FLIR dataset but also on the KAIST dataset, and competitive results were achieved. Table 2 presents
the comparison results of our model with other models on the KAIST dataset. We compared our model with
recent excellent single-stage detectors and some lightweight detectors. The results indicate that our model
is the smallest and superior to other detection models. Regarding accuracy, our mAP outperforms other
detection models. Our model demonstrates significant performance advantages compared to other models.
Compared to YOLOv3, YOLOv4, and other benchmark models, our model outperforms them in mAP. Com-
pared to YOLOv4, our model shows a slight improvement in mAP, ranging from 81.0% to 86.8%, along with
enhanced processing speed, increasing from 42 to 64.2 frames per second. Compared with pixel-wise con-
textual attention network (PiCA-Net), Multimodal Feature Embedding (MuFEm) + Spatio-Contextual Fea-
ture Aggregation (ScoFA), and multispectral fusion and double-stream detectors with Yolo-based information
(MFDs-YOLO), our model demonstrates notable enhancements in detection accuracy. Additionally, although
our model experiences a slight decrease in mAP compared to YOLO-ACN, there are significant improvements
in processing speed and model size. Overall, our model achieves substantial accuracy, speed, and size advance-
ments, making it more practical and competitive.

