Page 67 - Read Online
P. 67

Zhuang et al. Intell Robot 2024;4(3):276-92  I http://dx.doi.org/10.20517/ir.2024.18  Page 288

                             Table 1. Performance comparison (%) with the state-of-the-art methods on the FLIR dataset
                                      Methods         mAP (%)   FPS (frame/s)  Model size (MB)
                                    Faster R-CNN       84.6         6.1           577.0
                                      VGG16            84.2         5.5           526.7
                                     ResNet50          83.8         7.6           446.2
                                      YOLOv3           58.2         38.5          246.4
                                      YOLOv4           81.2         27.0          256.3
                                      YOLOv5           73.6         39.6          191.2
                                      TOLOv8s          74.2        158.3           /
                                    RefineDet  [33]    72.9          /             /
                                    ThermalDet  [34]   74.6          /             /
                                     SMG-C  [35]       75.6        107.0           /
                                     SMG-Y  [35]       77.0        40.0            /
                                     YOLO-IR  [36]     78.6         151.1          /
                                     PMBW  [37]        77.3          /            36.0
                                YOLOv3-MobileNetV3  [38]  60.59    14.40         139.60
                                     DS-Net  [15]      71.9         32.8          25.6
                                       ours            82.7         55.9          110.0
                                 mAP: Mean Average Precision; FPS: frame per second; R-CNN: the Region with CNN features;
                                 YOLO: you only look once; SMG-Y: Source Model Guidance based on YOLOv3; PMBW: Paced
                                 MultiStage BlockWise.


                             Table 2. Performance comparison (%) with the state-of-the-art methods on the KAIST dataset
                                      Methods       mAP (%)    FPS (frame/s)  Model size (MB)
                                      YOLOv3         79.6         36            246.4
                                      YOLOv4         81.0         42            256.3
                                    PiCA-Net  [39]   65.8          /             /
                                  MuFEm + ScoFA  [40]  78.0        /             /
                                    MFDs-YOLO  [41]  80.3          /             /
                                    YOLO-ACN  [42]   82.3          /            177.6
                                       ours          86.8         64.2          110.0
                                   mAP: Mean Average Precision; FPS: frame per second; YOLO: you only look once.


                                      Table 3. Ablation study of detection precision on the FLIR dataset
                                      Methods  IE-CGAN  CSPDarknet53  MobileNetV3  mAP (%)
                                         1                 ✓                   80.2
                                         2                            ✓        80.7
                                         3       ✓         ✓                   81.2
                                        4        ✓                    ✓        82.7
                                       IE-CGAN: Image Enhancement Conditional Generative Adversarial Net-
                                       work; mAP: Mean Average Precision.


               To intuitively demonstrate the influence of different methods on network performance, we conducted abla-
               tion experiments on the FLIR dataset using the YOLOv4 network. Specifically, we maintained the structure
               of YOLOv4 unchanged. Initially, we replaced the original backbone with MobileNetv3 and made further en-
               hancements. Then, we implemented new data processing methods. We trained and tested the network on
               various datasets to assess the influence of these methods on network performance. As shown in Table 3, ap-
               plying our data processing method IE-CGAN to the baseline model can increase the detection results mAP by
               1.0%. We replaced the backbone network CSPDarknet53 of YOLOv4 with MobileNetV3, which can increase
               the detection results mAP by 1.5% and significantly reduce the model size. We have selected several commonly
               used algorithms as references to test the performance of our method, and the experimental results are shown
               in the Table. Our model excels in terms of detection speed and model size, achieving a frame rate of 55.9
               per second and a compact model size of only 110.0 MB. The detection accuracy is good and can meet the
               recognition requirements. These results indicate that the model can detect onboard equipment in real time
               and perform lightweight tasks.
   62   63   64   65   66   67   68   69   70   71   72