Page 51 - Read Online
P. 51

Page 520                          Liu et al. Intell Robot 2024;4(4):503-23  I http://dx.doi.org/10.20517/ir.2024.29

                                          Table 5. Ablation study on different backbone networks
                            #Param  FLOPs         DUTS-TE             DUT-OMRON              ECSSD
                  Backbone             FPS
                             (M)   (G)     maxF↑  avgF↑  MAE↓  S↑  maxF↑  avgF↑  MAE↓  S↑  maxF↑  avgF↑  MAE↓  S↑
                 MobileNet [18]  4.27  2.2  36  0.804  0.712  0.067  0.825  0.753  0.678  0.073  0.805  0.906  0.869  0.064  0.884
                MobileNetV2 [19]  2.37  0.8  47  0.798  0.708  0.066  0.823  0.758  0.675  0.075  0.806  0.905  0.865  0.066  0.885
                 ShuffleNetV2 [21]  1.60  0.6  33  0.743  0.698  0.071  0.816  0.720  0.666  0.076  0.797  0.870  0.861  0.069  0.878
                 EfficientNet [32]  8.64  2.6  44  0.723  0.687  0.112  0.748  0.696  0.656  0.105  0.778  0.848  0.826  0.104  0.783
                    Ours    2.29   1.5  62  0.845  0.773  0.054  0.866  0.804  0.742  0.061  0.833  0.934  0.907  0.047  0.913
                            #Param  FLOPs         PASCAL-S              HKU-IS                SOD
                  Backbone             FPS
                             (M)   (G)     maxF↑  avgF↑  MAE↓  S↑  maxF↑  avgF↑  MAE↓  S↑  maxF↑  avgF↑  MAE↓  S↑
                 MobileNet [18]  4.27  2.2  36  0.821  0.751  0.099  0.801  0.895  0.855  0.052  0.884  0.809  0.744  0.135  0.737
                MobileNetV2 [19]  2.37  0.8  47  0.806  0.747  0.102  0.798  0.89  0.854  0.056  0.879  0.801  0.746  0.138  0.742
                 ShuffleNetV2 [21]  1.60  0.6  33  0.781  0.742  0.107  0.794  0.853  0.848  0.059  0.871  0.779  0.734  0.147  0.715
                 EfficientNet [32]  8.64  2.6  44  0.755  0.736  0.132  0.754  0.844  0.807  0.114  0.762  0.722  0.706  0.168  0.689
                    Ours    2.29   1.5  62  0.847  0.801  0.084  0.833  0.919  0.889  0.044  0.901  0.845  0.796  0.117  0.767
                 The best methods are in bold. FLOPs: Floating-point operations; FPS: frames per second; MAE: mean absolute error.


                                         Table 6. Ablation study on the SAFE module configuration
                               DUTS-TE      DUT-OMRON     ECSSD        PASCAL-S     HKU-IS        SOD
                Stage  #B  #D
                            maxF↑  avgF↑  MAE↓  maxF↑  avgF↑  MAE↓  maxF↑  avgF↑  MAE↓  maxF↑  avgF↑  MAE↓  maxF↑  avgF↑  MAE↓  maxF↑  avgF↑  MAE↓
                 Default configuration  0.834  0.765  0.059  0.794  0.734  0.064  0.924  0.896  0.055  0.836  0.793  0.088  0.913  0.881  0.048  0.842  0.786  0.121
                E 1-E 4  2  1,2  0.830  0.762  0.061  0.788  0.730  0.068  0.920  0.893  0.057  0.831  0.787  0.091  0.908  0.874  0.052  0.837  0.756  0.130
                E 1-E 4  4  1,2,3,4  0.833  0.765  0.060  0.792  0.733  0.065  0.921  0.894  0.057  0.835  0.792  0.089  0.911  0.880  0.050  0.842  0.787  0.120
                E 1-E 4  4  1,2,4,8  0.836  0.766  0.056  0.792  0.735  0.063  0.923  0.895  0.056  0.837  0.795  0.086  0.914  0.883  0.045  0.843  0.786  0.121
                 E 5  3  1,2,4  0.831  0.762  0.062  0.787  0.728  0.069  0.918  0.892  0.060  0.832  0.789  0.088  0.907  0.872  0.055  0.834  0.753  0.132
                 “Default configuration” refers to the parameter settings in Table 1. “#B” represents the number of branches. “#D” represents the dilation rates. The number
                 of branches and dilation rates in the unmentioned stages are set according to the default configuration. The best methods are in bold. SAFE: Scale-adaptive
                 feature extraction; SOD: salient object detection; MAE: mean absolute error.

               work. The module is mainly divided into multi-scale feature interaction and dynamic selection. Multi-scale
               feature interaction is used to realize cross-scale feature embedding and improve the representation ability of
               the network within the layer; features of various scales have different representation abilities for salient tar-
               gets. To measure this difference, we deploy dynamic selection after multi-scale feature interaction to extract
               useful information by assigning different weights to features of different scales. We complete the design of the
               backbone network with the SAFE module as the basic unit and combine it with a decoder based on the MFA
               module to realize the final SANet. We use four quantitative metrics, maxF, avgF, MAE, and S, to evaluate
               the effectiveness of the model on six commonly used SOD datasets and a traffic dataset, and use parameters
               (#Param), FLOPs, and FPS to evaluate the effectiveness. In addition, SANet is qualitatively compared with
               state-of-the-art heavyweight and lightweight methods. The final results show that SANet achieves 62 fps on an
               NVIDIA GTX 3090 GPU with only 2.29 M parameters, significantly outperforming other models. In terms
               of model performance, it matches the performance of general heavyweight methods and surpasses three other
               state-of-the-art lightweight methods.


               In this paper, we have conducted extensive ablation experiments to validate the parameter selection of the
               SAFE module, although further research on its theoretical foundation is needed. Therefore, in future work, we
               will further explore this theoretical basis. Additionally, we aim to improve the detection performance of the
               proposed model and expand its applicability to more scenarios.




               DECLARATIONS
               Authors’ contributions
               Made substantial contributions to conception and design of the study, performed data analysis and interpre-
               tation, and wrote the manuscript: Liu Z
   46   47   48   49   50   51   52   53   54   55   56