Page 51 - Read Online

P. 51

Page 520 Liu et al. Intell Robot 2024;4(4):503-23 I http://dx.doi.org/10.20517/ir.2024.29

Table 5. Ablation study on different backbone networks
#Param FLOPs DUTS-TE DUT-OMRON ECSSD
Backbone FPS
(M) (G) maxF↑ avgF↑ MAE↓ S↑ maxF↑ avgF↑ MAE↓ S↑ maxF↑ avgF↑ MAE↓ S↑
MobileNet [18] 4.27 2.2 36 0.804 0.712 0.067 0.825 0.753 0.678 0.073 0.805 0.906 0.869 0.064 0.884
MobileNetV2 [19] 2.37 0.8 47 0.798 0.708 0.066 0.823 0.758 0.675 0.075 0.806 0.905 0.865 0.066 0.885
ShuffleNetV2 [21] 1.60 0.6 33 0.743 0.698 0.071 0.816 0.720 0.666 0.076 0.797 0.870 0.861 0.069 0.878
EfficientNet [32] 8.64 2.6 44 0.723 0.687 0.112 0.748 0.696 0.656 0.105 0.778 0.848 0.826 0.104 0.783
Ours 2.29 1.5 62 0.845 0.773 0.054 0.866 0.804 0.742 0.061 0.833 0.934 0.907 0.047 0.913
#Param FLOPs PASCAL-S HKU-IS SOD
Backbone FPS
(M) (G) maxF↑ avgF↑ MAE↓ S↑ maxF↑ avgF↑ MAE↓ S↑ maxF↑ avgF↑ MAE↓ S↑
MobileNet [18] 4.27 2.2 36 0.821 0.751 0.099 0.801 0.895 0.855 0.052 0.884 0.809 0.744 0.135 0.737
MobileNetV2 [19] 2.37 0.8 47 0.806 0.747 0.102 0.798 0.89 0.854 0.056 0.879 0.801 0.746 0.138 0.742
ShuffleNetV2 [21] 1.60 0.6 33 0.781 0.742 0.107 0.794 0.853 0.848 0.059 0.871 0.779 0.734 0.147 0.715
EfficientNet [32] 8.64 2.6 44 0.755 0.736 0.132 0.754 0.844 0.807 0.114 0.762 0.722 0.706 0.168 0.689
Ours 2.29 1.5 62 0.847 0.801 0.084 0.833 0.919 0.889 0.044 0.901 0.845 0.796 0.117 0.767
The best methods are in bold. FLOPs: Floating-point operations; FPS: frames per second; MAE: mean absolute error.

Table 6. Ablation study on the SAFE module configuration
DUTS-TE DUT-OMRON ECSSD PASCAL-S HKU-IS SOD
Stage #B #D
maxF↑ avgF↑ MAE↓ maxF↑ avgF↑ MAE↓ maxF↑ avgF↑ MAE↓ maxF↑ avgF↑ MAE↓ maxF↑ avgF↑ MAE↓ maxF↑ avgF↑ MAE↓
Default configuration 0.834 0.765 0.059 0.794 0.734 0.064 0.924 0.896 0.055 0.836 0.793 0.088 0.913 0.881 0.048 0.842 0.786 0.121
E 1-E 4 2 1,2 0.830 0.762 0.061 0.788 0.730 0.068 0.920 0.893 0.057 0.831 0.787 0.091 0.908 0.874 0.052 0.837 0.756 0.130
E 1-E 4 4 1,2,3,4 0.833 0.765 0.060 0.792 0.733 0.065 0.921 0.894 0.057 0.835 0.792 0.089 0.911 0.880 0.050 0.842 0.787 0.120
E 1-E 4 4 1,2,4,8 0.836 0.766 0.056 0.792 0.735 0.063 0.923 0.895 0.056 0.837 0.795 0.086 0.914 0.883 0.045 0.843 0.786 0.121
E 5 3 1,2,4 0.831 0.762 0.062 0.787 0.728 0.069 0.918 0.892 0.060 0.832 0.789 0.088 0.907 0.872 0.055 0.834 0.753 0.132
“Default configuration” refers to the parameter settings in Table 1. “#B” represents the number of branches. “#D” represents the dilation rates. The number
of branches and dilation rates in the unmentioned stages are set according to the default configuration. The best methods are in bold. SAFE: Scale-adaptive
feature extraction; SOD: salient object detection; MAE: mean absolute error.

work. The module is mainly divided into multi-scale feature interaction and dynamic selection. Multi-scale
feature interaction is used to realize cross-scale feature embedding and improve the representation ability of
the network within the layer; features of various scales have different representation abilities for salient tar-
gets. To measure this difference, we deploy dynamic selection after multi-scale feature interaction to extract
useful information by assigning different weights to features of different scales. We complete the design of the
backbone network with the SAFE module as the basic unit and combine it with a decoder based on the MFA
module to realize the final SANet. We use four quantitative metrics, maxF, avgF, MAE, and S, to evaluate
the effectiveness of the model on six commonly used SOD datasets and a traffic dataset, and use parameters
(#Param), FLOPs, and FPS to evaluate the effectiveness. In addition, SANet is qualitatively compared with
state-of-the-art heavyweight and lightweight methods. The final results show that SANet achieves 62 fps on an
NVIDIA GTX 3090 GPU with only 2.29 M parameters, significantly outperforming other models. In terms
of model performance, it matches the performance of general heavyweight methods and surpasses three other
state-of-the-art lightweight methods.

In this paper, we have conducted extensive ablation experiments to validate the parameter selection of the
SAFE module, although further research on its theoretical foundation is needed. Therefore, in future work, we
will further explore this theoretical basis. Additionally, we aim to improve the detection performance of the
proposed model and expand its applicability to more scenarios.

DECLARATIONS
Authors’ contributions
Made substantial contributions to conception and design of the study, performed data analysis and interpre-
tation, and wrote the manuscript: Liu Z

46 47 48 49 50 51 52 53 54 55 56