Page 50 - Read Online

P. 50

Liu et al. Intell Robot 2024;4(4):503-23 I http://dx.doi.org/10.20517/ir.2024.29 Page 519

Table 4. Ablation study on the proposed SANet components

DUTS-TE DUT-OMRON ECSSD PASCAL-S HKU-IS SOD
Ver. Methods
maxF↑ MAE↓ maxF↑ MAE↓ maxF↑ MAE↓ maxF↑ MAE↓ maxF↑ MAE↓ maxF↑ MAE↓
0 Basic 0.819 0.068 0.779 0.076 0.911 0.069 0.825 0.112 0.898 0.056 0.821 0.133
1 Basic+MI 0.828 0.063 0.790 0.067 0.920 0.060 0.832 0.094 0.910 0.048 0.833 0.125
2 Basic+MI+DS 0.830 0.060 0.792 0.065 0.922 0.058 0.834 0.093 0.912 0.047 0.836 0.124
3 Basic+MI+DS+MF 0.834 0.059 0.794 0.064 0.924 0.055 0.836 0.088 0.913 0.048 0.842 0.121
4 Basic+MI+DS+MF+PR 0.845 0.054 0.804 0.061 0.934 0.047 0.847 0.084 0.919 0.044 0.845 0.117
We use the vanilla single branch module as the base model (Ver.0). Here, “MI”, “DS”, “MF”, and “PR” refer to the
multi-scale feature interaction, dynamic selection, MFA module, and ImageNet pre-training, respectively.

learning of lightweight networks.

4.4. Ablation study
In this section, we conduct an ablation study on the proposed module components, the backbone network’s
effectiveness, and the SAFE module’s configuration to demonstrate our proposed model’s effectiveness. The
relevant experimental settings are consistent with those outlined in Section 4.1.

4.5. Proposed module components
Table 4 shows the results of the ablation study of the model components in this paper. As the number of
model components increases, the model performance improves progressively. Compared with Ver.0, the av-
erage values of maxF on six datasets of Ver.3 increased by 0.015 and MAE decreased by 0.014. There is no
ImageNet pre-training between Ver.0 and Ver.3, and the difference in their experimental results also shows
that the proposed model is effective.

4.6. The effectiveness of the backbone network
In addition to existing SOD methods, we also compared several widely used lightweight backbone networks,
including MobileNet, MobileNetV2, ShuffleNetV2, and EfficientNet. To use these lightweight backbone net-
works for SOD tasks, we add the same decoder as SANet to these networks for ablation study.

In Table 5, we can see that directly applying the existing lightweight backbone network to the SOD task does
not produce satisfactory results regarding accuracy. Taking EfficientNet as an example, we take the average
values of maxF, avgF, and MAE of six data sets. The results showed that compared to EfficientNet, SANet
achieved a 13.20% improvement in maxF, an 11.14% improvement in avgF, and a 44.72% reduction in MAE.
This further verifies the correctness and rationality of our redesign of the backbone network structure for SOD.

4.7. Configuration of the SAFE module
Table 6 presents the ablation study results of the SAFE module with varying branch numbers and dilation rates.
Increasing the number of branches in the E 1-E 4 stages improves some metrics, but also significantly increases
computational complexity, which contradicts our goal of maintaining a lightweight model. The default settings
of the SAFE module are selected after weighing the trade-off between model accuracy and complexity.

5. CONCLUSION
This paper reviews existing research on SOD and analyzes the challenges in current approaches. Heavyweight
SOD models face difficulties in scenarios with low computing power and high real-time requirements due to
issues such as large model size and poor real-time performance. In contrast, lightweight SOD models have
poor detection performance and struggle to handle complex scenarios. To address these problems, we pro-
pose SANet, a scale-adaptive lightweight SOD model that achieves a trade-off between lightweight design
and detection effectiveness. We first implement the SAFE module, a component unit of the backbone net-

45 46 47 48 49 50 51 52 53 54 55