Page 36 - Read Online
P. 36
Liu et al. Intell Robot 2024;4(4):503-23 I http://dx.doi.org/10.20517/ir.2024.29 Page 505
Figure 1. Application scenarios of SOD. (A) Road surface defect detection; (B) Assisted driving; (C) Strip steel surface defect detection.
SOD: Salient object detection.
aspects remains a challenge in SOD.
We design and implement an efficient and lightweight SOD model based on the above analysis. It adopts a
novel scale-adaptive feature extraction (SAFE) module for multi-scale learning. Meanwhile, it can adaptively
adjusttheweightofeachscale ofinformation accordingto itsimportance toachievedynamic perceptionofthe
features of salient objects. The SAFE module mainly consists of multi-scale feature interaction and dynamic
selection. The multi-scale feature interaction is mainly used for feature extraction. It first uses depthwise sep-
arable convolutions with different dilation rates to extract information of various receptive fields, and divides
theinput features intodistinct branches. Then, feature interaction is achieved by fusing the features of different
branches to improve their representation capabilities. The dynamic selection mainly combines channel atten-
tion with multi-layer perceptron (MLP) to assign different weights to features of multiple scales to extract key
feature information. We also designed a decoder based on the multi-scale feature aggregation (MFA) module
to alleviate the information loss problem caused by excessive upsampling. Based on the SAFE and MFA mod-
ules, we implement an encoder-decoder network that is more suitable for SOD tasks, namely scale-adaptive
network (SANet). It can achieve an inference speed [frames per second (FPS)] of 62 fps on an NVIDIA GTX
3090 GPU with only 2.29 M parameters, far exceeding other models, and real-time performance is guaranteed.
The model performance reaches that of general heavyweight methods and exceeds many first-class lightweight
methods.
In summary, our contributions mainly include the following three points:
(1)WeproposeanovelSAFEmodule,whichconsistsoftwoparts: multi-scalefeatureinteraction,whichisused
to extract features of different scales and enhance the representation of salient objects through the interaction
of cross-scale features; dynamic selection, which is data-driven and can adaptively perceive and measure the
importance of features of different scales according to the changes in the input images.
(2) We implement the SANet network, which consists of an encoder based on the SAFE module and a decoder
based on the MFA module. This is an encoder-decoder network that considers both lightweight and detection

