Page 36 - Read Online
P. 36

Liu et al. Intell Robot 2024;4(4):503-23  I http://dx.doi.org/10.20517/ir.2024.29   Page 505
































               Figure 1. Application scenarios of SOD. (A) Road surface defect detection; (B) Assisted driving; (C) Strip steel surface defect detection.
               SOD: Salient object detection.


               aspects remains a challenge in SOD.

               We design and implement an efficient and lightweight SOD model based on the above analysis. It adopts a
               novel scale-adaptive feature extraction (SAFE) module for multi-scale learning. Meanwhile, it can adaptively
               adjusttheweightofeachscale ofinformation accordingto itsimportance toachievedynamic perceptionofthe
               features of salient objects. The SAFE module mainly consists of multi-scale feature interaction and dynamic
               selection. The multi-scale feature interaction is mainly used for feature extraction. It first uses depthwise sep-
               arable convolutions with different dilation rates to extract information of various receptive fields, and divides
               theinput features intodistinct branches. Then, feature interaction is achieved by fusing the features of different
               branches to improve their representation capabilities. The dynamic selection mainly combines channel atten-
               tion with multi-layer perceptron (MLP) to assign different weights to features of multiple scales to extract key
               feature information. We also designed a decoder based on the multi-scale feature aggregation (MFA) module
               to alleviate the information loss problem caused by excessive upsampling. Based on the SAFE and MFA mod-
               ules, we implement an encoder-decoder network that is more suitable for SOD tasks, namely scale-adaptive
               network (SANet). It can achieve an inference speed [frames per second (FPS)] of 62 fps on an NVIDIA GTX
               3090 GPU with only 2.29 M parameters, far exceeding other models, and real-time performance is guaranteed.
               The model performance reaches that of general heavyweight methods and exceeds many first-class lightweight
               methods.

               In summary, our contributions mainly include the following three points:


               (1)WeproposeanovelSAFEmodule,whichconsistsoftwoparts: multi-scalefeatureinteraction,whichisused
               to extract features of different scales and enhance the representation of salient objects through the interaction
               of cross-scale features; dynamic selection, which is data-driven and can adaptively perceive and measure the
               importance of features of different scales according to the changes in the input images.

               (2) We implement the SANet network, which consists of an encoder based on the SAFE module and a decoder
               based on the MFA module. This is an encoder-decoder network that considers both lightweight and detection
   31   32   33   34   35   36   37   38   39   40   41