Page 39 - Read Online

P. 39

Page 508 Liu et al. Intell Robot 2024;4(4):503-23 I http://dx.doi.org/10.20517/ir.2024.29

Figure 3. Overall encoder-decoder architecture of the proposed SANet. E represents the encoder at the -th stage. D indicates the decoder
at the -th stage. and denote the output feature maps of the encoder and the decoder at the -th stage, respectively. stands for
the predicted saliency map, and 1 is the final prediction result. is the ground-truth saliency map. PPM: Pyramid pooling module; MFA:
multi-scale feature aggregation.

CNN is that it lacks rotation invariance and therefore requires a large amount of training data [36] . To this end,
Hinton of Google Brain proposed the capsule network (CapsNet) that can capture structural information [37] .
CapsNets designed a clever dynamic routing algorithm to capture the part-whole relationship in the image to
enhance the equivalence of the network. Based on this advantage, related research and applications based on
the CapsNet structure have been proposed one after another. Saqur etal. proposed a new algorithm CapsGAN
by showing the weakness of the CNN-based generative adversarial network (GAN) architecture in generating
3D images [38] . Cheng . proposed complex-valued dense CapsNet (Cv-CapsNet) and complex-valued di-
verse CapsNet (Cv-CapsNet++) for image classification [39] . Sun . proposed a deep tensor capsule network
that uses a new tensor capsule-based routing algorithm and the corresponding convolution operation [40] .

Due to the unique advantages of CapsNet, it has also been successfully applied to the task of SOD. For ex-
ample, Liu . optimized deep unsupervised SOD by using the part-whole relationship characteristics of
CapsNet [41] . Zhang . used the attention mechanism to interact with CNN and CapsNet features to better
detect salient objects [42] . Liu . integrated the advantages of CNN and CapsNet, extracted different seman-
ticinformationrespectively, andinteractedwitheachothertogeneratebettersaliencypredictionmaps [43] . The
design of the SAFE module in this paper also contains some ideas for CapsNet.

3. THE PROPOSED METHOD
In this section, the proposed SOD framework is presented. Section 3.1 describes the overall network structure.
Section 3.2 introduces the SAFE module, which can adaptively extract and filter features according to the scale
differences of salient objects. Section 3.3 explains the decoder design based on the MFA module.

3.1. Overall network architecture
As shown in Figure 3, the overall network structure of SANet comprises a bottom-up encoder, a top-down
decoder, and a lateral connection between them. The encoder is built with SAFE modules as units and is
divided into five stages. In these five stages, we downsample the input using dilated DSConv3×3 with a stride

34 35 36 37 38 39 40 41 42 43 44