Page 91 - Read Online

P. 91

Wang et al. Intell Robot 2022;2(4):391-406 https://dx.doi.org/10.20517/ir.2022.25 Page 397

Figure 2. Schematic of the self-attention mechanism.

where b contains the relevant information among the three input features. Through this way, the self-
i
attention mechanism effectively assigns weight coefficients via the degree of similarity relationship between
two feature vectors and quickly extracts relevant information among multimodal parameters.

Pan et al. built a spatial-channel hierarchical network with a base net visual geometry Group 19 (VGG19) to
automatically detect bridge cracks at the pixel level and applied the self-attention mechanism not only for
mining the semantic dependence features of the spatial and channel dimensions but also for adaptively
[49]
integrating local features into their global dependence features . The segmentation performance of the
proposed approach was validated with public datasets containing 11,000 cracked and uncracked images and
achieved excellent evaluation results in terms of the mean intersection over union (85.31%). Zhao et al.
proposed a modified U-net for minute crack segmentation of 200 raw images in real-world, steel-box-girder
bridges and applied a self-attention module with softmax and gate operations to obtain the attention vector,
which enables the neuron to focus on the most significant receptive fields when processing large-scale
feature maps . The self-adaptation module, which consists of a multiplayer perceptron subnet, was selected
[50]
for deeper feature extraction inside a single neuron. The self-attention mechanism mimics the internal
process of biological observation behavior and can quickly extract important features of data, which is
especially good at capturing the internal correlation of data or features.

For feature extraction from bridge crack multimodal data, the traditional feature extraction method of
representative information has certain limitations, but the self-attention mechanism can reasonably allocate
weights among the time domain, spatial domain and channel domain to extract the most relevant features
of the target. The methods of feature extraction utilized in the above studies are summarized in Table 1.

2.2. Research status of the multisource heterogeneous data fusion representation of bridge cracks
The multisource heterogeneous parameters, such as the operating environment, load and structural
mechanical state indices of bridge cracks, have a strong correlation and low density, and it is difficult to
accurately and comprehensively reflect the evolution state of cracks. By analyzing and synthesizing the

86 87 88 89 90 91 92 93 94 95 96