Page 90 - Read Online
P. 90

Page 396                                                        Wang et al. Intell Robot 2022;2(4):391-406  https://dx.doi.org/10.20517/ir.2022.25

               neural network for concrete structural crack damage classification, where the multilevel cascade classifier
               was utilized to extract the characteristics of concrete cracks and achieved an accuracy of 97.9% . Zheng et
                                                                                                [15]
               al. adopted a model based on a CNN to amplify and extract the features for 5000 concrete crack images and
               analyzed the morphological and geometric indices of cracks through the training of building surface data
               such as roads, bridges, houses and dams, while achieving the highest crack detection accuracy of 98% and
               the average detection accuracy of 87% . Xu et al. proposed an end-to-end crack detection model based on
                                               [46]
               a CNN for 2068 bridge crack images using only images and image labels as input and extracted multiscale
               crack feature information by using cavity convolution and pooling methods, thus reducing the
               computational complexity and achieving a high recognition rate (96.37%) . Teng et al. applied 11 well-
                                                                               [47]
               known CNN models as the feature extractor of YOLOv2 for crack detection with 990 RGB bridge crack
               images, providing a basis for rapid and accurate crack detection of concrete structures, and achieved a high
               precision of 0.89 and a fast computing speed .
                                                    [48]
               The self-attention mechanism is a model in deep learning that has been widely applied in natural language
               processing tasks in recent years. The idea of attention is to filter out a small amount of important
               information from a large amount of information and focus on this important information, disregarding
               most of the unimportant information. The larger the weight, the more focused the corresponding feature,
               where the weight represents the importance of the feature. The self-attention mechanism reduces the
               dependence on external information and is better at capturing the internal correlations of data or features.

               A schematic diagram of the self-attention mechanism is shown in Figure 2. In the figure, three multimodal
               features, such as the depth of the crack, the load, and the environmental humidity, are used as input, and
               the output is their correlation information. The feature vectors of a , a  and a , which have certain
                                                                                       3
                                                                                 2
                                                                               1
               meanings, are obtained from the preprocessing of the three multimodal features. They are respectively
               multiplied by three weight vectors (w , w , and w ) to obtain three corresponding vectors (q, k, v, and i
                                                          v
                                                   k
                                                                                                  i
                                                                                                i
                                                                                                    i
                                                q
               denotes the number of features). The following process can be divided into three steps. Firstly, the similarity
               calculation of q and k is performed to obtain the weight of feature. The calculation formula is as follows
                            i
                                 i
                                                                                                        (1)
               where α  denotes the weight. Then the softmax function is used to normalize the weight of each feature. The
                      i,j
               normalized weight can be calculated by



                                                                                                         (2)



               where      denotes the normalized weight. Finally, a weighted sum operation is performed on the normalized
                                                   i
               weights and each corresponding vector (v) to obtain their corresponding outputs. The calculation method is
               as follows



                                                                                                         (3)
   85   86   87   88   89   90   91   92   93   94   95