Page 62 - Read Online
P. 62

Page 283                       Zhuang et al. Intell Robot 2024;4(3):276-92  I http://dx.doi.org/10.20517/ir.2024.18



























                                         Figure 4. The principle of SPP. SPP: Spatial Pyramid Pooling.


























                                        Figure 5. The structure of the MobileNetV3-YOLOv4 model.


               lighten its model size to enable deployment on resource-constrained edge devices. Seeking a lightweight net-
               work to substitute ddCSPDarknet53 as the feature extraction network of the YOLOv4 will be a viable option.

               The MobileNetV3 is a lightweight convolutional neural network presented by the Google team [31] , widely de-
               ployed on cell phones and smart bracelets. MobileNetV3 dramatically reduces parameters and increases speed
               by sacrificing only a small amount of accuracy compared with traditional large-scale convolutional neural net-
               works such as AlexNet [32] and ResNet. In the tail structure of the MobileNetV3, the average pooling is applied
               to cut the feature map of size 7 × 7 to 1 × 1. After that, the dimensionality of the feature map is increased by a 1
               × 1 convolution. The whole process reduces the computational by a factor of forty-nine. Because some convo-
               lutions in the head structure with the size of 3 × 3 and 1 × 1 have little impact on the accuracy, MobileNetV3
               removes them directly to improve the speed further. Additionally, MobileNetV3 cuts the convolutional core
               channels from 32 to 16, which is also an effective solution to make the network faster. To avoid a substantial
               decrease in accuracy, the Squeeze-and-Excitation Block (SE Block) is added to the core architecture of Mo-
               bileNetV3. The SE Block can determine the importance of each feature channel based on their dependency
               relationship. The network can selectively enhance the useful features while suppressing the less useful ones
   57   58   59   60   61   62   63   64   65   66   67