Page 62 - Read Online
P. 62
Page 283 Zhuang et al. Intell Robot 2024;4(3):276-92 I http://dx.doi.org/10.20517/ir.2024.18
Figure 4. The principle of SPP. SPP: Spatial Pyramid Pooling.
Figure 5. The structure of the MobileNetV3-YOLOv4 model.
lighten its model size to enable deployment on resource-constrained edge devices. Seeking a lightweight net-
work to substitute ddCSPDarknet53 as the feature extraction network of the YOLOv4 will be a viable option.
The MobileNetV3 is a lightweight convolutional neural network presented by the Google team [31] , widely de-
ployed on cell phones and smart bracelets. MobileNetV3 dramatically reduces parameters and increases speed
by sacrificing only a small amount of accuracy compared with traditional large-scale convolutional neural net-
works such as AlexNet [32] and ResNet. In the tail structure of the MobileNetV3, the average pooling is applied
to cut the feature map of size 7 × 7 to 1 × 1. After that, the dimensionality of the feature map is increased by a 1
× 1 convolution. The whole process reduces the computational by a factor of forty-nine. Because some convo-
lutions in the head structure with the size of 3 × 3 and 1 × 1 have little impact on the accuracy, MobileNetV3
removes them directly to improve the speed further. Additionally, MobileNetV3 cuts the convolutional core
channels from 32 to 16, which is also an effective solution to make the network faster. To avoid a substantial
decrease in accuracy, the Squeeze-and-Excitation Block (SE Block) is added to the core architecture of Mo-
bileNetV3. The SE Block can determine the importance of each feature channel based on their dependency
relationship. The network can selectively enhance the useful features while suppressing the less useful ones

