Page 144 - Read Online
P. 144

Page 429                              Chen et al. Intell Robot 2023;3:420-35  https://dx.doi.org/10.20517/ir.2023.24








               where (x, y) denotes the train input and target, and W denotes the trainable weights. Function l(·) calculates
               the difference between the prediction and the correct answer, which is part of the loss. g(·) is a sparsity-
               induced penalty on the scaling factors, and λ balances the two terms. In the sparse training, we let g(s) = |s|.

               By using Equation (2) as a loss function, γ asymptotically approaches smaller values. Since the output is the
               product of γ and the weights, the output of a channel with a γ near 0 will also be nearly 0, and since it
               contributes little to the calculation, the channel can be eliminated. In this way, the parameters of the entire
               model can be reduced. Figure 13 shows the accuracy and loss results after sparse learning. The distribution
               of γ values before and after learning is shown in Figure 14.


               From the training results, it can be seen that the accuracy took an initial dive during sparse learning.
               However, it eventually returned to 80%. Losses increased sharply due to the regularization term but then
               dropped to 1.6.


               From the results of the distribution of γ before and after the training, we can see that because of the sparsity-
               induced penalty on the scaling factors in Equation 1, the distribution of γ has been reduced to nearly 0. As a
               result, we will be able to remove redundant channels and layers.

               3.1.3. Reduction of layers and channels
               Through sparse learning, the channels and layers that could be deleted were identified. In this experiment,
               the threshold was set to 0.85, and the reduction was performed. A comparison of the number of parameters,
               accuracy, and inference time before and after deletion is summarized in Table 2.

               The results show that although the accuracy was reduced only by 8%, the number of parameters was
               reduced to about 1.5% of those in the original network and the inference time was reduced to 40% of the
               original. With the use of NCS2, YOLO V3 was successfully run in Raspberry Pi.


               3.1.4. Fine-tuning
               After the channel and layer reductions have been completed, another 300 fine-tuning training sessions can
               be performed to increase the accuracy. The resulting accuracy and losses are shown in Figure 15.


               The light blue lines in Figure 14 present the results of fine-tuning, which are compared to those from the
               initial and sparse learning. The results show that the fine-tuning increased the average accuracy to 85% and
               reduced the loss to 1.4.


               3.2. Experiments
               3.2.1 Verification of distance measurement
               Since the distance measurement is an essential feature of this research, we experimented with its accuracy.
               For the experimental conditions, we set six different distances from the camera (0.5 m, 1 m, 1.5 m, 2 m, 3 m,
               and 4 m), used a person as an obstacle, and tested the extent of difference between the correct distance and
               the measured distance. The results are summarized in Table 3.
   139   140   141   142   143   144   145   146   147   148   149