Page 16 - Read Online
P. 16

Page 160                            Ji et al. Intell Robot 2021;1(2):151-75  https://dx.doi.org/10.20517/ir.2021.14


























                                                    Figure 6. A residual block.

               • ResNeXt is a variant to ResNet and looks similar to the inception network as they both follow the split-
               transform-merge paradigm, but the outputs of different paths are merged by adding them together for
                                                                     [37]
               ResNeXt instead of depth-concatenated for inception network . ResNeXt architecture’s paths share the
               same topology. The number of independent paths is introduced as a hyper-parameter cardinality to provide
               a new way of adjusting the model capacity.


               • DenseNet further connects all layers directly with each other for the benefit of shortcut connections . All
                                                                                                    [38]
               earlier layers’ feature maps are aggregated with depth-concatenation and passed to subsequent layers.
               DenseNet is highly parameter-efficient due to feature reuse.


               • Network with stochastic depth randomly drops layers during training but uses the full network during
               testing. The ResNet neural network takes less time in training and is thus more useful for real-world
                         [39]
               applications . Each layer is randomly dropped with a survival probability during training, and all layers are
               active and recalibrated according to their survival probabilities during testing time. During training, the
               input of a residual block flows through both the identity shortcut and the weight layers when it is enabled;
               otherwise, only it only flows through the identity shortcut.

               • VGGNet is a standard deep CNN architecture with 16 and 19 convolutional layers for VGG-16 and VGG-
                 [40]
               19 . The VGG architecture is a well-performing image recognition architecture on many tasks and datasets
               beyond ImageNet.


               • SPPNet is a type of CNN that employs spatial pyramid pooling to remove the fixed-size constraint of the
               network . An SPP layer is added on top of the last convolutional layer to pool the features and generate
                      [41]
               fixed-length outputs which are then fed into the fully connected layers or other classifiers with the aim to
               avoid the need for cropping or warping at the beginning.


               • PReLU-Net is a kind of CNN using parameterized ReLUs for activation function and a robust Kaiming
               initialization scheme to account for non-linear activation functions .
                                                                       [42]
   11   12   13   14   15   16   17   18   19   20   21