Page 15 - Read Online
P. 15

Ji et al. Intell Robot 2021;1(2):151-75  https://dx.doi.org/10.20517/ir.2021.14     Page 159



























                                               Figure 5. Generative adversarial networks.

               For image classification tasks, the following deep learning methods could be adopted:


               • LeNet is the earliest pre-trained model used for recognizing handwritten and machine-printed characters
               and has a simple and straightforward architecture.

               • AlexNet consists of eight layers, five convolutional layers and three fully connected layers, features ReLU
               and overlapping techniques, and allows multiple GPU. The dropout technique is used to prevent overfitting
               problems while suffering from longer training time. The dropout technique is that, at every training step,
               the number of interconnecting neurons of a neural network is randomly reduced by a percentage. ZFNet is
               a classic CNN and was motivated by visualizing intermediate feature layers and the operation of the
                      [31]
               classifier . It has smaller filters and convolution stride than AlexNet.
               • Inception network differs from the CNN classifiers in that it has filters with multiple sizes operating on the
               same level and concatenated outputs are sent to the next inception module which makes the neural network
               wider .
                    [32]

               • GoogLeNet is a 27-layer architecture including nine inception modules that reduce the input images while
               retaining important spatial information to achieve efficiency. Users can utilize a GoogLeNet network
               trained on Imagenet with transfer learning instead of implementing or training the network from the
               scratch.


               • ResNet introduces, as shown in Figure 6, an identity shortcut connection that skips one or more layers.
                                                                                    [33]
               The identity mapping layers do nothing to avoid producing higher training error . Pre-activation ResNet
               makes the optimization easier and reduces the overfitting. RiR (ResNet in ResNet) makes the input with
               residual stream and transient stream for better accuracy attempts in order to generalize the ResNet block for
               residual network . Residual networks of residual networks (RoR) proposes to have shortcut connections
                             [34]
                                           [35]
               across a group of residual blocks . On top of this, another level of shortcut connection can exist across a
               group of “groups of residual blocks”. Wide residual network (WRN) reduces training time but has more
               parameters as the network widens and it tests plenty of parameters such as the design of the ResNet block
                                                    [36]
               including the depth and the widening factor .
   10   11   12   13   14   15   16   17   18   19   20