Page 159 - Read Online
P. 159

Ao et al. Intell Robot 2023;3(4):495-513  I http://dx.doi.org/10.20517/ir.2023.28  Page 9 of 19


                                     Table 1. Number of color images of sEMG signals for twelve gestures
                                    Gesture       a     b      c     d     e     f
                                    Number of pictures  15268  15016  17451  13726  13756  14599
                                    Gesture       g     h      i     j     k     l
                                    Number of pictures  14818  15568  13894  13189  12468  14576






















                       Figure 4. CNN network model constructs and associated hyper-parameters. CNN: Convolutional neural network.


               3.2. CNN model framework and hyper-parameter settings
               The model we build contains four convolutional layers and three fully connected layers, where the size of the
               convolutional kernels of the convolutional layers is uniformly set to 3x3. These layers have a stride value of 1
               andapaddingvalueof1andutilize32, 64, 128, and128convolutionalkernels, respectively. Eachconvolutional
               layer is followed by a normalization layer and a ReLU activation function. Between each convolutional layer,
               there is a 2x2 pooling layer. The last three fully connected layers have outputs of 1024, 512, and 12, and then
               the classification results are output. We chose to build a CNN with only four layers of convolution. This was
               done to demonstrate the stability of the Shapley value for muscle analysis and highlight its usefulness within
               a normal network. The loss function is a cross-entropy loss function, and the optimizer is stochastic gradient
               descent. Figure 4 shows the architecture of the CNN network model. We use these image data as input for a
               CNN to perform multi-classification recognition tasks and obtain gesture recognition accuracy. At the end of
               the network, we introduce the Grad-CAM method and perform CAM with gradient weighting on the input
               images to generate heat maps that show the importance attribution of the network. Based on the information
               obtained from Grad-CAM, which indicates the input data that the network considers important, we analyze
               muscle synergy by removing redundant information for the gesture action.


               3.3. Experimental results
               In this section, we present the experimental results. First, we provide an overview of our network training
               results, including the training loss, testing loss, training accuracy, and testing accuracy. Then, we showcase the
               results of the heat intensity map of the CAM, which we obtained by applying the Grad-CAM method. For the
               high-importance feature regions of the heat intensity map, we identified the target objects for SVMS analysis.


               3.3.1 Gesture recognition results
               We trained the network for 50 epochs, using 70% of the sEMG color images as the training set and 30% of the
               sEMG color images as the testing set. We used random seeds, thus ensuring that the data in the test set were
               not propagated by the neural network during the training process. Then, the network training was performed,
               and we obtained more satisfactory results, and the prediction accuracy reached 94.26%. Figure 5 illustrates the
               iterative process of training accuracy and testing accuracy. Figure 6 shows the comparison of the recognition
   154   155   156   157   158   159   160   161   162   163   164