Page 13 - Read Online
P. 13

Page 8                            Chazhoor et al. Intell Robot 2022;2:1-19  https://dx.doi.org/10.20517/ir.2021.15

                                                                  [34]
               deep learning framework used in this research is PyTorch . The images from the WaDaBa dataset are
               input to the pre-trained models after performing under-sampling in the dataset. The batch size chosen for
               this experiment is 4 such that the GPU doesn’t run out of memory while processing. The learning rate is
               0.001 and is decayed by a factor of 0.1 every seven epochs. Decaying the learning rate aids the network’s
               convergence to a local minimum and also enhances the learning of complicated patterns . Cross-Entropy
                                                                                          [35]
               loss is utilized for training, accompanied by a momentum of 0.9, which is widely used in the machine
               learning and neural network communities . The Stochastic Gradient Descent (SGD) optimizer , a
                                                     [36]
                                                                                                      [37]
               gradient descent technique that is extensively employed in training deep learning models, is used. The
               training is done using a five-fold cross-validation technique, and the result is generated, along with graphs
               showing the number of epochs vs. accuracy and number of epochs vs. loss. On the WaDaBa dataset, each
               model was subjected to twenty epochs.

               Before being forwarded on to the training, the data was normalized. These approaches, which were applied
               to the data, included random horizontal flipping and centre cropping.


               The size of the input picture is 224 × 224 pixels [Figure 9].

               2.3.1. Imbalance in the dataset
               The number of images for each class in the dataset is uneven. The first class (PETE) contains 2200 photos,
               while the last class (Others) contains only 40. Due to the size and cost of certain forms of plastic, obtaining
               datasets is quite tricky. Because of the class imbalance, the under-sampling strategy was used. Images were
               split into training and validation sets, eighty percent for the training and twenty percent for the testing
               purposes.

               2.3.2. K-fold cross-validation
               The 5-fold cross-validation was considered for all the tests to validate the benchmark models . The data
                                                                                               [38]
               was tested on the six models and the training loss and accuracy, validation loss and accuracy and the
               training time was recorded for 20 epochs with identical model parameters. The resultant average data was
               tabulated, and the corresponding graphs were plotted for visual representation. The flow chart of the
               experimental process is displayed in Figure 8.

               3. RESULTS
               3.1 Accuracy, loss, area under curve and receiver operating characteristic curve
               The metrics used to benchmark the models on the WaDaBa dataset are accuracy and loss. The accuracy
                                                    [39]
               corresponds to the correctness of the value . It measures the value to the actual value. Loss is a prediction
               of how erroneous the predictions of a neural network are, and the loss is calculated with the help of a loss
               function . The area under curve (AUC) measures the classifier’s ability to differentiate between classes and
                      [40]
               summarize the receiver operating characteristic (ROC) curve. ROC plots the performance of a classification
               model’s overall accuracy. The curve plots the True Positive Rate against the False Positive Rate.

               Table 2 clearly shows that the ResNeXt architecture achieves the maximum accuracy of 87.44 percent in an
               average time of thirteen minutes and eleven seconds. When implemented in smaller and portable devices,
               smaller networks such as MobileNet_v2, SqueezeNet, and DenseNet offer equivalent accuracy. AlexNet
               trains the model in the shortest period but with the lowest accuracy. In comparison to the other models,
               DenseNet takes the longest to train. With a classification accuracy of 97.6 percent, ResNeXt comes out as
               the top model for reliably classifying PE-HD. When compared to other models, MobileNet_v2 classifies PS
               with more accuracy. Also, from Table 2, we can see that PP has the least classification accuracy for all the
   8   9   10   11   12   13   14   15   16   17   18