Page 39 - Read Online
P. 39
Page 8 Chazhoor et al. Intell Robot 2022;2:1-19 https://dx.doi.org/10.20517/ir.2021.15
[34]
deep learning framework used in this research is PyTorch . The images from the WaDaBa dataset are
input to the pre-trained models after performing under-sampling in the dataset. The batch size chosen for
this experiment is 4 such that the GPU doesn’t run out of memory while processing. The learning rate is
0.001 and is decayed by a factor of 0.1 every seven epochs. Decaying the learning rate aids the network’s
convergence to a local minimum and also enhances the learning of complicated patterns . Cross-Entropy
[35]
loss is utilized for training, accompanied by a momentum of 0.9, which is widely used in the machine
learning and neural network communities . The Stochastic Gradient Descent (SGD) optimizer , a
[36]
[37]
gradient descent technique that is extensively employed in training deep learning models, is used. The
training is done using a five-fold cross-validation technique, and the result is generated, along with graphs
showing the number of epochs vs. accuracy and number of epochs vs. loss. On the WaDaBa dataset, each
model was subjected to twenty epochs.
Before being forwarded on to the training, the data was normalized. These approaches, which were applied
to the data, included random horizontal flipping and centre cropping.
The size of the input picture is 224 × 224 pixels [Figure 9].
2.3.1. Imbalance in the dataset
The number of images for each class in the dataset is uneven. The first class (PETE) contains 2200 photos,
while the last class (Others) contains only 40. Due to the size and cost of certain forms of plastic, obtaining
datasets is quite tricky. Because of the class imbalance, the under-sampling strategy was used. Images were
split into training and validation sets, eighty percent for the training and twenty percent for the testing
purposes.
2.3.2. K-fold cross-validation
The 5-fold cross-validation was considered for all the tests to validate the benchmark models . The data
[38]
was tested on the six models and the training loss and accuracy, validation loss and accuracy and the
training time was recorded for 20 epochs with identical model parameters. The resultant average data was
tabulated, and the corresponding graphs were plotted for visual representation. The flow chart of the
experimental process is displayed in Figure 8.
3. RESULTS
3.1 Accuracy, loss, area under curve and receiver operating characteristic curve
The metrics used to benchmark the models on the WaDaBa dataset are accuracy and loss. The accuracy
[39]
corresponds to the correctness of the value . It measures the value to the actual value. Loss is a prediction
of how erroneous the predictions of a neural network are, and the loss is calculated with the help of a loss
function . The area under curve (AUC) measures the classifier’s ability to differentiate between classes and
[40]
summarize the receiver operating characteristic (ROC) curve. ROC plots the performance of a classification
model’s overall accuracy. The curve plots the True Positive Rate against the False Positive Rate.
Table 2 clearly shows that the ResNeXt architecture achieves the maximum accuracy of 87.44 percent in an
average time of thirteen minutes and eleven seconds. When implemented in smaller and portable devices,
smaller networks such as MobileNet_v2, SqueezeNet, and DenseNet offer equivalent accuracy. AlexNet
trains the model in the shortest period but with the lowest accuracy. In comparison to the other models,
DenseNet takes the longest to train. With a classification accuracy of 97.6 percent, ResNeXt comes out as
the top model for reliably classifying PE-HD. When compared to other models, MobileNet_v2 classifies PS
with more accuracy. Also, from Table 2, we can see that PP has the least classification accuracy for all the