Page 9 - Read Online
P. 9
Page 4 Chazhoor et al. Intell Robot 2022;2:1-19 https://dx.doi.org/10.20517/ir.2021.15
2. METHODS
2.1. Dataset
The WaDaBa dataset is a sophisticated collection that contains images of common plastics used in society.
The dataset includes seven distinct varieties of plastic. Images show several forms of plastics on a platform
under two lighting conditions: an LED bulb and a fluorescent lamp and is displayed in Figure 2. Table 1
shows the distribution of the 4000 images in the dataset according to their classes. As there are no images in
the PVC and PE-LD classes, both the classes have been excluded from the deep learning models. Deep
learning models are trained on five class types with images in the current work i.e., PETE, PE-HD, PP, PS,
and Other. The deep learning models are set up in such a way that each output matches one of the five class
categories. When the images for PVC and PE-LD are released, these classes can be included in the models.
The dataset’s classes are imbalanced, with the last class holding just 40 images and the PETE class consisting
[15]
of 2000 images. The dataset is freely accessible to the public .
2.2. Transfer learning
A large amount of data is needed to get optimum accuracy in a neural network. Data needs to be trained for
hours on a powerful Graphical Processing Unit (GPU) to get the results. With the advent of transfer
[26]
learning , there has been a significant change in the learning processes in deep neural networks. The
model which has been already trained on a large dataset like ImageNet , known as the pre-trained model,
[27]
[28]
enhances the transfer learning process. The transfer learning process works by freezing the initially
hidden layers of the model and fine-tuning the final layers of the models. The layer’s frozen state indicates
that it will not be trained. As a result, its weights will remain unchanged. As the data set used in this
research is relatively small with a limited number of images in each class, transfer learning best suits this
research. The pre-trained models used in the research are further explained in the subsection.
2.2.1. AlexNet
AlexNet is a neural network with three convolutional layers and two fully connected layers, and it was
introduced in 2012 by Alex Krizhevesky. AlexNet increases learning capacity by increasing network depth
and using multi-parameter tuning techniques. AlexNet uses ReLU to add non-linearity and dropout to
decrease the overfitting of data. CNN-based applications gained popularity following AlexNet's excellent
performance on the ImageNet dataset in 2012 . The architecture of AlexNet is shown in Figure 3.
[23]
2.2.2. Resnet-50
Residual networks (Resnet-50) are convolutional neural networks with skip connections with an extremely
deep convolution and 11 million parameters. A skip connection after each block solves the vanishing
gradient problem. The skip connection skips some layers in the network. With batch normalization and
ReLU activation, two 3 × 3 convolutions are used in each block to achieve the desired result . The
[21]
architecture of Resnet-50-50 is displayed in Figure 4.
2.2.3. ResNeXt
Proposed by Facebook and ranking second in ILSVRC 2016, ResNeXt uses the repeating layer strategy of
Resnet-5050, and it appends the split-transform-merge method . The magnitude of a set of
[22]
transformations is known as cardinality. Cardinality provides a novel approach to modifying model capacity
by increasing the number of separate routes. Having width and depth as critical characteristics, ResNeXt
adds on Cardinality as a new dimension. Increasing cardinality is a practical approach to enhance the
[22]
accuracy of the model . The architecture of ResNeXt is shown in Figure 5.