Page 84 - Read Online

P. 84

Bah et al. Intell Robot 2022;2(1):7288 I http://dx.doi.org/10.20517/ir.2021.16 Page 78

2.1.4. Batch-normalization
Batch-Normalization (BN) is a regularization technique [29] that speeds up and stabilizes the training of Deep
Neural Networks (DNN). BN avoids the problem of massive gradient updates, which cause divergent loss and
uncontrollable activation as network depth increases. As a result, it entails using the current batch’s mean and
variance to normalize activation vectors from hidden layers [30] . In this research, we placed the BN layer after
the activation in the simple Convolutional Blocks and before the activation in the Residual blocks, see Figure 1.

2.1.5. Max pooling
Pooling is performed to reduce the dimensionality of the convolved image [24] . By applying pooling operation,
we reduce the number of parameters and fight against overfitting. Max pooling concerns taking the maximum
pixels in the size of the given windows [31] . During this process, the model does not learn. In our work, we
took a 2 × 2 window size and strides of 2 for the whole max-pooling layers. The output size is also given by
the Equation (1), where padding is 0. Using these parameters, we divide the height and width of each feature
map by 2.

2.1.6. Dropout
Dropout [32] is by far the most used Deep Neural Network regularization approach. It boosts the accuracy of
the model and avoids overfitting. The idea of using dropout is to randomly prevent some neurons at one step
to fire with a frequency of rate [33] , while the other neurons are scaled up by 1/(1 − ) so that the sum inside
the neuron remains unchanged. The same neuron can be actif at the next step and so on so forth. is the
hyper-parameter of the dropout layer, in our study we found out that the best value of is 0.3 for the early
layers of the feature extraction part and 0.4 for the last Convolutional Block.

2.1.7. Residual block
The Residual Block also known as identity shortcut connection was used in our study. It has a function of

( ) = ( ) + (8)

Where ( ) represents the output learning, is the input and ( ) is the residual layer [14] . The advantage of
this network in our study is that it reduced considerably the loss during the training and increased the accuracy
on the test set. The residual block is used to solve the problem of vanishing gradients. By skipping some
connections, we will allow the back-propagation towards the entire network and so give better performance.
In our implementation, we discovered that using the shortcut branch of 1 × 1 convolution is not suitable as it
does not help to reduce the overfitting, see Figure 1.

2.1.8. Global average pooling
Most of the research in CNN use flatten layer [34] to wrap up into a 1D vector the extracted features from
previous convolutional layers and forward them to the fully connected layers. Global Average Pooling is a
pooling technique used to substitute fully connected layers in traditional CNNs [22] . In this study using the
average pooling layer, the resulting vector, the average of each feature map is fed directly into the softmax layer
instead of constructing fully connected layers on top of the feature maps.

2.2. Data description
In this study, we mainly used the FERGIT dataset which is a combination of the FER-2013 and muxspace
datasets. The FER2013 database was collected from the internet, and most pictures were captured in the wild
using search engine research. It appears to be a low human FER system with an accuracy of about 65% [35] .
The FERGIT dataset comprises 49,300 detected faces in a grayscale of 48-by-48 pixels. The images shown in
Figure 2 are sample emotions from the FER2013 dataset.

The FER2013 has many problems itself, thus making it very difficult for deep learning architecture to achieve

79 80 81 82 83 84 85 86 87 88 89