Page 80 - Read Online

P. 80

Bah et al. Intell Robot 2022;2(1):7288 I http://dx.doi.org/10.20517/ir.2021.16 Page 74

emotions. FER has used a variety of methodologies to extract the visual highlights of picture layouts such as
weighted random forest (WRF) [15] . Hasani and Mahoor [16] utilized a novel network called ResNet-LSTM
to capture Spatio-temporal data, which combine lower highlights to LSTMs specifically. The deep learning
networkhasendedupasthemostwidelyutilizedstrategyinFERduetoitspowerfulfeatureextractioncapacity.

Using histogram of oriented gradients (HOG) in the wavelet domain, Nigam et al. [11] proposed a four steps
process for efficient FER (face processing, domain transformation, feature extraction and expression recog-
nition). In the expression recognition part, the authors used a tree-based multi-class SVM to classify the
retrieved HOG features in discrete wavelet transform (DWT). The system was trained and tested with CK+,
JAFFE and Yale datasets. The accuracy observed in the test set of these three (3) datasets are 90%, 71.43% and
75% respectively.

Upon deeply analyzing the Facial Expression Recognition problem, Minaee et al. proposed the use of Atten-
tional Convolutional Neural Network [17] instead of adding layers/neurons. Aside from that, they also sug-
gested adding a visualization technique that can find important parts of the face that is necessary for detecting
different emotions based on the classifier’s output. Their architecture includes a feature extraction part and
spatial transformer network that takes the input and uses the affine transformation to wrap it to the output.
They achieved a validation accuracy of 70.02 per cent for the categorization of the 7 classes using the FER2013
dataset.

WiththehelpoftheResidualMaskingNetwork [18] , theauthorsfocusedondeeparchitecturewiththeattention
mechanism. They used a segmentation network to refine feature maps, by enabling the network to focus on
relevant information to make the correct decision. Their work was divided into 2 parts: the residual masking
block which contains a residual layer, and the ensemble method for the combination with 7 different CNNs.
In the end, they managed to get an overall accuracy of 74.14% on the test set of FER2013 dataset.

Pu and Zhu [19] developed a FER framework based on the combination of a feature extraction network and
pre-trained model. The feature extraction consists of supervised learning optical flow based on residual block.
The classifier is the Inception architecture. By experimenting with their method on CK+ and FER2013 datasets
they achieved the average accuracy of 95.74% and 73.11% respectively. In order to resolve the fact that CNNs
require a lot of computation resources to train and process emotional recognition, Chowanda [20] proposed a
separable CNN. In the experiment, a comparison of four networks has been made. Networks with and without
separable modules, using flatten and fully connected layers, and using global average pooling. Their proposed
architecture was faster, with fewer parameters and achieved an accuracy of 99.4% on the CK+ dataset.

Deep learning methods have recently sparked a lot of interest, and there is a lot of research going on using
deep learning methods to recognize emotions from facial expressions. However, this study proposes the accu-
rate identification of facial emotion using a deep residual-based neural network architecture model. ResNet
was chosen as the study’s foundation because residual-based network models have shown to be effective in a
variety of image recognition applications and have also overcome the problem of overfitting. In our work, we
used emotional expressions such as happiness, surprise, anger, sadness, disgust, neutral, and fear to pick up
emotional changes on individual faces. Furthermore, the main contribution of this work are:
1. Propose a lighter version of CNN using Residual Blocks with fewer number of trainable parameters com-
pared to over 23 millions for the original ResNet network.
2. Locate the best position to use the Residual Blocks to avoid overfitting, and finally get a satisfying perfor-
mance.
3. Show the important of using Residual Blocks compared to the architecture without them.
4. Weight the cross-entropy loss function in order to deal with imbalance problem that suffer the FERGIT

75 76 77 78 79 80 81 82 83 84 85