Page 91 - Read Online

P. 91

Page 85 Bah et al. Intell Robot 2022;2(1):7288 I http://dx.doi.org/10.20517/ir.2021.16

Table 4. ResNet based model classification performance test results on CK+
Precision Recall F1-Score Support
Anger 1.00 1.00 1.00 14
Contempt 1.00 0.6 0.75 5
Disgust 1.00 0.94 0.97 18
Fear 0.88 1.00 0.93 7
Happy 0.91 1.00 0.95 21
Sadness 1.00 1.00 1.00 9
Surprise 1.00 1.00 1.00 25
Accuracy 0.97 99
Run time 29 min

Table 5. Comparison with other CNN based models results on CK+

Methodology Accuracy (%)
Diff ResNet [19] 95.74
Improved VGG-19 [44] 96
Ours 97

Figure 9. Framework testing on an individual pose

much in terms of accuracy. The goal is to build a robust and accurate model. Therefore, looking at Table 2
and Table 3, we observe that for the two models the precisions of all the labels are over 50%, this is to say
that for each emotion at least 50% time the model is giving a good prediction. The harmonic mean of these
of the recall and precision hereafter referred to as the f1-score, is utilized to determine how well the model
performs in terms of facial emotion detection. The value of the f1-score in both cases is 65%, that value is very
acceptable regarding how complex the dataset is. Finally, given that the Residual based model has a large 1%
plus, accuracy was the metric that helped us identify the optimal model.

We experimented on posed images of an individual to test how well our system recognizes facial expressions
that have been previously trained. The first step is to apply some pre-processing such as face detection and face
cropping on these images. The image is reshaped to (48,48,1) to have the same shape as the model is trained
on, then we use the model for the prediction as shown in Figure 9.

On this prediction, the model did very well with a high percentage of confidentiality (86%), which shows
how efficient our framework is. Therefore, the model gave wrong predictions on some labels, but with a low
percentage (40%), it’s due to the resemblance of emotions, sees Figure 10. For example, here, the individual
was asked to pose disgust, but the model predicted neutral. And as already mentioned, the FERGIT dataset
has a lot of mis-classified emotions.

86 87 88 89 90 91 92 93 94 95 96