Page 85 - Read Online
P. 85
Page 79 Bah et al. Intell Robot 2022;2(1):7288 I http://dx.doi.org/10.20517/ir.2021.16
Figure 2. The seven expressions included in the FER-2013 dataset (anger, fear, happiness, sadness, surprise, disgust, and neutral).
better results with its data. Some major issues are imbalanced data, intra-class variation, and occlusion. The
FERGIT database is a largely imbalanced dataset, in the training data, classes have huge different number of
samples. the happy emotion has more than 13 thousand samples, whereas the disgust has just six hundred
samples see Figure 3.
The intra-class variation is the variance within the same class. Minimizing intra-class variation whiles max-
imizing inter-class variation has a significant effect on classification. Variations, uncontrolled illusions, and
occlusions are problems that face recognition systems face in real-life applications [36] . These problems lead to
accuracy degradation compared to dataset experimental test performance. A facial occlusion posture is one
of several potential stances in which something blocks (occludes) a portion of a person’s face, such as their
hand. Occlusion might be caused by one or both hands being immediately on or in front of the face. Likewise,
hair, caps, and sunglasses are all common items that obstruct the view of the face. Despite occlusion posing a
challenge to face recognition, they could potentially yield valuable information because people face using their
hands when communicating via gestures.
2.3. Data preprocessing
First, we arbitrarily partitioned the training information into three parts: 44,370 faces (90% of the dataset)
were used for preparing our model, 2465 (5% of the dataset) faces for validation, and 2465 (5% of the dataset)
faces for testing, as detailed in Figure 4. The size of the dataset is relatively small; therefore, there is a need to
augment the dataset to create new data that the model has not seen before from the training set.
Since the FERGIT dataset only contains facial images [35] , no face detection, localization, cropping, or face
alignment were performed during data preparation for the training. We only performed those steps when