Page 79 - Read Online
P. 79
Page 73 Bah et al. Intell Robot 2022;2(1):7288 I http://dx.doi.org/10.20517/ir.2021.16
1. INTRODUCTION
Detecting a person’s emotions has become increasingly important in recent years. It has attracted interest, in
[1]
human emotion detection across a variety of areas including but not limited to human-computer , education,
and medicine. Interpersonal communication is impossible without emotions coming into play. In the daily
life of human communication, emotions play a significant role. Human emotional states can be gleaned from
spoken (verbal), and nonverbal information is collected by a variety of sensors. According to the 7-38-55
[2]
rule , verbal communication accounts for only 7% of all communication, whereas nonverbal components
of our daily conversation, such as voice tonality and body language, account for 38% and 55%, respectively.
Human emotions are exposed via changes on the face, voice intonation as well as body language. Studies have
proven that emotions expressed visually are most prominent which are displayed on individual faces. They
can be shown in a variety of ways, some of which are visible to the human eye and others that are not.
Emotion is a multidisciplinary area that includes psychology, computer science, and other disciplines. It can
be described in psychology terms as a psychological state that is associated with thoughts, feelings, behavioral
[3]
reactions, and a level of pleasure or dissatisfaction . Whereas in the field of computer science, it may be
recognized in the form of image, audio, video, and text documents. Emotion analysis from any of those docu-
ment types is not easy. People communicate mostly through their emotional reactions which can be positive,
negative, or neutral. It is generally accepted that good emotions are conveyed as a variety of different adjectives
such as cheerful, happy, joy, excited, while negative emotions can be hate, anger, fear, depression, sadness and
so on. People spend the majority of their time posting and expressing their feelings on social media sites such
[4]
as Facebook, Instagram, and others . They allow people to express their emotions in many different ways.
In our daily lives, we are faced with situations that affect our emotions. It has a significant impact on hu-
[5]
man cognitive functions such as perception, attention, learning, memory, reasoning, and problem-solving .
Among these, attention is the most impacted, both in terms of altering attention’s selectivity and in terms of
driving actions and behaviors. Human emotion can have a great impact on their health if poorly managed. It
[6]
weakens the immune system making it more susceptible to colds and other illnesses .
Deep learning’s growth has greatly improved the accuracy of facial expression identification tasks. Various
Convolutional Neural Network (CNN) models have recently been built to overcome problems with emotion
recognition from facial expressions. It is one of the leading networks in this field. A CNN architecture is
composed of convolutions, activations, and pooling layers. With the advancement of Artificial Intelligence
technologies such as pattern recognition and computer vision, computing terminal devices can now interpret
the changes in human expressions to a degree, allowing for greater diversity in human-computer communica-
[7]
tion . In Facial Expression Recognition (FER), the major aim is to map distinct facial expressions to their
corresponding emotional states. It consists of extracting the features from the facial image and recognizing the
emotion presented. Before feeding facial images to a CNN or other different machine learning classifier, some
[8]
image processing techniques need to be done. Existing methods include discrete wavelet transform , linear
[9]
discriminant analysis , histogram equalization [10] , histogram of gradients [11] , viola-jones algorithm [12] , etc.
When it comes to real conditions like occlusion and light, manual feature extraction has a good identification
capacity in specific special situations or laboratory environments, but it struggles when it comes to natural
conditions. Feature extraction approaches based on deep convolution neural networks have attracted a lot of
attention recently [13] , and this has helped to improve facial emotion detection performance. Deep Residual
Network [14] (DeepResNet)whichwas easier totrainandoptimize, hasplayeda majorroleinthefieldofimage
recognition, introducing a novel approach to Deep Neural Network optimization.
Previous work on emotion recognition depended on a two-stage classical learning strategy. The first stage
consists of extracting features using image processing techniques. The second stage, on the other hand, relied
ontheemploymentofatraditionalmachinelearningclassifiersuchasSupportVectorMachine(SVM)todetect