Page 30 - Read Online
P. 30
He et al. Intell. Robot. 2025, 5(2), 313-32 I http://dx.doi.org/10.20517/ir.2025.16 Page 329
Figure 9. Visualization results. The CAM of MSA is compared with other attention methods. The images and labels are from FER2013 and
RAF-DB. CAM: Class activation mapping; MSA: multi-scale attention.
results compared with the existing methods and the ablation experiment show that the proposed method can
achieve better performance and have high robustness on real-world facial expression datasets. In future work,
wewillfocusondesigningdatasetstoquantifyexpressionsandestablishingevaluationmetrics. Wewillexplore
how to integrate cognition and deep learning with minimal discrepancies to maximize information extraction.
This research will extend to diverse populations, where varied emotional expressions may be present.
DECLARATIONS
Authors’ contributions
Made substantial contributions to conception and design of the study and performed data analysis and inter-
pretation: He, H.; Liao, R.; Li, Y.
Performed data acquisition and provided administrative, technical, and material support: He, H.
Availability of data and materials
The datasets used in this study are sourced from publicly available datasets, including RAF-DB, FER2013, and
FERPlus. These datasets can be accessed at: RAF-DB: http://www.whdeng.cn/RAF/model1.html; FER2013:
https://www.kaggle.com/datasets/msambare/fer2013; FERPlus: https://www.kaggle.com/datasets/debanga/fa
cial-expression-recognition-ferplus. For proprietary or additional datasets used in this study, access requests
canbemadebycontactingthecorrespondingauthorathehuifang@gdep.edu.cn. Thecodeusedinthisresearch
is available at https://github.com/SCNU-RISLAB/MSAFNet or can be obtained upon request.
Financial support and sponsorship
ThisworkwassupportedinpartbytheGuangdongAssociationofHigherEducationunderthe“14thFive-Year”
Plan for Higher Education Research Projects, grant number 24GYB148.

