Page 24 - Read Online
P. 24

He et al. Intell. Robot. 2025, 5(2), 313-32  I http://dx.doi.org/10.20517/ir.2025.16  Page 323

































                                     Figure 5. The confusion matrices of MSAFNet on the RAF-DB dataset.


                                        Table 2. Comparison with other methods on FER2013 dataset
                                                 Methods     Year  Accuracy (%)
                                                SHCNN  [57]  2019    69.10
                                              Pre-trained CNN  [11]  2019  71.14
                                                AWHFL  [58]  2019    72.67
                                                 FreNet  [59]  2020  64.41
                                                LBAN-IL  [60]  2021   73.11
                                               MSAFNet(ours)  2025   73.25

                                               The bold format is used to indi-cate
                                               the best (highest) accuracy. SHCNN:
                                               Shallow  convolutional neural network;
                                               CNN: convolution neural network;
                                               AWHFL: adaptive weighting of
                                               handcrafted feature losses;  LBAN-IL:
                                               local binary at-tention network with
                                               instance loss; MSAFNet:  multi-scale
                                               attention and convolution-transformer
                                               fusion network.



               and ViT. Compared with PACVT, our method achieves a higher accuracy by about 1.85%. In the confusion
               matrix shown in Figure 5, happiness expression has the highest recognition accuracy, while fear and disgust
               expression has poor performance. This disparity primarily arises from three factors: data scarcity, inter-class
               similarity and feature subtlety.


               4.3.2. Results on FER2013
               Table 2 shows the results that our method compares with state-of-the-art methods on FER2013 dataset. Our
               MSAFNet obtains an accuracy of 73.25% on FER2013 dataset, which is competitive with other advanced meth-
               ods. The confusion matrix in Figure 6 shows that fear expression is the poorest to recognize and happiness
               has the highest recognition rate.
   19   20   21   22   23   24   25   26   27   28   29