Page 23 - Read Online
P. 23

Page 322                        He et al. Intell. Robot. 2025, 5(2), 313-32  I http://dx.doi.org/10.20517/ir.2025.16

                                        Table 1. Comparison with other methods on RAF-DB dataset
                                                 Methods    Year  Accuracy (%)
                                                gACNN  [49]  2018   85.07
                                                 RAN  [45]  2020    86.90
                                                 SCN  [48]  2020    87.01
                                                 OADN  [50]  2020   87.16
                                                 DACL  [51]  2021   87.78
                                                MA-Net  [52]  2021  88.40
                                                 FDRL  [53]  2021   89.47
                                                 VTFF  [54]  2021   88.14
                                                AMP-Net  [15]  2022  89.25
                                                 ADDL  [55]  2022   89.34
                                                 PACVT  [41]  2023  88.21
                                                GSDNet  [32]  2024  90.91
                                                 DBFN  [56]  2024   87.65
                                               MSAFNet(ours)  2025  90.06
                                                The bold format is used to indi-
                                                cate the best (highest) accuracy.
                                                gACNN: Region attention mecha-
                                                nism; RAN: region attention net-
                                                works; SCN: self-cure networks;
                                                OADN: occlusion-adaptive deep
                                                network; DACL: deep attentive
                                                center loss; MA-Net: multi-scale
                                                and local attention network;
                                                FDRL: feature decomposition and
                                                reconstruction learning; VTFF:
                                                visual transformers with fea-
                                                ture fusion; AMP-Net: adaptive
                                                multilayer perceptual attention
                                                network; ADDL: adaptive deep
                                                disturbance-disentangled learn-
                                                ing;  PACVT: patch attention
                                                convolutional vision transformer;
                                                DBFN: dual-branch fusion net-
                                                work;  MSAFNet:   multi-scale
                                                attention  and   convolution-
                                                transformer fusion network.


               4.3. Comparison with state-of-the-arts
               This section compares the proposed approach MSAFNet with several state-of-the-art methods on RAF-DB,
               FERPlus, FER2013, Occlusion-RAF-DB, Pose-RAF-DB, Occlusion-FERPlus, and Pose-FERPlus. MSAFNet
               consistently achieves high accuracy and demonstrates stable performance across these benchmarks. Notably,
               it exhibits strong generalization capabilities, particularly in complex scenarios involving diverse facial expres-
               sions and emotion categories.


               4.3.1 Results on RAF-DB
               Comparison results with other state-of-the-art methods on RAF-DB in recent years with seven emotion cat-
               egories are shown in Table 1. Multi-scale and local attention network (MA-Net) [52]  utilized global and local
               features to address the issues both occlusion and pose variation and got an accuracy of 88.40%. Adaptive mul-
               tilayer perceptual attention network (AMP-Net) [15]  uses different fine-grained features to extract global, local
               and salient features and obtained recognition accuracy of 89.25% on RAF-DB dataset. As shown in Table 1,
               our proposed method MSAFNet obtains the recognition accuracy of 89.77% on RAF-DB and achieves 1.66%
               and 0.81% improvement compared with the MA-Net [52]  and AMP-Net [15] , respectively. Compared to the vi-
               sual transformers with feature fusion (VTFF) [54]  which used transformers and attention selective fusion, our
               method has 1.92% improvement. PACVT [41]  can also extract local and global features with attention weights
   18   19   20   21   22   23   24   25   26   27   28