Page 25 - Read Online
P. 25
Page 324 He et al. Intell. Robot. 2025, 5(2), 313-32 I http://dx.doi.org/10.20517/ir.2025.16
Figure 6. The confusion matrices of MSAFNet on the FER2013 dataset.
4.3.3 Results on FERPlus
Comparison results with other state-of-the-art methods on FERPlus are shown in Table 3. The recognition
accuracy on FERPlus has been considerably improved when compared to the FER2013 dataset since FERPlus
has been relabeled and non-face images have been removed. As shown in Table 3, our MSAFNet obtains the
recognition accuracy of 89.82%. Compared to VTFF [54] and PACVT [41] which also utilized transformer archi-
tecture, our method achieves 1.01% and 1.1% improvement. The results of the confusion matrix on FERPlus
are shown in Figure 7. The confusion matrix shows that happiness, neutral, and surprise have better perfor-
mance than other expressions, and contempt, disgust, and fear have poor performance. The reason for these
results may be that contempt, disgust, and fear lack enough data compared to other expressions.
4.3.4 Results on occlusion and pose variant datasets
To verify the robustness of our method under occlusion and variant pose in real-world scenarios, we con-
duct experiments and compare the best results with occlusion and pose variant datasets, including Occlusion-
RAF-DB, Pose-RAF-DB, Occlusion-FERPlus, and Pose-FERPlus. Tables 4 and 5 show the results compared
to the state-of-the-art methods on the RAF-DB and FERPlus datasets for facial occlusion and pose variants.
Our MSAFNet obtains competitive performance compared to other methods. For facial occlusion datasets, it
achievessuperior recognitionperformance (86.38% and85.62%) on theRAF-DB and FERPlus datasets. Specif-
ically, our method outperforms the AMP-Net [15] method by 1.1% and 0.18%, which can demonstrate the
robustness of our method under facial occlusion. For pose variant datasets, our MSAFNet is significantly su-
perior to VTFF [54] , MA-Net [52] , and AMP-Net [15] on RAF-DB dataset with pose larger than 30 degrees and 45
degrees. OnFERPlusdatasetwithposelargerthan30degreesandposelargerthan45degrees, ourmethodalso
achieves higher accuracy compared with other methods. The results on occlusion and pose variant datasets
demonstrate the effectiveness of our method.
4.4. Ablation analysis
To evaluate the effectiveness of our method, we perform a series of ablation studies on RAF-DB dataset. In
the experiments, we evaluate the impact of the proposed components, the impact of different fusion methods,
and the impact of different attention methods, respectively.

