Page 53 - Read Online
P. 53

Page 8 of 14                            Li et al. J Mater Inf 2024;4:4  I http://dx.doi.org/10.20517/jmi.2023.41


                           Table 1. Performance comparison of machine learning models using different descriptors [Figure 1]
                                           Model              Algorithm  MDAE  MAE  RMSE  MARPD
                                Using conventional VT-based descriptors  GB  0.20  0.28  0.42  108%
                                                                KNN   0.20  0.29  0.44  105%
                                                                RF     0.19  0.27  0.41  103%
                              Using improved VT-based fingerprint descriptors  GB  0.16  0.20  0.27  89%
                                                                KNN    0.11  0.17  0.25  72%
                                                                RF     0.08  0.14  0.22  61%
                            MDAE: Median Absolute Error; MAE: Mean Absolute Error; RMSE: Root Mean Squared Error; MARPD:
                            Mean Absolute Relative Percent Difference; VT: Voronoi tessellation; GB: Gradient Boosting; KNN: K-Nearest
                            Neighbors; RF: Random Forest.



               RESULTS AND DISCUSSION
               Inthiswork,weproposeacomprehensiveanduniversalframeworkforpredictingmolecularadsorptionenergy
               on surfaces, which involves three key steps [Figure 1]. Firstly, we transform the original material structure into
               a graph-based 2D Voronoi diagram and extract improved fingerprint information. Next, we further optimize
               the descriptors using ML techniques. Finally, we utilize the powerful performance of DL to construct, train,
               and predict. This approach enables us to capture important descriptors of the adsorption process and achieve
               high prediction accuracy using either traditional ML or advanced DL algorithms.

               Testing local environment approach with traditional machine learning models
               Traditional ML algorithms have high interpretability and can output weights for descriptors, so they are of-
               ten used to improve descriptors [52] . In this ML Section, to test the performance of our local environment
               interaction-based approach, we applied three widely-used algorithms, namely the Gradient Boosting (GB) [53] ,
               K-Nearest Neighbors (KNN) [54] , and Random Forest (RF) [55] . In addition, for some systems with specific
               requirements, we provide a simple path to use the RF algorithm to determine the feature importance in the
               training process to fine-tune the model; refer to [Supplementary S3] for specific results upon further analysis.


               Table 1 summarizes the training results. The random forest model incorporating the descriptors of the LEI-
               framework produces the lowest MAE value of 0.13 eV. The RF algorithm outperforms the other two ML meth-
               ods thanks to its ability to integrate decision trees and capture complex inter-atomic relationships. In addition,
               the ML models using the improved VT-based fingerprint descriptors [Figure 1, module 2] outperform those
               using conventional VT-based descriptors [Figure 1, module 1]. This is because the conventional VT method
               cannot incorporate layered chemical information about adsorbed hydrogen atoms into the model. As a result,
               the descriptors of the LEI-framework showed reliable performance in predicting the H adsorption energy of
               catalytical processes with significantly lower error rates.


               Testing local environment approach with advanced deep learning models
               To check whether our approach can also apply for DL models and whether advanced DL models have higher
               performance than traditional ML models, we introduce a ResNet [56]  into our model. The ResNet has excellent
               tunability and fast training speed, allowing for greater versatility in applying our model to a wider range of
               adsorbates and scenarios. Additionally, its strong generalization performance facilitates easy portability of our
               model to other fields. Considering the local environment descriptor input as a list type [Figure 1], which was
               previously represented by a 6 × 11 matrix, we can now input this matrix size in a graph form as local environ-
               ment input into ResNet (LERN). We utilize a convolutional neural network with a 3 × 3 convolutional kernel
               to further process this graph, mapping the matrix to predicted parameters, specifically adsorption energy.


               The detailed structure is shown in Table 2.

               The prediction results of the LERN model are shown in Figure 3. The Residual plot is beneficial for analyzing
   48   49   50   51   52   53   54   55   56   57   58