Page 53 - Read Online
P. 53
Page 8 of 14 Li et al. J Mater Inf 2024;4:4 I http://dx.doi.org/10.20517/jmi.2023.41
Table 1. Performance comparison of machine learning models using different descriptors [Figure 1]
Model Algorithm MDAE MAE RMSE MARPD
Using conventional VT-based descriptors GB 0.20 0.28 0.42 108%
KNN 0.20 0.29 0.44 105%
RF 0.19 0.27 0.41 103%
Using improved VT-based fingerprint descriptors GB 0.16 0.20 0.27 89%
KNN 0.11 0.17 0.25 72%
RF 0.08 0.14 0.22 61%
MDAE: Median Absolute Error; MAE: Mean Absolute Error; RMSE: Root Mean Squared Error; MARPD:
Mean Absolute Relative Percent Difference; VT: Voronoi tessellation; GB: Gradient Boosting; KNN: K-Nearest
Neighbors; RF: Random Forest.
RESULTS AND DISCUSSION
Inthiswork,weproposeacomprehensiveanduniversalframeworkforpredictingmolecularadsorptionenergy
on surfaces, which involves three key steps [Figure 1]. Firstly, we transform the original material structure into
a graph-based 2D Voronoi diagram and extract improved fingerprint information. Next, we further optimize
the descriptors using ML techniques. Finally, we utilize the powerful performance of DL to construct, train,
and predict. This approach enables us to capture important descriptors of the adsorption process and achieve
high prediction accuracy using either traditional ML or advanced DL algorithms.
Testing local environment approach with traditional machine learning models
Traditional ML algorithms have high interpretability and can output weights for descriptors, so they are of-
ten used to improve descriptors [52] . In this ML Section, to test the performance of our local environment
interaction-based approach, we applied three widely-used algorithms, namely the Gradient Boosting (GB) [53] ,
K-Nearest Neighbors (KNN) [54] , and Random Forest (RF) [55] . In addition, for some systems with specific
requirements, we provide a simple path to use the RF algorithm to determine the feature importance in the
training process to fine-tune the model; refer to [Supplementary S3] for specific results upon further analysis.
Table 1 summarizes the training results. The random forest model incorporating the descriptors of the LEI-
framework produces the lowest MAE value of 0.13 eV. The RF algorithm outperforms the other two ML meth-
ods thanks to its ability to integrate decision trees and capture complex inter-atomic relationships. In addition,
the ML models using the improved VT-based fingerprint descriptors [Figure 1, module 2] outperform those
using conventional VT-based descriptors [Figure 1, module 1]. This is because the conventional VT method
cannot incorporate layered chemical information about adsorbed hydrogen atoms into the model. As a result,
the descriptors of the LEI-framework showed reliable performance in predicting the H adsorption energy of
catalytical processes with significantly lower error rates.
Testing local environment approach with advanced deep learning models
To check whether our approach can also apply for DL models and whether advanced DL models have higher
performance than traditional ML models, we introduce a ResNet [56] into our model. The ResNet has excellent
tunability and fast training speed, allowing for greater versatility in applying our model to a wider range of
adsorbates and scenarios. Additionally, its strong generalization performance facilitates easy portability of our
model to other fields. Considering the local environment descriptor input as a list type [Figure 1], which was
previously represented by a 6 × 11 matrix, we can now input this matrix size in a graph form as local environ-
ment input into ResNet (LERN). We utilize a convolutional neural network with a 3 × 3 convolutional kernel
to further process this graph, mapping the matrix to predicted parameters, specifically adsorption energy.
The detailed structure is shown in Table 2.
The prediction results of the LERN model are shown in Figure 3. The Residual plot is beneficial for analyzing

