Page 47 - Read Online
P. 47
Page 2 of 14 Li et al. J Mater Inf 2024;4:4 I http://dx.doi.org/10.20517/jmi.2023.41
INTRODUCTION
Machine learning (ML) has found widespread applications in the field of materials science and engineer-
ing [1,2] . Researchers such as Behler and Parrinello have utilized atomic neural networks to learn total en-
[3]
ergy, which has been instrumental in developing interatomic potentials . Moreover, ML techniques, such
[5]
[4]
as SchNet , Crystal Graph Convolutional Neural Network (CGCNN) , and Atomistic Line Graph Neu-
[6]
ral Network (ALIGNN) , have been employed to establish relationships between atomic structures and their
properties. These methods havebeen used topredict upto50 different characteristics ofcrystals andmolecular
materials, including formation energy and electronic band gaps. Additionally, deep learning (DL) techniques
have been leveraged in various applications to identify chemically feasible spaces. For instance, Bayesian opti-
mization methods, in conjunction with MEGNet, have been employed as energy evaluators for direct structure
[7]
relaxation . To further enhance the performance, BOWSR incorporates band symmetry relaxation alongside
[8]
Bayesian optimization . Accurate characterization of localized physicochemical properties is of paramount
importance in numerous scientific and engineering disciplines. Ranging from electrochemical catalysis, sen-
sors, carbon capture, and energy storage and conversion to drug delivery, exploring and exploiting the in-
tricacies of local environments are at the heart of many frontier investigations. For instance, adsorption is
a pervasive surface phenomenon in areas such as electrocatalysis, with its understanding rooted in founda-
[9]
tional theories such as bonding and adsorption thermodynamics . Crucially, the influence of neighboring
atoms on adsorption sites must be fully accounted for, as factors including atomic electronic structures, spa-
tial constraints, surface stoichiometry, and surface defects can all impinge on the behavior of adsorbates on
surfaces [10] .
Recently, DL models have found applications in the catalysis realm, as demonstrated by various end-to-end
graph neural network models developed in the Open Catalyst Project (OCP) challenge, encompassing equiv-
ariantEuclideanNeuralNetworks(e3nn), SphericalChannelNetwork(SCN),andEquivariantSphericalChan-
nel Network (eSCN), among others [11–17] . While these models showcase stellar performance on the data-rich
OC20 database [18] , their computational complexity often leads to overfitting on smaller datasets [19–22] . Specif-
ically, unlike single metal materials, multi-metal alloy catalysts exhibit excellent physical and chemical prop-
erties in the field of nanoparticles [23] . However, for this type of complex adsorption systems such as large
organic molecules and certain transition metal oxides, accurate Density Functional Theory (DFT) computa-
tions are challenging, resulting in a paucity of reliable data [24,25] . Currently, graph neural networks based on
global information are designed to capture the topology of the data, making them well-suited for processing
small- to medium-scale molecular and crystalline materials, where local connectivity does not increase signif-
icantly with system size. However, for large systems, the number of edges and nodes in the graph network
increases dramatically, resulting in significant performance degradation. Furthermore, it is difficult for the
graph network to distinguish which information is critical. Therefore, extracting local information is critical
to the adsorption energy. The structural diversity of large nanoparticles or two-dimensional (2D) materials
is higher, and it is difficult for graph networks to handle the complexity caused by structural differences. In
addition, implementingeffectiveboundary conditionprocessingingraphnetworksis also a challenge. The Ad-
sorbate Chemical Environment-based Graph Convolution Neural Network (ACE-GCN) endeavors to convert
each adsorbent surface to the configuration is initially split into subgraphs to explicitly account for the local
chemical and structural environment of the adsorbent [26] . This approach suffers from weak interpretability
and captures interactions in complex molecular systems. The segmentation of subgraphs has limitations and
uncertainties, making it difficult to generalize to more diverse systems such as 2D materials. Employing ML
approaches boasts advantages such as heightened interpretability, reduced data dependency, flexibility in de-
signing and selecting features tailored to specific problems, and capabilities in prediction interpretation and
error analysis [27–32] . Earlier studiesproposed numerousfeature engineering descriptors to enhancethe predic-
tion of ML models for adsorption energy, encompassing atomic number, ionization energy, electronegativity,
ionic radius, and inter-atomic interactions [33–35] . These descriptors reported are equally pertinent to the field
of electrocatalysis [36] . Yet, adequately representing metal compound adsorption sites remains a major chal-

