Page 101 - Read Online
P. 101
Wu et al. J. Mater. Inf. 2025, 5, 15 https://dx.doi.org/10.20517/jmi.2024.67 Page 5 of 15
DL
Apart from the aforementioned ML models, DL has attracted increasing interest for catalyst development
due to its capability for more complicated material systems and properties. DL includes convolutional
neural networks (CNNs) and graph neural networks (GNNs). CNNs can deal with data in Euclidean space,
such as images, videos, etc., and have shown promising performance for the explorations of electrocatalyst
materials. For example, Yang et al. employed a CNN model to predict the adsorption energies of various
adsorbates on 2D SACs based on their electronic density of states (DOS). They achieved a low mean
absolute error (MAE) of 0.06 eV across various adsorbates such as CO , COOH, CO, and CHO. Combining
2
the CNN model with the volcano plot in the analysis of catalysis performance, the framework is useful for
[56]
designing SACs as the electrocatalyst for CO RR . On the other hand, GNNs are designed to solve
2
problems of non-Euclidean data including social networks, knowledge graphs, and molecules/materials.
Given the node, edge and global attributes, the information is transformed using “message passing”
algorithms, which can be written as:
(2)
(3)
where m is the message, h and e are the embeddings of nodes and edges, respectively, M denotes the
t
message update function, and U indicates the vertex update functions. Afterward, the readout layer
t
calculates node, edge and global embeddings. Finally, the results can be predicted by adding fully connected
[57]
layers to the embeddings corresponding to the task . GNNs usually perform better than conventional ML
models without the requirement of dedicated feature designs. Specifically, simple and accessible features
such as electronegativity, covalent radius, and group number can be directly used as the input of GNNs.
[58]
Recent applications in ML interatomic potentials (MLIPs) have illustrated that GNNs have better
generalization and higher accuracy than other models [59-63] . Batatia et al. proposed a new equivariant
message passing GNN (MPNN) model called MACE [64,65] , one of the most accurate MLIPs, as evidenced by
F1 score of 0.669 eV/atom, coefficient of determination (R ) of 0.697 eV/atom, and MAE of
2
[66]
0.057 eV/atom . MACE can also be applied to complex scenarios for heterogeneous catalysis such as
Pourbaix diagrams, linear scaling relationship (LSR), CO (electro-)oxidation on Cu, and carborane
[67]
rearrangement of heterogeneous catalysis .
Natural language processing
The advent of large langue models (LLMs)-based ChatGPT represents the most significant progress in
nature language processing, which has drawn great attention of the scientific community . LLMs are often
[68]
built on transformer architecture designed to effectively process sequential data such as text [69,70] . The critical
component of transformer architecture is the attention mechanism, which focuses on relevant parts of the
dataset to enhance performance. However, LLMs can sometimes produce misleading results in professional
[71]
contexts due to the immense diversity of large datasets . Fortunately, transfer learning algorithms can be
utilized to adapt the universal LLMs for specific tasks . Although the application of LLMs in
[72]
electrocatalysis is still in the early stages, various attempts have been made. For example, Beltagy et al.
developed the pretrained BERT model, SciBERT, to automatically extract scientific knowledge from existing
[75]
[74]
papers . Other efforts include CataLM and InCrEDible-MaT-GO . However, the application of LLMs
[73]
on functional materials such as electrocatalyst development may require very large and complex models,
which often lead to difficulty in understanding their perdition. Another issue is that language-like
representation for the material structure often lacks inherent physics or chemistry information, which
might require more complicated models or training processes.

