Page 101 - Read Online
P. 101

Wu et al. J. Mater. Inf. 2025, 5, 15  https://dx.doi.org/10.20517/jmi.2024.67    Page 5 of 15

               DL
               Apart from the aforementioned ML models, DL has attracted increasing interest for catalyst development
               due to its capability for more complicated material systems and properties. DL includes convolutional
               neural networks (CNNs) and graph neural networks (GNNs). CNNs can deal with data in Euclidean space,
               such as images, videos, etc., and have shown promising performance for the explorations of electrocatalyst
               materials. For example, Yang et al. employed a CNN model to predict the adsorption energies of various
               adsorbates on 2D SACs based on their electronic density of states (DOS). They achieved a low mean
               absolute error (MAE) of 0.06 eV across various adsorbates such as CO , COOH, CO, and CHO. Combining
                                                                          2
               the CNN model with the volcano plot in the analysis of catalysis performance, the framework is useful for
                                                           [56]
               designing SACs as the electrocatalyst for CO RR . On the other hand, GNNs are designed to solve
                                                        2
               problems of non-Euclidean data including social networks, knowledge graphs, and molecules/materials.
               Given the node, edge and global attributes, the information is transformed using “message passing”
               algorithms, which can be written as:

                                                                                                        (2)



                                                                                                        (3)

               where m is the message, h and e are the embeddings of nodes and edges, respectively, M  denotes the
                                                                                               t
               message update function, and U  indicates the vertex update functions. Afterward, the readout layer
                                            t
               calculates node, edge and global embeddings. Finally, the results can be predicted by adding fully connected
                                                           [57]
               layers to the embeddings corresponding to the task . GNNs usually perform better than conventional ML
               models without the requirement of dedicated feature designs. Specifically, simple and accessible features
               such as electronegativity, covalent radius, and group number  can be directly used as the input of GNNs.
                                                                   [58]
               Recent applications in ML interatomic potentials (MLIPs) have illustrated that GNNs have better
               generalization and higher accuracy than other models [59-63] . Batatia et al. proposed a new equivariant
               message passing GNN (MPNN) model called MACE [64,65] , one of the most accurate MLIPs, as evidenced by
               F1  score  of  0.669  eV/atom,  coefficient  of  determination  (R )  of  0.697  eV/atom,  and  MAE  of
                                                                         2
                           [66]
               0.057 eV/atom . MACE can also be applied to complex scenarios for heterogeneous catalysis such as
               Pourbaix diagrams, linear scaling relationship (LSR), CO (electro-)oxidation on Cu, and carborane
                                                  [67]
               rearrangement of heterogeneous catalysis .
               Natural language processing
               The advent of large langue models (LLMs)-based ChatGPT represents the most significant progress in
               nature language processing, which has drawn great attention of the scientific community . LLMs are often
                                                                                          [68]
               built on transformer architecture designed to effectively process sequential data such as text [69,70] . The critical
               component of transformer architecture is the attention mechanism, which focuses on relevant parts of the
               dataset to enhance performance. However, LLMs can sometimes produce misleading results in professional
                                                              [71]
               contexts due to the immense diversity of large datasets . Fortunately, transfer learning algorithms can be
               utilized  to  adapt  the  universal  LLMs  for  specific  tasks . Although  the  application  of  LLMs  in
                                                                   [72]
               electrocatalysis is still in the early stages, various attempts have been made. For example, Beltagy et al.
               developed the pretrained BERT model, SciBERT, to automatically extract scientific knowledge from existing
                                                                          [75]
                                                 [74]
               papers . Other efforts include CataLM  and InCrEDible-MaT-GO . However, the application of LLMs
                     [73]
               on functional materials such as electrocatalyst development may require very large and complex models,
               which often lead to difficulty in understanding their perdition. Another issue is that language-like
               representation for the material structure often lacks inherent physics or chemistry information, which
               might require more complicated models or training processes.
   96   97   98   99   100   101   102   103   104   105   106