Page 100 - Read Online
P. 100

Page 4 of 15                         Wu et al. J. Mater. Inf. 2025, 5, 15  https://dx.doi.org/10.20517/jmi.2024.67

               ML
               Linear regression (LR) is the simplest ML model. It assumes a linear relationship between the output (y) and
               the input descriptors (X , X  … X ), written as:
                                   1
                                           n
                                      2
                                                                                                        (1)

               where y is the predicted value of the model, w  is the bias, w  is the regression coefficient of the n-th
                                                                      n
                                                         0
               independent variable [n ∈ (1, 2, …, n)], and ϵ is the error for this model.
               The Brønsted−Evans−Polanyi (BEP) principle and scaling relationships are two well-known empirical linear
               rules in catalysis. The BEP relation states that the activation barrier of a reaction (E ) linearly scales with the
                                                                                     a
               corresponding reaction energy (ΔE), i.e., E  = αΔE + β. Scaling relationships refer to the widely observed
                                                    a
               linearity in the binding energy between different adsorbates which are similarly bound to the catalysts, that
               is, ΔE  = mΔE  + b. These linear relationships have been intensively applied to simplify and accelerate the
                    1
                           2
               catalyst design. However, they also reveal inherent limitations in performance optimization [35-37] .
               LR can be used to construct BEP principles and explore the scaling relationship. For example, for the high-
               entropy alloy (HEA) based electrocatalyst, LR has been used to predict  OH and  O adsorption energy on
                                                                             *
                                                                                     *
               HEA IrPdPtRhRu, which suggests a new HEA composition with better performance than pure Pt(111) . In
                                                                                                     [38]
               addition, LR has been deployed to discover the active sites responsible for ORR in nonplatinum porphyrin-
               based electrocatalysts, in which two types of active sites are identified: the Co site associated with
               pyropolymer and the Co particles covered by oxide layers . Despite its simple formulation, the outstanding
                                                               [39]
               generalization and interpretability make LR appealing and widely deployed in catalyst design.

               Support vector machine (SVM) is another widely used ML model . It was initially designed to find a
                                                                          [40]
               hyperplane to separate samples from different groups, which is a useful solution to a classification task. By
               employing kernel functions, SVM is able to handle nonlinear relationships by mapping samples from a
               lower dimension to a higher dimension. Moreover, support vector regression (SVR) is capable of handling
               regression tasks by introducing tolerance margin  ϵ into the SVM. Both SVM and SVR have wide
               applications in the electrochemical domain [41-45] . For example, Tamtaji et al. used SVR to predict the Gibbs
               free energies of various reaction intermediates on single-atom catalysts (SACs) supported by graphene and
               porphyrin . Based on the trained SVR model, they reported that the most crucial factors in this system are
                        [44]
               the number of pyridinic nitrogen atoms, the number of d electrons, and the number of valence electrons of
               the reaction intermediate.


               Random forest (RF), which is regarded as one of the most successful ensemble models of ML, employs the
               Bagging algorithm. It uses decision trees as its base learner while introducing the random feature selection
               method in the training process. RF is also useful for predicting the properties of electrocatalysts [46-53] . In a
               recent investigation on double-atom catalysts, SVR, RF, Xtreme gradient boosting regression (XGBR), and
               artificial neural network (ANN) were utilized to predict the Gibbs free energy change of hydrogen
               adsorption . Among them, RF exhibits the best prediction performance owing to its ensemble nature and
                        [49]
               is considered as an effective model as it allows automatic feature selection in the training process. Another
               well-known ensemble model is gradient boosting regression (GBR), with which improved performance has
               been reported in various cases such as predicting free energy of N  electroreduction reaction and selecting
                                                                       2
               MXenes as HER catalysts [47,48,52,54,55] . While these ML models have shown promising performance for the
               development of electrocatalysts, they often suffer from issues such as strong dependence on the features, the
               models’ generalizability, and the extension to more complex prediction.
   95   96   97   98   99   100   101   102   103   104   105