Page 41 - Read Online

P. 41

Page 14 of 21 Chen et al. J Mater Inf 2022;2:19 https://dx.doi.org/10.20517/jmi.2022.23

optimal compositions with high activity. Combining HT DFT calculations, ML, data-guided combinatorial
synthesis and HT characterization, these works demonstrate an efficient methodology for HT closed-loop
materials design in the rising field of HEA catalysts.

ML models of CO RR on HEA catalysts
2
The ever-increasing demand for global energy and the need to replace CO -emitting fossil fuels with
2
renewable sources have driven interest in energy conversion and storage. In particular, the electrochemical
reduction of CO to chemical feedstocks is a hot topic due to its high correlation with both CO removal and
2 2
renewable energy generation. To accelerate catalyst discovery for the CO RR, Zhong et al. developed a
2
ML-accelerated HT DFT framework and explored 12,229 surfaces and 228,969 adsorption sites on 244
[85]
copper-containing intermetallic crystals . This work illustrates the significance of computation and ML for
exploring multi-metallic systems in experiments. By combining DFT with supervised ML, Pedersen et al.
presented a strategy for the probabilistic and unbiased discovery of high-performance CO RR catalysts on
2
[86]
disordered CoCuGaNiZn and AgAuCuPdPt HEAs . Gaussian process regressors were trained by
hundreds of adsorption energy values of CO* (on-top site) and H* (hollow site) on (111) surfaces of
CoCuGaNiZn and AgAuCuPdPt, achieved by DFT calculations, as illustrated in Figure 10A-F. The
normally distributed errors of the Gaussian process regressors are similar to those of the cross-validations.
As seen in Figure 10A-F, most predictions are within the dotted lines (± 0.1 eV deviation from the DFT
values), which indicates that the Gaussian process regressors successfully capture the essential parts of the
chemical environment of adsorption sites. The learning curves, which give the relation between the
prediction error and the number of training samples, validate that the Gaussian process regressors have
converged prediction error for the current number of adsorption energies achieved by DFT calculations.
For a ML model, the input feature is of vital importance for the precision and universality of the model.
More importantly, it is essential to understand the structure-activity relationships of HEA catalysts. To
address this aspect, Roy et al. applied the permutation importance module as implemented in the
scikit-learn library of Python to understand the contribution of every input feature towards the output, as
[53]
depicted in Figure 10G . To determine the correlation between every input feature, a correlation matrix
was generated, where the highly correlated features could be eliminated to decrease the dimensionality of
the data set. Moreover, the correlation of each metal from every region with the corresponding adsorption
energy is easily achieved and analyzed by the feature importance.

Descriptors in ML models of HEA catalysts
The key to constructing a ML model is designing effective descriptors, which is more important for HEA
catalysts due to the complex active sites. The appropriate descriptors as input features for a ML model
should be achieved directly from databases or by the simplest DFT calculations and include sufficient
information on surface active sites. Some approaches, such as coordination atom fingerprints (CAFs),
Coulomb matrices, the spectrum of London and Axilrod-Teller-Muto, elemental properties and SLATM
(EP & SLATM), smooth overlap of atomic positions, Voronoi connectivity-based crystal graph, labeled site
crystal graph (LSCG) and FCHL19, have recently been reported [87-94] . Li et al. applied elemental groups and
periods (GP) to replace atomic numbers in the FCHL19, LSCG, Atomic Number and Coordination
Number (ANCN) and CAF representations to achieve an effective improvement for predicting adsorption
[95]
energies on alloys . This strategy effectively enables ML models to learn from the periodic table. An
improvement is achieved up to ~0.2 eV in adsorption energy MAE, compared to those obtained using
ANCN, CAF, FCHL19 and LSCG. In particular, for the GP-LSCG representation, the MAE is 0.05 eV (near
chemical accuracy) in predicting hydrogen adsorption and ~0.1 eV for other strong binding adsorbates (C*,
N*, O* and S*). Although this work mainly focuses on bimetallic alloy systems, it has the potential to be
extended to HEA catalysts, which has been verified by another research group, who proposed a transferable
[96]
ML model by considering the intrinsic properties of substrates and adsorbates . Simply training the

36 37 38 39 40 41 42 43 44 45 46