Page 128 - Read Online
P. 128

Page 8 of 19                          Chen et al. J Mater Inf 2023;3:10  https://dx.doi.org/10.20517/jmi.2023.06

               Table 5. List of the commonly used features of HEAs with the corresponding formula
                                             Data feature                        Formula
                Compositional feature        Molar fraction of components        c
                                                                                 i
                Atomic features              Mean atomic radius

                                             Atomic size difference



                                             Valence electron concentration


                                             Electronegativity

                Thermodynamic features       Mixing enthalpy

                                             Ideal mixing entropy

                Physical features            Melting temperature


                                             Elastic modulus


                                             Bulk modulus




                                                      [132]
               Shapley Additive Explanation (SHAP) value . We believe that data miners need to develop physics-
               informed data features, which can be derived from the fundamental theories for eutectic formation, such as
                                    [133]
               the Jackson-Hunt theory , to improve the predictability of the machine learning models. At present, these
               are still the ongoing effort of active research for EHEAs. By comparison, the design of data labels for EHEAs
               is relatively more straightforward, which is either the characteristics of a eutectic-related microstructure
               (i.e., the volume fraction of eutectic phases [79,80] ) or the targeted properties for regression ML modeling. For
                                [113]
               instance, Qiao et al.  used the difference between the solidus and liquidus temperature (i.e. the so-called
                                         [113]
               melting range termed in Ref. ) as the data label, and the composition and phase fraction as the data
               feature in the search of EHEAs, which led to the discovery of a near-eutectic composition of AlCrFe Ni .
                                                                                                   2.5  2.5
               Machine learning model
               After data are collected with their descriptors/labels being developed, the next task for the data-driven based
               alloy design is to select a proper ML algorithm. To date, there are a number of ML algorithms that have
                                                                                 [116,125,126,128,134]
               been used for the design of HEAs, such as support vector machine (SVM)      , artificial neural
               network (ANN) [125,126,130,135] , random forest (RF) [126,136,137] , decision tree (DT) [138,139]  and k-nearest neighbors
                     [130,140]                                                         [116,141,142]
               (KNN)     . The  selection  of  the  ML  algorithms  can  be  either  heuristic    or  through
               benchmarking [125,135] .

               Once the ML algorithm is selected, the ML model will be trained and the reliability of the training results is
               usually evaluated against the issues, such as overfitting and underfitting, through cross-validation (CV) [143,144]
                                                                                           [128]
               and bootstrapping [122,131] . To be more specific, the testing accuracy [116,126] , the Kappa index , the confusion
                     [118,145]                                             [144]
               matrix    , and/or the receiver operating characteristic (ROC) curves  are usually used as the metric for
                                                                           2 [146]
               the evaluation of classifiers, while the coefficient of determination (R )  and the root mean square error
   123   124   125   126   127   128   129   130   131   132   133