Page 127 - Read Online
P. 127

Chen et al. J Mater Inf 2023;3:10  https://dx.doi.org/10.20517/jmi.2023.06       Page 7 of 19






































                                                                                                       [124]
                Figure 3. A typical categorization of data descriptors in the high entropy alloy design. Reproduced with permission from Roy et al.  .
                Copyright 2021, Elsevier.

                                                            [119]
               could jeopardize the validity of the ML predictions . Also, we note that metastability or the thermal
               history is another issue that may affect data fidelity. In such a case, one may obtain different microstructures
                                                                                [120,121]
               and properties from the same alloy composition, such as AlCoCrFeNi  EHEA  .
                                                                          3
               Data features & labels
               After data collection, one needs to develop proper data features (or descriptors) and labels for the
               subsequent training of the ML models. Table 5 lists the commonly used features for the design of HEAs.
               Ideally, data features should be uncorrelated while containing all relevant information. In data-driven
               design of HEAs [122,123] , alloy composition is usually the first data feature to be included. However, it is
               believed that only alloy composition alone is not sufficient. Therefore, other complementary data features,
                                                                            [112]
               which are of physical relevance and significance, should be considered . To date, nearly a hundred data
               features have been employed in the training and optimization of the ML models, which include the
                                       [124-126]                       [123]
               so-called atomic parameters  , the environmental parameters , and the thermodynamic parameters
                                                        [125-128]
               which can all be derived from alloy composition  , as represented in Figure 3 and Table 5.

               The formulation of the complementary data features requires domain knowledge in material science and
                                [124]
               physical metallurgy . To date, data features for eutectic alloys can be divided into two groups: (1) those
                                                                                               [1]
               related to eutectics formation and growth; and (2) those correlated with mechanical properties . However,
               unlike the Hume-Rothery rules for solid solution HEAs [125,126,129] , there still lacks a well-established general
               theory that can underpin the correlation between alloy compositions and eutectics, if there is any.
               Therefore, most ML models for EHEAs reported in the literature are solely based on the data feature of alloy
               composition [Table 4], which may compromise their performance. In practice, one can find the most
                                                                                                  [130,131]
               important features using different approaches, such as Pearson Correlation Coefficient (PCC)   and
   122   123   124   125   126   127   128   129   130   131   132