Page 128 - Read Online
P. 128
Page 8 of 19 Chen et al. J Mater Inf 2023;3:10 https://dx.doi.org/10.20517/jmi.2023.06
Table 5. List of the commonly used features of HEAs with the corresponding formula
Data feature Formula
Compositional feature Molar fraction of components c
i
Atomic features Mean atomic radius
Atomic size difference
Valence electron concentration
Electronegativity
Thermodynamic features Mixing enthalpy
Ideal mixing entropy
Physical features Melting temperature
Elastic modulus
Bulk modulus
[132]
Shapley Additive Explanation (SHAP) value . We believe that data miners need to develop physics-
informed data features, which can be derived from the fundamental theories for eutectic formation, such as
[133]
the Jackson-Hunt theory , to improve the predictability of the machine learning models. At present, these
are still the ongoing effort of active research for EHEAs. By comparison, the design of data labels for EHEAs
is relatively more straightforward, which is either the characteristics of a eutectic-related microstructure
(i.e., the volume fraction of eutectic phases [79,80] ) or the targeted properties for regression ML modeling. For
[113]
instance, Qiao et al. used the difference between the solidus and liquidus temperature (i.e. the so-called
[113]
melting range termed in Ref. ) as the data label, and the composition and phase fraction as the data
feature in the search of EHEAs, which led to the discovery of a near-eutectic composition of AlCrFe Ni .
2.5 2.5
Machine learning model
After data are collected with their descriptors/labels being developed, the next task for the data-driven based
alloy design is to select a proper ML algorithm. To date, there are a number of ML algorithms that have
[116,125,126,128,134]
been used for the design of HEAs, such as support vector machine (SVM) , artificial neural
network (ANN) [125,126,130,135] , random forest (RF) [126,136,137] , decision tree (DT) [138,139] and k-nearest neighbors
[130,140] [116,141,142]
(KNN) . The selection of the ML algorithms can be either heuristic or through
benchmarking [125,135] .
Once the ML algorithm is selected, the ML model will be trained and the reliability of the training results is
usually evaluated against the issues, such as overfitting and underfitting, through cross-validation (CV) [143,144]
[128]
and bootstrapping [122,131] . To be more specific, the testing accuracy [116,126] , the Kappa index , the confusion
[118,145] [144]
matrix , and/or the receiver operating characteristic (ROC) curves are usually used as the metric for
2 [146]
the evaluation of classifiers, while the coefficient of determination (R ) and the root mean square error