Page 85 - Read Online
P. 85
Page 10 of 45 Mooraj et al. J Mater Inf 2023;3:4 https://dx.doi.org/10.20517/jmi.2022.41
build an input space that is computationally efficient to analyze, as shown in previous examples. In the case
of SISSO, the features are compiled as mathematical functions (descriptors) by applying mathematical
operators to arbitrary groupings of the features. The descriptor space is then narrowed using sure
independence screening (SIS) to identify descriptors that most strongly correlate to the target properties.
Then, the sparsifying operator (SO) produces a linear model of the descriptors that best predicts the target
[91]
property . In this way, SISSO can produce models which converge even if the initial feature space is larger
than the data set. Additionally, Vazquez et al. point out that SISSO is computationally inexpensive
[90]
compared to typical ab-initio calculation methods like DFT . Figure 4C shows the prediction of misfit
volume and yield strength vs. the actual values calculated by DFT. The accuracy of the prediction of the
misfit volume suggests that SISSO can reliably predict the mechanical properties of RHEA systems while
remaining computationally much cheaper than DFT calculations. While the yield strength prediction
overall shows a very low root mean squared error (RMSE), the R2 value is quite large, which arises due to
limited experimental data and a lack of documentation of the processing conditions related to many
compositions in the yield strength database. This result highlights the need for larger, more robust, and
more detailed databases of experimental HEA data to improve the training quality of future ML models.
As previously mentioned, ML models can predict phase formation using solely composition information.
This concept is taken even further by Wu et al., who used a NN to study the effect of each element in a HEA
[6]
system on the phase to predict the primary phase fraction after casting . With this technique, they could
[6] [6]
design near-eutectic compositions within the Al-Co-Cr-Fe-Ni system . The database to train the model
was prepared using experimental data from the literature, and CALPHAD calculations were performed
using the nickel-based superalloy database TTNI8. Wu et al. chose to only use the elemental compositions
as the input nodes and the primary phase fraction as the output node. The primary phase fraction was
defined as 0 for eutectic compositions. In contrast, hyper- and hypo-eutectic compositions showed a
positive value when FCC was predicted as the primary phase and a negative value when body-centered
cubic (BCC) was predicted as the primary. After training and executing the NN, the authors identified 400
near-eutectic compositions and correlated them with the atomic fraction of each element. This plot is
shown in Figure 4D, where it can be seen that the majority of the near-eutectic compositions fall into the
region when Al content (at. %) is between 15% and 20% and the Cr content is below 25%. The other
elements do not seem to significantly affect the formation of eutectic structures, which suggests that the Al
and Cr contents are most crucial for eutectic structure formation in this alloy system. Thus, the NN was first
used to predict the amount of Al that needed to be added to an equiatomic CoFeNi alloy to form a eutectic
microstructure and how much Cr could be added to maintain that microstructure. Finally, the ratios of the
other elements were further adjusted to predict a near-eutectic microstructure. The best composition based
on the criteria of stable eutectic microstructure was Ni Co Fe Cr Al . This work presents the potential of
32 30 10 10 18
ML models to refine a huge design space containing thousands of unique compositions down to a single
optimized composition that can then be experimentally studied in detail.
While ML techniques such as those discussed in this section can readily analyze extremely large data sets,
their accuracy depends heavily on the robustness and comprehensive nature of experimentally verified
[77]
training sets . There is currently a severe lack of such high-fidelity datasets to accurately train ML models
[92]
to ensure ML can accurately predict the properties of future alloy systems . In the meantime, as these
databases expand, the scientific community is also implementing other computational methods that do not
rely so heavily on previous results to predict future alloying behavior. These methods include first-principles
calculations, molecular dynamics (MD), and CALPHAD calculations and will be discussed in the following
sections.