Page 65 - Read Online

P. 65

Gao et al. J Mater Inf 2023;3:6 https://dx.doi.org/10.20517/jmi.2023.03 Page 3 of 15

Consequently, based on the ML approach together with an exhausting collection of literature data, an
accurate “composition-process-properties” dataset for the basic but technically important SLMed Al-Si-
(Mg) alloys is to be developed, and then applied to design the SLMed Al-Si-(Mg) alloys with better
mechanical properties than the literature reports. In the next section, a brief introduction to the ML
approach is given, followed by the data collection and pre-processing of the dataset, including the data
cleaning and feature analysis. After that, four ML models are employed to establish the quantitative relation
of “composition-processes-properties” in the SLMed Al-Si-(Mg) alloys, and the optimal one for best
reproducing the training and testing sets is to be selected. The selected ML model is subsequently applied to
discover the novel compositions and processing parameters of as-built Al-Si-(Mg) alloys with better
mechanical properties than the literature reports. Finally, a conclusion of this paper is drawn.

MACHINE LEARNING APPROACH
The machine learning approach can establish an intrinsic relation between the input and output layers by
systematically analyzing each characteristic variable in the data, thus realizing the prediction of new data.
The basic ML workflow in alloy design mainly includes data collection, data cleaning, feature analysis,
model selection, model training, model verification, and prediction analysis. In this paper, four machine
learning models were built and comprehensively compared with each other to select the one with the
highest accuracy, including linear regression (LinearReg), multi-layer perceptron regression (MLPReg),
random forest regression (RFReg), and k-nearest neighbors (K-NN) regression.

LinearReg is widely utilized to construct a linear relationship between the target values and the
characteristic variables by adjusting the regression coefficients. MLPReg is an artificial neural network that
has a single input layer, one or more hidden layers, and a single output layer of perceptron, which is
constituted by numerous neurons. The quantified relation between target values and variables is obtained
by different weights and deviations in each neuron. The RFReg model consists of multiple mutually
unrelated decision trees, each of which yields a prediction result from randomly selected samples and
features, and the prediction result is obtained by combining the results of all trees and taking the average.
The model has the function of calculating the importance of features. When the K-NN regressor is used for
prediction, the mean value of the nearest data point is chosen as the prediction value.

In order to avoid the order of magnitude difference between each dimension of the dataset, resulting in
inaccurate prediction results, all the variables need to be standardized before constructing the model. The
data then obey a normal distribution with a mean of 0 and a variance of 1 (i.e., standard normal
distribution). The equation of standardization treatment is given as

where Z is the return value, μ is the mean of the training samples, and σ is the standard deviation of the
training samples.

Two metrics methods were utilized to evaluate the quality of machine learning models, i.e., the mean
2
absolute error (MAE) and the coefficient of determination (R ). The MAE measures the relative magnitude
of deviation, while the R can be used to characterize the fitness level of the model. They are respectively
2
defined as

60 61 62 63 64 65 66 67 68 69 70