Page 51 - Read Online

P. 51

Liu et al. J Mater Inf 2022;2:20 https://dx.doi.org/10.20517/jmi.2022.29 Page 3 of 12

11
-1
10 K·s . In the course of the simulation, the sample model was always in an isothermal-isobaric ensemble,
controlled by the Nosé-Hoover thermostat and barostat. All MD simulations were performed by the
large-scale atomic/molecular massively parallel simulator.

Isothermal relaxation
To analyze the atomic motion behavior at different temperatures, we simulated a series of isothermal
relaxations of the HEMG sample. The HEMG sample was relaxed at specific temperatures from T = 100 K
to 2000 K and zero pressure for 2.2 ns. The trajectories of all atoms in the last 200 ps were collected for
subsequent ML.

Tensile test and creep
At 100 K, a uniaxial tensile simulation along the z-axis was first carried out to measure the yield strength.
-1
The tensile strain rate was set as a constant of 0.0017 ps , corresponding to the loading rate of
ca. 0.1 GPa·ps . Next, to simulate creep, tensile stress (of 2.2-2.4 GPa) lower than the yield strength was
-1
-1
applied to the HEMG sample with a loading rate of 0.1 MPa·fs and a duration of 2 ns.
kNN ML
Dataset preparation
A multiclass classifier based on the kNN algorithm was developed for identifying temperature from atomic
mobility. The initial dataset was composed of the atomic trajectories from isothermal relaxation
simulations. Every atom at a given temperature corresponded to an instance in the dataset. Twenty
temperatures (from T = 100 to 2000 K) were chosen as the initial class labels. Thus, the initial dataset
contained one million data (= 20 T’s × 50,000 atoms). The initial 41 features for representing atomic
mobility were related to the configurations at different time intervals Δt, which are listed in
[34]
Supplementary Table 1. Different from the features used in Ref. , in this paper, we further considered the
atomic mass effect and hence proposed the new features, i.e., lg[ (∆t) /(Tm )], where (∆t) is the atomic
-1
2
displacement and m is the atomic mass.
Representative class selection and feature selection
To improve classification accuracy, it is necessary to remove indistinguishable classes and redundant
features prior to learning. During the representative class selection, the initial dataset was first randomly
divided into 250 subsets. For every subset, a clustering tree was constructed on the basis of the
agglomerative hierarchical clustering algorithm. By the single-linkage criterion, the leaf nodes were grouped
into one cluster if the distance was shorter than the cutoff of 3.6. A qualified cluster was defined as one
having at least 160 leaf nodes, and it was labeled by the mode of the temperature of the leaves. As shown in
Supplementary Figure 1, finally, seven representative classes were selected, i.e., T = 100, 800, 1000, 1100,
1200, 1300, and 2000 K.

Feature selection was performed by means of the ReliefF method . The feature importance was evaluated
[35]
by the weight:

46 47 48 49 50 51 52 53 54 55 56