Page 49 - Read Online
P. 49
Page 4 of 14 Li et al. J Mater Inf 2024;4:4 I http://dx.doi.org/10.20517/jmi.2023.41
MATERIALS AND METHODS
The concept of local environment, for ML-based prediction of adsorption energy, is primarily rooted in the
analysisandfeature extractionofthe surroundingenvironment ofthe adsorbate. We firstlocatethe positionof
theadsorbatewithinthe3Dstructureandthensequentiallyextracttheatomsneighboringtheadsorbateateach
layer. Through feature engineering of the surrounding atoms at each layer, we obtain descriptors that describe
the local environment, which can be used to input ML or neural network algorithms. This approach allows us
to capture the essential information about the immediate surroundings of the adsorbate and characterize its
local environment effectively. In this model, we use mean absolute error as the loss function and Adam as the
optimization algorithm to train the network.
Database
This work relies on two essential databases, including in both 2D and 3D materials, each serving a distinct
purpose. The 2D material database contains a substantial collection of 2,472 DFT calculations for hydrogen
adsorption energy on 2D materials [45] . These calculations are performed on surfaces obtained from our de-
veloped 2Dmatpedia database [46] , which currently encompasses over 10,000 distinct 2D materials.
The 3D materials dataset comprises a remarkable 47,279 DFT-calculated adsorption energy values. These cal-
culations were conducted using the Generalized Adsorption Simulator for Python [37,47] . The dataset includes
21,269 adsorption energies concerning hydrogen atoms, which are the central focus of this study. Additionally,
26,010 adsorption energies pertain to other atoms. This extensive dataset covers a wide range of 52 chemical
elements and 1,952 bulk materials, thereby enhancing its relevance and applicability. Furthermore, it is en-
riched with 9,102 symmetrically distinct surfaces and 29,843 distinct coordination environments, all carefully
characterized based on the surface and the adsorbate neighbors.
The utilization of these comprehensive and diverse databases ensures that the findings of our study are both
robust and pertinent, paving the way for significant contributions to the field. The data distribution is largely
normal and is therefore deemed suitable for ML methods. For details, please refer to [Supplementary S1]. The
training set comprises 80% of the dataset, while the test and validation sets account for 10% each.
Structure representation
This study uses a graph-based representation of structural properties mostly based on local properties. These
properties are determined by looking at the differences in elemental properties between an atom and its neigh-
boring atoms. Specifically, the local property difference for each atom is calculated by taking the face-weighted
average of the absolute differences in elemental properties between that atom and each of its neighboring
atoms. Voronoi tessellation (VT) [48] , also known as Voronoi diagram or Voronoi partitioning, is a mathemati-
cal method used to divide a space into a number of regions based on distance to a specific set of points known
as Voronoi sites. Each point in the given space is associated with the closest Voronoi site, creating a Voronoi
cell around each site. These cells together form a tessellation that covers the entire space without overlap or
gaps. In addition, the study obtains data from the Open Quantum Materials Database (OQMD) [49] , which
includes Specific Volume, Band Gap Energy, Magnetic Moment (per atom), and Space Group Number of 0 K
Ground States. By analyzing a total of 22 different elemental properties, the study calculates the mean, mean
absolute deviation, maximum, and minimum values of the local property differences for each atom, which are
used to create the elemental properties.
The VT method offers several advantages, including freedom from parameter tuning, transferability, and re-
producibility. The infinite vertices issue is addressed without introducing any human input parameters, which
ensures the accuracy and integrity of the calculations. Furthermore, this approach is applicable to general hy-
drogen surface adsorption problems and is independent of the symmetry and composition of the adsorbent
surface. However, when applying the original VT technique to the surface adsorption system, the adsorbate

