^{*}Correspondence to: Prof. Su-Huai Wei, School of Physics, Eastern Institute of Technology, No. 568, Tongxin Road, Zhuangshi Subdistrict, Zhenhai District, Ningbo 315200, Zhejiang, China. E-mail:

Crystal structure prediction (CSP) plays a crucial role in condensed matter physics and materials science, with its importance evident not only in theoretical research but also in the discovery of new materials and the advancement of novel technologies. However, due to the diversity and complexity of crystal structures, trial-and-error experimental synthesis is time-consuming, labor-intensive, and insufficient to meet the increasing demand for new materials. In recent years, machine learning (ML) methods have significantly boosted CSP. In this review, we present a comprehensive review of the ML models applied in CSP. We first introduce the general steps for CSP and highlight the bottlenecks in conventional CSP methods. We further discuss the representation of crystal structures and illustrate how ML-assisted CSP works. In particular, we review the applications of graph neural networks (GNNs) and ML force fields in CSP, which have been demonstrated to significantly speed up structure search and optimization. In addition, we provide an overview of advanced generative models in CSP, including variational autoencoders (VAEs), generative adversarial networks (GANs), and diffusion models. Finally, we discuss the remaining challenges in ML-assisted CSP.

With the rapid development of artificial intelligence, we are now in the so-called Big Data Era, a time when vast amounts of data are generated and collected from various sources at an unprecedented pace^{[1,2]}. In this context, the data-driven research paradigm has become mainstream in modern materials science^{[3–7]}. This paradigm leverages big data and machine learning (ML) technologies to accelerate the discovery and design of new materials, marking a shift from traditional research methods that rely on experiments and theory to more efficient and automated methodologies. The data-driven research paradigm focuses on extracting features and patterns from a large database to guide the design and property prediction of new materials^{[8,9]}. The typical workflow of data-driven materials science research includes: (ⅰ) collecting data; (ⅱ) building ML models; and (ⅲ) using ML models for rapid computation and data analysis^{[10]}. Obviously, data collection constitutes the basis of this workflow^{[11,12]}, with materials science data originating from two primary sources: experimental results and theoretical predictions.

Crystal structure prediction (CSP) serves as a vital data source for modern data-driven materials science research, providing structural information crucial for understanding the electronic, optical, and magnetic properties of materials^{[13,14]}. The goal of CSP is to determine the most stable arrangement of atoms solely based on chemical composition^{[15,16]}. CSP also explores metastable states that might possess unique properties^{[12,17]} and examines all possible compositions to discover new compounds^{[18,19]}. When the temperature drops to 0 K, the free energy transforms into enthalpy, consistent with the total energies calculated by most first-principle calculation software [e.g., the Vienna Ab-initio Simulation Package (VASP)^{[20–22]}]. Therefore, in most cases, we are looking for the global minimum on the potential energy surface.

As illustrated in ^{[23–26]}. In short, compared to data collection that solely relies on trial-and-error experimental synthesis - which is time-consuming and labor-intensive - CSP is more economical, environmentally friendly, and safer^{[27–29]}. CSP can be transformed into a combinatorial problem, with general steps including^{[30]}: (ⅰ) space gridding; (ⅱ) atom arrangement; and (ⅲ) energy evaluation. By extensively repeating the last two steps, we can find the low-energy arrangements of atoms. However, this exhaustive structure search method is suitable when the number of structures is small, but faces significant challenges when the number of structures explodes.

ML-driven materials design. First, researchers use ML-based CSP methods to explore low-energy structures of target compositions in a short time. Then, the low-energy structures can be added to databases or used in quantum mechanical calculations. Finally, the potential candidates can be synthesized in experiment. ML: Machine learning; CSP: crystal structure prediction.

The main difficulty in CSP is that the number of possible structures increases explosively as the number of atoms in a unit cell increases^{[14,31]}. If we use the general steps mentioned above, the number of possible structures can be estimated using^{[14]}:

where

● High-dimensional potential energy surfaces^{[32,33]}: A large number of atoms in the unit cell leads to very high-dimensional potential energy surfaces. The number of possible structures on the potential energy surface increases exponentially with the number of atoms, making it extremely difficult to search for the global minimum in high-dimensional spaces. Simple exhaustive methods are not suitable for CSP.

● Computational cost^{[13,34]}: Determining the accurate energies of crystal structures typically requires first-principles calculations based on density functional theory (DFT). However, the computational complexity of DFT increases rapidly with the number of electrons, limiting the size of systems where DFT can be applied.

● Limitations of empirical force fields^{[35,36]}: Empirical force fields can be used for energy calculations and structure optimization because they are faster. However, due to their reliance on empirical parameters, they often fail to accurately describe the entire potential energy surface.

● Local minima^{[37,38]}: The potential energy surface contains numerous local minima corresponding to metastable structures. Without appropriate global search methods, structure searches can easily become trapped in local minima.

To overcome these challenges, various algorithms have been adopted in conventional CSP methods, including particle swarm optimization^{[39,40]}, genetic algorithm (GA)^{[41,42]}, Bayesian optimization^{[43,44]}, and simulated annealing^{[45,46]}. Nowadays, ML methods have been applied to CSP, greatly improving the efficiency of structure searches. These include the graph neural network (GNN)^{[47,48]}, ML force field^{[49,50]}, and generative model^{[51,52]}.

Conventional CSP methods mainly refer to those that do not use ML techniques. In this section, we briefly discuss these methods to convey their basic ideas, progress, and bottlenecks.

As shown in

General steps in CSP. (A) Initial structures generated randomly with physical constraints; (B) Structure optimization by DFT or classical force fields; (C) New structures generated by global optimization algorithms. CSP: Crystal structure prediction; DFT: density functional theory.

The random search algorithm is the most basic method in CSP^{[53,54]}. This method searches for the lowest-energy structures through extensive random exploration. For instance, ab initio random structure searching (AIRSS)^{[55]} is a typical implementation of the random search algorithm. AIRSS first generates a large number of structures randomly and then uses first-principles calculations to relax these structures. Using random search methods, novel structures have been discovered for defect clusters of various sizes^{[56]}, high-pressure phases of solid hydrogen^{[57]}, nitrogen^{[58]}, and lithium^{[59]}. Combining random search with a set of correlation functions as the objective, the well-known special quasirandom structures (SQS)^{[60,61]} approach has been developed for modeling the chemically disordered state within a fixed lattice for alloys with variable compositions. By using SQS, it is also possible to investigate order-disorder phase transitions^{[60,61]}, such as phase transitions in Fe-C alloys^{[62]}, BeZnO_{2} alloys^{[63]}, and Cs_{2}AgBiBr_{6} perovskite^{[64]}. Despite the accomplishments, random search algorithms face challenges due to the giant configurational space. To improve search efficiency, several strategies can be applied: using geometric constraints to reduce the search space^{[65,66]}, adopting ML models for rapid screening and energy calculations^{[67]}, and utilizing parallel computing to accelerate the search process^{[68]}. These strategies have made random search algorithms reasonably practical in the field of CSP, especially in the generation of initial structures.

The particle swarm optimization^{[69,70]} is based on swarm intelligence, inspired by the collective behavior of birds or fishes in nature. In particle swarm optimization, particles move through the solution space, updating their positions and velocities based on their own experiences and the experiences of other particles in the swarm. For instance, crystal structure analysis by particle swarm optimization (CALYPSO)^{[69]} is a CSP package based on the particle swarm optimization. The general workflow of CALYPSO includes the following steps: First, initial structures are randomly generated with physical constraints, including minimum interatomic distances and crystal symmetry. Then, structures are characterized using crystal fingerprints to eliminate duplicate or similar structures. After removing duplicate structures, local optimization is applied to candidate structures to reach the local minima. Finally, the particle swarm optimization is used for structural evolution, generating initial structures for the next iteration. These steps are repeated until the convergence conditions are met. To date, a large number of functional materials have been discovered by CALYPSO, covering wide applications in lithium batteries^{[71,72]}, superconductors^{[73]}, photovoltaics^{[74]}, and electronics^{[75]}. The particle swarm optimization is simple to implement, with relatively few parameters that are easy to adjust. However, it may get trapped in local optima, especially in complex high-dimensional spaces or non-convex optimization problems.

The GA^{[17,76,77]} mimics the mechanism of natural selection, choosing individuals with the highest adaptability for reproduction. In CSP, each individual represents a potential crystal structure, and the fitness of the configuration is primarily determined by its energy, with lower energy indicating higher fitness. For instance, Universal Structure Predictor: Evolutionary Xtallography (USPEX)^{[17]} is a GA-based CSP software widely used for discovering new materials, optimizing existing ones, and understanding the underlying principles of crystal formation. The core steps include: First, selecting two or more parent crystal structures from the existing population based on the fitness function. Then, a crossover operation is performed, where parts of the parent chromosomes are exchanged to generate new offspring crystal structures. Subsequently, with a certain probability, mutation is introduced in the offspring chromosomes, randomly altering some genes (e.g., unit cell parameters) to introduce genetic diversity. The new generation population includes high-fitness individuals inherited from the parents and high-fitness offspring. This process iterates until the convergence conditions are met. USPEX has been widely utilized to identify various functional materials^{[78–81]}, such as novel electride materials Sr_{5}P_{3}^{[82]}, hard metallic phase TiN_{2}^{[83]}, high _{3}S^{[84]}, and transparent high-pressure phase of sodium^{[85]}. The GA demonstrates significant capability in handling complex, nonlinear optimization problems, effectively avoiding entrapment in local optima. However, GA-based methods for CSP sometimes require numerous evaluations of potential solutions to evolve optimal candidate structures. When combined with computationally intensive calculations such as DFT, the overall computational cost can become substantial, particularly in systems with large numbers of atoms where DFT calculations are especially time-consuming^{[86]}. Fortunately, recent advancements, such as the integration of ML models in USPEX, have helped to alleviate some of these challenges^{[87]}.

Making use of the Bayesian theory and Gaussian process regression, Bayesian optimization^{[88]} can significantly reduce the computational time and accelerate the structure search process by constructing surrogate models. It mainly consists of two parts: the surrogate model based on Gaussian process regression, and the acquisition function, which guides the search process. Bayesian optimization has been widely applied to search for clusters, such as Cu_{15}^{[89]}, CuNi^{[90]}, and C_{24}^{[91]} clusters. Bayesian optimization exhibits great potential in CSP but still faces several challenges^{[92–94]}. First, updating the surrogate model and calculating the acquisition function can be very time-consuming. In addition, the noise and uncertainty in actual calculations can affect the accuracy of the model and the stability of the optimization process.

The simulated annealing^{[95–97]} is a random search method inspired by the natural phenomenon of atomic rearrangement in solid-state materials that achieve the lowest-energy state through slow cooling after heating. The core principle is to temporarily allow the system to enter higher energy states during the search process, which helps to avoid premature convergence to local optima. Simulated annealing has been successfully used to predict the crystal structures of LiF^{[97]}, GeF_{2}^{[98]} and BN^{[99]} and to investigate the properties of IrO_{2} and RuO_{2} surfaces^{[100]}. Simulated annealing is favored in optimization problems mainly due to its simplicity and effectiveness in avoiding trapping by local minima, thereby increasing the likelihood of finding the global minimum. However, the performance of the algorithm heavily depends on parameter settings, such as initial temperature, cooling rate, and termination temperature. Determining the optimal values for these parameters often requires extensive experience and numerous tests.

Besides these ab initio methods, another widely used CSP approach is the template-based method. A well-known example of this approach is ion substitution^{[101]}. Traditionally, this method involves replacing an ion in the crystal structure of a known compound with a chemically similar ion, guided by empirical rules such as the Goldschmidt rules^{[102]}. This process has been further enhanced by a probabilistic model, which quantitatively predicts the likelihood of successful ionic substitution by analyzing a vast database of crystal structures^{[103]}. This data-driven model not only improves the accuracy of predicting new compounds, but also accelerates the materials discovery process by efficiently identifying novel structures with reduced computational resources. For instance, using this method, a comprehensive stability map of inorganic ternary metal nitrides has been constructed, leading to the synthesis of several new Zn- and Mg-based ternary nitrides^{[104]}.

Although conventional CSP methods have achieved remarkable accomplishments, most of them still suffer from low computational efficiencies, in addition to the limitations of each global optimization algorithm mentioned above. The main time cost lies in the optimization of structures, as there is rarely a guarantee that the structures are near the local minima on the potential energy surface, leading to tens of thousands of DFT calculations or time-consuming optimizations. Fortunately, advanced ML techniques have shed new light on tackling these challenges, opening up new possibilities in this direction.

In recent years, ML has achieved a better balance between speed and accuracy by embedding physical knowledge into neural networks^{[105]}, such as energies, forces, stresses, and magnetic moments, and training on large-scale data^{[106]}. By leveraging the advantages of ML models, they can usually be combined with CSP in the following four aspects.

● Crystal Structure Representation: ML-based structure representation methods can accurately capture the geometric and topological features of crystals [^{[107–111]}, converting complex structural information into high-dimensional crystal feature vectors^{[109,112]}. These feature vectors not only contain sufficient structural information, but also exhibit rotational invariance, translational invariance, and index permutation invariance^{[107]}. Most importantly, these feature vectors can reveal intrinsic connections and differences between structures, greatly enhancing the effectiveness of structure clustering during the search process.

Application of machine-learning models in CSP. (A) Representation of atoms in machine-learning models. Reproduced with permission^{[107]}. Copyright 2018, American Physical Society; (B) Neighbor search using Voronoi tessellation and construction of a global periodic graph. Reproduced from Ref.^{[108]}. CC BY 4.0; (C) Representation of bonding between atoms in machine-learning models. Reproduced from Ref.^{[109]}. CC BY 4.0; (D) Architecture of crystal graph convolutional neural network. Reproduced with permission^{[107]}. Copyright 2018, American Physical Society; (E) Architecture of the neural network used in machine-learning force fields. Reproduced from Ref.^{[110]}. CC BY-NC 4.0; (F) VAE for stable structure generation. Reproduced from Ref.^{[111]}. CC BY-NC 4.0. CSP: Crystal structure prediction; VAE: variational autoencoder.

● Property Prediction and Rapid Screening: ML models, especially the GNN shown in ^{[113–115]}, such as energy, band gap, and performance for different applications. Moreover, by combining ML models with global optimization algorithms such as simulated annealing, GAs, and particle swarm optimization, low-energy crystal structures can be efficiently identified^{[116]}.

● Machine-Learning Force Field: ML force fields [^{[117,118]}. Also, they enable high-throughput material screening and the construction of material databases^{[119,120]}.

● Generative Model: Generative models [^{[111]}, enabling the exploration of a more diverse range of crystal structures. Some advanced generative models provide better compositional and structural diversity than substitution-based enumeration in high-throughput calculations and better structural generation efficiency^{[121,122]} than conventional CSP techniques.

In this section, we review applications of crystal structure characterization, property prediction, and ML force fields in structure generation, global structure search, and local structure optimization, respectively. Finally, we will discuss the generative model, which differs from the typical CSP workflow.

In the ML-based CSP, once the initial structures are generated, suitable descriptors are needed to capture the geometric and topological information of the crystal structure. By converting crystal structures into a readable format using ML models, we can effectively represent structures and learn the relationship between structure and properties. The descriptors used to construct ML models should meet the following three basic criteria^{[107,123]}:

1. Physical Consistency: The descriptors should maintain physical invariance, meaning that their values should not change with the rotation and translation of the structure.

2. Index Invariance: The descriptors should be insensitive to the indexing order of the atoms. Even if the order or numbering of the atoms changes, the descriptor values should remain unchanged, ensuring model consistency and stability.

3. Discrimination: The descriptors should be able to distinguish different atomic environments. Similar local chemical environments should yield similar descriptors, while different local chemical environments should result in significantly different descriptors.

There are currently two main approaches to structure representation: continuous 3D voxel representation and matrix representation. In the continuous 3D voxel representation^{[124]}, encoders and decoders are employed to prepare 2D crystal graphs and to reconstruct 3D voxel images. In the matrix representation^{[125–127]}, crystal structure features such as lattice parameters, atomic occupation coordinates, and elemental properties are separated into different matrix rows and columns. Since widely used GNN and ML force fields mainly adopt the matrix representation, we will focus on the matrix representation, including the atom features and bonding features.

Atom features are used to describe different atoms in ML models^{[107,126,128]}. As illustrated in ^{[107,129–132]}, the initial atomic feature vectors contain various elemental properties. These descriptors can uniquely determine each element and include their main physical properties. In advanced GNNs, such as message passing neural network (MPNN)^{[128]}, crystal graph convolutional neural network (CGCNN)^{[107]}, materials graph network (MEGNet)^{[126]}, and atomistic line graph neural network (ALIGNN)^{[133]}, the initial atomic features are processed through fully connected layers to construct atomic representations that are more strongly correlated with the target properties.

Atom features used in CGCNN

Group number | - | 1,2,…,18 | 18 |

Period number | - | 1,2,…,9 | 9 |

Electronegativity^{[129,130]} |
- | 0.5-4.0 | 10 |

Covalent radius^{[131]} |
pm | 25-250 | 10 |

Valence electrons | - | 1,2,…,12 | 12 |

First ionization energy | eV | 1.3-3.3 | 10 |

Electron affinity^{[132]} |
eV | -3-3.7 | 10 |

Block | - | s,p,d,f | 4 |

Atomic volume | cm^{3}/mol |
1.5-4.3 | 10 |

These atom features are encoded using one-hot vectors. Reproduced with permission^{[107]}. Copyright 2018, American Physical Society. CGCNN: Crystal graph convolutional neural network.

Bonding features are used to describe the local environment of each atom. In GNNs, the bonding features are directly used as input to the ML model. In ML force fields, the input is the atom positions, and the bonding features are obtained via symmetry functions. The Behler-Parrinello and smooth overlap of atomic positions (SOAP) descriptors are two commonly used bonding features, and we introduce them as follows.

The Behler-Parrinello descriptor^{[32,134]} uses a set of symmetry functions to characterize the local chemical environment of each atom. It consists of two types of symmetry functions: radial and angular symmetry functions, which capture distance and angle information between atoms, respectively.

SOAP is another descriptor used to characterize the local environment of atoms^{[135]}. The SOAP descriptor represents the environment of each atom as a continuous density field, capturing the geometric properties of the atomic environment by calculating the overlap of density fields.

The calculation process of the SOAP descriptor for the local environment of atom

where

To incorporate angular information, the atomic density

The expansion coefficients are calculated by inner product:

where

To ensure the rotational invariance of the descriptor, the SOAP descriptor calculates the power spectrum of the expansion coefficients:

The vector composed of the power spectrum

When constructing ML models, atoms can be encoded by property-based one-hot vectors. The Behler-Parrinello or SOAP descriptors generate a high-dimensional bonding feature vector for each atom. These vectors serve as input for the ML models, and the output is the total energy, enabling the ML model to map the local atomic environment to energy.

With the increasing size of open material databases^{[18,120,136–138]} and the development of ML models^{[139–141]}, it has become a common practice to screen hundreds of thousands of materials to identify potential candidates^{[19,142,143]}. A typical workflow for applying an ML model to screen structures in CSP is shown in ^{[107,126,128,133,144]}. The ML model is pretrained using databases [

GNNs applied in CSP. (A) Prediction pipeline; (B) Examples of structures stored in database. Reproduced from Ref.^{[144]}. CC BY 4.0; (C) MPNN predicts the quantum properties of an organic molecule. Reproduced from Ref.^{[128]}. CC BY-NC 4.0; (D) Illustration of the CGCNN, including construction of the crystal graph and then building the structure of the convolutional neural network on top of the crystal graph. Reproduced with permission^{[107]}. Copyright 2018, American Physical Society; (E) Overview of MEGNet. The initial graph is represented by the set of atomic attributes ^{[126]}. Copyright 2019, American Chemical Society; (F) ALIGNN convolution layer alternates between message passing on the bond graph and its line graph. Reproduced from Ref.^{[133]}. CC BY 4.0. GNNs: Graph neural networks; CSP: crystal structure prediction; MPNN: message passing neural network; CGCNN: crystal graph convolutional neural network; MEGNet: materials graph network; ALIGNN: atomistic line graph neural network.

The MPNN^{[128]} provides a general framework for GNN [

Specifically, MPNN updates the representation of each vertex in the graph through the following steps:

1. Message Passing: Each vertex

where

2. Message Readout: After

where

Most GNNs can be represented using the MPNN framework, including the Molecular Fingerprint Convolution Network^{[145]}, the Gated Graph Neural Network^{[146]}, Interaction Networks^{[147]}, Molecular Graph Convolutional Networks^{[148]}, Deep Tensor Networks^{[127]}, and Graph Laplacian Matrix Networks^{[149]}. Here, we concentrate on three GNNs suitable for crystal property prediction: CGCNN^{[107]}, MEGNet^{[126]}, and ALIGNN^{[133]}, which can also be built using the MPNN framework.

CGCNN is a well-known GNN model designed for predicting crystal properties [

where

Thus, the graph convolution operator effectively represents atomic interactions using

MEGNet is a universal property prediction model for molecules and crystals. Compared to CGCNN, MEGNet encodes the macroscopic properties of the system (such as temperature, pressure, entropy,

ALIGNN is a GNN model designed for predicting crystal properties [

Combining GNN and CSP, Cheng ^{[67]}. It mainly includes three parts: (ⅰ) pre-training of ML models; (ⅱ) structure generation with physical constraints; and (ⅲ) structure search and optimization based on ML. In the framework, GNNs such as CGCNN, MEGNet, ALIGNN, and CHGNet can be potentially used as prediction models, while algorithms such as random search, simulated annealing, GAs, or particle swarm optimization can be employed.

Recently, the symmetry-based combinatorial crystal optimization program (SCCOP)^{[150,151]} has been developed for 2D materials. The workflow of SCCOP is shown in

Workflow of SCCOP for the search of two-dimensional materials. Step 1: generating structures by symmetry. Step 2: characterizing structures into crystal vectors and exploring the potential energy surface by Bayesian optimization. Step 3: updating the energy prediction model. Step 4: optimizing structures to obtain the lowest-energy configuration by ML and DFT. The whole program runs in a closed loop. Reproduced from Ref.^{[150]}. CC BY 4.0. SCCOP: Symmetry-based combinatorial crystal optimization program; ML: machine learning; DFT: density functional theory.

For the desired structures, SCCOP optimizes them with ML-accelerated simulated annealing, in conjunction with a limited number of DFT calculations, to obtain the lowest-energy structure.

To evaluate the effectiveness of SCCOP, it was applied to a total of 35 representative 2D materials. _{2} is recorded as having a four-fold coordination (-3.509 eV/atom) in the database, but SCCOP discovers that a structure with six-fold coordination exhibits lower energy (-3.591 eV/atom). SCCOP has been further applied to validate the stability of Cu- and Ag-based ternary compounds in the chalcopyrite structure prototype^{[152]}. It has also been successfully utilized to investigate the mixed-coordination structures of IB-VA-VIA2 compounds, which have the lowest free energy at low temperatures compared to the octahedrally coordinated structure in experiments^{[153]}. These applications highlight the wide-range applicability of SCCOP and demonstrate the feasibility of using GNNs to accelerate CSP.

Performance of SCCOP on 35 representative compounds. (A) Time cost and lowest energy for each compound, with all energy calculations evaluated with DFT; (B) Three lowest-energy structures identified by SCCOP. Each compound has been explored five times by SCCOP, with up to ten atoms in the unit cell. Reproduced from Ref.^{[150]}. CC BY 4.0. SCCOP: Symmetry-based combinatorial crystal optimization program; DFT: density functional theory.

Although ML models can identify potential candidates in a short time, high-accuracy structure optimization is still needed to fully relax structures to their local minima on the potential energy surface. In conventional CSP methods, structures are optimized by DFT or classical force fields. While DFT has high accuracy, it is time-consuming. Classical force fields are much faster than DFT, but often lack sufficient precision when dealing with complex systems, such as metal-organic frameworks and biological macromolecules, especially in scenarios involving intricate electronic effects and chemical reactions^{[154,155]}.

Currently, ML force fields exhibit the potential to speed up the structure optimization. ML force fields can maintain the speed advantage of classical force fields while significantly improving prediction accuracy for complex systems, particularly in cases where classical force fields perform poorly. Through collective efforts of the field, many ML force fields have been developed, e.g., MEGNet^{[126]}, CHGNet^{[105]}, NequIP^{[156]}, and MACE^{[157]}. ML force fields are commonly applied in studying the properties of new materials^{[158]}, the mechanisms of drug molecules^{[159]}, and the protein folding process^{[160]}. Thus, using ML force fields to replace time-consuming structural relaxation is a feasible way to speed up conventional CSP. In this section, we will discuss ML force fields and their applications in CSP.

While constructing ML force fields [^{[32]}:

Neural network of ML force fields. This feedforward neural network consists of an input layer, two hidden layers, and an output layer. The input is a coordination matrix. The hidden layers transform the inputs to the local environment of each atom, mapping it to local energy, and finally summing them to get the total energy. Reproduced from Ref.^{[110]}. CC BY-NC 4.0. ML: Machine learning.

where

To train the ML force field, the simplest loss function only fits the energy:

where ^{[161]}:

where ^{[162,163]}, thereby improving the efficiency of data utilization.

As shown in _{2} system^{[110]}, which contains three different polymorphs: Anatase, Brookite, and Rutile. They have been demonstrated for other crystal systems such as Al_{2}O_{3}, Cu, Ge, and Si, as well as on MoS_{2} slabs and small molecular systems. These results demonstrate that the trained ML force fields can adapt to various types of systems, and the energy is fitted with high precision, indicating its strong capability in force calculations.

Validity and time cost of machine-learning force fields. (A) Comparison of the DFT energies and the DeepPot-SE predicted energies on the testing snapshots. Reproduced from Ref.^{[110]}. Copyright 2018, Curran Associates Inc.; (B) Phonon band structure and DOS of fcc Al using DFT (blue dashed lines), and optimized (red solid lines) and original (green dashed lines) ML interatomic potentials. Reproduced with permission^{[162]}. Copyright 2019, AIP Publishing; (C) Correlation functions of liquid water from DPMD and PI-AIMD. Reproduced with permission^{[164]}. Copyright 2018, Elsevier; (D) Computational cost of MD steps versus system size with DPMD, TIP3P, PBE + TS, and PBE0 + TS. Reproduced with permission^{[164]}. Copyright 2018, Elsevier. DFT: Density functional theory; DOS: density of states; ML: machine learning; DPMD: deep potential molecular dynamics; PI-AIMD: path-integral Ab initio molecular dynamics; MD: molecular dynamics; TIP3P: transferable intermolecular potential with 3 points; PBE: Perdew-Burke-Ernzerhof functional; TS: Tkatchenko-Scheffler functional.

In addition to energy prediction, ML force fields can be used for more complex tasks, including phonon spectra and solid-liquid phase transitions. ^{[162]}. In the case of water and ice, ML force fields and DFT were used to simulate different thermodynamic conditions^{[164]}. The average energy, density, radial distribution functions [

To alleviate the bottleneck in CSP, ML force fields have been employed to replace the time-consuming DFT optimization. For example, a ML and graph theory assisted universal structure searcher (MAGUS) combines ML force fields with global optimization algorithms for structure search [^{[165]}. Specifically, the initial population is first generated by seeding and random generation. In each generation, the structures in the population are optimized using DFT or other force fields. Next, duplicate structures are removed from the population to maintain diversity. The remaining structures are then selected for crossover and mutation to create offspring. Generally, structures with higher fitness are more likely to be chosen as parents for crossover and mutation. The selection process can also incorporate the confidence level of the fitness with Bayesian optimization methods. As illustrated in _{6}) has been discovered, which can be quenched to ambient pressure after high-pressure synthesis^{[166]}. Two different stable stoichiometries for helium-water compounds have also been predicted^{[167]}, both of which exhibit a superionic state at high pressures and temperatures.

Workflow of MAGUS. (A) Classical evolutionary algorithm; (B) Machine-learning CSP. Reproduced from Ref.^{[165]}. Copyright 2023, Oxford University Press. MAGUS: Machine learning and graph theory assisted universal structure searcher; CSP: crystal structure prediction.

Many conventional CSP methods have adopted ML force fields to accelerate the optimization process. To integrate ML force fields with CSP, a sampling strategy using disordered structures to train ML models has been developed^{[168]}. By combining ML force fields and CALYPSO, the putative global minimum structure for the B_{84} cluster has been uncovered, and the computational cost was substantially reduced by 1-2 orders of magnitude compared to full DFT-based structure searches^{[169,170]}. In the ML-based USPEX, the methodology was first tested on the prediction of crystal structures of carbon, high-pressure phases of sodium, and boron allotropes. For the test cases, the main allotropes have been reproduced, and a previously unknown 54-atom structure of boron has been predicted with very moderate computational effort^{[87]}. Additionally, by integrating ML force fields with GAs, the structure prediction of inorganic crystals using neural network potentials with evolutionary and random searches (SPINNER) method has been presented, which identified experimentally known or theoretically more stable phases with a success rate of 80% for 60 ternary compounds^{[171]}, and high-throughput discovery of oxide materials using SPINNER has been conducted^{[172]}. Furthermore, the ^{[173]} by ML-based AIRSS.

Despite these achievements, the implementation of ML force fields for structure optimization faces several challenges, including data requirements, model complexity, transferability, and computational efficiency. Solutions to these challenges include using data augmentation and transfer learning to enlarge datasets^{[174,175]}, applying explainable tools for better model interpretability^{[176]}, developing domain-specific and hybrid models to improve generalization^{[32]}, and employing model compression and efficient algorithms to enhance computational efficiency^{[177,178]}. These strategies assist researchers in effectively utilizing ML force fields for accurate and efficient structure optimization.

Combining ML models with the general CSP steps has achieved significant progress in CSP, but it still struggles with the vast search space of feasible materials. Nowadays, thanks to breakthroughs in image generation^{[179,180]}, video generation^{[181,182]}, and realistic text generation^{[183]}, generative models in materials science show an unprecedented ability to learn the mapping between the structure and property spaces [

Material property prediction and inverse design by generative models. (A) Schematic showing material property prediction from the structure space to the property space (downward arrow), and inverse material design from the property space back to the structure space (upward arrow). Reproduced from Ref.^{[111]}. CC BY-NC 4.0; (B) VAE. The VAE consists of an encoder that transforms the input sample feature vector to a latent distribution space, and a decoder that reconstructs the sample given the hidden distribution. The VAE also models the latent space vector ^{[111]}. CC BY-NC 4.0; (C) GAN. GAN uses a generator to transform a random noise variable into the generated sample, and a discriminator to distinguish whether a sample is real or generated. Reproduced from Ref.^{[111]}. CC BY-NC 4.0; (D) Inorganic materials design with MatterGen. It generates stable materials by reversing a corruption process by iteratively denoising an initially random structure. Reproduced from Ref.^{[121]}. CC BY-NC 4.0. VAE: Variational autoencoder; GAN: generative adversarial network.

Among the generative models, VAEs, composed of an encoder and a decoder, minimize the reconstruction error between the decoded and input data [^{[184]} and the Fourier-transformed crystal properties (FTCP) framework^{[185]}. Specifically, iMatGen uses an invertible image-based representation to encode solid-state materials, leading to the generation of synthesizable V-O compounds. FTCP adds a target-learning branch to map latent points to target properties, resulting in the generation of 142 new crystals with desired ground- and excited-state properties. VAEs are relatively easy to train and provide more diversified structures that better cover the distribution compared to other generative models. These models generate diversified structures, but may have a lower output validity rate.

GANs use a minimax game theory approach, with a generator transforming a random latent variable into a sample and a discriminator distinguishing real from generated samples [^{[186]}, crystalGAN^{[187]}, the zeolite GAN (ZeoGAN)^{[188]}, and the constrained crystals deep convolutional generative adversarial network (CCDCGAN)^{[189]}. For instance, the composition-conditioned crystal GAN allows the extension of the latent variable

The diffusion model generates samples by learning a score network to reverse a fixed destruction process^{[190]}. In image generation, the diffusion process typically adds Gaussian noises. However, crystals have unique periodic structures and symmetries that require a customized diffusion process. In MatterGen, Zeni ^{[121]}. As shown in

From the introduction of advanced generative models applied in CSP, we can see that the biggest difference between generative models and the applications of ML in general CSP steps is that generative models, such as VAE, GAN, and diffusion models, are end-to-end systems. This means that the structure generation, structure search, and structure optimization are all done by neural networks, making it difficult to control each step. Interestingly, this is also the biggest advantage of current advanced ML models: they reduce human intervention. Parameters are determined by algorithms and training data, giving ML models the potential to extract better features and design better workflows than humans. However, there is still a long way to go, and more efforts are needed to fully control these advanced ML models.

At the end of this section, to help the readers quickly learn about the progress of CSP method development or to apply CSP codes in their research, we summarize the conventional CSP and ML-based CSP methods in

Summary of CSP algorithm categories with their advantages and disadvantages

Conventional Methods | Effective for complex search spaces |
Computationally expensive |

ML-based Methods | Efficient with large datasets |
Requires extensive training data |

Generative Models | Good for exploring novel structures |
Computationally intensive |

CSP: Crystal structure prediction.

Some conventional and ML-based CSP codes, along with their applications

USPEX (2006)^{[66]} |
Evolutionary algorithm | NaCl (2013)^{[191]}, W-B (2018)^{[192]} |

XtalOPT (2010)^{[193]} |
Evolutionary algorithm | NaH_{n} (2011)^{[194]}, H_{2}O (2012)^{[195]} |

AIRSS (2011)^{[53,196]} |
Random search | SiH_{4} (2006)^{[196]}, NH_{3±x} (2008)^{[197]} |

CALYPSO (2012)^{[69,70]} |
Particle swarm optimization | Li (2011)^{[198]}, LaH_{10} (2017)^{[199]}, P (2024)^{[200]} |

GASP (2013)^{[201]} |
Evolutionary algorithm | Li-Be (2008)^{[202]}, Li-Si (2013)^{[203]} |

AGA (2013)^{[86]} |
Adaptive GA | Zr-Co (2014)^{[204]}, MgO-SiO_{2} (2017)^{[205]} |

MUSE (2014)^{[206]} |
Evolutionary algorithm | IrB_{4} (2016)^{[207]}, NbSe_{2} (2017)^{[208]} |

IM^{2}ODE (2015)^{[209]} |
Differential evolution | TiO_{2} (2014)^{[210]}, 2D SiS (2016)^{[211]} |

SYDSS (2018)^{[54]} |
Random search | H_{2}O-NaCl (2018)^{[54]}, Cl-F (2020)^{[212]} |

MAISE (2021)^{[213]} |
Evolutionary algorithm | Fe-B (2010)^{[214]}, NaSn_{2} (2016)^{[215]} |

GOFEE (2020)^{[216]} |
Bayesian optimization & GA | C_{24} (2022)^{[91]}, Carbon clusters (2022)^{[91]} |

BEACON (2021)^{[89,90]} |
Bayesian optimization | Cu_{15} (2021)^{[89]}, CuNi clusters (2021)^{[90]} |

CrySPY (2021)^{[217]} |
Bayesian optimization & GA | Y_{2}Co_{17} (2018)^{[218]}, Al_{2}O_{3} (2018)^{[218]} |

FTCP (2022)^{[185]} |
VAE | Au_{2}Sc_{2}O_{3} (2022)^{[185]}, Y_{2}Zn_{2}As_{2}O_{3} (2022)^{[185]} |

GN-OA (2022)^{[67]} |
GNN & Optimization algorithms | Tested on typical compounds (2022)^{[67]} |

MAGUS (2023)^{[165,219]} |
GA & Bayesian optimization | WN_{6} (2018)^{[166]}, HeH_{2}O (2019)^{[167]} |

SCCOP (2023)^{[150]} |
GNN & Simulated annealing | B-C-N (2023)^{[150]}, AgBiS_{2} (2024)^{[152,153]} |

iMatGen (2019)^{[184]} |
VAE | V-O (2019)^{[184]} |

CrystalGAN (2019)^{[187]} |
GAN | Pd-Ni-H (2019)^{[187]}, Mg-Ti-H (2019)^{[187]} |

CCDCGAN (2021)^{[189]} |
GAN | MoSe_{2} (2021)^{[189]} |

MatterGen (2024)^{[121]} |
Diffusion model | V-Sr-O (2024)^{[121]} |

UniMat (2024)^{[220]} |
Diffusion model | Tested on typical compounds (2024)^{[220]} |

DiffCSP (2024)^{[221]} |
Diffusion model | Tested on typical compounds (2024)^{[221]} |

LLaMA-2 (2024)^{[222]} |
Large language-based model | Tested on typical compounds (2024)^{[222]} |

When selecting CSP methods, it is important to consider the system's complexity and specific needs. GAs, such as those in USPEX, are effective for exploring large search spaces, making them ideal for complex, multi-modal problems. Random search methods in AIRSS provide a straightforward, computationally inexpensive option for initial explorations. Particle swarm optimization, as used in CALYPSO, is suitable for systems requiring quick convergence. For versatile applications, evolutionary algorithms in genetic algorithm for structure and phase predictions (GASP) and module for ab initio structure evolution (MAISE) are recommended. Bayesian optimization in global optimization with first-principles energy expressions (GOFEE) and BEACON excels in optimizing expensive functions with fewer evaluations, which is ideal for computationally intensive problems. Generative models, such as those in iMatGen and CrystalGAN, are excellent for innovative materials design and exploring unknown structures by learning complex distributions. For systems requiring relevant property modeling, GNNs in SCCOP and graph network-optimization algorithm (GN-OA) are powerful tools. Finally, to leverage large datasets, consider language-based models such as LLaMA-2. For beginners, starting with USPEX or AIRSS is recommended, while CALYPSO and MAGUS are better suited for complex systems. MatterGen and iMatGen are ideal for innovative designs, while IM

In general, conventional CSP methods remain successful due to their proven reliability and ability to handle complex systems^{[14,73,191,192,198]}. These methods are grounded in fundamental physical and chemical principles, making them robust and trustworthy for a wide range of materials. They also benefit from incorporating geometric constraints and prior knowledge. Despite being computationally intensive, ongoing improvements and the integration of ML techniques have further solidified their status in modern materials science. For the ML-based CSP methods, they can significantly reduce computational time compared to conventional methods such as DFT. Traditional CSP approaches can take days to weeks to predict a structure on a small server containing dozens to hundreds of CPU cores, while ML models, once trained, can predict structures in seconds to minutes using the same computational resources^{[87,150,170]}. This efficiency is achieved because ML approaches learn from existing data, facilitating effective feature extraction, rapid structure screening, and optimization, thereby offering a more cost-effective alternative to conventional methods.

In this review, we discussed the current progress in CSP, particularly focusing on the applications of ML in CSP. To help the readers understand the basic concepts, progress, and challenges in this field, we first introduced the basics of conventional CSP methods. Next, we reviewed ML models combined with general CSP steps, including descriptors in structure generation, GNNs in structure search, and ML force fields in structure optimization. The application of ML models has significantly reduced the time required for CSP, and ML-based CSP methods have helped to find more low-energy structures for desired compositions^{[113–115]}. We further discussed generative models, which differ greatly from ML models combined with general CSP steps. Generative models for CSP are entirely based on neural networks without DFT calculations; thus, they can be applied to very large systems.

Although ML models have made significant progress in solving CSP, they still face several challenges: (ⅰ) Overfitting and data collapse: ML models may overfit the database, preventing them from identifying low-energy structures in CSP, or may cause data collapse in generative models. To mitigate overfitting, techniques such as data augmentation^{[223]}, dropout regularization^{[224]}, and ensemble learning can be employed^{[225]}. Additionally, employing early stopping and cross-validation methods can help prevent overfitting by ensuring the model is generalizing well on unseen data; (ⅱ) Limited training data: ML models are often trained on stable or metastable structures stored in databases, which represent only a small part of the complex potential energy surface; thus, the generalization of ML models cannot be guaranteed. To address this, transfer learning^{[175]} and active learning can be used to enhance model performance by incrementally expanding the training dataset with more diverse structures; (ⅲ) Mismatch between local fitting models and global optimization algorithms: in CSP, ML models lack a theoretical guarantee of global generalization, which may cause global optimization algorithms to fail while converging to the correct results. This issue can be tackled by techniques such as multi-fidelity modeling^{[226]}, which combine high-fidelity simulations with ML predictions to improve the reliability of global optimization. Despite these challenges, we remain optimistic that ML models will ultimately solve the challenging task of CSP, similar to the advancements seen in protein structure prediction^{[227]}, thereby boosting materials science research and the discovery and design of new materials.

Writing-original draft preparation: Li CN

Proposed the conception and design: Li CN, Liang HP, Zhang X

References collection: Li CN, Liang HP, Zhao BQ

Writing-review and editing: Zhang X, Wei SH

Supervision: Zhang X, Wei SH

Not applicable.

We acknowledge financial support from the National Natural Science Foundation of China (Nos. 52172136, 11774416, 11991060, 12088101, and U2230402).

All authors declared that there are no conflicts of interest.

Not applicable.

Not applicable.

© The Author(s) 2024.

Monticelli L, Tieleman DP. Force fields for classical molecular dynamics. In: Monticelli L, Salonen E. editors. Biomolecular simulations. Methods in molecular biology. Humana Press; 2013. pp. 197-213.

10.1007/978-1-62703-017-5_8

Kennedy J, Eberhart R. Particle swarm optimization. In: Proceedings of ICNN'95 - International Conference on Neural Networks; 1995 Nov 27 - Dec 01; Perth, Australia. IEEE; 1995. pp. 1942-8.

10.1109/ICNN.1995.488968

Gerges F, Zouein G, Azar D. Genetic algorithms with local optima handling to solve sudoku puzzles. In: Proceedings of the 2018 International Conference on Computing and Artificial Intelligence. Association for Computing Machinery; 2018. pp. 19-22.

10.1145/3194452.3194463

Mockus J. The Bayesian approach to global optimization. In: Drenick RF, Kozin F, editors. System modeling and optimization. 1982. p. 473-81.

10.1007/BFb0006170

Močkus J. On Bayesian methods for seeking the extremum. In: Optimization Techniques IFIP Technical Conference Novosibirsk; 1974 Jul 1-7. 1975. pp. 400-4.

10.1007/3-540-07165-2_55

Kingma DP, Welling M. Auto-encoding variational bayes. arXiv. [Preprint.] Dec 10, 2022 [accessed on 2024 Sep 23]. Available from:

Goodfellow IJ, Pouget-Abadie J, Mirza M, et al. Generative adversarial networks. arXiv. [Preprint.] Jun 10, 2014 [accessed on 2024 Sep 23]. Available from:

_{2}O-NaCl and carbon oxide compounds with a symmetry-driven structure search algorithm

_{2}alloy

_{2}AgBiBr

_{6}perovskite through order–disordered transition: a first-principle study

_{2}S

_{2}in Li–S batteries: a first-principles study

_{x}(BN)

_{1−x}biphenylene networks

_{2}

_{2}

_{2}with high-

_{c}superconductivity

Titsias M. Variational learning of Inducing variables in sparse Gaussian processes. In: Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics. 2009. pp. 567-74. Available from:

_{2}: a system featuring lone pair structure candidates

_{2}and RuO

_{2}

Zhang L, Han J, Wang H, Saidi WA, Car R, E W. End-to-end symmetry preserving inter-atomic potential energy model for finite and extended systems. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems; Montréal, Canada. 2018. Available from:

Zeni C, Pinsler R, Zügner D, et al. MatterGen: a generative model for inorganic materials design. arXiv. [Preprint.] Jan 29, 2024 [accessed on 2024 Sep 23]. Available from:

Xie T, Fu X, Ganea OE, Barzilay R, Jaakkola T. Crystal diffusion variational autoencoder for periodic material generation. arXiv. [Preprint.] Mar 14, 2022 [accessed on 2024 Sep 23]. Available from:

Hoffmann J, Maestrati L, Sawada Y, Tang J, Sellier JM, Bengio Y. Data-driven approach to encoding and decoding 3-D crystal structures. arXiv. [Preprint.] Sep 3, 2019 [accessed on 2024 Sep 23]. Available from:

Gilmer J, Schoenholz SS, Riley PF, Vinyals O, Dahl GE. Neural message passing for quantum chemistry. arXiv. [Preprint.] Jun 12, 2017 [accessed on 2024 Sep 23]. Available from:

Haynes WM. CRC handbook of chemistry and physics. CRC Press; 2014. Available from:

_{2}electrocatalysts using active machine learning

Duvenaud DK, Maclaurin D, Iparraguirre J, et al. Convolutional networks on graphs for learning molecular fingerprints. arXiv. [Preprint.] Nov 3, 2015 [accessed on 2024 Sep 23]. Available from:

Li Y, Tarlow D, Brockschmidt M, Zemel R. Gated graph sequence neural networks. arXiv. [Preprint.] Sep 22, 2017 [accessed on 2024 Sep 23]. Available from:

Battaglia PW, Pascanu R, Lai M, Rezende D, Kavukcuoglu K. Interaction networks for learning about objects, relations and physics. arXiv. [Preprint.] Dec 1, 2016 [accessed on 2024 Sep 23]. Available from:

Bruna J, Zaremba W, Szlam A, LeCun Y. Spectral networks and locally connected networks on graphs. arXiv. [Preprint.] May 21, 2014 [accessed on 2024 Sep 23]. Available from:

Batatia I, Kovács DP, Simm GNC, Ortner C, Csányi G. MACE: higher order equivariant message passing neural networks for fast and accurate force fields. arXiv. [Preprint.] Jan 26, 2023 [accessed on 2024 Sep 23]. Available from:

Han S, Pool J, Tran J, Dally WJ. Learning both weights and connections for efficient neural networks. arXiv. [Preprint.] Oct 30, 2015 [accessed on 2024 Sep 23]. Available from:

Ramesh A, Pavlov M, Goh G, et al. Zero-shot text-to-image generation. arXiv. [Preprint.] Feb 26, 2021 [accessed on 2024 Sep 23]. Available from:

Yu J, Xu Y, Koh JY, et al. Scaling autoregressive models for content-rich rext-to-image generation. arXiv. [Preprint.] Jun 22, 2022 [accessed on 2024 Sep 23]. Available from:

Ho J, Chan W, Saharia C, et al. Imagen video: high definition video generation with diffusion models. arXiv. [Preprint.] Oct 5, 2022 [accessed on 2024 Sep 23]. Available from:

Singer U, Polyak A, Hayes T, et al. Make-a-video: text-to-video generation without text-video data. arXiv. [Preprint.] Sep 29, 2022 [accessed on 2024 Sep 23]. Available from:

Anil R, Dai AM, Firat O, et al. PaLM 2 technical report. arXiv. [Preprint.] Sep 13, 2023 [accessed on 2024 Sep 23]. Available from:

Nouira A, Sokolovska N, Crivello JC. CrystalGAN: learning to discover crystallographic structures with generative adversarial networks. arXiv. [Preprint.] May 25, 2019 [accessed on 2024 Sep 23]. Available from:

Ho J, Jain A, Abbeel P. Denoising diffusion probabilistic models. arXiv. [Preprint.] Dec 16, 2020 [accessed on 2024 Sep 23]. Available from:

_{n}(

_{C}superconducting lanthanum and yttrium hydrides at high pressure

_{3}post-perovskite in super-Earth mantles

_{use}: multi-algorithm collaborative crystal structure prediction

_{2}phases with low band gaps by a multiobjective global optimization approach

Yang S, Cho K, Merchant A, et al. Scalable diffusion for materials generation. arXiv. [Preprint.] Jun 3, 2024 [accessed on 2024 Sep 23]. Available from:

Jiao R, Huang W, Lin P, et al. Crystal structure prediction by joint equivariant diffusion. arXiv. [Preprint.] Mar 7, 2024 [accessed on 2024 Sep 23]. Available from:

Gruver N, Sriram A, Madotto A, Wilson AG, Zitnick CL, Ulissi Z. Fine-tuned language models generate stable inorganic materials as text. arXiv. [Preprint.] Feb 6, 2024 [accessed on 2024 Sep 23]. Available from:

Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting.

Dietterich TG. Ensemble methods in machine learning. In: Multiple classifier systems. Springer Berlin Heidelberg; 2000. pp. 1-15.

10.1007/3-540-45014-9_1