Clustering Molecular Energy Landscapes by Adaptive Network Embedding

In order to efficiently explore the chemical space of all possible small molecules, a common approach is to compress the dimension of the system to facilitate downstream machine learning tasks. Towards this end, we present a data driven approach for clustering potential energy landscapes of molecular structures by applying recently developed Network Embedding techniques, to obtain latent variables defined through the embedding function. To scale up the method, we also incorporate an entropy sensitive adaptive scheme for hierarchical sampling of the energy landscape, based on Metadynamics and Transition Path Theory. By taking into account the kinetic information implied by a system's energy landscape, we are able to interpret dynamical node-node relationships in reduced dimensions. We demonstrate the framework through Lennard-Jones (LJ) clusters and a human DNA sequence.


Introduction
The motivation of the project is the fundamental question of chemical spaces: how many organic molecules can be formed, and of these, how can we identify molecules with useful properties which can be chemically synthesized.Understanding how such molecules function in biological systems will have a tremendous impact on development of new drugs and new treatment of diseases [1].For example, the GDB-17 dataset [2] takes into account only molecules allowed by valency rules, excluding those unstable or unsynthesizable due to strained topologies or reactive functional groups, thereby reducing the enumeration to a manageable database size of 166.4 billion molecules formed of up to 17 atoms of C, N, O, S, and halogens.Fast nearest neighbors searching of large generated datasets like GDBs has led to methods for virtual screening and visualization of druglike molecules, with early success in neurotransmitter receptor and transporter ligands.
Most applications in biology and chemistry, such as protein folding, involve systems that behave according to some potential energy landscape of complex structure with a large number of local minima, saddle points (transition states), entropic plateaus and deep energy wells.Existing methods for energy landscape analysis focus on identifying local minima via geometric optimization, and finding transition states connecting them using steepest descent pathways [3].To understand the dynamics over multiple magnitudes of space and time scales, we can take the viewpoint of the system as a network of local energy minima and entropic basins, connected by edges weighted according to the energy and entropy barriers that must be crossed for transitions between metastable states.
The proposed research is to apply recent Network Embedding techniques [4,5,6,7] to develop a data driven approach for clustering of potential energy landscapes and identifying latent variables of molecular structures, to further facilitate sampling and optimization of the chemical spaces and developing generative models for druglike small molecules.The latent variables are given by the output of the embedding function.By incorporating energetic information, we will be able to interpret node-node relationships in reduced dimensions that are consistent with chemical kinetics and are more likely to be aligned with synthesizability.
One multiscale challenge is due to the presence of deep potential wells.Metadynamics [8] uses a non-Markovian random walk to explore an energy landscape, which is smoothed by additive Gaussian terms after each step, and so is the transition probabilities of the associated random walk.As this happens, the process is discouraged from revisiting the lowest energy states repeatedly.The eventual output is to have a flattened energy landscape, as well as more efficient random walk samplings.The original potential can be recreated by subtracting the additive Gaussian terms.
To study transition processes in complex systems with rugged energy landscape dominated by entropic effects, such that transitions involving a flat region on the potential surface that is favorable entropically and the necessity to decrease entropy to exit from this region, it is necessary to examine the ensemble of all the transition paths as a probability space.The Transition Path Theory (TPT) [9] studies statistical properties of the reactive trajectories such as rates and dominant reaction pathways through probability currents between adjacent states.In [10], the definition of probability current in TPT was generalized from edges to individual nodes and networks, for characterizing transition states in the form of subnetworks.
In this article, we use Network Embedding techniques in combination with Metadynamics and TPT to produce adaptive embeddings that hierarchically convey information about the system's behavior at different scales.We adjust the edge weights of the network in a way that parallels Metadynamics to encourage exploration away from the local energy minima, and adopt TPT to capture micro dynamical features of interest.It is shown that these embeddings provide an effective way to understand and visualize inter-node relationships.
The rest of this article is structured as follows.In Section 2, we provide some background on Network Embedding, Metadynamics and TPT.In Section 3, we discuss more details in the implementations and demonstrate our method through Lennard-Jones (LJ) clusters.Section 4 contains an application to a less homogeneous system: DNA folding in a human telomere.The basic setup is an undirected network G(S, E) with node set S and edge set E, while generalizations to directed graphs is straightforward.Let |S| = n and A n×n be the weighted adjacency matrix of the network with weight a ij ≥ 0 between nodes v i and v j .The output will be a map ( For methods discussed in this paper, the encoding function (1), referred as direct encoding, is simply a lookup matrix: contains the embedding vectors for all nodes v i ∈ S, and e i is an indicator vector, therefore f (v i ) simply gives the ith column of Z.The set of trainable parameters for direct encoding approaches is the embedding matrix Z, which is to be optimized directly.Vector a i = {a ik } n k=1 denotes the first order proximity between node v i and other nodes.The second order proximity between v i and v j can be determined by the similarity between a i and a j , which compares the pair's neighborhood structures.
In DeepWalk and Node2vec [4,5], short random walk simulations are used to determine proximity for each pair of nodes.More specifically, random walk runs through nodes of the network, with transition rates determined by the edges weights.Two nodes are close to each other if there is a high probability that a random walk simulation containing one node will also contain the other.Similarities between between embedding nodes is given by the following conditional probabilities based on the SkipGram model: where P (j|i) denotes the conditional probability that a random walk starting at node v i will include node v j , and z i is the embedding of v i .The learning is achieved by minimizing the following cross entropy loss using Stochastic Gradient Descent (SGD) method: where R(i) represents a k-step random walk trial starting from node i.The efficiency of evaluating (2) can be greatly improved by using negative sampling [11] that randomly selects edges favoring less frequent ones.
In this paper, we will adopt a Network Embedding scheme introduced in [7], where the embeddings are produced via a sparse approximation of random walks on networks.For a given undirected network G(A) with adjacency matrix A = (a ij ), let D be the diagonal matrix such that D ii = j a ij .The volume of the graph will be given by v = i D ii .The Laplacian L = I − D −1 A has eigen-decomposition L = ΦΛΦ T , where Λ represents the diagonal matrix of ordered eigenvalues so that 0 = λ 1 ≤ λ 2 ≤ ...λ n , and the eigenvectors are given by columns of Φ denoted by ϕ 1 , ϕ 2 , ....ϕ n .Assuming the network is connected, the discrete Green function satisfies We can further define the commute time ct(i, j) to be the mean time for the Markov process prescribed by transition probability matrix T = D −1 A = I − L on the network to travel from node v i to node v j , and back to v i .It is shown in [12] that the coordinate matrix for embeddings that preserve commute times has the following form To produce an efficient approximation of Θ, we can assume that T is local in the sense that, at least asymptotically, its columns have small support, and high powers of T will be of low rank, which can be justified for potential driven systems by disparate transition rates between different neighboring metastable states.Taking higher powers of T is equivalent to running the Markov chain forward in time, which allows for representations of the random walk, i.e. reaction pathways, at different time scales.Making use of this sparsity, we can produce compressed approximations to the dyadic powers of T with its principal components, by using fast algorithms such as Lanczos Bidiagonalization for singular value decomposition (SVD) with the complexity depending linearly on the number of nonzero elements.
The following scheme is a modification of the Diffusion Wavelet algorithm [13].Starting with T 0 = T , at each iteration, taking U k and Σ k to be the top left singular vectors and singular values of From this, we can have a low-rank approximation to the Green function of the random walk, using the Schultz method: The embedding matrix Θ thus satisfies Θ T Θ = vG, where v is the volume of the network defined as above.Denoting the leading singular values and left singular vectors of the matrix vG by Σ G and U G , we take Θ := Σ We can further use Θ as a starting point, introduce parameters by multiplying its jth column by a weight c j , and optimize {c j }'s to minimize cross entropy loss (3) via SGD.Moreover, for robustness of the algorithm, with certain probability, we can reintroduce singular vectors that were removed in previous truncations, as a residual correction technique.

Metadynamics
Metadynamics was introduced in [8] as a technique to aid in the exploration of energy landscapes.The scheme is to create a non-Markovian, and approximately self-avoiding, random walk by adjusting the gradient of the energy landscape after each step with the addition of derivative of a Gaussian term.Over time, these Gaussians eventually fill up the valleys in the energy potential, which allows the random walk to explore other areas of the landscape and leads to a more complete picture of the system dynamics.Specifically, after each step, the parameter ϕ i , which represents the derivative of the energy with respect to the ith parameter − ∂E ∂xi , is adjusted according to: where W , δ and x t i are the height, width and center of the Gaussian respectively, to be chosen based on prior knowledge of the energy landscape.

Transition Path Theory
The Transition Path Theory (TPT) [9] studies statistical properties of the reactive trajectories such as rates and dominant pathways through probability currents between adjacent states.In the simplest setting, given reactant state A and product state B, any equilibrium path X(t) oscillates infinitely many times between A and B, with each oscillation from A to B being a reaction event.The reactive trajectories are successive pieces of X(t) during which it has left A and on its way to B next, without coming back to A.
The discrete forward committor q + i is defined as the probability that the process starting in node i will first reach B rather than A, and the discrete backward committor q − i is defined as the probability that the process arriving in node v i last came from A rather than B. For Markov processes with infinitesimal generator T = (t ij ), the forward committor satisfies discrete Dirichlet equations: with the boundary condition The backward committor satisfies a similar equation.For time reversible processes, The probability current of reactive trajectories is the average rate at which they flow from one state to another when the process is at statistical equilibrium with distribution π, and can be obtained by To deal with the fact that transitions between any two states can go forward and backward, the effective current can be introduced as

Network Embedding with Metadynamics
In this section, we want to introduce the Network Embedding techniques for energy landscapes with Metadynamics adjustments through a few of examples.

8 atom Lennard-Jones cluster
LJ clusters are often used as a model for atomic or molecular dynamics within a fluid, in which the potential energy E of a given configuration of atoms depends only on distances between atoms: where r ij denotes the distance between atoms i and j, and the parameters σ and ϵ represent pair equilibrium separation and well depth.In the experiments below, we adopt reduced units (e.g.σ = ϵ = 1).
To apply the Network Embedding techniques to the LJ clusters, we first generate a database of local energy minima using the Pele software available at [14].The database was produced using a basin-hopping run of 500 steps, and consists of 8797 local minima connected by 8099 transition states.The database also contains thermodynamic information, including the potential energy at each local minimum.The embeddings here are based on a network constructed with nodes given by the local minima, and edges located between each pair of nodes where a transition state has been identified.The adjacency matrix has entries given by the energy barriers between metastable states.
Initially, every node is embedded into the vector space.Since most points near the global minimum will be close to each other in terms of commute time, this typically results these nodes being embedded around the global minimum.Then, removing nodes that are further away from the global minimum, and re-embedding only the nodes in residual cluster reveals new, more detailed information about the remaining nodes.We do this by creating a new, smaller adjacency matrix including only re-embedded points, and adjusting the edge weights with Gaussian terms according to Metadynamics where θ i is the coordinate of the ith node (energy minimum).Adaptively choosing the center node and removing distant nodes, the process will produce a series of hierarchical embeddings that provide information about the full energy landscape.Each level provides a representation of the energy landscape at a different scale.Figures 1 gives the disconnectivity tree of the original potential and embeddings after applying the Metadynamics adjustment by equation 11 for the 8-atom LJ cluster.The colors on each minimum on the disconnectivity tree are matched to those nodes' embeddings in Figure 1, with some of the nodes colored red, embedded to the same place.In the embeddings, nodes with closer dynamic relationships are clustered together, as a result of the shorter commute time distance between them.
On a larger scale, we can also see how the higher energy nodes relate to the nodes with the two lowest energies.Specifically, the global minimum (in dark blue) is closer to the nodes in the middle of the tree (in orange), while the second lowest local minimum is much closer to the three highest energy nodes.This allows us to draw conclusions about a lowest commute time path through these states: one of the orange colored states might transition directly to the global minimum, while one of the higher energy states would be more likely to transition to the second lowest energy state first, and then either remain there or transition on to the global minimum.For comparison, we also investigated the inter-node relationships by directly computing the commute times between each pair of nodes.The commute time between nodes v i and v j , or mean time for the Markov process on the graph prescribed by the Laplacian L to travel from v i to v j and back to v i , is given by where λ k are the eigenvalues of L, and the ϕ k are the corresponding eigenvectors as defined in the introduction.The off-diagonal entries for the Laplacian for a LJ cluster with κ degrees of freedom are given by where O and O k represent point group orders of the local minima and transition states, respectively, similarly v and v k represent mean vibrational frequencies

Multi-level embeddings of LJ 38 cluster
Figure 3 shows disconnectivity tree of the 38-atom LJ cluster.For simplicity, we construct the adjacency matrix for this network with entries given by the energy barriers between states.Figure 4 gives the hierarchical embedding of the LJ-38 cluster.The left image shows the initial embeddings, colored by their commute time distances from the global minimum.We re-embedded the points of interest with the Metadynamics adjustment to the potential with two iterations and obtain the right image.The hierarchical structure of the embedding process is organized as the following.In the first level embeddings (Fig. 4, left), the nodes are embedded consistently with their commute time distances from the global minimum, which correspond loosely to potential energy level.In particular, the nodes with energy E > −170, which are the highest energy clusters in the tree, are embedded further from the global minimum.In the next embedding, these nodes are removed due to their further distances from the global minimum, and more central clusters will be re-embedded.As we consider the second level and later embeddings, this pattern repeats: each re-embedding reveals a new "layer" of nodes that are embedded closer to the global minimum, which correspond to nodes on the disconnectivity tree that are of lower energy than nodes removed in the previous level.In particular, at the 3rd level (Fig. 4, right), the embeddings have 4 small "spokes" originating from a central cluster containing the global minimum.However, these embeddings provide additional context: nodes that are embedded within a particular "spoke" are more closely related, which means the system is more likely to transition between these states.Since the nodes do not all come from the same group on the disconnectivity tree, these embeddings can also reveal interactions between nodes that aren't indicated on the disconnectivity tree.
Additionally, since the spokes are connected to the cluster containing the global minimum, we can conclude that each spoke represents a potential transition pathway from the outer edge of the cluster to the center.In other words, if the system is at a state represented by the outer point of one of the spokes, its most likely path toward the global minimum will be to travel through the states represented by other nodes in the same spoke.
We can apply similar reasoning to higher level embeddings.Each level reveals a more detailed picture of the dynamics of a different part of the system's energy landscape.The first levels give a coarse-grained picture, only identifying broad groups of high energy and low energy nodes, while later levels give a more fine-grained visualization of the nodes most closely related to the global minimum.
Often, we want a more detailed, finer grained visualization of the energy landscape than the disconnectivity tree in Figure 3 can provide.It is informative, therefore, to re-embed parts of the network of greatest interest to gain further insight.Here we focus on the lowest energy parts of the LJ energy landscape.We repeated the above process using the subnetwork consisting only of the nodes with potential energy < −170.9, that is, the 163 lowest energy nodes.Figure 5 shows the results of this experiment.Now we want to provide a closer inspection on the information flow along the hierarchical sampling with Metadynamics.In the first level without the Metadynamics adjustment, nodes in these embeddings (both Figure 4 and Figure 5) are clustered according to their similarity in terms of commute time.In other words, nodes that can be quickly and frequently reached from either the global minimum or second lowest energy node will be grouped near them.As a result, most of the nodes we are most interested in end up in the same cluster, and the embeddings obtained from the first application of the embedding method only give useful clusters for nodes that are more distant in terms of commute time from the part of the graph of interest.As we progress through additional levels, we pull apart the cluster containing the global minimum, positioned near the origin in the first level's embeddings, until the final level's embeddings give details of the dynamics of the process within this cluster.
After the second level of Embedding with Metadynamics adjustment, if two nodes share a cluster or are close in the embedding space, it indicates that the system can easily transition between those nodes, with a relatively low energy barrier.As a result, the groupings seen in these embeddings correspond to the groupings in the disconnectivity tree for this system [3,14].For instance, one of the clusters in the second level embeddings correspond to the global minimum and its nearest neighbors, pictured directly right of center in the disconnectivity tree in Figure 3. Clusters can be mapped to the disconnectivity tree by comparing the potential energies of the nodes within the cluster to the tree.
The difference is after the third level, some of the clusters instead represent combinations of multiple disconnectivity tree groups; this is a result of reembedding the nodes to spread out those that were previously near the origin.Such nodes ended up being embedded in or near the clusters they are most closely related to, even though they are not actually members of the respective tree groupings.For instance, the global minimum is embedded directly next to a node from a neighboring tree group.
In other words, the first level's embeddings tell us about higher energy nodes and those that are more distant from the global minima, and further levels reveal information about parts of the network that are closer to the global minimum.Additionally, these embeddings are useful for identifying transition paths.The structure of the third level embeddings, in particular, reveals four transition paths: if the system is initialized from a node near the outside of one of these "spokes", its lowest energy path to the global minima will involve following the spoke into the center cluster.We can also use these embeddings to observe the results of entropic changes to the system.In Figure 6, the embeddings for this cluster under two additional temperature conditions are shown.The results of the temperature change are reasonable according to what was observed previously with the 8-atom cluster.Namely, at lower temperatures (Fig. 6, left) closely related nodes are more likely to be embedded much nearer each other, creating the impression that there are fewer embeddings, while at higher temperatures (Fig. 6, right), there is greater variation in the node embeddings.

Human Telomere Folding
Now we want further develop Network Embedding technique for organic molecular structures and apply it to a more complex problem: DNA folding in a human telomere.More specifically, we consider a sequence of 22 nucleotide bases A(G 3 T T A) 3 G 3 which repeats within human telomeres.This sequence is known to form a G-quadruplex, a type of secondary structure formed by groups of four guanine bases called G-tetrads.Its structure and potential energy landscape were previously investigated in [15], where the potential energy landscape was calculated using the HiRE-RNA model for coarse-grained DNA [16] with 6 or 7 atoms considered for each of the 22 nucleotides in the telomere.We use Figure 7: The four-strand G-quadruplex structure (PDB structure 1KF1), with guanine nucleotides colored green.Image produced with Chimera [17].
this database as a starting point.In particular, we construct a network with nodes given by the 4000 lowest-energy local minima, and edges between nodes determined by the transition states connecting them.
For first experiments, we used a random walk based on the energy barriers between states.Figure 8 shows the results of the initial embeddings and fourth levels with metadynamic adjustments, based on an energy landscape adjusted by a Gaussian term with width 0.75 and height 1 after each successive embedding.As in the LJ cluster experiment, the first embedding includes all nodes in the network, while each successive embedding shows a re-embedding of the subnetwork of nodes most closely related to the global minimum (that is, all nodes whose previous embeddings lie within a small tolerance of the global minimum).As with the LJ clusters, each level of embedding represents "zooming in" on the part of the network around the global minimum.The first level gives us a global view -the global minimum and its nearest neighbors (with respect to commute times) are clustered at the origin, represented by a dark blue dot.The red and orange dots furthest away from the origin represent local minima which are more distantly related, requiring multiple steps or higher energies to transition to the global minimum.Potential transition paths can be identified by starting at one of these points, and moving toward the origin along nearby points.At the second and each of the following levels, the nodes embedded closest to the local minimum are re-embedded to give us a more detailed inspection into the relationships of those nodes.The likely transition paths here can be constructed similarly.

Multiscale embedding with TPT
Now we want to make use of the multiscale nature of the molecular dynamics to speed up and scale up the computation.When a subset of the variables evolves more quickly than the others, dimension reduction can be achieved by the quasi-equilibrium on the fast variables such that averaged macroscopic effect can replace microscopic details.For molecular configurations where the energy landscape is "flatter" and state transitions occur faster, this principle of averaging applies.From the space perspective, if the transitions paths between nodes representing different molecular configurations are of relatively low energy barriers, they should be closely related in terms of mean commute time.We can expect these low barrier transitions to describe subtler changes likely involving position changes for only a small number of atoms.In other words, there is a time-space scale separation, i.e., on a global scale, transitions between two states tend to require larger, higher dimensional changes (associated with greater energy expenditures), while locally, transitions within some subnetwork around a point of interest require far fewer degrees of freedom.
For energy landscapes with a large number of local minima, the corresponding networks contain large numbers of nodes.In fact, for the LJ clusters, the number of minima increases exponentially with the number of atoms.In these situations, as we have seen, it is benefiting to have a method for Network Embedding that focuses on locally embedding regions of the network that are of particular interest, for example the subnetwork consisting of the global minimum and its nearest neighbors.Since the commute times between states in this subnetwork are short, the time required for the system to move between these configurations is fast and the nodes tend to embedded near each other.The states in these subnetworks often represent molecular configurations that are similar, differing only by a simple conformational change; therefore these subnetworks can have dramatically reduced dimensions compared with the full network.
We want to present an alternate formulation, which replaces the adjacency matrix with the probability current matrix from TPT.For large networks, particularly those with irregular structures, the committor functions needed to apply TPT to the full network may prove difficult or impractical to compute.Focusing our application of TPT onto a smaller, localized subnetwork avoids this difficulty, allowing us to take advantage of the additional information TPT offers.In the following experiments, we first embed all the local minima using the adjacency matrix constructed with energy barriers, and apply the Network Embedding with Metadynamics.At each level, we remove nodes that have longer commute times to the global minimum, until the system is reduced to a subnetwork such that we can apply TPT and compute the committor functions (8) and probability currents (9).
Then, each subnetwork is embedded into R 3 using the effective probability current in place of the adjacency matrix.Here we apply the hierarchical embedding procedure to the subnetwork of nodes surrounding the global minimum as in previous sections, but the same process could be used to examine other domains of the network aside from the global minimum to obtain a complete picture of the energy landscape.
The distances between nodes within each subnetwork preserve the commute times.Since the cross entropy loss minimization is applied to the overall network at each step, we can expect that the distances between node embeddings to be consistent with likelihoods of a transition, similarly to the previous examples, and therefore we can interpret the embeddings and identify possible transition paths in the same way.We can demonstrate its efficacy on a smaller system-the 8-atom LJ cluster, as illustrated in Figure 9.We still see that the two nodes with potential energies near -19.2 (colored yellow in Figs 1 and 2) are closely connected, and the nodes represented in red and orange are closer to the lowest energy nodes than to each other, but in this case the short distance between the 2 lowest energy nodes reflects a higher transition rate between them.The highest energy nodes, embedded in red, are also embedded separately in this case, whereas their adjacency matrix based embeddings were identical.These embeddings indicate that a transition path (shown in Fig. 9) from the highest energy node to the slightly lower energy node labeled 5 might pass through nodes 1 and 2 on the way.Direct simulations of this system confirm this transition path.Hence, these embeddings can be useful for predicting mechanisms by which a molecule changes between two configurations.
We now return to the human telomere molecule.Figure 10 shows the embeddings produced via the TPT process described above.The colors of local minima in Figure 10 are determined by the commute times between those nodes and the global minimum, which is embedded in dark blue.It is immediately apparent that the local minima are grouped according to these commute times, with separate clusters containing most of the red, yellow, and light blue nodes.Within each of these clusters, the nodes are relatively close in terms of commute times, indicating that these molecular configurations are similar up to some simple molecular change.As with the 8-atom LJ cluster, these embeddings suggest possible transition paths.For example, if the system starts at one of the states furthest from the global minimum (colored in red, clustered in the upper left of the right figure) one would expect a transition path to the global minimum to travel through the light blue and yellow clusters to reach the node in dark blue.It is also worth noting that the embeddings given by the TPT approach appear to retain more nodes compared to the adjacency matrix-based embeddings in Figure 8, which results from the greater number of nonzero edge weights in the probability current matrix.

Conclusion and future work
This article presents a framework for the analysis of energy landscape data.Adaptive network embeddings that combine the ideas of Metadynamics and TPT with Node Embedding techniques can be used to aid in the interpretation and simplification of energy landscape data.In this embedding scheme, the network itself -both its edge weights and the set of nodes under consideration -can be adjusted to more effectively focus on particular areas of the graph.Future research will involve applying and developing the method for functions of specific molecules in the context of certain chemical or biological reacting networks.
We anticipate that these energy landscape-based network embeddings would be used to advance the models currently used to explore the space of small molecules and identify potential new drugs.For example, the latent variables learned from molecular energy landscapes could be incorporated into Variational Autoencoders or other generative models as node attributes [18], in much the same way that 3D representations of molecules are already used.The inclusion of these additional latent variables would lead to a multi-modal generative method that takes into account kinetic information of the molecular systems for generating more realistic, chemically viable molecules.

Figure 1 :
Figure 1: Disconnectivity tree and Metadynamics based embeddings for the Lennard-Jones cluster with 8 atoms.Left: Disconnectivity tree of all local minima.Right: Embeddings for the local minima after applying the Metadynamics adjustment.Color scheme represents the potential energy, e.g., dark blue denotes the lowest, and red as the highest.Closely related minima have very similar or identical embeddings, e.g., both yellow minima are embedded at the yellow point on the right.

Figure 2 :
Figure 2: The 8-atom LJ network of local minima.Edge lengths are proportional to commute times.Node colors are chosen to match those in Figures 1.

Figure 3 :
Figure 3: Disconnectivity tree for the 38-atom LJ cluster.The structures of the two lowest-energy configurations are also pictured.

Figure 4 :
Figure 4: Hierarchical embeddings for the LJ cluster with 38 atoms.Pictured are the embeddings before (left) and after (right) applying Metadynamics.Color scheme denotes commute time from the global minimum, with dark blue being shortest distances, and red as furthest distances.

Figure 5 :
Figure 5: Embeddings of the local minima of the 38-atom LJ cluster with potential energies less than -170.9.The figure shows output of 2 level embeddings with Metadynamics adjustment.Color scheme denotes commute time from the global minimum.

Figure 6 :
Figure 6: Metadynamics-based embeddings for the 38-atom cluster at the temperatures T = 0.08 (left) and T = 1 (right).Color scheme denotes commute time from the global minimum.

Figure 8 :
Figure 8: Embeddings of the local minima network for the human telomere sequence, based on the adjacency matrix and the Metadynamics adjustment.Color scheme denotes commute time from the global minimum.

Figure 9 :
Figure 9: Hierarchical embeddings of the local minima network for the 8 atom LJ cluster, based on a TPT-based subnetwork consisting of the nodes clustered around the global minimum (in dark blue), and the metadynamic adjustment.Colors of embeddings denote potential energy same as in previous illustrations.

Figure 10 :
Figure 10: Hierarchical embeddings of the local minima network for the human telomere sequence.Left: the embeddings of the full network based on the adjacency matrix.Right: embeddings of the subnetwork near the global minimum using Metadynamics and TPT.Colors of embeddings denote commute time distance from the global minimum.
Making use of the sparsity of networks, recently developed Network Embedding methods can scale up linearly with regard to the number of edges.Major techniques include factorization of functions of the adjacency matrix, random walks samplings of node neighborhoods, and deep network learning techniques.The idea is that nodes with close proximities in the network should have similar embeddings in the latent space.