﻿<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "http://jats.nlm.nih.gov/publishing/1.0/JATS-journalpublishing1.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-id journal-id-type="nlm-ta">Complex Eng. Syst.</journal-id>
      <journal-id journal-id-type="publisher-id">COMENGSYS</journal-id>
      <journal-title-group>
        <journal-title>Complex Engineering Systems</journal-title>
      </journal-title-group>
      <issn pub-type="epub">2770-6249</issn>
      <publisher>
        <publisher-name>OAE Publishing Inc.</publisher-name>
      </publisher>
    </journal-meta>
    <article-meta>
	 <article-id pub-id-type="doi">10.20517/ces.2025.88</article-id>
      <article-categories>
        <subj-group>
          <subject>Research Article</subject>
        </subj-group>
      </article-categories>
      <title-group>
        <article-title>Drug-target interaction prediction via hierarchical gated attention and information bottleneck</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <name>
            <surname>Song</surname>
            <given-names>Shengli</given-names>
          </name>
        </contrib>
        <contrib contrib-type="author">
          <name>
            <surname>Chen</surname>
            <given-names>Zihao</given-names>
          </name>
        </contrib>
        <contrib contrib-type="author">
          <name>
            <surname>Wang</surname>
            <given-names>Yihan</given-names>
          </name>
        </contrib>
        <contrib contrib-type="author">
          <name>
            <surname>Guo</surname>
            <given-names>Quanming</given-names>
          </name>
        </contrib>
        <contrib contrib-type="author" corresp="yes">
          <name>
            <surname>Guo</surname>
            <given-names>Yanbu</given-names>
          </name>
          <xref ref-type="corresp" rid="cor1" />
        </contrib>
      </contrib-group>
      <aff id="I">College of Software Engineering, Zhengzhou University of Light Industry, Zhengzhou 450001, Henan, China.</aff>
      <author-notes>
        <corresp id="cor1">Correspondence to: Dr. Yanbu Guo, College of Software Engineering, Zhengzhou University of Light Industry, Zhengzhou 450001, Henan, China. E-mail: <email>guoyanbu@zzuli.edu.cn</email></corresp>
     
	 
	 
	<fn fn-type="other">
          <p>
            <bold>Received:</bold> 28 Dec 2025 | <bold>First Decision:</bold> 13 Feb 2026 | <bold>Revised:</bold> 19 Mar 2026 | <bold>Accepted:</bold> 15 Apr 2026 | <bold>Published:</bold> 28 May 2026</p>
        </fn>
        <fn fn-type="other">
          <p>
            <bold>Academic Editor:</bold> Wenwu Yu | <bold>Copy Editor:</bold> Fangling Lan |  <bold>Production Editor:</bold> Fangling Lan</p>
        </fn>
      </author-notes>
	  <pub-date pub-type="ppub">
        <year>2026</year>
      </pub-date>
      <pub-date pub-type="epub">
        <day>28</day>
        <month>5</month>
        <year>2026</year>
      </pub-date>
     <volume>6</volume>
	  <issue>2</issue>
	 <elocation-id>10</elocation-id>
	 
	
	 
      <permissions>
        <copyright-statement>© The Author(s) 2026.</copyright-statement>
        <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
          <license-p>© The Author(s) 2026. <bold>Open Access</bold> This article is licensed under a Creative Commons Attribution 4.0 International License (<uri xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</uri>), which permits unrestricted use, sharing, adaptation, distribution and reproduction in any medium or format, for any purpose, even commercially, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.</license-p>
        </license>
      </permissions>
      <abstract>
        <p>Accurate drug-target interaction (DTI) prediction is essential for drug repositioning and accelerating drug discovery. Deep learning methods have made remarkable progress over traditional biological experiments, yet existing models often fail to capture local node topologies and multi-view semantic dependencies simultaneously. Moreover, most methods rely on basic loss functions that cannot filter out redundant noise, hindering the learning of compact and discriminative node representations. In this work, we propose a DTI prediction framework that integrates a hierarchical gated multi-head attention (HGMA) mechanism with an information bottleneck (IB) strategy. HGMA adopts a two-layer architecture: the first layer performs weighted aggregation over semantic meta-paths, and the second layer fuses attention heads via an adaptive gating mechanism, enhancing drug and target representations. The IB module compresses inputs by removing task-irrelevant redundancy while preserving predictive information, improving discriminability and generalization. Extensive experiments show that our model consistently outperforms state-of-the-art methods in both accuracy and robustness.</p>
      </abstract>
      <kwd-group>
        <kwd>Complex biological systems</kwd>
        <kwd>drug-target interaction</kwd>
        <kwd>hierarchical gated mechanism</kwd>
        <kwd>graph attention</kwd>
        <kwd>information bottleneck</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec1">
      <title>1. INTRODUCTION</title>
      <p>Drug-target interaction (DTI) prediction is a fundamental task in drug discovery, facilitating the identification of potential interactions between drugs and their cognate biological targets<sup>[<xref ref-type="bibr" rid="B1">1</xref>]</sup>. Accurate DTI prediction provides critical insights into drug repositioning, polypharmacology, resistance mechanism analysis, and adverse effect prediction<sup>[<xref ref-type="bibr" rid="B2">2</xref>]</sup>. Although <italic>in vitro</italic> experimental methods for assessing DTIs are regarded as reliable, they are often expensive, time-consuming, and impractical for large-scale datasets. Consequently, computational approaches for DTI prediction have garnered increasing attention<sup>[<xref ref-type="bibr" rid="B3">3</xref>]</sup> owing to their ability to deliver rapid and accurate interaction predictions.</p>
      <p>raditional computational methods fall into structure-based and ligand-based categories<sup>[<xref ref-type="bibr" rid="B4">4</xref>]</sup>. Structure-based methods (e.g., molecular docking) rely on three-dimensional (3D) protein structures, but their applicability is limited by the scarcity of experimentally resolved targets. Ligand-based methods assume that similar ligands exhibit similar activities, yet they depend heavily on known ligand-target interactions, which limits performance in data-sparse scenarios<sup>[<xref ref-type="bibr" rid="B5">5</xref>]</sup>. To address these limitations, machine learning-based approaches have been increasingly explored. Under the "guilt-by-association" principle—that similar drugs interact with similar targets—these methods infer unobserved DTIs from known interactions<sup>[<xref ref-type="bibr" rid="B6">6</xref>]</sup>. Machine learning methods for DTI prediction include traditional, network-based, and deep learning-based approaches. Traditional methods use handcrafted features from drug chemical structures and protein sequences, employing classifiers such as support vector machines (SVMs)<sup>[<xref ref-type="bibr" rid="B7">7</xref>]</sup>, logistic regression (LR)<sup>[<xref ref-type="bibr" rid="B8">8</xref>]</sup>, random forest (RF)<sup>[<xref ref-type="bibr" rid="B9">9</xref>]</sup>, and k-nearest neighbors (KNNs)<sup>[<xref ref-type="bibr" rid="B10">10</xref>]</sup>. For instance, Yamanishi <italic>et al</italic>.<sup>[<xref ref-type="bibr" rid="B11">11</xref>]</sup> proposed a kernel-based method integrating chemical and genomic spaces for binary DTI classification, while Jacob <italic>et al.</italic><sup>[<xref ref-type="bibr" rid="B12">12</xref>]</sup> introduced a multi-task SVM framework to capture inter-target relationships. Despite their successes, reliance on manual features and shallow representations limits their ability to capture complex, nonlinear relationships in high-dimensional, sparse, and heterogeneous biomedical data.</p>
      <p>Graph-based deep learning methods have emerged as a promising paradigm for modeling biomedical network topologies. However, existing approaches face substantial challenges. First, biomedical networks are heterogeneous, encompassing diverse node and edge types that standard graph neural networks (GNNs) struggle to capture in terms of high-order semantic dependencies. Meta-paths offer multi-perspective semantics for node representations. Second, traditional graph convolutions indiscriminately aggregate neighbors, neglecting their varying importance and thereby limiting fine-grained local feature extraction. Attention mechanisms address this limitation by adaptively weighting task-relevant neighbors, enabling the simultaneous modeling of local and global dependencies. Moreover, heterogeneous networks contain noisy, redundant information. Naïve integration of multi-view features from meta-paths and attention mechanisms can yield ambiguous embeddings due to redundancy. To this end, the information bottleneck (IB) principle<sup>[<xref ref-type="bibr" rid="B13">13</xref>]</sup> compresses input data into compact drug-target representations, filtering noise while preserving predictive features.</p>
      <p>Motivated by these challenges, we propose a DTI prediction framework (HGMAIB) tailored to heterogeneous biomedical networks. HGMAIB has two core components. The hierarchical gated multi-head attention (HGMA) module employs a two-tier structure: the first layer performs weighted aggregation over semantic meta-paths to capture multi-view dependencies; the second layer adaptively fuses attention head outputs via a gating mechanism, enhancing local structural representations of drugs and targets. The IB module further compresses redundant information while retaining task-relevant features, enabling the learning of compact and discriminative latent representations. The main contributions are summarized as follows:</p>
      <p>• A HGMA mechanism is proposed to jointly model fine-grained local structural interactions and multi-view semantic information of drug and target nodes within heterogeneous biomedical networks.</p>
      <p>• An IB module is integrated to effectively suppress redundant features and noise while preserving key predictive information, yielding compact and discriminative joint representations for DTI prediction.</p>
      <p>• Extensive experiments conducted on two widely used benchmark DTI datasets demonstrate that the proposed HGMAIB model consistently outperforms several state-of-the-art methods.</p>
    </sec>
    <sec id="sec2">
      <title>2. RELATED WORK</title>
      <sec id="sec2-1">
        <title>2.1. Graph computational method-based DTI prediction</title>
        <p>Early DTI prediction methods framed the task as link prediction on heterogeneous networks, learning topology-preserving embeddings. DTINet<sup>[<xref ref-type="bibr" rid="B14">14</xref>]</sup> integrated multimodal features via random walks followed by dimensionality reduction. With the rise of GNNs<sup>[<xref ref-type="bibr" rid="B15">15</xref>]</sup>, NeoDTI<sup>[<xref ref-type="bibr" rid="B16">16</xref>]</sup> jointly modeled structural and attribute information through neighborhood aggregation, while GCN-DTI<sup>[<xref ref-type="bibr" rid="B17">17</xref>]</sup> and EEG-DTI<sup>[<xref ref-type="bibr" rid="B18">18</xref>]</sup> used graph convolutional networks (GCNs) to extract drug and protein features separately. However, these approaches rely on simplistic aggregation (e.g., mean or sum pooling), failing to capture fine-grained local structural characteristics. To better exploit semantic dependencies, researchers have introduced attention mechanisms and meta-paths. IMCHGAN<sup>[<xref ref-type="bibr" rid="B19">19</xref>]</sup> employed hierarchical attention to distinguish meta-path semantics, and AMGDTI<sup>[<xref ref-type="bibr" rid="B20">20</xref>]</sup> adaptively fused multi-view features from multiple meta-paths. DHGT-DTI<sup>[<xref ref-type="bibr" rid="B21">21</xref>]</sup> further proposed a dual-view heterogeneous graph to capture local and global interaction patterns. Nevertheless, naïve fusion of multi-view features often leads to information redundancy, and high-dimensional heterogeneous structures introduce noise, degrading representation compactness<sup>[<xref ref-type="bibr" rid="B22">22</xref>]</sup>. More recent methods adopt contrastive learning and causal inference to enhance model robustness and address data sparsity. CE-DTI<sup>[<xref ref-type="bibr" rid="B23">23</xref>]</sup> incorporated causal inference to mitigate bias; SGCL-DTI<sup>[<xref ref-type="bibr" rid="B24">24</xref>]</sup> and SHGCL-DTI<sup>[<xref ref-type="bibr" rid="B25">25</xref>]</sup> applied structure-aware contrastive learning to maximize mutual information between local and global views; DSS-DTI<sup>[<xref ref-type="bibr" rid="B26">26</xref>]</sup> used a dual-scale spatiotemporal framework; and MIDTI<sup>[<xref ref-type="bibr" rid="B27">27</xref>]</sup> employed multi-view interaction modeling. Despite these advances, balancing fine-grained local substructure extraction with global semantic redundancy reduction remains a key challenge, motivating a framework that integrates structural gating with information compression.</p>
      </sec>
      <sec id="sec2-2">
        <title>2.2. Graph attention network-based applications</title>
        <p>The attention mechanism, inspired by human visual cognition, enables models to focus on salient input features while suppressing irrelevant information. Graph Attention Networks (GATs)<sup>[<xref ref-type="bibr" rid="B28">28</xref>]</sup> extend this paradigm to non-Euclidean domains. Attention mechanisms were first introduced to address long-range dependencies in machine translation. The Transformer relies entirely on self-attention to capture global dependencies, outperforming recurrent neural networks (RNNs). Attention mechanisms also capture spatial relationships and regions of interest. Non-local neural networks<sup>[<xref ref-type="bibr" rid="B29">29</xref>]</sup> used self-attention to model pixel-level long-range dependencies, overcoming convolution’s receptive field limits. Vision Transformers<sup>[<xref ref-type="bibr" rid="B30">30</xref>]</sup> further demonstrate that purely attention-based architectures excel at image recognition. Moreover, biological data (e.g., molecular structures, protein interaction networks) naturally lend themselves to graph representations. AlphaFold<sup>[<xref ref-type="bibr" rid="B31">31</xref>]</sup> heavily utilizes attention to predict 3D protein structures by modeling pairwise residue relationships. For molecular property prediction, GATs learn molecular fingerprints by attending to critical atoms and functional groups<sup>[<xref ref-type="bibr" rid="B32">32</xref>]</sup>. In protein-protein interaction (PPI) prediction, attention identifies key interface residues by weighting biologically active regions. These successes underscore the ability of graph attention mechanisms to capture both local structural details and global functional dependencies in biomedical data.</p>
      </sec>
    </sec>
    <sec id="sec3">
      <title>3. METHODS</title>
      <sec id="sec3-1">
        <title>3.1. Overall framework</title>
        <p>The proposed HGMAIB model comprises four core modules [<xref ref-type="fig" rid="fig1">Figure 1</xref>]. (1) Node representation initialization: Node2Vec<sup>[<xref ref-type="bibr" rid="B33">33</xref>]</sup> generates low-dimensional embeddings for drug and protein nodes as informative input features; (2) Adaptive meta-path selection: Automatically identifies discriminative semantic paths to guide high-order semantic information propagation and aggregation; (3) Multi-semantic structural modeling: Employs a multi-step graph convolutional framework with weighted residual connections to capture deep semantic dependencies while mitigating gradient vanishing and feature over-smoothing. A hierarchical gated multi-head attention mechanism further learns node-level local structural patterns and integrates multi-perspective semantic information. (4) IB: Compresses redundant information and extracts task-relevant joint embeddings, enhancing discriminative power and generalization.</p>
        <fig id="fig1" position="float">
          <label>Figure 1</label>
          <caption>
            <p>(A) Overall architecture of HGMAIB: (a) Node representation initialization - Node2Vec extracts drug and target features. (b) Adaptive meta-path search - A dynamic search strategy identifies informative semantic paths. (c) Multi-semantic structural modeling - Multi-step residual graph convolution and hierarchical gated multi-head attention (HGMA) capture deep dependencies. (d) Information bottleneck - Learned embeddings are refined to filter redundant noise. (B) HGMA stacking process: The hierarchical architecture performs weighted aggregation within each attention head, followed by a gating mechanism that integrates information across all heads.</p>
          </caption>
          <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="ces5088.fig.1.jpg" />
        </fig>
      </sec>
      <sec id="sec3-2">
        <title>3.2. Node representation initialization module</title>
        <p>By simulating biased random walks, Node2Vec captures both local and global structural information and generates low-dimensional dense vectors for each node, serving as input features for downstream GNNs [<xref ref-type="fig" rid="fig2">Figure 2</xref>]. First, an undirected weighted bipartite graph is constructed from the drug-target adjacency matrix, where drugs (D1-D4) and targets (T1-T4) form two distinct node types, and edge weights denote known interactions. To prevent data leakage, the bipartite graph used for Node2Vec initialization is dynamically rebuilt during each five-fold cross-validation iteration: all test-set edges are explicitly masked and excluded from the adjacency matrix before random walk generation. Node2Vec is then applied to the masked graph to obtain initial node embeddings. It then simulates fixedlength random walks by interpolating between breadth first search (BFS) and depth first search (DFS). Two hyperparameters control walk preferences: the return parameter p (the likelihood of revisiting the previous node) and the inout parameter q (the tendency to explore local vs. distant nodes), thereby balancing structural equivalence and homophily.</p>
        <fig id="fig2" position="float">
          <label>Figure 2</label>
          <caption>
            <p>Workflow of Node2Vec for node representation learning.</p>
          </caption>
          <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="ces5088.fig.2.jpg" />
        </fig>
        <p>The resulting walk sequences are treated as “sentences” and fed into a skip-gram model<sup>[<xref ref-type="bibr" rid="B34">34</xref>]</sup>. Originally designed for natural language processing (NLP), the skip-gram model maximizes word co-occurrence probabilities; in graphs, it captures the co-occurrence of adjacent nodes within the same walk, encoding contextual structure. After training, each node is mapped to a dense low-dimensional embedding that preserves the drug-target graph topology. These embeddings serve as informative initial inputs for subsequent GNNs, enhancing their ability to perceive both topological and semantic features.</p>
      </sec>
      <sec id="sec3-3">
        <title>3.3. Adaptive metapath selection module</title>
        <p>Inspired by AMGDTI<sup>[<xref ref-type="bibr" rid="B20">20</xref>]</sup>, we adopt a dynamic search strategy to model high-order semantic dependencies in heterogeneous graphs. This mechanism automatically generates discriminative meta-paths, constructing a guided graph structure that enhances semantic propagation and feature aggregation. Its key advantage is eliminating manual path design by adaptively identifying optimal meta-path combinations, thereby improving the model’s ability to represent complex semantic relationships.</p>
        <p>Specifically, this module constructs a directed acyclic adaptive meta-graph, denoted as M = (V<sub>m</sub>, E<sub>m</sub>). The node set V<sub>m</sub> represents the sequence of node feature states {S<sub>0</sub>, S<sub>1</sub>, …, S<sub>T</sub>} after each propagation step within the heterogeneous network, where the total number of nodes is determined by the predefined propagation steps T. The edge set E<sub>m</sub> encodes all possible information propagation strategies. For example, a directed edge labeled "Protein→Disease" from S<sub>0</sub> to S<sub>1</sub> indicates that the feature of a "Disease" node in S<sub>1</sub> is obtained by aggregating the features of "Protein" nodes in S<sub>0</sub>. This structure allows any previous state S<sub>i</sub>∈{S<sub>0</sub>, S<sub>1</sub>,…, S<sub>T-1</sub>} to influence the current state S<sub>T</sub> through skip connections, enabling the model to fully capture complex semantic information embedded in heterogeneous networks.</p>
        <p>In addition, the model adaptively determines each edge in the meta-graph by evaluating whether a previous state contributes to the current state and selecting the most appropriate propagation strategy. This approach treats all edge types as potential propagation patterns, enabling dynamic path selection. Additionally, two auxiliary connection types are introduced: L<sub>1</sub> indicates that the current state is identical to the previous state, while L<sub>2</sub> implies that the previous state does not affect the current state. The adaptive meta-graph comprises 12 possible edge types: {L<sub>DP</sub>, L<sub>PD</sub>, L<sub>DS</sub>, L<sub>SD</sub>, L<sub>DE</sub>, L<sub>ED</sub>, L<sub>PE</sub>, L<sub>EP</sub>, L<sub>DD</sub>, L<sub>PP</sub>, L<sub>1</sub>, L<sub>2</sub>}. The first ten correspond to explicit biological relations within the heterogeneous network (e.g., L<sub>DP</sub>: Drug → Protein, L<sub>ED</sub>: Disease → Drug), while the latter two are auxiliary designs intended to enhance the structural flexibility and semantic expressiveness of the model.</p>
        <p>To mathematically formulate the path-selection optimization and ensure reproducibility, we formally define the candidate connection constraints and the selection mechanism. Let <italic>C<sub>t,i</sub></italic> denote the set of possible connection types from a previous state <italic>S<sub>i</sub></italic> to the current state <italic>S<sub>t</sub></italic>. To avoid meaningless aggregations, <italic>C<sub>t,i</sub></italic> is dynamically constrained based on the propagation step t and the total steps T:</p>
       
	 <p><disp-formula> <label>(1)</label> <tex-math id="E1"> $$ \begin{equation}  \begin{aligned}  C_{t, i} &amp; =\left\{\begin{array}{ll}
E_{M}-\left\{L_{2}\right\}, &amp; i=t-1, t &lt; T \\
E_{M}, &amp; i &lt; t-1, t &lt; T \\
R, &amp; i = t-1, t = T \\
R \cup L_{1} \cup L_{2}, &amp; i &lt; t-1, t = T
\end{array}\right.  \end{aligned} \end{equation} $$ </tex-math>
</disp-formula></p>
		
        <p>where <italic>E<sub>M</sub></italic> represents the full set of 12 candidate edges, and R represents the subset of target-related edges. Subsequently, we assign a learnable structural parameter <inline-formula><tex-math id="M4">$$\theta_{t, i}^{n}$$</tex-math></inline-formula> to each possible connection. To balance exploration and exploitation, a random sampling strategy is introduced. The final selected connection <inline-formula><tex-math id="M4">$$C_{t, i}^{*}$$</tex-math></inline-formula> is defined as:</p>
     
	 <p><disp-formula> <label>(2)</label> <tex-math id="E1"> $$ \begin{equation}  \begin{aligned}  C_{t, i}^{*} &amp; =\left\{\begin{array}{ll}
\theta_{t, i}^{m}, &amp; 1-p_{i} \\
\operatorname{rand}\left(C_{t, i}\right), &amp; p_{i}
\end{array}\right.  \end{aligned} \end{equation} $$ </tex-math>
</disp-formula></p>
	 
	 
        <p>where <italic>m</italic> is the index of the edge type with the maximum parameter value (<inline-formula><tex-math id="M4">$$\theta_{t, i}^{m}=\max \left(\theta_{t, i}^{0}, \ldots, \theta_{t, i}^{n}\right)$$</tex-math></inline-formula>), and rand (<italic>C<sub>t,i</sub></italic>) denotes uniform random sampling from the candidate set. To encourage exploration early in training, the probability <italic>p<sub>i</sub></italic> ∈ (0,1) is initialized to a small value and gradually decays to zero over epochs, ensuring structural determinism during inference. For computational efficiency and reproducibility, the adaptive search space is implemented via dynamic binary masks applied to the initial heterogeneous adjacency matrices. Masked aggregations use sparse tensor operations to manage memory overhead in large-scale biomedical networks.</p>
      </sec>
      <sec id="sec3-4">
        <title>3.4 Multi-Semantic Structural Modeling Module</title>
        <p>(1) Multi-step Weighted residual graph convolution module</p>
          <p>This module aims to capture the topological information and structural dependencies of nodes along different semantic paths by integrating multi-hop graph propagation with residual learning. It enhances the expressive power of node features while effectively mitigating issues such as gradient vanishing and over-smoothing during deep propagation. Given the input feature matrix <inline-formula><tex-math id="M4">$$X \in R^{N \times d_{m}}$$</tex-math></inline-formula>, where N is the number of nodes and <italic>d<sub>m</sub></italic> is the input dimension, the model first applies a linear transformation to project all node features into a unified hidden space, as defined in Equation (3):</p>
         
<p><disp-formula> <label>(3)</label> <tex-math id="E1"> $$ \begin{equation}  \begin{aligned}  h^{(0)} = XW_{0} + b_{0} \end{aligned} \end{equation} $$ </tex-math>
</disp-formula></p>
<p>where <inline-formula><tex-math id="M4">$$W_{0} \in R^{d_{m} \times d_{h i d}}$$</tex-math></inline-formula> denotes the learnable weight matrix and <italic>b</italic><sub>0</sub> is a bias term. The initial node representation at layer 0 is denoted as <italic>h</italic><sup>(0)</sup>. Subsequently, the model performs a total of steps of graph convolutional propagation. Each step consists of two components. The first component is sequential graph convolution: at step <italic>t</italic> ∈ {1, 2……, T}, sparse adjacency propagation is performed as follows.</p>
       <p><disp-formula> <label>(4)</label> <tex-math id="E1"> $$ \begin{equation}  \begin{aligned}  \mathrm{h}_{{seq }}^{(t)}  ={Op}\left(A_{{seq }}^{(t)}, h^{(t-1)}\right) \end{aligned} \end{equation} $$ </tex-math>
</disp-formula></p>
	   
	   
          <p>where <inline-formula><tex-math id="M4">$$A_{seq}^{(t)}$$</tex-math></inline-formula> is the adjacency matrix for the sequential path, <italic>Op</italic>(·) denotes sparse adjacency propagation, and <inline-formula><tex-math id="M4">$$h_{seq}^{(t)}$$</tex-math></inline-formula> represents the output of the graph convolution at step t. We introduce residual connections at each propagation step to mitigate gradient vanishing and feature over-smoothing. Specifically, at step t, all intermediate representations from previous steps <italic>s</italic> ∈ {0,1,…..,t-1} are incorporated. Each of these representations is propagated using its corresponding adjacency matrix <inline-formula><tex-math id="M4">$$A_{res}^{(t,s)}$$</tex-math></inline-formula>, and then combined via a learnable weighted summation, as defined in Equation (5):</p>
       

	   <p><disp-formula> <label>(5)</label> <tex-math id="E1"> $$ \begin{equation}  \begin{aligned} \mathrm{h}_{res}^{(t)}  =\sum_{s=0}^{t-1} \alpha_{s}^{(t)} {Op}\left(A_{r es}^{(t, s)}, h^{(s)}\right) \end{aligned} \end{equation} $$ </tex-math>
</disp-formula></p>
		 
		 
		 
          <p>where <inline-formula><tex-math id="M4">$$\alpha_{s}^{(t)}$$</tex-math></inline-formula> denotes a learnable scalar residual weight at step t during training, <inline-formula><tex-math id="M4">$$A_{res}^{(t,s)}$$</tex-math></inline-formula> represents the adjacency matrix used for residual propagation from step s at step t, and <italic>h</italic><sup>(</sup><italic><sup>s</sup></italic><sup>)</sup> denotes the node representation obtained after the s-th propagation step. To adaptively balance the information flow across different propagation steps, the residual weights <inline-formula><tex-math id="M4">$$\alpha_{s}^{(t)}$$</tex-math></inline-formula> are implemented as learnable parameters. Specifically, these weights are initialized to a constant value of 1.0 to ensure stable and sufficient signal flow during the initial stages of training. During training, <inline-formula><tex-math id="M4">$$\alpha_{s}^{(t)}$$</tex-math></inline-formula> is jointly optimized with the network backbone using the NAdam optimizer via backpropagation. This dynamic adjustment allows the model to adaptively weight the contributions of higher-order dependencies while effectively mitigating the over-smoothing problem commonly associated with deep GNNs.</p>
		  
          <p>The final node representation at step t is obtained by summing the sequential propagation <inline-formula><tex-math id="M4">$$\mathrm{h}_{s e q}^{(t)}$$</tex-math></inline-formula> and residual aggregation results <inline-formula><tex-math id="M4">$$\mathrm{h}_{r e s}^{(t)}$$</tex-math></inline-formula>, as defined in Equation (6):</p>
         
		 <p><disp-formula> <label>(6)</label> <tex-math id="E1"> $$ \begin{equation}  \begin{aligned} \mathrm{h}_{{agg }}^{(t)}  =\mathrm{h}_{s e q}^{(t)}+\mathrm{h}_{r e s}^{(t)}  \end{aligned} \end{equation} $$ </tex-math>
</disp-formula></p>
		 
          <p>After T propagation steps, the final node representation is obtained as <inline-formula><tex-math id="M4">$$\mathrm{h}_{{agg }}^{(t)}$$</tex-math></inline-formula> To improve stability and accelerate convergence, batch normalization (BN) and a non-linear activation function are applied to the final output <inline-formula><tex-math id="M4">$$\mathrm{h}_{{agg }}^{(t)}$$</tex-math></inline-formula> to enhance the model’s representational capacity, defined as follows:</p>
        
		<p><disp-formula> <label>(7)</label> <tex-math id="E1"> $$ \begin{equation}  \begin{aligned}  \mathrm{h}^{(t)} =\sigma\left(B N\left(\mathrm{~h}_{{agg }}^{(t)}\right)\right) \end{aligned} \end{equation} $$ </tex-math>
</disp-formula></p>
		
          <p>where σ is the Sigmoid Linear Unit (SiLu) activation function<sup>[<xref ref-type="bibr" rid="B35">35</xref>]</sup>, BN denotes batch normalization<sup>[<xref ref-type="bibr" rid="B36">36</xref>]</sup>, and <inline-formula><tex-math id="M4">$${h_{1}^{(T)}}$$</tex-math></inline-formula> represents the final node embedding obtained from the l-th semantic path after T propagation steps.</p>
         <p>(2) Hierarchical gated multi-head attention module</p>
          <p>To capture local structural dependencies and integrate multi-view semantic information, the HGMA module uses a hierarchical gated multi-head attention mechanism with two layers. The first layer performs weighted aggregation along distinct semantic paths within each attention head. The second layer then fuses information across heads via a gating mechanism. Concretely, after multi-path residual GCN propagation, the representations of each node across all semantic paths are collected. The final representations from all L paths are stacked along the path dimension into a 3D tensor as follows:</p>
         
		 <p><disp-formula> <label>(8)</label> <tex-math id="E1"> $$ \begin{equation}  \begin{aligned}  H =\left[h_{1}^{(T)}, h_{2}^{(T)}, \ldots, h_{L}^{(T)}\right] \in R^{B \times L \times d}  \end{aligned} \end{equation} $$ </tex-math>
</disp-formula></p>
		 
          <p>where <inline-formula><tex-math id="M4">$${h_{1}^{(T)}}$$</tex-math></inline-formula> denotes the final node representation under the l-th semantic path, and B denotes the number of nodes. Let L be the number of semantic paths and <italic>d</italic> the hidden dimension. This tensor is then fed into the attention layers to learn adaptive fusion weights across paths. The first layer applies a path-level attention mechanism to capture representation differences of the same node across different paths. Specifically, for the <italic>h</italic>-th attention head, a feedforward neural network nonlinearly maps the path representations. A shared linear transformation followed by a Tanh activation function<sup>[<xref ref-type="bibr" rid="B37">37</xref>]</sup> is applied to all path representations H, producing transformed representations of each path in the attention space as follows:</p>
         
		 <p><disp-formula> <label>(9)</label> <tex-math id="E1"> $$ \begin{equation}  \begin{aligned}  S^{(h)} =Tanh\left(H W_{h}^{(1)}\right) \end{aligned} \end{equation} $$ </tex-math>
</disp-formula></p>
		 
		 
          <p>where <inline-formula><tex-math id="M4">$$W_{h}^{(1)} \in R^{d \times d_{a}}$$</tex-math></inline-formula> is a learnable attention parameter, and <italic>d<sub>a</sub></italic> denotes the attention dimension, <inline-formula><tex-math id="M4">$$S^{(h)} \in R^{B \times L \times d_{a}}$$</tex-math></inline-formula>. Next, an additional linear transformation compresses each path representation. A softmax operation<sup>[<xref ref-type="bibr" rid="B38">38</xref>]</sup> then normalizes these weights, yielding the attention coefficients:</p>
        
		<p><disp-formula> <label>(10)</label> <tex-math id="E1"> $$ \begin{equation}  \begin{aligned} \alpha_{l}^{(h)}  =\operatorname{soft} \max \left(S^{(h)} W_{h}^{(2)}\right)   \end{aligned} \end{equation} $$ </tex-math>
</disp-formula></p>
		
		
          <p>where <inline-formula><tex-math id="M4">$$W_{h}^{(2)} \in R^{d_{a} \times 1}$$</tex-math></inline-formula> is a learnable attention parameter, and <inline-formula><tex-math id="M4">$$ \alpha_{l}^{(h)} \in R^{B \times L \times 1}$$</tex-math></inline-formula> is the attention weight tensor representing the attention scores of each sample over different paths, with the scores normalized to sum to one across all paths. Subsequently, a weighted sum over all path representations is computed to obtain the aggregated node representation under the corresponding attention head as follows:</p>
         <p><disp-formula> <label>(11)</label> <tex-math id="E1"> $$ \begin{equation}  \begin{aligned} o^{(h)} =\sum_{l=1}^{L} \alpha_{l}^{(h)} \odot h_{1}^{(T)} \end{aligned} \end{equation} $$ </tex-math>
</disp-formula></p>
		
          <p>where ⊙ denotes element-wise multiplication, and <italic>h<sub>l</sub></italic> denotes the node representations under the <italic>l</italic>-th semantic path obtained from Eq. (6). After the first-level attention, each attention head <italic>h</italic> = {1,…,M} outputs an aggregated node representation <italic>o</italic><sup>(</sup><italic><sup>h</sup></italic><sup>)</sup>. The representations are then stacked by the formula:</p>
          
		   <p><disp-formula> <label>(12)</label> <tex-math id="E1"> $$ \begin{equation}  \begin{aligned}  O_{stack} = [o^{(1)}, \ldots, o^{M}]  \end{aligned} \end{equation} $$ </tex-math>
</disp-formula></p>
		  
		  <p>To further regulate the contribution of each attention head to the final representation, a gating mechanism<sup>[<xref ref-type="bibr" rid="B39">39</xref>]</sup> is introduced to adaptively weight and fuse the outputs of all attention heads. Specifically, mean pooling is first applied along the feature dimension d to each attention head’s output to obtain an average activation value, reflecting the overall contribution of each head to the node representation. This value is then mapped to the range [0,1] via a sigmoid function<sup>[<xref ref-type="bibr" rid="B37">37</xref>]</sup>, producing the gating weights <italic>G</italic> as follows:</p>
		  
         <p><disp-formula> <label>(13)</label> <tex-math id="E1"> $$ \begin{equation}  \begin{aligned} G = \sigma\left(\frac{1}{d} \sum_{i = 1}^{d} O_{ {stack }}\right)  \end{aligned} \end{equation} $$ </tex-math>
</disp-formula></p>
		  
          <p>where <italic>σ</italic>(•) denotes the sigmoid function, which maps the input logit to a probability score in the range (0, 1). Finally, the gating weights <italic>G</italic> are multiplied element-wise with the attention head outputs, and summed along the head dimension to obtain the final fused representation as follows:</p>
         <p><disp-formula> <label>(14)</label> <tex-math id="E1"> $$ \begin{equation}  \begin{aligned} H_{{final }} = \sum_{h = 1}^{H} G \odot O_{{stack }} \end{aligned} \end{equation} $$ </tex-math>
</disp-formula></p>
          <p>where ⊙ denotes element-wise multiplication, and <italic>H<sub>final</sub></italic> denotes the final fused node representation obtained by aggregating the outputs of all attention heads through a learnable gating mechanism.</p>
       </sec>
      <sec id="sec3-5">
        <title>3.5. Information bottleneck module</title>
        <p>To learn compact and discriminative drug-target representations, we integrate the IB principle<sup>[<xref ref-type="bibr" rid="B13">13</xref>,<xref ref-type="bibr" rid="B40">40</xref>]</sup> into our framework. IB extracts target-relevant information from inputs by balancing compression and prediction. We adopt a variational implementation, modeling each node representation as a latent probabilistic distribution. Variational inference approximates the posterior, and a Kullback-Leibler (KL) divergence term<sup>[<xref ref-type="bibr" rid="B41">41</xref>]</sup> regularizes it toward a standard normal prior. This mechanism retains discriminative features and filters redundancy. It guides the model to learn robust and compact embeddings. Specifically, for each drug and target node, the encoder outputs the mean <italic>μ</italic> and log-variance log<italic>σ</italic><sup>2</sup> of the latent representation, which parameterize a Gaussian distribution as follows:</p>
        <p><disp-formula> <label>(15)</label> <tex-math id="E1"> $$ \begin{equation}  \begin{aligned}  Z \sim Ν~(\mu , diag(\sigma ^2))  \end{aligned} \end{equation} $$ </tex-math>
</disp-formula></p>
<p>where <italic>diag</italic> denotes the operation of converting a vector into a diagonal matrix. Since direct sampling from this distribution does not support gradient backpropagation, the reparameterization trick<sup>[<xref ref-type="bibr" rid="B42">42</xref>]</sup> is employed to enable differentiable sampling of the latent variables:</p>
       <p><disp-formula> <label>(16)</label> <tex-math id="E1"> $$ \begin{equation}  \begin{aligned} Z =  \mu + \sigma \odot \varepsilon  \end{aligned} \end{equation} $$ </tex-math>
</disp-formula></p>

	    <p>where <italic>ε</italic> denotes a random variable sampled from the standard normal distribution, and ⊙ represents element-wise multiplication. The sampled drug and target representations, <italic>Z<sub>s</sub></italic> and <italic>Z<sub>t</sub></italic>, are then used to compute the interaction prediction score, defined as their inner product:</p>
		
     <p><disp-formula> <label>(17)</label> <tex-math id="E1"> $$ \begin{equation}  \begin{aligned} y_{i j} = Z_{s_{i}}^{T} Z_{t_{j}}   \end{aligned} \end{equation} $$ </tex-math>
</disp-formula></p>
		
        <p>The prediction error is measured using the Binary Cross-Entropy (BCE) loss<sup>[<xref ref-type="bibr" rid="B43">43</xref>]</sup> between the predicted interaction score and the ground truth label:</p>
       <p><disp-formula> <label>(18)</label> <tex-math id="E1"> $$ \begin{equation}  \begin{aligned}  L_{B C E}=-\frac{1}{N} \sum_{i=1}^{N}\left[y_{i} \log \sigma\left(s_{i}\right)+\left(1-y_{i}\right) \log \left(1-\sigma\left(s_{i}\right)\right)\right]\end{aligned} \end{equation} $$ </tex-math>
</disp-formula></p>
	   
        <p>where <italic>y<sub>i</sub></italic> is the ground truth label, <italic>s<sub>i</sub></italic> is the predicted interaction score, N is the total number of training samples, and <italic>σ</italic> is the sigmoid function. To regulate the degree of information compression, the KL divergence is introduced as a regularization term, encouraging the latent variable distribution to approximate a unit Gaussian:</p>
     
	 <p><disp-formula> <label>(19)</label> <tex-math id="E1"> $$ \begin{equation}  \begin{aligned}  L_{K L} = \frac{1}{2} \sum_{i = 1}^{d}\left(\mu_{i}^{2}+\sigma_{i}^{2}-\log \sigma_{i}^{2}-1\right) \end{aligned} \end{equation} $$ </tex-math>
</disp-formula></p>

        <p>The final loss function is formulated as:</p>
      
	  <p><disp-formula> <label>(20)</label> <tex-math id="E1"> $$ \begin{equation}  \begin{aligned} L = L_{B C E}+\beta \cdot\left(L_{K L}^{(S)}+L_{K L}^{(t)}\right) \end{aligned} \end{equation} $$ </tex-math>
</disp-formula></p>
	  
        <p>where <italic>L<sub>BCE</sub></italic> is the BCE loss, <italic>β</italic> is a hyperparameter controlling the strength of the IB regularization, <inline-formula><tex-math id="M4">$$L_{K L}^{(S)}$$</tex-math></inline-formula> and <inline-formula><tex-math id="M4">$$L_{K L}^{(t)}$$</tex-math></inline-formula> denote the KL divergence loss for the drug and target representations, respectively. Theoretically, <italic>β</italic> acts as a Lagrange multiplier that controls the trade-off between the compression of input information and the preservation of predictive sufficiency. A carefully selected <italic>β</italic> ensures that the model filters out redundant structural noise inherent in heterogeneous networks while retaining discriminative features for DTI prediction.</p>
      </sec>
    </sec>
    <sec id="sec4">
      <title>4. EXPERIMENTAL SETUP</title>
      <sec id="sec4-1">
        <title>4.1. Datasets</title>
        <p>To evaluate the proposed method, we used two public benchmark datasets: the Luo dataset<sup>[<xref ref-type="bibr" rid="B14">14</xref>]</sup> and the Zheng dataset<sup>[<xref ref-type="bibr" rid="B44">44</xref>]</sup>. The Luo dataset comprises four biomedical entity types (drugs, proteins, diseases, side effects) and multiple relation types (e.g., drug-target, disease-protein, drug-side effect), offering a heterogeneous network with diverse structural information [<xref ref-type="table" rid="t1">Table 1</xref>]. The Zheng dataset provides rich attribute features—chemical substructures and substituents for drugs, and Gene Ontology (GO) terms for proteins—along with associated edges (e.g., drug-substructure, protein-GO), enabling fine-grained semantic modeling [<xref ref-type="table" rid="t2">Table 2</xref>].</p>
        <table-wrap id="t1">
          <label>Table 1</label>
          <caption>
            <p>Information on nodes and edges in the Luo dataset</p>
          </caption>
          <table frame="hsides" rules="groups">
            <thead>
              <tr>
                <td>
                  <bold>Node type</bold>
                </td>
                <td style="border-bottom:1;">
                  <bold>Num</bold>
                </td>
                <td style="border-bottom:1;">
                  <bold>Edge type</bold>
                </td>
                <td style="border-bottom:1;">
                  <bold>Num</bold>
                </td>
              </tr>
            </thead>
            <tbody>
              <tr>
                <td>Drug</td>
                <td>708</td>
                <td>Drug-drug (interaction)</td>
                <td>10036</td>
              </tr>
              <tr>
                <td>Protein</td>
                <td>1512</td>
                <td>Drug-drug (similarity)</td>
                <td>501264</td>
              </tr>
              <tr>
                <td>Disease</td>
                <td>5603</td>
                <td>Drug-protein</td>
                <td>1923</td>
              </tr>
              <tr>
                <td>Side effect</td>
                <td>4192</td>
                <td>Drug-disease</td>
                <td>199214</td>
              </tr>
              <tr>
                <td />
                <td />
                <td>Drug-side effect</td>
                <td>80164</td>
              </tr>
              <tr>
                <td />
                <td />
                <td>Protein-disease</td>
                <td>1596745</td>
              </tr>
              <tr>
                <td />
                <td />
                <td>Protein-protein (interaction)</td>
                <td>7363</td>
              </tr>
              <tr>
                <td />
                <td />
                <td>Protein-protein (similarity)</td>
                <td>2286144</td>
              </tr>
            </tbody>
          </table>
        </table-wrap>
        <table-wrap id="t2">
          <label>Table 2</label>
          <caption>
            <p>Information on nodes and edges in the Zheng dataset</p>
          </caption>
          <table frame="hsides" rules="groups">
            <thead>
              <tr>
                <td>
                  <bold>Node type</bold>
                </td>
                <td style="border-bottom:1;">
                  <bold>Num</bold>
                </td>
                <td style="border-bottom:1;">
                  <bold>Edge type</bold>
                </td>
                <td style="border-bottom:1;">
                  <bold>Num</bold>
                </td>
              </tr>
            </thead>
            <tbody>
              <tr>
                <td>Drug</td>
                <td>1094</td>
                <td>Drug-drug</td>
                <td>1196836</td>
              </tr>
              <tr>
                <td>Protein</td>
                <td>1556</td>
                <td>Drug-protein</td>
                <td>11819</td>
              </tr>
              <tr>
                <td>Chemical structure</td>
                <td>881</td>
                <td>Drug-chemical substructure</td>
                <td>133880</td>
              </tr>
              <tr>
                <td>Side effect</td>
                <td>4063</td>
                <td>Drug-side effect</td>
                <td>122792</td>
              </tr>
              <tr>
                <td>Substituent</td>
                <td>738</td>
                <td>Drug-substituent</td>
                <td>20798</td>
              </tr>
              <tr>
                <td>GO term</td>
                <td>4098</td>
                <td>Protein-GO term</td>
                <td>35980</td>
              </tr>
              <tr>
                <td />
                <td />
                <td>Protein-protein</td>
                <td>2421136</td>
              </tr>
            </tbody>
          </table>
		    <table-wrap-foot>
            <fn>
              <p>GO: Gene ontology.</p>
            </fn>
          </table-wrap-foot>
        </table-wrap>
      </sec>
      <sec id="sec4-2">
        <title>4.2. Experimental settings</title>
        <p>For reproducibility, all code and datasets are available at <uri xlink:href="https://github.com/chenzh-23/HGMAIB">https://github.com/chenzh-23/HGMAIB</uri>. The models were developed with Python 3.8, PyTorch 1.11+cu113, and SciPy 1.10.1. Data are presented as <InlineParagraph>mean ± standard</InlineParagraph> deviation (SD); statistical significance was assessed by an independent two-sample t test (<italic>P</italic>  &lt; 0.05). Training and evaluation ran on a cloud server with an NVIDIA RTX 3090 GPU. HGMAIB was trained using the NAdam optimizer<sup>[<xref ref-type="bibr" rid="B45">45</xref>]</sup> (learning rate = 0.005, weight decay = 0) for 100 epochs, with 4 attention heads and hidden dimensions of 64 (Luo dataset) or 256 (Zheng dataset). Node2Vec generated initial embeddings (walk length = 100, 10 walks/node, p = q = 1). The IB loss weight β was set to 0.005.</p>
        <p>A fivefold crossvalidation strategy was adopted. In each fold, 60% of the samples were used for training, 20% for validation (hyperparameter tuning), and 20% for testing. To address the severe imbalance between known and unknown interactions, negative samples were randomly undersampled to match positive samples per fold. Two standard evaluation metrics widely used in previous studies for assessing binary classification models in DTI prediction were adopted: the Area Under the Receiver Operating Characteristic Curve (AUC)<sup>[<xref ref-type="bibr" rid="B46">46</xref>]</sup> and the Area Under the Precision-Recall Curve (AUPRC)<sup>[<xref ref-type="bibr" rid="B47">47</xref>]</sup>.</p>
      </sec>
    </sec>
    <sec id="sec5">
      <title>5. RESULTS AND DISCUSSION</title>
      <sec id="sec5-1">
        <title>5.1. Comparison with baselines</title>
        <p>To comprehensively evaluate HGMAIB, we compare it with twelve baseline methods spanning diverse modeling paradigms. For a fair comparison, architectural hyperparameters (e.g., number of layers, embedding dimensions) follow each baseline’s original optimal settings. Training hyperparameters (learning rate, weight decay, early stopping patience) were systematically retuned for each baseline on the Luo and Zheng datasets using validation set performance within our five-fold cross-validation framework. The baselines include:</p>
        <p>• Network-based DTI methods: DTINet<sup>[<xref ref-type="bibr" rid="B14">14</xref>]</sup> and NeoDTI<sup>[<xref ref-type="bibr" rid="B16">16</xref>]</sup>.</p>
        <p>• GNN-based DTI models: GCN-DTI<sup>[<xref ref-type="bibr" rid="B17">17</xref>]</sup>, EEG-DTI<sup>[<xref ref-type="bibr" rid="B18">18</xref>]</sup>, and IMCHGAN<sup>[<xref ref-type="bibr" rid="B19">19</xref>]</sup>.</p>
        <p>• Representation enhancement and contrastive learning-based methods: CE-DTI<sup>[<xref ref-type="bibr" rid="B23">23</xref>]</sup>, SGCL-DTI<sup>[<xref ref-type="bibr" rid="B24">24</xref>]</sup>, MIDTI<sup>[<xref ref-type="bibr" rid="B27">27</xref>]</sup>, and SHGCL-DTI<sup>[<xref ref-type="bibr" rid="B25">25</xref>]</sup>.</p>
        <p>• Recent state-of-the-art heterogeneous graph frameworks: AMGDTI<sup>[<xref ref-type="bibr" rid="B20">20</xref>]</sup>, DSS-DTI<sup>[<xref ref-type="bibr" rid="B26">26</xref>]</sup>, and DHGT-DTI<sup>[<xref ref-type="bibr" rid="B21">21</xref>]</sup>.</p>
        <p>As shown in <xref ref-type="table" rid="t3">Table 3</xref>, HGMAIB consistently achieves the best performance across all evaluation metrics on both benchmark datasets, significantly outperforming all baseline methods. Specifically, HGMAIB attains values of 0.991 ± 0.002 and 0.990 ± 0.002 on the Luo dataset, and 0.988 ± 0.002 and 0.984 ± 0.001 on the Zheng dataset, demonstrating its superior predictive accuracy and robustness.</p>
        <table-wrap id="t3">
          <label>Table 3</label>
          <caption>
            <p>Performance comparison with baseline methods on the Luo and Zheng datasets</p>
          </caption>
          <table frame="hsides" rules="groups">
            <thead>
              <tr>
                <td style="border-bottom:1;">
                  <bold>Methods</bold>
                </td>
                <td colspan="2" style="border-bottom:1;">
                  <bold>Luo dataset</bold>
                </td>
                <td colspan="2" style="border-bottom:1;">
                  <bold>Zheng dataset</bold>
                </td>
              </tr>
              <tr>
                <td style="border-bottom:1;" />
                <td style="border-bottom:1;">
                  <bold>AUC</bold>
                </td>
                <td style="border-bottom:1;">
                  <bold>AUPRC</bold>
                </td>
                <td style="border-bottom:1;">
                  <bold>AUC</bold>
                </td>
                <td style="border-bottom:1;">
                  <bold>AUPRC</bold>
                </td>
              </tr>
            </thead>
            <tbody>
              <tr>
                <td>DTINet</td>
                <td>0.879 ± 0.004**</td>
                <td>0.906 ± 0.003**</td>
                <td>0.889 ± 0.004**</td>
                <td>0.900 ± 0.004**</td>
              </tr>
              <tr>
                <td>NeoDTI</td>
                <td>0.955 ± 0.003**</td>
                <td>0.889 ± 0.004**</td>
                <td>0.946 ± 0.003**</td>
                <td>0.846 ± 0.005**</td>
              </tr>
              <tr>
                <td>GCN-DTI</td>
                <td>0.918 ± 0.005**</td>
                <td>0.897 ± 0.005**</td>
                <td>0.922 ± 0.004**</td>
                <td>0.914 ± 0.004**</td>
              </tr>
              <tr>
                <td>IMCHGAN</td>
                <td>0.956 ± 0.004**</td>
                <td>0.959 ± 0.003**</td>
                <td>0.946 ± 0.002**</td>
                <td>0.929 ± 0.003**</td>
              </tr>
              <tr>
                <td>SGCL-DTI</td>
                <td>0.977 ± 0.002**</td>
                <td>0.976 ± 0.002**</td>
                <td>0.942 ± 0.003**</td>
                <td>0.941 ± 0.003**</td>
              </tr>
              <tr>
                <td>EEG-DTI</td>
                <td>0.954 ± 0.003**</td>
                <td>0.964 ± 0.004**</td>
                <td>0.968 ± 0.002**</td>
                <td>0.968 ± 0.002**</td>
              </tr>
              <tr>
                <td>MIDTI</td>
                <td>0.978 ± 0.003**</td>
                <td>0.970 ± 0.002**</td>
                <td>0.957 ± 0.003**</td>
                <td>0.961 ± 0.003**</td>
              </tr>
              <tr>
                <td>CE-DTI</td>
                <td>0.976 ± 0.002**</td>
                <td>0.976 ± 0.002**</td>
                <td>0.972 ± 0.003**</td>
                <td>0.972 ± 0.002**</td>
              </tr>
              <tr>
                <td>SHGCL-DTI</td>
                <td>0.957 ± 0.004**</td>
                <td>0.958 ± 0.003**</td>
                <td>0.954 ± 0.002**</td>
                <td>0.949 ± 0.004**</td>
              </tr>
              <tr>
                <td>DSS-DTI</td>
                <td>0.986 ± 0.003*</td>
                <td>0.985 ± 0.002*</td>
                <td>0.972 ± 0.001**</td>
                <td>0.969 ± 0.002**</td>
              </tr>
              <tr>
                <td>DHGT-DTI</td>
                <td>0.965 ± 0.003**</td>
                <td>0.969 ± 0.001**</td>
                <td>0.973 ± 0.002**</td>
                <td>0.977 ± 0.001**</td>
              </tr>
              <tr>
                <td>AMGDTI</td>
                <td>0.977 ± 0.002**</td>
                <td>0.977 ± 0.002**</td>
                <td>0.973 ± 0.004**</td>
                <td>0.971 ± 0.002**</td>
              </tr>
              <tr>
                <td>HGMAIB</td>
                <td>0.991 ± 0.002</td>
                <td>0.990 ± 0.002</td>
                <td>0.988 ± 0.002</td>
                <td>0.984 ± 0.001</td>
              </tr>
            </tbody>
          </table>
          <table-wrap-foot>
            <fn>
              <p>Note: Results are presented as mean ± standard deviation. Statistical significance between the proposed HGMAIB and baseline methods was evaluated using an independent two-sample t-test based on summary statistics via SciPy (version 1.10.1). *<italic>P</italic> &lt; 0.05, **<italic>P</italic> &lt; 0.01. AUC: the Area Under the Receiver Operating Characteristic Curve; AUPRC: the Area Under the Precision-Recall Curve.</p>
            </fn>
          </table-wrap-foot>
        </table-wrap>
        <p>Compared with early network-based methods (DTINet, NeoDTI), HGMAIB achieves substantial gains, revealing the limitations of shallow feature projection and simple network integration. Against GNN-based models (GCN-DTI, IMCHGAN), HGMAIB improves performance by explicitly modeling heterogeneous semantic paths rather than relying solely on homogeneous aggregation. It also surpasses contrastive learning methods (CE-DTI, SGCL-DTI, MIDTI, SHGCL-DTI), which lack redundancy suppression; in contrast, HGMAIB’s information bottleneck retains task-relevant features, yielding more discriminative representations. Among recent heterogeneous frameworks, DSS-DTI and DHGT-DTI capture multi-scale or dual-view dependencies, but HGMAIB further integrates hierarchical gated multi-head attention with IB-guided refinement, jointly modeling multi-semantic structures for more robust and accurate DTI prediction. Collectively, these results demonstrate HGMAIB’s ability to capture complex heterogeneous semantics and generate highly discriminative representations.</p>
      </sec>
      <sec id="sec5-2">
        <title>5.2. Ablation study</title>
        <p>To validate the contribution of each component, we designed four ablation experiments targeting the multi-step weighted residual graph convolution, HGMA, IB, and adaptive meta-path search. The results are shown in <xref ref-type="fig" rid="fig3">Figure 3</xref>.</p>
        <fig id="fig3" position="float">
          <label>Figure 3</label>
          <caption>
            <p>Performance comparison of ablation experiments: (A) Results on the Luo dataset; (B) results on the Zheng dataset.</p>
          </caption>
          <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="ces5088.fig.3.jpg" />
        </fig>
        <p>• HGMAIB w/o HGMA: This variant replaces the HGMA module with a single-head attention mechanism, while keeping the remaining architecture unchanged.</p>
        <p>• HGMAIB w/o IB: In this variant, the IB loss is substituted with the conventional BCE loss, and all other modules are preserved.</p>
        <p>• HGMAIB w/o Multi-hop GCN: This variant replaces the multi-step graph convolution module with a basic first-order graph convolution, removing residual connections and deep structural propagation, with the rest of the framework kept intact.</p>
        <p>• HGMAIB w/o Adaptive Meta-path: This variant removes the adaptive meta-path search module and instead adopts a fixed predefined meta-path, namely Drug → Disease → Protein → Protein, as the structural input for message propagation, while keeping all other components unchanged.</p>
        <p>The ablation results confirm that removing any component degrades performance. On the Luo dataset, replacing hierarchical gated multi-head attention with single-head attention lowers the AUC to 0.988 and the AUPRC to 0.985, highlighting the importance of multi-head attention with hierarchical gating for multi-semantic aggregation. Replacing the IB loss with standard BCE loss yields a slight performance decrease (AUC 0.987, AUPRC 0.984), validating IB’s role in suppressing redundancy. Removing the adaptive meta-path selection and adopting a fixed semantic path (Drug → Disease → Protein → Protein) further degrades performance (AUC 0.985, AUPRC 0.983). Replacing the multi-step graph convolution and residual connections with a single-layer convolution causes a sharp performance drop on the sparse Luo dataset (AUC and AUPRC both fall to 0.848). With only 1,923 edges among 708 drugs and 1,512 proteins (density ≈ 0.18%), deep multi-hop propagation and residual learning are essential to capture higher-order dependencies. In contrast, on the denser Zheng dataset (11,819 edges, density ≈ 0.69%, nearly four times denser than Luo), the performance loss is much milder (AUC 0.977, AUPRC 0.973). The Zheng dataset contains rich local attributes (881 chemical substructures, 4,098 GO terms), which provide sufficient 1-hop semantic signals, partly alleviating the need for deep propagation. These results confirm that each component of HGMAIB is indispensable, and that the benefit of deep structural propagation depends on the inherent complexity of the biomedical network.</p>
      </sec>
      <sec id="sec5-3">
        <title>5.3. Effect of node embedding methods</title>
        <p>To justify the use of Node2Vec for node initialization in HGMAIB, we compared it with LINE and GraphSAGE on the Luo dataset. All other components (network architecture, training procedure, hyperparameters) were kept identical. As shown in <xref ref-type="table" rid="t4">Table 4</xref>, Node2Vec achieves the highest predictive performance, outperforming both alternatives. This indicates that Node2Vec more effectively captures the structural and semantic information of heterogeneous nodes in biomedical networks, which is essential for accurate DTI prediction.</p>
        <table-wrap id="t4">
          <label>Table 4</label>
          <caption>
            <p>Effect of node embedding methods on HGMAIB performance on the Luo dataset</p>
          </caption>
          <table frame="hsides" rules="groups">
            <thead>
              <tr>
                <td style="border-bottom:1;">
                  <bold>Node embedding method</bold>
                </td>
                <td style="border-bottom:1;">
                  <bold>AUC</bold>
                </td>
                <td style="border-bottom:1;">
                  <bold>AUPRC</bold>
                </td>
              </tr>
            </thead>
            <tbody>
              <tr>
                <td>Node2Vec</td>
                <td>0.991</td>
                <td>0.990</td>
              </tr>
              <tr>
                <td>LINE</td>
                <td>0.982</td>
                <td>0.976</td>
              </tr>
              <tr>
                <td>GraphSAGE</td>
                <td>0.980</td>
                <td>0.968</td>
              </tr>
            </tbody>
          </table>
		    <table-wrap-foot>
            <fn>
              <p>AUC: the Area Under the Receiver Operating Characteristic Curve; AUPRC: the Area Under the Precision-Recall Curve.</p>
            </fn>
          </table-wrap-foot>
        </table-wrap>
      </sec>
      <sec id="sec5-4">
        <title>5.4. Parameter sensitivity analysis</title>
        <p>Parameter sensitivity analysis was conducted on the Luo dataset to evaluate the impact of three key hyperparameters: the number of attention heads, learning rate, and the β coefficient. The number of attention heads (tested in {2, 4, 8, 16}) achieved peak performance with four heads [<xref ref-type="fig" rid="fig4">Figure 4A</xref>], indicating that a moderate size optimally balances feature aggregation and computational efficiency. The learning rate (tested in {5e-4, 1e-3, 5e-3, 1e-2}) yielded the best results at 5e-3 [<xref ref-type="fig" rid="fig4">Figure 4B</xref>], balancing convergence speed and stability—too small a value slows training, while too large a value causes instability.</p>
        <fig id="fig4" position="float">
          <label>Figure 4</label>
          <caption>
            <p>The performance of HGMAIB on the Luo dataset under different parameter settings: (A) Number of attention heads; (B) Learning rate; (C) β coefficient.</p>
          </caption>
          <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="ces5088.fig.4.jpg" />
        </fig>
        <p>The β parameter determines the weight of the IB loss and plays a key role in balancing representation learning and regularization. A small β may lead to under-regularization and overfitting, whereas an excessively large β can constrain the model and impair learning. β was evaluated over the set {1e-3, 5e-3, 1e-2, 5e-2}, with optimal performance observed at 5e-3, achieving a favorable trade-off between effective representation learning and appropriate regularization [<xref ref-type="fig" rid="fig4">Figure 4C</xref>].</p>
      </sec>
      <sec id="sec5-5">
        <title>5.5. t-SNE visualization of node embeddings</title>
        <p>To evaluate HGMAIB’s representation capability, t-SNE was applied to the Zheng dataset, projecting drug (blue) and target (red) embeddings into a two-dimensional space. As shown in <xref ref-type="fig" rid="fig5">Figure 5A</xref>, the initial embeddings exhibit substantial overlap with no distinct clusters, indicating limited semantic discriminability. After training [<xref ref-type="fig" rid="fig5">Figure 5B</xref>], the embeddings form compact intra-class clusters with clear inter-class separation and well-defined boundaries. This improvement demonstrates that HGMAIB effectively captures heterogeneous semantic dependencies and topological correlations: the hierarchical gated multi-head attention adaptively aggregates multi-level semantics, while the IB filters redundant signals and preserves task-relevant features. Consequently, HGMAIB produces discriminative, biologically meaningful embeddings, enhancing interpretability and generalization in DTI prediction. Overall, these results confirm HGMAIB’s ability to learn robust representations from complex heterogeneous graphs.</p>
        <fig id="fig5" position="float">
          <label>Figure 5</label>
          <caption>
            <p>t-SNE visualization of drug and target embeddings on the Zheng dataset. (A) Initial embeddings before training; (B) Learned embeddings after training with the HGMAIB model.</p>
          </caption>
          <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="ces5088.fig.5.jpg" />
        </fig>
      </sec>
      <sec id="sec5-6">
        <title>5.6. Computational cost analysis</title>
        <p>To improve the completeness of our evaluation and justify the computational cost of the proposed complex modules against baseline methods, we provide a detailed theoretical complexity analysis and an empirical footprint of HGMAIB. Because existing baseline methods are implemented across varied frameworks, making direct empirical runtime comparisons highly hardware-dependent, we focus on asymptotic complexity and actual resource consumption.</p>
        <p>The computational complexity of HGMAIB is primarily composed of three phases. In the multi-path graph representation learning phase, multi-step sequential propagation with residual aggregation over P meta-paths incurs a complexity of O(T<sup>2</sup> × |E| × D + P × N × D), where N is the number of nodes, |E| is the number of edges, D is the hidden dimension, and T is the propagation steps. In the hierarchical attention aggregation phase, path-level and head-level gated attention across M heads contributes O(N × P × D × M + N × M × D). Finally, the IB-based feature compression requires O(N × D<sup>2</sup>) operations. Consequently, the overall time complexity scales linearly with the number of edges and attention heads, avoiding exponential computational overhead. The space complexity (parameter size and feature storage) is strictly bounded by O(|E|+N × D + P × N × D + N × P × M).</p>
        <p>Empirically, to further evaluate the practical efficiency of HGMAIB against baseline scales, we recorded the computational resource consumption during training. On the Luo dataset, each training fold processes approximately 2,300 meta-path-guided subgraph instances (averaging 3,570 nodes and 8.1 million edges per subgraph). Evaluated on a single NVIDIA RTX 3090 GPU, HGMAIB achieves stable training with an average training time of approximately 15 seconds per fold and a remarkably low peak GPU memory usage of 0.38 GB. These empirical results clearly indicate that, despite incorporating advanced modules such as multi-step residual propagation and hierarchical multi-path attention, the parameter footprint and computational overhead of HGMAIB remain highly competitive with basic GNNs. This demonstrates a highly favorable and justifiable trade-off between complex model expressiveness and computational cost in large-scale heterogeneous biomedical networks.</p>
      </sec>
    </sec>
    <sec id="sec6">
      <title>6. CONCLUSION</title>
      <p>To address the critical challenge of accurate DTI identification in drug repositioning, this study proposes a novel predictive framework termed HGMAIB. Extensive evaluations demonstrate that HGMAIB achieves highly competitive predictive performance by effectively capturing complex structural dependencies and fusing multi-perspective semantic information within heterogeneous biological networks. Furthermore, our findings confirm that the synergistic integration of the HGMA and the IB module critically enhances overall model efficacy by actively filtering out redundant noise to produce highly discriminative representations. Future research will focus on incorporating broader heterogeneous biomedical data and dynamic GNN techniques<sup>[<xref ref-type="bibr" rid="B48">48</xref>]</sup> to capture temporal biological dynamics, thereby further advancing robust DTI prediction and accelerating drug discovery.</p>
    </sec>
  </body>
  <back>
    <sec>
      <title>DECLARATIONS</title>
      <sec>
        <title>Authors’ contributions</title>
        <p>Substantial contributions to the conceptualization, methodology, writing, and visualization: Song, S.; <InlineParagraph>Chen, Z.;</InlineParagraph> Guo, Y.</p>
        <p>Formal analysis, methodology, and validation, along with data analysis: Wang, Y.; Guo, Q.</p>
      </sec>
      <sec>
        <title>Availability of data and materials</title>
        <p>The experimental dataset supporting this study can be obtained from <InlineParagraph><uri xlink:href="https://github.com/chenzh-23/HGMAIB">https://github.com/chenzh-23/HGMAIB</uri>.</InlineParagraph></p>
      </sec>
	  
	    <sec>
        <title>AI and AI-assisted tools statement</title>
       <p>Not applicable.</p>
      </sec>
	  
      <sec>
        <title>Financial support and sponsorship</title>
        <p>This work was supported by the National Natural Science Foundation of China (Grant No. 62403437) and the Young Backbone Teacher Training Program of Zhengzhou University of Light Industry (Grant No. 13502010009).</p>
      </sec>
      <sec>
        <title>Conflicts of interest</title>
        <p>All authors declared that there are no conflicts of interest.</p>
      </sec>
      <sec>
        <title>Ethical approval and consent to participate</title>
        <p>Not applicable.</p>
      </sec>
      <sec>
        <title>Consent for publication</title>
        <p>Not applicable.</p>
      </sec>
      <sec>
        <title>Copyright</title>
        <p>© The Author(s) 2026.</p>
      </sec>
    </sec>
    <ref-list>
      <ref id="B1">
        <label>1</label>
        <nlm-citation publication-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Zhou</surname>
              <given-names>G</given-names>
            </name>
            <name>
              <surname>Xuan</surname>
              <given-names>C</given-names>
            </name>
            <name>
              <surname>Wang</surname>
              <given-names>Y</given-names>
            </name>
            <name>
              <surname>Zhang</surname>
              <given-names>B</given-names>
            </name>
            <name>
              <surname>Wu</surname>
              <given-names>H</given-names>
            </name>
            <name>
              <surname>Gao</surname>
              <given-names>J</given-names>
            </name>
          </person-group>
          <article-title>Drug repositioning based on a multiplex network by integrating disease, gene, and drug information</article-title>
          <source>Curr Bioinf</source>
          <year>2023</year>
          <volume>18</volume>
          <fpage>266</fpage>
          <lpage>75</lpage>
          <pub-id pub-id-type="doi">10.2174/1574893618666230223114427</pub-id>
        </nlm-citation>
      </ref>
      <ref id="B2">
        <label>2</label>
        <nlm-citation publication-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Palhamkhani</surname>
              <given-names>F</given-names>
            </name>
            <name>
              <surname>Alipour</surname>
              <given-names>M</given-names>
            </name>
            <name>
              <surname>Dehnad</surname>
              <given-names>A</given-names>
            </name>
            <name>
              <surname>Abbasi</surname>
              <given-names>K</given-names>
            </name>
            <name>
              <surname>Razzaghi</surname>
              <given-names>P</given-names>
            </name>
            <name>
              <surname>Ghasemi</surname>
              <given-names>JB</given-names>
            </name>
          </person-group>
          <article-title>DeepCompoundNet: enhancing compound-protein interaction prediction with multimodal convolutional neural networks</article-title>
          <source>J Biomol Struct Dyn</source>
          <year>2023</year>
          <volume>43</volume>
          <fpage>1414</fpage>
          <lpage>23</lpage>
          <pub-id pub-id-type="doi">10.1080/07391102.2023.2291829</pub-id>
          <pub-id pub-id-type="pmid">38084744</pub-id>
        </nlm-citation>
      </ref>
      <ref id="B3">
        <label>3</label>
        <nlm-citation publication-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Hu</surname>
              <given-names>L</given-names>
            </name>
            <name>
              <surname>Fu</surname>
              <given-names>C</given-names>
            </name>
            <name>
              <surname>Ren</surname>
              <given-names>Z</given-names>
            </name>
            <etal />
          </person-group>
          <article-title>SSELM-neg: spherical search-based extreme learning machine for drug-target interaction prediction</article-title>
          <source>BMC Bioin Bioinform</source>
          <year>2023</year>
          <volume>24</volume>
          <fpage>38</fpage>
          <pub-id pub-id-type="doi">10.1186/s12859-023-05153-y</pub-id>
          <pub-id pub-id-type="pmid">36737694</pub-id>
          <pub-id pub-id-type="pmcid">PMC9896467</pub-id>
        </nlm-citation>
      </ref>
      <ref id="B4">
        <label>4</label>
        <nlm-citation publication-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Shi</surname>
              <given-names>W</given-names>
            </name>
            <name>
              <surname>Yang</surname>
              <given-names>H</given-names>
            </name>
            <name>
              <surname>Xie</surname>
              <given-names>L</given-names>
            </name>
            <name>
              <surname>Yin</surname>
              <given-names>X</given-names>
            </name>
            <name>
              <surname>Zhang</surname>
              <given-names>Y</given-names>
            </name>
          </person-group>
          <article-title>A review of machine learning-based methods for predicting drug-target interactions</article-title>
          <source>Health Inf Sci Syst</source>
          <year>2024</year>
          <volume>12</volume>
          <fpage>30</fpage>
          <pub-id pub-id-type="doi">10.1007/s13755-024-00287-6</pub-id>
          <pub-id pub-id-type="pmid">38617016</pub-id>
          <pub-id pub-id-type="pmcid">PMC11014838</pub-id>
        </nlm-citation>
      </ref>
      <ref id="B5">
        <label>5</label>
        <nlm-citation publication-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Zitnik</surname>
              <given-names>M</given-names>
            </name>
            <name>
              <surname>Nguyen</surname>
              <given-names>F</given-names>
            </name>
            <name>
              <surname>Wang</surname>
              <given-names>B</given-names>
            </name>
            <name>
              <surname>Leskovec</surname>
              <given-names>J</given-names>
            </name>
            <name>
              <surname>Goldenberg</surname>
              <given-names>A</given-names>
            </name>
            <name>
              <surname>Hoffman</surname>
              <given-names>MM</given-names>
            </name>
          </person-group>
          <article-title>Machine learning for integrating data in biology and medicine: Principles, practice, and opportunities</article-title>
          <source>Inf Fusion</source>
          <year>2019</year>
          <volume>50</volume>
          <fpage>71</fpage>
          <lpage>91</lpage>
          <pub-id pub-id-type="doi">10.1016/j.inffus.2018.09.012</pub-id>
          <pub-id pub-id-type="pmid">30467459</pub-id>
          <pub-id pub-id-type="pmcid">PMC6242341</pub-id>
        </nlm-citation>
      </ref>
      <ref id="B6">
        <label>6</label>
        <nlm-citation publication-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Zhang</surname>
              <given-names>R</given-names>
            </name>
            <name>
              <surname>Wang</surname>
              <given-names>Z</given-names>
            </name>
            <name>
              <surname>Wang</surname>
              <given-names>X</given-names>
            </name>
            <name>
              <surname>Meng</surname>
              <given-names>Z</given-names>
            </name>
            <name>
              <surname>Cui</surname>
              <given-names>W</given-names>
            </name>
          </person-group>
          <article-title>MHTAN-DTI: metapath-based hierarchical transformer and attention network for drug-target interaction prediction</article-title>
          <source>Brief Bioinform</source>
          <year>2023</year>
          <volume>24</volume>
          <fpage>bbad079</fpage>
          <pub-id pub-id-type="doi">10.1093/bib/bbad079</pub-id>
        </nlm-citation>
      </ref>
      <ref id="B7">
        <label>7</label>
        <nlm-citation publication-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Yang</surname>
              <given-names>Y</given-names>
            </name>
            <name>
              <surname>Yang</surname>
              <given-names>Y</given-names>
            </name>
            <name>
              <surname>Pan</surname>
              <given-names>A</given-names>
            </name>
            <etal />
          </person-group>
          <article-title>Identifying depression in Parkinson's disease by using combined diffusion tensor imaging and support vector machine</article-title>
          <source>Front Neurol</source>
          <year>2022</year>
          <volume>13</volume>
          <fpage>878691</fpage>
          <pub-id pub-id-type="doi">10.3389/fneur.2022.878691</pub-id>
          <pub-id pub-id-type="pmid">35795798</pub-id>
          <pub-id pub-id-type="pmcid">PMC9251067</pub-id>
        </nlm-citation>
      </ref>
      <ref id="B8">
        <label>8</label>
        <nlm-citation publication-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Yang</surname>
              <given-names>J</given-names>
            </name>
            <name>
              <surname>He</surname>
              <given-names>S</given-names>
            </name>
            <name>
              <surname>Zhang</surname>
              <given-names>Z</given-names>
            </name>
            <name>
              <surname>Bo</surname>
              <given-names>X</given-names>
            </name>
          </person-group>
          <article-title>NegStacking: drug-target interaction prediction based on ensemble learning and logistic regression</article-title>
          <source>IEEE/ACM Trans Comput Biol Bioinform</source>
          <year>2021</year>
          <volume>18</volume>
          <fpage>2624</fpage>
          <lpage>34</lpage>
          <pub-id pub-id-type="doi">10.1109/tcbb.2020.2968025</pub-id>
        </nlm-citation>
      </ref>
      <ref id="B9">
        <label>9</label>
        <nlm-citation publication-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>De</surname>
              <given-names>A</given-names>
            </name>
            <name>
              <surname>Chowdhury</surname>
              <given-names>AS</given-names>
            </name>
          </person-group>
          <article-title>DTI based Alzheimer’s disease classification with rank modulated fusion of CNNs and random forest</article-title>
          <source>Expert Syst Appl</source>
          <year>2021</year>
          <volume>169</volume>
          <fpage>114338</fpage>
          <pub-id pub-id-type="doi">10.1016/j.eswa.2020.114338</pub-id>
        </nlm-citation>
      </ref>
      <ref id="B10">
        <label>10</label>
        <nlm-citation publication-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Liu</surname>
              <given-names>B</given-names>
            </name>
            <name>
              <surname>Pliakos</surname>
              <given-names>K</given-names>
            </name>
            <name>
              <surname>Vens</surname>
              <given-names>C</given-names>
            </name>
            <name>
              <surname>Tsoumakas</surname>
              <given-names>G</given-names>
            </name>
          </person-group>
          <article-title>Drug-target interaction prediction via an ensemble of weighted nearest neighbors with interaction recovery</article-title>
          <source>Appl Intell</source>
          <year>2021</year>
          <volume>52</volume>
          <fpage>3705</fpage>
          <lpage>27</lpage>
          <pub-id pub-id-type="doi">10.1007/s10489-021-02495-z</pub-id>
        </nlm-citation>
      </ref>
      <ref id="B11">
        <label>11</label>
        <nlm-citation publication-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Yamanishi</surname>
              <given-names>Y</given-names>
            </name>
            <name>
              <surname>Araki</surname>
              <given-names>M</given-names>
            </name>
            <name>
              <surname>Gutteridge</surname>
              <given-names>A</given-names>
            </name>
            <name>
              <surname>Honda</surname>
              <given-names>W</given-names>
            </name>
            <name>
              <surname>Kanehisa</surname>
              <given-names>M</given-names>
            </name>
          </person-group>
          <article-title>Prediction of drug-target interaction networks from the integration of chemical and genomic spaces</article-title>
          <source>Bioinformatics</source>
          <year>2008</year>
          <volume>24</volume>
          <fpage>i232</fpage>
          <lpage>40</lpage>
          <pub-id pub-id-type="doi">10.1093/bioinformatics/btn162</pub-id>
          <pub-id pub-id-type="pmid">18586719</pub-id>
          <pub-id pub-id-type="pmcid">PMC2718640</pub-id>
        </nlm-citation>
      </ref>
      <ref id="B12">
        <label>12</label>
        <nlm-citation publication-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Jacob</surname>
              <given-names>L</given-names>
            </name>
            <name>
              <surname>Vert</surname>
              <given-names>J</given-names>
            </name>
          </person-group>
          <article-title>Protein-ligand interaction prediction: an improved chemogenomics approach</article-title>
          <source>Bioinformatics</source>
          <year>2008</year>
          <volume>24</volume>
          <fpage>2149</fpage>
          <lpage>56</lpage>
          <pub-id pub-id-type="doi">10.1093/bioinformatics/btn409</pub-id>
          <pub-id pub-id-type="pmid">18676415</pub-id>
          <pub-id pub-id-type="pmcid">PMC2553441</pub-id>
        </nlm-citation>
      </ref>
      <ref id="B13">
        <label>13</label>
        <nlm-citation publication-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Tishby</surname>
              <given-names>N</given-names>
            </name>
            <name>
              <surname>Pereira</surname>
              <given-names>FC</given-names>
            </name>
            <name>
              <surname>Bialek</surname>
              <given-names>W</given-names>
            </name>
          </person-group>
          <article-title>The information bottleneck method</article-title>
          <source>arXiv</source>
          <year>2000</year>
         <fpage>arXiv:physics/0004057</fpage>
          <pub-id pub-id-type="doi">10.48550/arXiv.physics/0004057</pub-id>
        </nlm-citation>
      </ref>
      <ref id="B14">
        <label>14</label>
        <nlm-citation publication-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Luo</surname>
              <given-names>Y</given-names>
            </name>
            <name>
              <surname>Zhao</surname>
              <given-names>X</given-names>
            </name>
            <name>
              <surname>Zhou</surname>
              <given-names>J</given-names>
            </name>
            <etal />
          </person-group>
          <article-title>A network integration approach for drug-target interaction prediction and computational drug repositioning from heterogeneous information</article-title>
          <source>Nat Commun</source>
          <year>2017</year>
          <volume>8</volume>
          <fpage>573</fpage>
          <pub-id pub-id-type="doi">10.1038/s41467-017-00680-8</pub-id>
          <pub-id pub-id-type="pmid">28924171</pub-id>
          <pub-id pub-id-type="pmcid">PMC5603535</pub-id>
        </nlm-citation>
      </ref>
      <ref id="B15">
        <label>15</label>
        <nlm-citation publication-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Scarselli</surname>
              <given-names>F</given-names>
            </name>
            <name>
              <surname>Gori</surname>
              <given-names>M</given-names>
            </name>
            <name>
              <surname>Tsoi</surname>
              <given-names>AC</given-names>
            </name>
            <name>
              <surname>Hagenbuchner</surname>
              <given-names>M</given-names>
            </name>
            <name>
              <surname>Monfardini</surname>
              <given-names>G</given-names>
            </name>
          </person-group>
          <article-title>The graph neural network model</article-title>
          <source>IEEE Trans Neural Netw</source>
          <year>2009</year>
          <volume>20</volume>
          <fpage>61</fpage>
          <lpage>80</lpage>
          <pub-id pub-id-type="doi">10.1109/tnn.2008.2005605</pub-id>
        </nlm-citation>
      </ref>
      <ref id="B16">
        <label>16</label>
        <nlm-citation publication-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Wan</surname>
              <given-names>F</given-names>
            </name>
            <name>
              <surname>Hong</surname>
              <given-names>L</given-names>
            </name>
            <name>
              <surname>Xiao</surname>
              <given-names>A</given-names>
            </name>
            <name>
              <surname>Jiang</surname>
              <given-names>T</given-names>
            </name>
            <name>
              <surname>Zeng</surname>
              <given-names>J</given-names>
            </name>
          </person-group>
          <article-title>NeoDTI: neural integration of neighbor information from a heterogeneous network for discovering new drug-target interactions</article-title>
          <source>Bioinformatics</source>
          <year>2019</year>
          <volume>35</volume>
          <fpage>104</fpage>
          <lpage>11</lpage>
          <pub-id pub-id-type="doi">10.1093/bioinformatics/bty543</pub-id>
        </nlm-citation>
      </ref>
      <ref id="B17">
        <label>17</label>
        <nlm-citation publication-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Zhao</surname>
              <given-names>T</given-names>
            </name>
            <name>
              <surname>Hu</surname>
              <given-names>Y</given-names>
            </name>
            <name>
              <surname>Valsdottir</surname>
              <given-names>LR</given-names>
            </name>
            <name>
              <surname>Zang</surname>
              <given-names>T</given-names>
            </name>
            <name>
              <surname>Peng</surname>
              <given-names>J</given-names>
            </name>
          </person-group>
          <article-title>Identifying drug-target interactions based on graph convolutional network and deep neural network</article-title>
          <source>Brief Bioinform</source>
          <year>2021</year>
          <volume>22</volume>
          <fpage>2141</fpage>
          <lpage>50</lpage>
          <pub-id pub-id-type="doi">10.1093/bib/bbaa044</pub-id>
          <pub-id pub-id-type="pmid">32367110</pub-id>
        </nlm-citation>
      </ref>
      <ref id="B18">
        <label>18</label>
        <nlm-citation publication-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Peng</surname>
              <given-names>J</given-names>
            </name>
            <name>
              <surname>Wang</surname>
              <given-names>Y</given-names>
            </name>
            <name>
              <surname>Guan</surname>
              <given-names>J</given-names>
            </name>
            <etal />
          </person-group>
          <article-title>An end-to-end heterogeneous graph representation learning-based framework for drug-target interaction prediction</article-title>
          <source>Brief Bioinform</source>
          <year>2021</year>
          <volume>22</volume>
          <fpage>bbaa430</fpage>
          <pub-id pub-id-type="doi">10.1093/bib/bbaa430</pub-id>
        </nlm-citation>
      </ref>
      <ref id="B19">
        <label>19</label>
        <nlm-citation publication-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Li</surname>
              <given-names>J</given-names>
            </name>
            <name>
              <surname>Wang</surname>
              <given-names>J</given-names>
            </name>
            <name>
              <surname>Lv</surname>
              <given-names>H</given-names>
            </name>
            <name>
              <surname>Zhang</surname>
              <given-names>Z</given-names>
            </name>
            <name>
              <surname>Wang</surname>
              <given-names>Z</given-names>
            </name>
          </person-group>
          <article-title>IMCHGAN: inductive matrix completion with heterogeneous graph attention networks for drug-target interactions prediction</article-title>
          <source>IEEE/ACM Trans Comput Biol Bioinform</source>
          <year>2022</year>
          <volume>19</volume>
          <fpage>655</fpage>
          <lpage>65</lpage>
          <pub-id pub-id-type="doi">10.1109/tcbb.2021.3088614</pub-id>
        </nlm-citation>
      </ref>
      <ref id="B20">
        <label>20</label>
        <nlm-citation publication-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Su</surname>
              <given-names>Y</given-names>
            </name>
            <name>
              <surname>Hu</surname>
              <given-names>Z</given-names>
            </name>
            <name>
              <surname>Wang</surname>
              <given-names>F</given-names>
            </name>
            <etal />
          </person-group>
          <article-title>AMGDTI: drug-target interaction prediction based on adaptive meta-graph learning in heterogeneous network</article-title>
          <source>Brief Bioinform</source>
          <year>2024</year>
          <volume>25</volume>
          <fpage>bbad474</fpage>
          <pub-id pub-id-type="doi">10.1093/bib/bbad474</pub-id>
          <pub-id pub-id-type="pmid">38145949</pub-id>
          <pub-id pub-id-type="pmcid">PMC10749791</pub-id>
        </nlm-citation>
      </ref>
      <ref id="B21">
        <label>21</label>
        <nlm-citation publication-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Wang</surname>
              <given-names>M</given-names>
            </name>
            <name>
              <surname>Lei</surname>
              <given-names>X</given-names>
            </name>
            <name>
              <surname>Guo</surname>
              <given-names>L</given-names>
            </name>
            <name>
              <surname>Chen</surname>
              <given-names>M</given-names>
            </name>
            <name>
              <surname>Pan</surname>
              <given-names>Y</given-names>
            </name>
          </person-group>
          <article-title>DHGT-DTI: advancing drug-target interaction prediction through a dual-view heterogeneous network with GraphSAGE and graph transformer</article-title>
          <source>J Pharm Anal</source>
          <year>2025</year>
          <volume>15</volume>
          <fpage>101336</fpage>
          <pub-id pub-id-type="doi">10.1016/j.jpha.2025.101336</pub-id>
          <pub-id pub-id-type="pmid">41245658</pub-id>
          <pub-id pub-id-type="pmcid">PMC12616060</pub-id>
        </nlm-citation>
      </ref>
      <ref id="B22">
        <label>22</label>
        <nlm-citation publication-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Zhang</surname>
              <given-names>Z</given-names>
            </name>
            <name>
              <surname>Zhou</surname>
              <given-names>X</given-names>
            </name>
            <name>
              <surname>Qi</surname>
              <given-names>Y</given-names>
            </name>
            <etal />
          </person-group>
          <article-title>Leveraging 3D molecular spatial visual information and multi-perspective representations for drug discovery</article-title>
          <source>Adv Sci</source>
          <year>2025</year>
          <volume>13</volume>
          <fpage>e12453</fpage>
          <pub-id pub-id-type="doi">10.1002/advs.202512453</pub-id>
          <pub-id pub-id-type="pmid">41090528</pub-id>
          <pub-id pub-id-type="pmcid">PMC12786381</pub-id>
        </nlm-citation>
      </ref>
      <ref id="B23">
        <label>23</label>
        <nlm-citation publication-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Qiao</surname>
              <given-names>G</given-names>
            </name>
            <name>
              <surname>Wang</surname>
              <given-names>G</given-names>
            </name>
            <name>
              <surname>Li</surname>
              <given-names>Y</given-names>
            </name>
          </person-group>
          <article-title>Causal enhanced drug-target interaction prediction based on graph generation and multi-source information fusion</article-title>
          <source>Bioinformatics</source>
          <year>2024</year>
          <volume>40</volume>
          <fpage>btae570</fpage>
          <pub-id pub-id-type="doi">10.1093/bioinformatics/btae570</pub-id>
          <pub-id pub-id-type="pmid">39312682</pub-id>
          <pub-id pub-id-type="pmcid">PMC11639159</pub-id>
        </nlm-citation>
      </ref>
      <ref id="B24">
        <label>24</label>
        <nlm-citation publication-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Li</surname>
              <given-names>Y</given-names>
            </name>
            <name>
              <surname>Qiao</surname>
              <given-names>G</given-names>
            </name>
            <name>
              <surname>Gao</surname>
              <given-names>X</given-names>
            </name>
            <name>
              <surname>Wang</surname>
              <given-names>G</given-names>
            </name>
          </person-group>
          <article-title>Supervised graph co-contrastive learning for drug-target interaction prediction</article-title>
          <source>Bioinformatics</source>
          <year>2022</year>
          <volume>38</volume>
          <fpage>2847</fpage>
          <lpage>54</lpage>
          <pub-id pub-id-type="doi">10.1093/bioinformatics/btac164</pub-id>
        </nlm-citation>
      </ref>
      <ref id="B25">
        <label>25</label>
        <nlm-citation publication-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Yao</surname>
              <given-names>K</given-names>
            </name>
            <name>
              <surname>Wang</surname>
              <given-names>X</given-names>
            </name>
            <name>
              <surname>Li</surname>
              <given-names>W</given-names>
            </name>
            <etal />
          </person-group>
          <article-title>Semi-supervised heterogeneous graph contrastive learning for drug-target interaction prediction</article-title>
          <source>Comput Biol Med</source>
          <year>2023</year>
          <volume>163</volume>
          <fpage>107199</fpage>
          <pub-id pub-id-type="doi">10.1016/j.compbiomed.2023.107199</pub-id>
        </nlm-citation>
      </ref>
      <ref id="B26">
        <label>26</label>
        <nlm-citation publication-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Wu</surname>
              <given-names>M</given-names>
            </name>
            <name>
              <surname>Guo</surname>
              <given-names>C</given-names>
            </name>
            <name>
              <surname>Ning</surname>
              <given-names>Q</given-names>
            </name>
            <name>
              <surname>Li</surname>
              <given-names>H</given-names>
            </name>
            <name>
              <surname>Guo</surname>
              <given-names>S</given-names>
            </name>
            <name>
              <surname>Deng</surname>
              <given-names>Z</given-names>
            </name>
          </person-group>
          <article-title>Dss-Dti: drug-target interaction prediction method based on dual spatiotemporal scales</article-title>
          <source>SSRN</source>
          <year>2025</year>
          <pub-id pub-id-type="doi">10.2139/ssrn.5294391</pub-id>
        </nlm-citation>
      </ref>
      <ref id="B27">
        <label>27</label>
        <nlm-citation publication-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Song</surname>
              <given-names>W</given-names>
            </name>
            <name>
              <surname>Xu</surname>
              <given-names>L</given-names>
            </name>
            <name>
              <surname>Han</surname>
              <given-names>C</given-names>
            </name>
            <name>
              <surname>Tian</surname>
              <given-names>Z</given-names>
            </name>
            <name>
              <surname>Zou</surname>
              <given-names>Q</given-names>
            </name>
          </person-group>
          <article-title>Drug-target interaction predictions with multi-view similarity network fusion strategy and deep interactive attention mechanism</article-title>
          <source>Bioinformatics</source>
          <year>2024</year>
          <volume>40</volume>
          <fpage>btae346</fpage>
          <pub-id pub-id-type="doi">10.1093/bioinformatics/btae346</pub-id>
          <pub-id pub-id-type="pmid">38837345</pub-id>
          <pub-id pub-id-type="pmcid">PMC11164831</pub-id>
        </nlm-citation>
      </ref>
      <ref id="B28">
        <label>28</label>
        <nlm-citation publication-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Velickovic</surname>
              <given-names>P</given-names>
            </name>
            <name>
              <surname>Cucurull</surname>
              <given-names>G</given-names>
            </name>
            <name>
              <surname>Casanova</surname>
              <given-names>A</given-names>
            </name>
            <etal />
          </person-group>
          <article-title>Graph attention networks</article-title>
          <source>arXiv</source>
          <year>2017</year>
          <fpage>abs/1710.10903</fpage>
          <pub-id pub-id-type="doi">10.48550/arXiv.1710.10903</pub-id>
        </nlm-citation>
      </ref>
      <ref id="B29">
        <label>29</label>
        <nlm-citation publication-type="confproc">
          <person-group person-group-type="author">
            <name>
              <surname>Wang</surname>
              <given-names>X</given-names>
            </name>
            <name>
              <surname>Girshick</surname>
              <given-names>R</given-names>
            </name>
            <name>
              <surname>Gupta</surname>
              <given-names>A</given-names>
            </name>
            <name>
              <surname>He</surname>
              <given-names>K</given-names>
            </name>
          </person-group>
          <comment>Non-local neural networks. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2018 Jun 18-23; Salt Lake City, UT, USA. IEEE; 2018. pp. 7794-803.</comment>
          <pub-id pub-id-type="doi">10.1109/cvpr.2018.00813</pub-id>
        </nlm-citation>
      </ref>
      <ref id="B30">
        <label>30</label>
        <nlm-citation publication-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Dosovitskiy</surname>
              <given-names>A</given-names>
            </name>
            <name>
              <surname>Beyer</surname>
              <given-names>L</given-names>
            </name>
            <name>
              <surname>Kolesnikov</surname>
              <given-names>A</given-names>
            </name>
            <etal />
          </person-group>
          <article-title>An image is worth 16x16 words: transformers for image recognition at scale</article-title>
          <source>ArXiv</source>
         <year>2020</year>
          <fpage>abs/2010.11929.</fpage>
          <pub-id pub-id-type="doi">10.48550/arXiv.2010.11929</pub-id>
        </nlm-citation>
      </ref>
      <ref id="B31">
        <label>31</label>
        <nlm-citation publication-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Jumper</surname>
              <given-names>J</given-names>
            </name>
            <name>
              <surname>Evans</surname>
              <given-names>R</given-names>
            </name>
            <name>
              <surname>Pritzel</surname>
              <given-names>A</given-names>
            </name>
            <etal />
          </person-group>
          <article-title>Highly accurate protein structure prediction with AlphaFold</article-title>
          <source>Nature</source>
          <year>2021</year>
          <volume>596</volume>
          <fpage>583</fpage>
          <lpage>9</lpage>
          <pub-id pub-id-type="doi">10.1038/s41586-021-03819-2</pub-id>
          <pub-id pub-id-type="pmid">34265844</pub-id>
          <pub-id pub-id-type="pmcid">PMC8371605</pub-id>
        </nlm-citation>
      </ref>
      <ref id="B32">
        <label>32</label>
        <nlm-citation publication-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Xiong</surname>
              <given-names>Z</given-names>
            </name>
            <name>
              <surname>Wang</surname>
              <given-names>D</given-names>
            </name>
            <name>
              <surname>Liu</surname>
              <given-names>X</given-names>
            </name>
            <etal />
          </person-group>
          <article-title>Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism</article-title>
          <source>J Med Chem</source>
          <year>2019</year>
          <volume>63</volume>
          <fpage>8749</fpage>
          <lpage>60</lpage>
          <pub-id pub-id-type="doi">10.1021/acs.jmedchem.9b00959</pub-id>
        </nlm-citation>
      </ref>
      <ref id="B33">
        <label>33</label>
        <nlm-citation publication-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Balvir</surname>
              <given-names>SU</given-names>
            </name>
            <name>
              <surname>Raghuwanshi</surname>
              <given-names>MM</given-names>
            </name>
            <name>
              <surname>Borkar</surname>
              <given-names>PS</given-names>
            </name>
          </person-group>
          <article-title>Node2Vec and machine learning: a powerful duo for link prediction in social network</article-title>
          <source>J Electr Syst</source>
          <year>2024</year>
          <volume>20</volume>
          <fpage>639</fpage>
          <lpage>49</lpage>
          <pub-id pub-id-type="doi">10.52783/jes.1530</pub-id>
        </nlm-citation>
      </ref>
      <ref id="B34">
        <label>34</label>
        <nlm-citation publication-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Mikolov</surname>
              <given-names>T</given-names>
            </name>
            <name>
              <surname>Sutskever</surname>
              <given-names>I</given-names>
            </name>
            <name>
              <surname>Chen</surname>
              <given-names>K</given-names>
            </name>
            <etal />
          </person-group>
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          <source>Adv Neural Inform Process Syst</source>
          <year>2013</year>
          <fpage>arXiv:1310.4546.</fpage>
          <pub-id pub-id-type="doi">10.48550/arXiv.1310.4546</pub-id>
        </nlm-citation>
      </ref>
      <ref id="B35">
        <label>35</label>
        <nlm-citation publication-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Singh</surname>
              <given-names>B</given-names>
            </name>
            <name>
              <surname>Patel</surname>
              <given-names>S</given-names>
            </name>
            <name>
              <surname>Vijayvargiya</surname>
              <given-names>A</given-names>
            </name>
            <name>
              <surname>Kumar</surname>
              <given-names>R</given-names>
            </name>
          </person-group>
          <article-title>Analyzing the impact of activation functions on the performance of the data-driven gait model</article-title>
          <source>Results Eng</source>
          <year>2023</year>
          <volume>18</volume>
          <fpage>101029</fpage>
          <pub-id pub-id-type="doi">10.1016/j.rineng.2023.101029</pub-id>
        </nlm-citation>
      </ref>
      <ref id="B36">
        <label>36</label>
        <nlm-citation publication-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Balestriero</surname>
              <given-names>R</given-names>
            </name>
            <name>
              <surname>Baraniuk</surname>
              <given-names>RG</given-names>
            </name>
          </person-group>
          <article-title>Batch normalization explained</article-title>
          <source>arXiv</source>
          <year>2022</year>
         <fpage>arXiv:2209.14778</fpage>
          <pub-id pub-id-type="doi">10.48550/arXiv.2209.14778</pub-id>
        </nlm-citation>
      </ref>
      <ref id="B37">
        <label>37</label>
        <nlm-citation publication-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Dubey</surname>
              <given-names>SR</given-names>
            </name>
            <name>
              <surname>Singh</surname>
              <given-names>SK</given-names>
            </name>
            <name>
              <surname>Chaudhuri</surname>
              <given-names>BB</given-names>
            </name>
          </person-group>
          <article-title>Activation functions in deep learning: a comprehensive survey and benchmark</article-title>
          <source>Neurocomputing</source>
          <year>2022</year>
          <volume>503</volume>
          <fpage>92</fpage>
          <lpage>108</lpage>
          <pub-id pub-id-type="doi">10.1016/j.neucom.2022.06.111</pub-id>
        </nlm-citation>
      </ref>
      <ref id="B38">
        <label>38</label>
        <nlm-citation publication-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Shen</surname>
              <given-names>K</given-names>
            </name>
            <name>
              <surname>Guo</surname>
              <given-names>J</given-names>
            </name>
            <name>
              <surname>Tan</surname>
              <given-names>X</given-names>
            </name>
            <etal />
          </person-group>
          <article-title>A study on relu and softmax in transformer</article-title>
          <source>arXiv</source>
          <year>2023</year>
          <fpage>arXiv:2302.06461</fpage>
          <pub-id pub-id-type="doi">10.48550/arXiv.2302.0646</pub-id>
        </nlm-citation>
      </ref>
      <ref id="B39">
        <label>39</label>
        <nlm-citation publication-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Du</surname>
              <given-names>Y</given-names>
            </name>
            <name>
              <surname>Liu</surname>
              <given-names>Y</given-names>
            </name>
            <name>
              <surname>Peng</surname>
              <given-names>Z</given-names>
            </name>
            <name>
              <surname>Jin</surname>
              <given-names>X</given-names>
            </name>
          </person-group>
          <article-title>Gated attention fusion network for multimodal sentiment classification</article-title>
          <source>Knowl Based Syst</source>
          <year>2022</year>
          <volume>240</volume>
          <fpage>108107</fpage>
          <pub-id pub-id-type="doi">10.1016/j.knosys.2021.108107</pub-id>
        </nlm-citation>
      </ref>
      <ref id="B40">
        <label>40</label>
        <nlm-citation publication-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Sun</surname>
              <given-names>Q</given-names>
            </name>
            <name>
              <surname>Li</surname>
              <given-names>J</given-names>
            </name>
            <name>
              <surname>Peng</surname>
              <given-names>H</given-names>
            </name>
            <etal />
          </person-group>
          <article-title>Graph structure learning with variational information bottleneck</article-title>
          <source>AAAI</source>
          <year>2022</year>
          <volume>36</volume>
          <fpage>4165</fpage>
          <lpage>74</lpage>
          <pub-id pub-id-type="doi">10.1609/aaai.v36i4.20335</pub-id>
        </nlm-citation>
      </ref>
      <ref id="B41">
        <label>41</label>
        <nlm-citation publication-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Wu</surname>
              <given-names>T</given-names>
            </name>
            <name>
              <surname>Tao</surname>
              <given-names>C</given-names>
            </name>
            <name>
              <surname>Wang</surname>
              <given-names>J</given-names>
            </name>
            <etal />
          </person-group>
          <article-title>Rethinking kullback-leibler divergence in knowledge distillation for large language models</article-title>
          <source>arXiv</source>
          <year>2024</year>
          <fpage>arXiv:2404.02657</fpage>
          <pub-id pub-id-type="doi">10.48550/arXiv.2404.02657</pub-id>
        </nlm-citation>
      </ref>
      <ref id="B42">
        <label>42</label>
        <nlm-citation publication-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Huang</surname>
              <given-names>W</given-names>
            </name>
            <name>
              <surname>Chang</surname>
              <given-names>W</given-names>
            </name>
            <name>
              <surname>Yan</surname>
              <given-names>G</given-names>
            </name>
            <name>
              <surname>Yang</surname>
              <given-names>Z</given-names>
            </name>
            <name>
              <surname>Luo</surname>
              <given-names>H</given-names>
            </name>
            <name>
              <surname>Pei</surname>
              <given-names>H</given-names>
            </name>
          </person-group>
          <article-title>EEG-based motor imagery classification using convolutional neural networks with local reparameterization trick</article-title>
          <source>Expert Syst Appl</source>
          <year>2022</year>
          <volume>187</volume>
          <fpage>115968</fpage>
          <pub-id pub-id-type="doi">10.1016/j.eswa.2021.115968</pub-id>
        </nlm-citation>
      </ref>
      <ref id="B43">
        <label>43</label>
        <nlm-citation publication-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Tian</surname>
              <given-names>Y</given-names>
            </name>
            <name>
              <surname>Wang</surname>
              <given-names>X</given-names>
            </name>
            <name>
              <surname>Yao</surname>
              <given-names>X</given-names>
            </name>
            <name>
              <surname>Liu</surname>
              <given-names>H</given-names>
            </name>
            <name>
              <surname>Yang</surname>
              <given-names>Y</given-names>
            </name>
          </person-group>
          <article-title>Predicting molecular properties based on the interpretable graph neural network with multistep focus mechanism</article-title>
          <source>Brief Bioinform</source>
          <year>2023</year>
          <volume>24</volume>
          <fpage>bbac534</fpage>
          <pub-id pub-id-type="doi">10.1093/bib/bbac534</pub-id>
        </nlm-citation>
      </ref>
      <ref id="B44">
        <label>44</label>
        <nlm-citation publication-type="confproc">
          <person-group person-group-type="author">
            <name>
              <surname>Zheng</surname>
              <given-names>Y</given-names>
            </name>
            <name>
              <surname>Peng</surname>
              <given-names>H</given-names>
            </name>
            <name>
              <surname>Zhang</surname>
              <given-names>X</given-names>
            </name>
            <name>
              <surname>Gao</surname>
              <given-names>X</given-names>
            </name>
            <name>
              <surname>Li</surname>
              <given-names>J</given-names>
            </name>
          </person-group>
          <comment>Predicting drug targets from heterogeneous spaces using anchor graph hashing and ensemble learning. In 2018 International Joint Conference on Neural Networks (IJCNN); 2018 Jul 8-13; Rio de Janeiro. IEEE; 2018. pp. 1-7.</comment>
          <pub-id pub-id-type="doi">10.1109/ijcnn.2018.8489028</pub-id>
        </nlm-citation>
      </ref>
      <ref id="B45">
        <label>45</label>
        <nlm-citation publication-type="confproc">
          <person-group person-group-type="author">
            <name>
              <surname>Fan</surname>
              <given-names>L</given-names>
            </name>
            <name>
              <surname>Long</surname>
              <given-names>Z</given-names>
            </name>
          </person-group>
          <comment>Optimization of Nadam algorithm for image denoising based on convolutional neural network. In 2020 7th International Conference on Information Science and Control Engineering (ICISCE); 2020 Dec 18-20; Changsha, China. IEEE; 2020. pp. 957-61.</comment>
          <pub-id pub-id-type="doi">10.1109/icisce50968.2020.00197</pub-id>
        </nlm-citation>
      </ref>
      <ref id="B46">
        <label>46</label>
        <nlm-citation publication-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Carrington</surname>
              <given-names>AM</given-names>
            </name>
            <name>
              <surname>Manuel</surname>
              <given-names>DG</given-names>
            </name>
            <name>
              <surname>Fieguth</surname>
              <given-names>PW</given-names>
            </name>
            <etal />
          </person-group>
          <article-title>Deep ROC analysis and AUC as balanced average accuracy, for improved classifier selection, audit and explanation</article-title>
          <source>IEEE Trans Pattern Anal Mach Intell</source>
          <year>2023</year>
          <volume>45</volume>
          <fpage>329</fpage>
          <lpage>41</lpage>
          <pub-id pub-id-type="doi">10.1109/tpami.2022.3145392</pub-id>
        </nlm-citation>
      </ref>
      <ref id="B47">
        <label>47</label>
        <nlm-citation publication-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Lai</surname>
              <given-names>B</given-names>
            </name>
            <name>
              <surname>Xu</surname>
              <given-names>J</given-names>
            </name>
          </person-group>
          <article-title>Accurate protein function prediction via graph attention networks with predicted structure information</article-title>
          <source>Brief Bioinform</source>
          <year>2022</year>
          <volume>23</volume>
          <fpage>bbab502</fpage>
          <pub-id pub-id-type="doi">10.1093/bib/bbab502</pub-id>
          <pub-id pub-id-type="pmid">34882195</pub-id>
          <pub-id pub-id-type="pmcid">PMC8898000</pub-id>
        </nlm-citation>
      </ref>
      <ref id="B48">
        <label>48</label>
        <nlm-citation publication-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Zheng</surname>
              <given-names>Y</given-names>
            </name>
            <name>
              <surname>Yi</surname>
              <given-names>L</given-names>
            </name>
            <name>
              <surname>Wei</surname>
              <given-names>Z</given-names>
            </name>
          </person-group>
          <article-title>A survey of dynamic graph neural networks</article-title>
          <source>Front Comput Sci</source>
          <year>2024</year>
          <volume>19</volume>
          <fpage>196323</fpage>
          <pub-id pub-id-type="doi">10.1007/s11704-024-3853-2</pub-id>
        </nlm-citation>
      </ref>
    </ref-list>
  </back>
</article>