Page 10 - Read Online

P. 10

Page 860 Fan et al. Intell. Robot. 2025, 5, 859-63 https://dx.doi.org/10.20517/ir.2025.44

Taking autonomous driving as an example, embodied artificial intelligence enables vehicles not only to
perceive their surroundings through advanced sensors such as LiDAR, Radar, and cameras, but also to
dynamically adapt to evolving traffic conditions, interact safely with pedestrians, and respond effectively to
[4]
unforeseen events . These intelligent agents leverage contextual understanding to make real-time decisions
that prioritize both safety and operational efficiency . Similarly, service robots equipped with embodied
[5]
intelligence can skillfully navigate domestic and public environments, comprehend natural language
instructions, and engage in complex interactions with both objects and humans . Such advancements mark
[6]
a significant step toward the meaningful integration of artificial intelligence into real-world environments.

This Special Issue aims to demonstrate recent advances in the rapidly evolving field of embodied artificial
intelligence. We received eight submissions from researchers across the globe, reflecting the growing
interest and momentum in this area. Following a rigorous peer-review process and valuable feedback from
expert reviewers, four articles were selected for publication. These contributions primarily address key
challenges in perception and localization, presenting novel insights and methodologies that advance the
capabilities of intelligent embodied systems.

2. CONTRIBUTED ARTICLES
Among the four accepted articles, three deep learning-based studies focus on leveraging single-modal neural
[7]
networks to perform salient object detection , facial expression recognition , and infrared object
[8]
detection , respectively. Although targeting different application domains, these studies share a unified
[9]
vision: enabling machines to perceive, interpret, and respond to complex visual environments in real time.
Each work introduces specialized network architectures designed to enhance perception accuracy while
maintaining computational efficiency, reflecting a collective push toward deployable artificial intelligence in
resource-constrained or dynamic conditions. A notable commonality among these studies lies in their
emphasis on multi-scale feature learning. Whether through attention mechanisms, convolution-transformer
fusion, or scale-adaptive modules, all three approaches integrate hierarchical visual cues to capture both
global semantic context and fine-grained structural detail. This multi-scale strategy enables a crucial balance
between semantic comprehension and spatial precision, an essential attribute for embodied artificial
intelligence systems operating in unstructured or variable environments. Furthermore, each method
demonstrates strong empirical performance on public benchmark datasets, underscoring the robustness and
generalizability of the proposed designs.

[7]
Despite these shared foundations, each study makes distinct and complementary contributions. The study
focuses on lightweight salient object detection, introducing scale-adaptive feature extraction and multi-scale
feature aggregation modules to achieve an optimal trade-off between efficiency and accuracy. The study
[8]
addresses facial expression recognition toward embodied artificial intelligence, proposing a multi-scale
attention and convolution-transformer fusion network to enhance emotion-aware human-robot
[10]
[9]
interaction . Finally, the study targets infrared object detection under adverse weather conditions,
combining the MobileNetV3-YOLOv4 architecture with an image enhancement generative adversarial
network to ensure high-precision detection on low-power edge devices.

Taken together, these studies collectively advance the frontier of efficient and adaptive visual perception. By
addressing complementary aspects of perception, ranging from the semantic understanding of human
emotions to the structural saliency of objects and the multi-modal robustness required under low-visibility
conditions, they demonstrate a coherent progression toward unified, context-aware perception systems.
Such efforts not only contribute to academic exploration but also carry significant practical implications for
real-world applications, from intelligent vehicles and service robots to next-generation embodied agents
capable of understanding and interacting with their surroundings in a human-like manner.

5 6 7 8 9 10 11 12 13 14 15