Page 127 - Read Online
P. 127

Page 4 of 26                            Sun et al. Soft Sci. 2025, 5, 18  https://dx.doi.org/10.20517/ss.2024.77

               for fast, energy-efficient data processing, but their integration into large-scale systems requires overcoming
               challenges related to fabrication, scalability, and reliability.

               In this study, we present a recent comprehensive review on human skin-inspired sensors combined with
               neuromorphic computing technologies. Different from previously published reviews [37-39] , we focus on the
               role of neuromorphic computing within the sensor systems themselves, as opposed to emphasizing its
               integration at the system level, including the specific sensor modalities and the integration of neuromorphic
               devices such as memristors, transistors, etc., as shown in Figure 1. First, we introduce the multimodal
               perception of skin-inspired sensors in their principles and types. Subsequently, we discuss the application of
               neuromorphic devices including transistors and memristors in skin-inspired sensors and provide detailed
               descriptions for each. In the section on neural network algorithms, we introduce the latest developments in
               algorithms in neuromorphic computing. Then we present examples of on-chip neuromorphic computing
               systems in applications such as human health monitoring, robotic skin, and other related fields, highlighting
               their potential for real-time processing and adaptive responses in these areas. Finally, we offer insights into
               the prospects of large-scale manufacturing of human skin-inspired neuromorphic sensors, emphasizing the
               technological advancements required to enhance scalability, performance, and integration into real-world
               applications.


               MULTIMODAL PERCEPTION
               The human perception system facilitates high-level cognition and learning through the integration and
               interaction of vision, hearing, touch, smell, and other senses. Numerous studies have utilized flexible
               sensors in skin-inspired systems to achieve perceptive abilities. To fully simulate the human perceptual
               system, it is essential to model multiple types of sensory signals.

               These neuromorphic sensors can only realize their full potential through multimodal perception, as shown
               in Figure 2. Multimodal perception refers to the integration of heterogeneous data acquired from various
               sensors (such as vision, touch, hearing, etc.) to provide a comprehensive understanding of the environment
               or target . The core of this process lies in data fusion and collaborative analysis, which includes the
                       [40]
               following key principles: first, data acquisition involves obtaining multimodal data from different sensors
               (e.g., tactile arrays, temperature sensors), where each modality has distinct physical properties and
                                      [41]
               spatiotemporal resolutions . The data collected from these sensors is often complementary or redundant
               (e.g., vision cannot perceive object hardness, requiring assistance from tactile sensors). Through appropriate
               fusion strategies, the information utilization rate is optimized, and the accuracy of each modality’s
               measurements is enhanced. Once data collection is completed, feature extraction and representation are
               required to process the heterogeneous data. For example, features from visual data are typically extracted
               using convolutional neural networks (CNNs), while tactile data is analyzed by extracting pressure signal
               features based on time sequences . Subsequently, methods such as sparse coding and graph neural
                                             [42]
                                                                                      [43]
               networks are employed to establish associative models between different modalities . For instance, tactile
               data can be integrated with visual texture features to improve object recognition accuracy. Finally, data
               fusion is performed. In strongly correlated scenarios (e.g., synchronized vision and touch), multimodal raw
               data is directly concatenated at the data level. In weaker correlation or asynchronous scenarios (e.g.,
               combining visual recognition with voice commands), each modality is independently processed before the
               decision results are fused. By combining these two approaches and optimizing the joint model, multimodal
               perception is achieved. In this section, we summarize recent developments in skin-inspired sensors, which
               have shown potential to achieve multimodal perception.
   122   123   124   125   126   127   128   129   130   131   132