Page 163 - Read Online
P. 163

Page 2 of 15                              Tu et al. Soft Sci 2023;3:25   https://dx.doi.org/10.20517/ss.2023.15

               INTRODUCTION
               Humans and animals are all immersed in a physical environment filled with dynamic and complex sensory
               cues, such as tactile, visual, auditory, gustatory, and olfactory. These cues are captured and encoded by
               distinct sensory receptors, each of which is specialized for a specific type of cue and then sent to nervous
                                                [1-3]
               systems for processing to form senses . In principle, each cue can provide an individual estimate of the
               same event. However, multiple sensory cues are necessary for high-level perceptual events, such as thinking,
               planning, and problem-solving, which are integrated and regulated in cortical networks. Multisensory
               integration will decrease perceptual ambiguity, enabling more accurate detection of events, but it can also
               improve perceptual sensitivity with the aim of reacting to even slight changes in environments .
                                                                                              [4-8]
               Skin is the largest sensory organ in the human body, which is responsible for detecting various stimuli.
               Wearable electronic skin (e-skin) devices are developed to mimic and even go beyond human skin. These
               devices detect and distinguish different external stimuli tuning them into accessible signals for processing
               and recognition. The great functionalities and soft mechanical and physical properties altogether provide
               tremendous application potential for e-skins in fields of healthcare monitoring, human-machine interfaces
               (HMIs), and sensory skins for robotics [9-14] . Current e-skin devices mainly emphasize the acquisition and
               processing of the unimodal sensory cue, involving a myriad of sensors based on nanomaterials/micro-nano
               structures. These sensors are designed to detect and measure strain, pressure, temperature, and optical and
               electrophysiological signals [15-24] . The main concerns are about improving the physical properties of the
               specific sensor and developing new fabrication methods and signal processing techniques. Although
               unimodal sensing has been well developed over the past few years, single-functional e-skin systems are
               insufficient for complex tasks and practical applications, such as robotic hands for detection of spatial
               distributions of signals and object recognition [25-27] . Unlike unimodal sensing, multimodal sensing aims to
               endow e-skins with the same sensing modalities as human skins or even more. Integrating sensors from
               different modalities, such as physical, electrophysiological, and chemical sensors, forms a multi-parameter
               sensing network for comprehensive stimuli sensing from the surroundings. Obstacles still exist when trying
               to simultaneously detect multimodal signals, including the difficulty of differentiating multicomplex signals
               and the interference between sensing components and the mechanical disturbance. Thus, novel materials
               and structure designs are urgently needed to overcome these ongoing problems for reliable and accurate
               measurement.


               In addition to multimodal sensing, work on e-skin systems has been undertaken with the aim of realizing
               multimodal perception. It has been indicated that high-level perceptual behaviors are attributed to the
               crossmodal synthesis of multimodal sensory information from the aspect of neuroscience [15,28-31] . The
               multimodal perception of e-skins takes inspiration from the multisensory integration mechanism of cortical
               networks, emphasizing the fusion of sensory cues through hardware or algorithms [Figure 1]. Compared to
               multimodal sensing, studies on multimodal perception are much more limited due to inevitable challenges
               at both device and software levels. As machine learning is suitable when managing tasks with multi-
               parameter inputs and without explicit mathematical models, current e-skin systems implement multimodal
               perception mainly through software-level methods [27,32,33] . Still, software-level multimodal perception faces
               difficulties in fusing multimodal signals due to the incompatibility between datasets, including combining
                                                                                                       [34]
               the datasets from heterogeneous modalities and dealing with missing data or different levels of noise .
               Besides, a sea of raw data collected by sensor networks have to be transmitted to computation units or
               cloud-based systems, which will bring problems in terms of energy consumption, response time, data
               storage, and communication bandwidth . To solve these significant problems, device-level multimodal
                                                  [35]
               perception occurs through near-/in-sensor computing, where the data computing is acted close to or even
               within the sensory units . However, it requires more advanced computing devices that are suitable for
                                    [36]
   158   159   160   161   162   163   164   165   166   167   168