Page 171 - Read Online
P. 171

Page 10 of 15                             Tu et al. Soft Sci 2023;3:25  https://dx.doi.org/10.20517/ss.2023.15


                Copyright©2022, Nature Publishing Group; (E) Schematic diagram of bottom-up and top-down multisensory fusion.


               Emerging neuromorphic computing devices hold great potential for bottom-up multisensory fusion at the
               device level. A bimodal artificial sensory neuron was developed to achieve the sensory fusion processes
                                                                                                        [71]
               [Figure 4A]. Pressure sensors and photodetectors are integrated to transform tactile and visual stimuli into
               electrical signals. Then the combined signals are transmitted via an ion cable to the synaptic transistor,
               where they are integrated to produce an excitatory postsynaptic current. As a result, the somatosensory and
               visual information are fused at the device level, achieving multimodal perception integration after further
               data processing. In a multi-transparency pattern recognition task, robust recognition confirms potential
               application in neurorobotics and artificial intelligence, even with smaller datasets. However, the issue
               remains that the visual-haptic fusion matrix was just implemented as feature extraction layers of artificial
               neural networks (ANNs). In other words, the device part alone cannot realize multimodal perception tasks
               without additional algorithms.


               For software-level perception fusion, various machine learning algorithms, such as k-nearest-neighbor
                       [73]
               classifiers , supporting vector machines (SVMs) [73,74] , and convolutional neural networks (CNNs) [75,76] , are
               common strategies for data fusion. Among innovative e-skin systems, these advanced algorithms are
               implemented to achieve multimodal perception. Li et al. integrated flexible quadruple tactile sensors onto a
               robot hand to realize precise object identification [Figure 4B]. This novel skin-inspired quadruple tactile
               sensor was constructed in a multilayer architecture, which enables the perceiving of the grasping pressure,
               environment temperature, and temperature and thermal conductivity of objects with no interference. To
               realize accurate object recognition, the multimodal sensory information collected through this smart hand
               was fused as a 4 × 10 signal map at the dataset level. After being trained using multilayer perception
               networks (also known as ANNs), the smart robotic hand achieves a classification accuracy of 94% in a
                                [27]
               garbage sorting task . Feature-based cognition fusion is also a common strategy, which involves extracting
               features from multisensory signals and concatenating them into a single feature vector. The feature vector is
               then fed into pattern recognition algorithms, such as neural networks, clustering algorithms, and template
               methods . Wang et al. proposed a bio-inspired architecture for data fusion that can recognize human
                      [77]
               gestures by fusing visual data with somatosensory data from skin-like stretchable strain sensors
               [Figure 4C]. For early visual processing, the learning architecture uses a sectional CNN and then
               implements a sparse neural network for sensor data fusion and feature-level recognition, resembling the
               somatosensory-visual (SV) fusion hierarchy in the higher association cortices of brains. Using stacked soft
               materials, the sensor section was designed to be highly stretchable, conformable, and adhesive, enabling the
               sensor to adhere tightly to the knuckle for precise monitoring of finger movement. This bioinspired
               algorithm can achieve a recognition accuracy of 100% in its own dataset and even maintain high recognition
               results when texting non-ideal conditions images . Liu et al. reported a tactile-olfactory sensing system
                                                          [33]
               [Figure 4D]. The bimodal sensing array was integrated with mechanical hands. Olfactory and tactile data
               fusion was then achieved through a machine-learning strategy for robust object recognition in rough
               situations. This artificial bimodal system could classify 11 objects with an accuracy of 96.9% in a simulated
               fire scenario . Although more studies should be carried out on perception fusion models and near/in-
                          [72]
               sensor fusion devices, both types of bottom-up multimodal perception fusion still motivate the next
               generation of e-skins.


               Top-down attention-based multimodal perception fusion
               Sensory responses in lower sensory cortices are modulated by attention and task engagement for the
               efficient perception of relevant sensory stimuli [Figure 4E]. In the scenario of multimodal stimuli competing
               for processing resources, the saliency for individual stimuli in the potentially preferred modality may
   166   167   168   169   170   171   172   173   174   175   176