Page 171 - Read Online
P. 171
Page 10 of 15 Tu et al. Soft Sci 2023;3:25 https://dx.doi.org/10.20517/ss.2023.15
Copyright©2022, Nature Publishing Group; (E) Schematic diagram of bottom-up and top-down multisensory fusion.
Emerging neuromorphic computing devices hold great potential for bottom-up multisensory fusion at the
device level. A bimodal artificial sensory neuron was developed to achieve the sensory fusion processes
[71]
[Figure 4A]. Pressure sensors and photodetectors are integrated to transform tactile and visual stimuli into
electrical signals. Then the combined signals are transmitted via an ion cable to the synaptic transistor,
where they are integrated to produce an excitatory postsynaptic current. As a result, the somatosensory and
visual information are fused at the device level, achieving multimodal perception integration after further
data processing. In a multi-transparency pattern recognition task, robust recognition confirms potential
application in neurorobotics and artificial intelligence, even with smaller datasets. However, the issue
remains that the visual-haptic fusion matrix was just implemented as feature extraction layers of artificial
neural networks (ANNs). In other words, the device part alone cannot realize multimodal perception tasks
without additional algorithms.
For software-level perception fusion, various machine learning algorithms, such as k-nearest-neighbor
[73]
classifiers , supporting vector machines (SVMs) [73,74] , and convolutional neural networks (CNNs) [75,76] , are
common strategies for data fusion. Among innovative e-skin systems, these advanced algorithms are
implemented to achieve multimodal perception. Li et al. integrated flexible quadruple tactile sensors onto a
robot hand to realize precise object identification [Figure 4B]. This novel skin-inspired quadruple tactile
sensor was constructed in a multilayer architecture, which enables the perceiving of the grasping pressure,
environment temperature, and temperature and thermal conductivity of objects with no interference. To
realize accurate object recognition, the multimodal sensory information collected through this smart hand
was fused as a 4 × 10 signal map at the dataset level. After being trained using multilayer perception
networks (also known as ANNs), the smart robotic hand achieves a classification accuracy of 94% in a
[27]
garbage sorting task . Feature-based cognition fusion is also a common strategy, which involves extracting
features from multisensory signals and concatenating them into a single feature vector. The feature vector is
then fed into pattern recognition algorithms, such as neural networks, clustering algorithms, and template
methods . Wang et al. proposed a bio-inspired architecture for data fusion that can recognize human
[77]
gestures by fusing visual data with somatosensory data from skin-like stretchable strain sensors
[Figure 4C]. For early visual processing, the learning architecture uses a sectional CNN and then
implements a sparse neural network for sensor data fusion and feature-level recognition, resembling the
somatosensory-visual (SV) fusion hierarchy in the higher association cortices of brains. Using stacked soft
materials, the sensor section was designed to be highly stretchable, conformable, and adhesive, enabling the
sensor to adhere tightly to the knuckle for precise monitoring of finger movement. This bioinspired
algorithm can achieve a recognition accuracy of 100% in its own dataset and even maintain high recognition
results when texting non-ideal conditions images . Liu et al. reported a tactile-olfactory sensing system
[33]
[Figure 4D]. The bimodal sensing array was integrated with mechanical hands. Olfactory and tactile data
fusion was then achieved through a machine-learning strategy for robust object recognition in rough
situations. This artificial bimodal system could classify 11 objects with an accuracy of 96.9% in a simulated
fire scenario . Although more studies should be carried out on perception fusion models and near/in-
[72]
sensor fusion devices, both types of bottom-up multimodal perception fusion still motivate the next
generation of e-skins.
Top-down attention-based multimodal perception fusion
Sensory responses in lower sensory cortices are modulated by attention and task engagement for the
efficient perception of relevant sensory stimuli [Figure 4E]. In the scenario of multimodal stimuli competing
for processing resources, the saliency for individual stimuli in the potentially preferred modality may

