Page 172 - Read Online
P. 172
Tu et al. Soft Sci 2023;3:25 https://dx.doi.org/10.20517/ss.2023.15 Page 11 of 15
[46]
remain at a low level and thus affect accurate perception and cognition . To solve this, an attention-based
mechanism engages in and conditionally selects a salient modality between different signals. Although it is
still blank in the field of e-skins about the top-down multisensory fusion mechanism, some research on the
attention-based fusion mechanism provides future e-skin systems with algorithm models for reference.
There have been many attention-based fusion models being constructed in other fields, such as video
descriptions [78,79] , event detection [80,81] , and speech recognition [79,82] . For example, Zhou et al. presented a
robust attention-based dual-modal speech recognition system. In virtue of the multi-modality attention-
based method, the system can strike a balance between visual and audio information by fusing
representations of them based on their importance. In addition, the attention of different modalities can be
[82]
mediated over time by modeling temporal variability for each modality using a long short-term memory .
Considering further exploration in neuroscience and developing advanced algorithm models, a top-down
attention-based fusion technique can push forward the progress of smart skins.
CONCLUSIONS AND FUTURE PERSPECTIVE
Collectively, we overviewed the recent works in the intriguing field of e-skins with multimodal sensing and
perception fusion. Although considerable progress in multimodal sensing integration has been made over
the last few years, challenges remain and need to be addressed. As a fast-growing research interest,
multimodal perception fusion deserves much deeper investigation. To realize the next generation of e-skins,
more attention should be paid to the following aspects:
(i) Decoupled sensing modalities without signal interference. It is worthy of in-depth research to endow
e-skins with sensing abilities, which are the same as or even beyond the basic functions of human skin. In
order to achieve higher-level perception, integrating other sensing parts, such as chemical, sound, and light
sensors, with the existing e-skin systems enables more accurate detection of events. Nevertheless, the same
sensor can respond to different stimuli where interference comes along. This will affect the accuracy of the
signal outputs for each sensing mode. Signal processing can sometimes minimize the effect of interference
but also come along with processing complexity. Therefore, multimodal sensing systems with self-
decoupled mechanisms are desired for the superiority of simplified data processing and higher accuracy of
signals with less interference. Self-decoupling materials can remove signal interference intrinsically through
novel sensing mechanisms. Ionic-based materials are suitable for self-decoupling sensing systems with
frequency-dependent ion relaxation dynamics. An ionic-based conductor differentiates thermal and
mechanical information without signal interference. In addition, ferroelectric materials can be candidates
for multimodal systems with their triboelectric and pyroelectric effects. The different response and
relaxation times of the triboelectric and the pyroelectric effect can decouple the pressure and thermal
signals. With the superiority of direction differentiation, magnetic mechanisms can also be used for force
self-decoupling. Strain and pressure can be distinguished by detecting the change of magnetic flux densities.
With the advantage of these novel self-decoupling materials, the next-generation multimodal sensing
systems will fulfill practical demands for healthcare, HMIs, and robotics. Eliminating interferences caused
by external stimuli is a significant challenge for next-generation e-skins and demands more effort in finding
novel materials and integrating multiple sensing mechanisms.
(ii) High-density, high-fidelity, and large-area integration. A highly integrated e-skin system with
multimodal sensing abilities will provide device-level foundations for further research on multimodal
perception fusion and surely contributes to a wider range of applications in smart healthcare, soft robotics,
and HMIs. However, highly integrated e-skin systems with various sensors, electrical interconnectors, and
signal processing units are faced with great challenges. Growing density of and decreasing spaces between
interconnect lines and the lower signal intensity caused by the miniaturization of sensors induce signal

