Page 137 - Read Online
P. 137
Chen et al. Intell Robot 2023;3:420-35 https://dx.doi.org/10.20517/ir.2023.24 Page 422
different vibration patterns depending on the distance to the obstacle, which it determines by taking the
distance measurement using a stereo camera. With this function of measuring the distance to the obstacle
and the presentation of the tactile patterns, the system is able to provide better walking support for the
visually impaired. To achieve high-performance object recognition and distance measurements while
maintaining wearability, we use the deep learning model YOLO . For realizing real-time recognition in an
[7,8]
onboard system with limited performance, we reduce the number of parameters and layers to slim down the
model and increase the inference speed without compromising accuracy. To measure the distances to
obstacles, we employ a stereo camera system that uses the parallax of two cameras and implements the
distance measurement of multiple objects using the feature point matching method. We also design and
fabricate the control circuit to stably control the SMA actuators from the board. Small motors and SMA
actuators are woven into the finger and palm parts in the tactile glove to achieve a silent alarm by employing
micro-vibration.
2. METHODS
2.1. System configuration
Figure 1 shows the overall structure of the wearable system. The wearable system in this study consists of
two major parts. The upper part of the figure shows the real-time object detection part that carries out
object recognition and distance measurement using Raspberry Pi and a stereo camera. The lower part
presents the tactile presentation part using Raspberry Pi Zero, a signal amplifier circuit, small vibration
motors, and SMA actuators. Information acquired from the stereo camera, which consists of two cameras, is
processed by the Raspberry Pi, and inference acceleration is carried out by a Neural Compute Stick 2
[9]
(NCS2) to calculate the position and distance information of the detected objects. The calculated
information is transferred to the Raspberry Pi Zero in the tactile presentation part through TCP
communication. Then, according to the acquired position and distance information, a predetermined
vibration pattern is transmitted to the tactile display through the signal amplifier circuit, and the tactile
stimuli are presented to the user. Figure 2 shows the Raspberry Pi set-up for real-time object detection. A
fan was installed to cool the chip, as its prolonged operation could result in overheating.
2.2. Real-time object recognition using compressed YOLO V3 and NCS2
The object identification set-up consists of a Raspberry Pi 3b , a small camera, an NCS2, and a mobile
+
power supply. One approach to avoid slowing down the inference speed is to adopt a stereo camera with a
relatively low resolution (320 × 240). For a longer time operation, a high-capacity battery package powered
by two 18,650 batteries (6,800 mAh) is employed to be attached to the Raspberry Pi. For the inference, it was
necessary to compress YOLO V3 by reducing the number of parameters and layers due to the limited
computing power of the Raspberry Pi.
2.2.1. The compression of YOLO V3
To compress the YOLO V3 model, unnecessary layers and channels in the network were removed by
executing four specific steps: regular learning, sparse learning, layer and channel removal, and fine-tuning
learning. These steps reduced the number of parameters in the compressed model to approximately 5% of
the original structure. The training results and the number of parameters of the model before and after
compression are described in the next section.
2.2.2. Acceleration of model inference by NCS2
By considering the mobility and portability of the system, a compact computer, Raspberry Pi 3b , is
+
employed for system control. Since it is not equipped with a GPU, it lacks the necessary computational
power to perform object recognition with large-scale algorithms, such as YOLO V3, in real time. The small