Page 137 - Read Online
P. 137

Chen et al. Intell Robot 2023;3:420-35  https://dx.doi.org/10.20517/ir.2023.24        Page 422

               different vibration patterns depending on the distance to the obstacle, which it determines by taking the
               distance measurement using a stereo camera. With this function of measuring the distance to the obstacle
               and the presentation of the tactile patterns, the system is able to provide better walking support for the
               visually impaired. To achieve high-performance object recognition and distance measurements while
               maintaining wearability, we use the deep learning model YOLO . For realizing real-time recognition in an
                                                                     [7,8]
               onboard system with limited performance, we reduce the number of parameters and layers to slim down the
               model and increase the inference speed without compromising accuracy. To measure the distances to
               obstacles, we employ a stereo camera system that uses the parallax of two cameras and implements the
               distance measurement of multiple objects using the feature point matching method. We also design and
               fabricate the control circuit to stably control the SMA actuators from the board. Small motors and SMA
               actuators are woven into the finger and palm parts in the tactile glove to achieve a silent alarm by employing
               micro-vibration.

               2. METHODS
               2.1. System configuration
               Figure 1 shows the overall structure of the wearable system. The wearable system in this study consists of
               two major parts. The upper part of the figure shows the real-time object detection part that carries out
               object recognition and distance measurement using Raspberry Pi and a stereo camera. The lower part
               presents the tactile presentation part using Raspberry Pi Zero, a signal amplifier circuit, small vibration
               motors, and SMA actuators. Information acquired from the stereo camera, which consists of two cameras, is
               processed by the Raspberry Pi, and inference acceleration is carried out by a Neural Compute Stick 2
                      [9]
               (NCS2)  to calculate the position and distance information of the detected objects. The calculated
               information is transferred to the Raspberry Pi Zero in the tactile presentation part through TCP
               communication. Then, according to the acquired position and distance information, a predetermined
               vibration pattern is transmitted to the tactile display through the signal amplifier circuit, and the tactile
               stimuli are presented to the user. Figure 2 shows the Raspberry Pi set-up for real-time object detection. A
               fan was installed to cool the chip, as its prolonged operation could result in overheating.

               2.2. Real-time object recognition using compressed YOLO V3 and NCS2
               The object identification set-up consists of a Raspberry Pi 3b , a small camera, an NCS2, and a mobile
                                                                     +
               power supply. One approach to avoid slowing down the inference speed is to adopt a stereo camera with a
               relatively low resolution (320 × 240). For a longer time operation, a high-capacity battery package powered
               by two 18,650 batteries (6,800 mAh) is employed to be attached to the Raspberry Pi. For the inference, it was
               necessary to compress YOLO V3 by reducing the number of parameters and layers due to the limited
               computing power of the Raspberry Pi.


               2.2.1. The compression of YOLO V3
               To compress the YOLO V3 model, unnecessary layers and channels in the network were removed by
               executing four specific steps: regular learning, sparse learning, layer and channel removal, and fine-tuning
               learning. These steps reduced the number of parameters in the compressed model to approximately 5% of
               the original structure. The training results and the number of parameters of the model before and after
               compression are described in the next section.


               2.2.2. Acceleration of model inference by NCS2
               By considering the mobility and portability of the system, a compact computer, Raspberry Pi 3b , is
                                                                                                       +
               employed for system control. Since it is not equipped with a GPU, it lacks the necessary computational
               power to perform object recognition with large-scale algorithms, such as YOLO V3, in real time. The small
   132   133   134   135   136   137   138   139   140   141   142