Page 63 - Read Online
P. 63

Kimbowa et al. Art Int Surg 2024;4:149-69  https://dx.doi.org/10.20517/ais.2024.20                                                      Page 157

               Deep learning-based methods, on the other hand, aim to leverage data to learn both the feature extractor Eλ
                                                                                                         1
               and decoder Dλ . Early approaches used multi-layer perceptrons (MLPs) to learn relevant needle features
                             2
                                        [70]
               and classify them accordingly . In the approach proposed by Geraldes and Rocha, the MLP took as input a
               region of interest (ROI) selected from the input ultrasound image, output a probability estimate for each
               pixel in the ROI being a needle, and a threshold applied to the output to localize the needle . This
                                                                                                   [70]
               approach, however, yielded tip localization errors greater than 5 mm.

               Most deep learning methods use convolutional neural networks (CNNs) with multiple layers, whereby the
               first layers learn local features from the image and the deeper layers combine the local features to learn more
               global features. CNN-based approaches can be categorized into four: (1) classification; (2) regression; (3)
               segmentation; and (4) object detection. Using CNNs for classification is common in methods working with
               3D ultrasound. For instance, in the approach by Pourtaherian et al., a CNN is used to classify voxels
               extracted from 3D ultrasound volumes as either needle or background yielding a 3D voxel-wise
               segmentation map of the needle [71,72] . A cylindrical model is then fitted to this map using RANSAC to
               estimate the needle axis which is used to determine the 2D plane containing the entire needle. Another
               approach would be to classify each scan plane in the 3D volume as either containing a needle or not, and
               then similarly combine and visualize the 2D plane that contains the entire needle . While these approaches
                                                                                   [54]
               enhance the visualization of the needle in 3D ultrasound, they do not localize the needle tip.

               With regression, the features extracted by the CNN are used to directly regress the needle tip coordinates
               (x, y) , or their proxy by regressing four values representing the two opposite vertices of a tight bounding
                   [65]
                                             [66]
               box centered around the needle tip  [Figure 3D]. These approaches are suitable for needle localization in
               both in-plane and out-of-plane insertion as they do not heavily rely on shaft information. The only
               downside of these approaches is that they do not enhance visualization of the entire needle during in-plane
               insertions.

               For segmentation, the high-level features extracted by the CNN are used in reverse to generate a probability
               map with pixel-wise probabilities of the existence of a needle [72-78] . This probability map can then be post-
               processed, usually by thresholding, to generate a binary segmentation map. CNNs with segmentation are
               the most commonly used deep learning approach for needle detection because they can detect the entire
               needle, including the shaft, while producing probabilities for their outputs, which leaves room for a variety
               of postprocessing approaches [Figure 3C]. A special case of segmentation can be found in high dose rate
               (HDR) prostate brachytherapy applications where multiple needles are segmented simultaneously [79-82] . In
               these applications, transverse 2D slices obtained from the 3D ultrasound volume are passed as input to the
               CNN trained to output the corresponding multi-needle segmentations for each slice. Unlike shaft
               segmentations for in-plane needle insertion, segmentations in HDR prostate brachytherapy slices are
               circular and centered around each needle in a given slice. These segmentations are then combined and the
               centers of the circles are considered to be the needle shaft with the most distal bright intensity considered as
               the needle tip.

               Object detection methods are similar to segmentation methods but output bounding boxes encasing the
               detected needle. For instance, Mwikirize et al. used a CNN to automatically generate potential bounding
               box regions containing the needle and fed them to a region-based CNN (R-CNN) to classify which regions
               contained the needle . On the other hand, Wang et al. used the Yolox-nano detector that outputs
                                  [83]
               bounding box predictions for each pixel. These predictions are then combined using non-max suppression
               to obtain a single bounding box indicating the predicted needle . Rubin et al. combined a 3D CNN, to
                                                                       [84]
               extract temporal features from an ultrasound video stream, with a 2D Yolov3-tiny object detector to
   58   59   60   61   62   63   64   65   66   67   68