Page 11 - Read Online
P. 11

Ji et al. Intell Robot 2021;1(2):151-75  https://dx.doi.org/10.20517/ir.2021.14     Page 155

               Through this paper, the authors intend to provide a useful guide to researchers and practitioners who are
               interested in applying deep learning methods to rail track condition monitoring tasks. There are three major
               contributions of this paper. First, the evolution of deep learning and a collection of relevant deep learning
               methods provide clear coverage of the relevance, usefulness, and applicability of deep learning methods,
               which enable fellow researchers to navigate through the usually rather daunting deep learning domain with
               confidence. Second, a systematic search and review of the application publications proves the relevance of
               deep learning methods to rail track condition monitoring tasks and provides insights into how such
               research works are carried out and what potential further studies can be followed up. Third, two illustrative
               case studies demonstrate practical considerations and aim to motivate wider and more creative adoption of
               deep learning methods to rail industries. This paper is organized as follows. Section 2 describes a historical
               overview of deep learning and briefly introduces common deep learning models. Section 3 reviews research
               adopting deep learning methods for rail track condition monitoring and anomaly detection. Section 4
               discusses challenges and opportunities. Section 5 presents case studies applying deep learning to rail
               anomaly detection and classification while Section 6 concludes the paper.

               2. DEEP LEARNING MODELS
               2.1. Historical overview of deep learning
               We provide a simplified timeline for deep learning and its evolution. Important issues and development at
               critical junctures are highlighted. A modern definition of deep learning describes a current understanding of
               the topic. Multiple layers of a deep learning model learn to represent the data with abstractions at multiple
               levels. The intricate structure of the large input data is discovered through the computations at each layer.
               Each layer computes its own representation from the representation of its previous layer according to the
               deep learning model’s internal parameters which are updated using the backpropagation algorithm. Images,
               video, speech, and audio data can be processed by deep convolutional nets while sequential data such as text
                                       [9]
               and speech by recurrent nets . In the following paragraphs, we examine the journey of deep learning from a
               single neuron to the current status and hence determine the scope of the following review work.

               The McCulloch-Pitts (MCP) neuron proposed in 1943 was the first computational model mimicking the
               functionality of a biological neuron which marks the start of the era of artificial neural networks. An
                                                                                            [10]
               aggregation of Boolean inputs determines the output through a threshold parameter . The classical
               perceptron model  proposed in 1958 was further refined and analyzed  in 1969. The perceptron model
                              [11]
                                                                             [12]
               brought in the concept of numerical weights to measure the importance of inputs and a mechanism for
               learning those weights. The model is similar to but more generalized than the MCP neuron as it takes
               weighted real inputs and the threshold value is learnable. As a single artificial neuron is incapable of
               implementing some functions such as the XOR logical function, larger networks also have similar
               limitations which cooled down the artificial neural network development.

               The multi-layer perceptron (MLP) was proposed in 1986 where node outputs of hidden layers are calculated
               using sigmoid function and biogeography based optimization is used to find the weights of the network
                    [13]
               model . The universal approximation theorem of MLP, proved in 1989, states that, for any given function
               f(x), there is a backpropagation neural network that can approximately approach the result . The LeNet
                                                                                              [14]
                                                                                         [15]
               network was proposed in 1989 to recognize handwritten digits with good performances . In 1991, with the
               backpropagation neural network, the vanishing gradient problem was discovered, that is back-propagated
               error signals either shrink rapidly or grow out of bounds in typical deep or recurrent networks because
               certain activation functions, such as the sigmoid function, take a large input space but have a small output
               space between 0 and 1 . The LSTM model was proposed in 1997  and performs well in predicting
                                    [16]
                                                                          [17]
               sequential data. However, since then, neural networks had not been progressing well until 2006. It is worth
               mentioning that statistical learning theory, a framework for machine learning, blossomed between 1986 and
   6   7   8   9   10   11   12   13   14   15   16