Page 11 - Read Online

P. 11

Ji et al. Intell Robot 2021;1(2):151-75 https://dx.doi.org/10.20517/ir.2021.14 Page 155

Through this paper, the authors intend to provide a useful guide to researchers and practitioners who are
interested in applying deep learning methods to rail track condition monitoring tasks. There are three major
contributions of this paper. First, the evolution of deep learning and a collection of relevant deep learning
methods provide clear coverage of the relevance, usefulness, and applicability of deep learning methods,
which enable fellow researchers to navigate through the usually rather daunting deep learning domain with
confidence. Second, a systematic search and review of the application publications proves the relevance of
deep learning methods to rail track condition monitoring tasks and provides insights into how such
research works are carried out and what potential further studies can be followed up. Third, two illustrative
case studies demonstrate practical considerations and aim to motivate wider and more creative adoption of
deep learning methods to rail industries. This paper is organized as follows. Section 2 describes a historical
overview of deep learning and briefly introduces common deep learning models. Section 3 reviews research
adopting deep learning methods for rail track condition monitoring and anomaly detection. Section 4
discusses challenges and opportunities. Section 5 presents case studies applying deep learning to rail
anomaly detection and classification while Section 6 concludes the paper.

2. DEEP LEARNING MODELS
2.1. Historical overview of deep learning
We provide a simplified timeline for deep learning and its evolution. Important issues and development at
critical junctures are highlighted. A modern definition of deep learning describes a current understanding of
the topic. Multiple layers of a deep learning model learn to represent the data with abstractions at multiple
levels. The intricate structure of the large input data is discovered through the computations at each layer.
Each layer computes its own representation from the representation of its previous layer according to the
deep learning model’s internal parameters which are updated using the backpropagation algorithm. Images,
video, speech, and audio data can be processed by deep convolutional nets while sequential data such as text
[9]
and speech by recurrent nets . In the following paragraphs, we examine the journey of deep learning from a
single neuron to the current status and hence determine the scope of the following review work.

The McCulloch-Pitts (MCP) neuron proposed in 1943 was the first computational model mimicking the
functionality of a biological neuron which marks the start of the era of artificial neural networks. An
[10]
aggregation of Boolean inputs determines the output through a threshold parameter . The classical
perceptron model proposed in 1958 was further refined and analyzed in 1969. The perceptron model
[11]
[12]
brought in the concept of numerical weights to measure the importance of inputs and a mechanism for
learning those weights. The model is similar to but more generalized than the MCP neuron as it takes
weighted real inputs and the threshold value is learnable. As a single artificial neuron is incapable of
implementing some functions such as the XOR logical function, larger networks also have similar
limitations which cooled down the artificial neural network development.

The multi-layer perceptron (MLP) was proposed in 1986 where node outputs of hidden layers are calculated
using sigmoid function and biogeography based optimization is used to find the weights of the network
[13]
model . The universal approximation theorem of MLP, proved in 1989, states that, for any given function
f(x), there is a backpropagation neural network that can approximately approach the result . The LeNet
[14]
[15]
network was proposed in 1989 to recognize handwritten digits with good performances . In 1991, with the
backpropagation neural network, the vanishing gradient problem was discovered, that is back-propagated
error signals either shrink rapidly or grow out of bounds in typical deep or recurrent networks because
certain activation functions, such as the sigmoid function, take a large input space but have a small output
space between 0 and 1 . The LSTM model was proposed in 1997 and performs well in predicting
[16]
[17]
sequential data. However, since then, neural networks had not been progressing well until 2006. It is worth
mentioning that statistical learning theory, a framework for machine learning, blossomed between 1986 and

6 7 8 9 10 11 12 13 14 15 16