Page 55 - Read Online
P. 55
Page 50 Harib et al. Intell Robot 2022;2(1):37-71 https://dx.doi.org/10.20517/ir.2021.19
Figure 3. Architecture of the proposed NNs in the work of Levin and Narendra [69] . NNs: Neural Networks.
techniques, performance may be assessed using cost functions such as least mean squared error. All of the
training data is available at the same time with off-line approaches. However, with on-line approaches, the
required feature is continuous learning, and as a result, the methods must be extremely efficient in order to
keep up with the changing events over time.
Adaptive NNs have recently been used by a growing number of academics and researchers to construct
acceptable control rules for nonlinear systems. An overview of the primarymost recent literature that
implemented adaptive NNs-based techniques is discussed in Table 3 [73-82] .
3.2. Inverted pendulum
Many researchers have studied learning control using the inverted pendulum problem. The canonical
underactuated system, called the cart-pole system, is illustrated in Figure 4. Because deriving the dynamics
is relatively simple, it is considered a basic control issue, yet it still hides some underlying complexity owing
to its underactuated character. The multiple obstacles that must be addressed to properly regulate such
extremely complex nonlinear unstable systems include severe nonlinearities, variable operating
circumstances, structured and unstructured dynamical uncertainties, and external disturbances. The
purpose of the control is to balance the pole by moving the cart, which has a restricted range of movements.
We distinguish between the position of the cart h and its velocity h, and the angle of the pole θ with its
angular velocity θ.
[83]
In 1983, Barto et al. showed how a system consisting of two neuronlike adaptive elements, associative
search element (ASE) and adaptive critic element (ACE), can solve a difficult learning control problem such
as the cart-pole system. Their work was based on the addition of a single ACE to the ASE developed by
Michie and Chambers in the works of Michie and Chambers [84,85] . They have partitioned the state space into
162 boxes. Their simulations revealed that the ASE/ACE system outperformed the boxes system in terms of
run time. The system was more likely to solve the problem before it had 100 failures, but the boxes system
was less likely to do so. The ASE/ACE system’s high performance was nearly completely owing to the ACE’s
provision of reinforcement throughout the trials. Learning occurs only upon failure with box systems and
ASEs without an ACE, which happens less frequently as learning progresses. An ASE can get input on each
time step with the ACE in place. The system attempts to access some areas of the state space and avoids
others as a result of the learning achieved by this input.
[86]
Anderson built on the work of Barto et al. by using a variant of the common error BP algorithm to two-
[83]
layered networks that learn to balance the pendulum given the inverted pendulum’s real state variables as
input. Two years later , he summarized both aforementioned works by discussing the neural network
[87]
structures and learning methods from a functional viewpoint and by presenting the experimental results. He
described NN learning techniques, which use two functions to learn how to construct action sequences. The
first is an action function, which converts the current state into control actions. The second is an evaluation
function, which converts the present state into an assessment of that state. There were two sorts of networks
that emerged: “action and evaluation” networks. This is an adaptive critic architecture version