Page 55 - Read Online
P. 55

Page 50                              Harib et al. Intell Robot 2022;2(1):37-71  https://dx.doi.org/10.20517/ir.2021.19










                          Figure 3. Architecture of the proposed NNs in the work of Levin and Narendra [69] . NNs: Neural Networks.

               techniques, performance may be assessed using cost functions such as least mean squared error.  All of the
               training data is available at the same time with off-line approaches. However, with on-line approaches, the
               required feature is continuous learning, and as a result, the methods must be extremely efficient in order to
               keep up with the changing events over time.


               Adaptive NNs have recently been used by a growing number of academics and researchers to construct
               acceptable control rules for nonlinear systems. An overview of the primarymost recent literature that
               implemented adaptive NNs-based techniques is discussed in Table 3 [73-82] .


               3.2. Inverted pendulum
               Many researchers have studied learning control using the inverted pendulum problem. The canonical
               underactuated system, called the cart-pole system, is illustrated in Figure 4. Because deriving the dynamics
               is relatively simple, it is considered a basic control issue, yet it still hides some underlying complexity owing
               to its underactuated character. The multiple obstacles that must be addressed to properly regulate such
               extremely complex nonlinear unstable systems include severe nonlinearities, variable operating
               circumstances, structured and unstructured dynamical uncertainties, and external disturbances. The
               purpose of the control is to balance the pole by moving the cart, which has a restricted range of movements.
               We distinguish between the position of the cart h and its velocity h, and the angle of the pole θ with its
               angular velocity θ.

                                [83]
               In 1983, Barto et al.  showed how a system consisting of two neuronlike adaptive elements, associative
               search element (ASE) and adaptive critic element (ACE), can solve a difficult learning control problem such
               as the cart-pole system. Their work was based on the addition of a single ACE to the ASE developed by
               Michie and Chambers in the works of Michie and Chambers [84,85] . They have partitioned the state space into
               162 boxes. Their simulations revealed that the ASE/ACE system outperformed the boxes system in terms of
               run time. The system was more likely to solve the problem before it had 100 failures, but the boxes system
               was less likely to do so. The ASE/ACE system’s high performance was nearly completely owing to the ACE’s
               provision of reinforcement throughout the trials. Learning occurs only upon failure with box systems and
               ASEs without an ACE, which happens less frequently as learning progresses. An ASE can get input on each
               time step with the ACE in place. The system attempts to access some areas of the state space and avoids
               others as a result of the learning achieved by this input.


                       [86]
               Anderson  built on the work of Barto et al.  by using a variant of the common error BP algorithm to two-
                                                    [83]
               layered networks that learn to balance the pendulum given the inverted pendulum’s real state variables as
               input. Two years later , he summarized both aforementioned works by discussing the neural network
                                  [87]
               structures and learning methods from a functional viewpoint and by presenting the experimental results. He
               described NN learning techniques, which use two functions to learn how to construct action sequences. The
               first is an action function, which converts the current state into control actions. The second is an evaluation
               function, which converts the present state into an assessment of that state. There were two sorts of networks
               that emerged: “action and evaluation” networks. This is an adaptive critic architecture version
   50   51   52   53   54   55   56   57   58   59   60