Page 13 - Read Online

P. 13

Fan et al. Complex Eng Syst 2023;3:5 I http://dx.doi.org/10.20517/ces.2023.04 Page 9 of 15

where ˜ = − ˆ is the weight error vector. At present, in order to minimize the objective function
T
= (1/2) , the normalised steepest descent algorithm based on Equation (38) is employed as follows:

1
¤ ˆ = − = − , (40)
T T 2
(1 + ) 2 ˆ (1 + )

T
2
where > 0 represents the basic learning rate. Besides, (1 + ) is introduced for the normalization to

simplify the critic error dynamics, and is derived as
∗
= ∇ ( )[F ( ) + H ( ) ˆ ( )] − ( ). (41)

Usually, in the traditional weight training process, it is often necessary to select the appropriate initial weight
vector for effective training. To eliminate the initial admissible control law, an improved critic learning rule is
presented in the following.
4.2. Improved critic learning rule via neural networks
Herein, an additional Lyapunov function is introduced for the purpose of improving the critic learning mech-
anism. Then, the following rational assumption is given.

Assumption1Considerthedynamicofthe thATISEquation(9) withtheoptimalcostfunctionEquation(14)
and the closed-loop optimal control policy Equation (32). We select ( ) as a continuously differentiable
Lyapunov function and have the following relation:

T
∗
( ) = (∇ ( )) [F ( ) + G ( ) ( ) + H ( ) ( )] < 0. (42)
∗
¤

In other words, there exists a positive definite matrix ℬ such that
T
T
2
(∇ ( )) [F ( ) + G ( ) ( ) + H ( ) ( )] = −(∇ ( )) ℬ∇ ( ) ≤ − k∇ ( )k , (43)
∗
∗

where is the minimum eigenvalue of the matrix ℬ.
Remark2. Herein,themotivationofselectingthecostfunction ( ) istoobtaintheoptimalDTCstrategy, which
can minimize and maximize ( ) under the optimal control law and the worst disturbance law. Moreover, we
can discuss the stability of closed-loop systems by the constructed optimal cost function. Besides, just to be clear,
( ) is derived by properly selecting the quadratic polynomial in terms of the state vector. We generally choose
T
( ) = 0.5 .
T
When the condition occurs, that is, (∇ ( )) [F ( ) + G ( ) ( ) + H ( ) ( )] > 0, which means the
∗
∗

system is in an unstable state under the optimal control law Equation (36). In this case, an additional term is
introduced to ensure the system stability. Based on Equation (36), some processing is performed as follows:
T
∗
∗
− [(∇ ( )) (F ( ) + G ( ) ( ) + H ( ) ( ))]

ˆ
∗ T T ∗ ∗
ˆ ( ) − [(∇ ( )) (F ( ) + G ( ) ( ) + H ( ) ( ))]

=
∗
ˆ ˆ ( )

1 −1 T
= ∇ ( )G ( ) G ( )∇ ( ). (44)

2
Thus, we describe the improved learning rule as
1 −1 T
∗
∗
¤ ˆ = − + Π ( , ˆ , ˆ )∇ ( )G ( ) G ( )∇ ( ), (45)

T
(1 + ) 2 2

8 9 10 11 12 13 14 15 16 17 18