In this paper, the decentralized tracking control (DTC) problem is investigated for a class of continuous-time nonlinear systems with external disturbances. First, the DTC problem is resolved by converting it into the optimal tracking controller design for augmented tracking isolated subsystems (ATISs). %It is investigated in the form of the nominal system. A cost function with a discount is taken into consideration. Then, in the case of external disturbances, the DTC scheme is effectively constructed via adding the appropriate feedback gain to each ATIS. %Herein, we aim to obtain the optimal control strategy for minimizing the cost function with discount. In addition, utilizing the approximation property of the neural network, the critic network is constructed to solve the Hamilton-Jacobi-Isaacs equation, which can derive the optimal tracking control law and the worst disturbance law. Moreover, the updating rule is improved during the process of weight learning, which removes the requirement for initial admission control. Finally, through the interconnected spring-mass-damper system, a simulation example is given to verify the availability of the DTC scheme.

For large-scale nonlinear interconnected systems, which are considered as nonlinear plants consisting of many interconnected subsystems, decentralized control has become a research hotspot in the last few decades ^{[1–4]}. Compared with the centralized control, the decentralized control has the advantages of simplifying the structure and reducing the computation burden of the controller. Besides, the local controller only depends on the information of the local subsystem. Meanwhile, with the development of science and technology, interconnected engineering applications have become increasingly complex, such as robotic systems ^{[5]} and power systems ^{[6, 7]}. In ^{[8–10]}, we found that the decentralized control of the large-scale system was connected with the optimal control of the isolated subsystems, which means the optimal control method ^{[11–14]} can be adopted to achieve the design purpose of the decentralized controllers. However, the optimal control of the nonlinear system often needs to solve the Hamilton-Jacobi-Bellman (HJB) or Hamilton-Jacobi-Isaacs (HJI) equation, which can be solved by using the adaptive dynamic programming (ADP) method ^{[15, 16]}. Besides, in ^{[13]}, Wang ^{[14]}, the optimal feedback stabilization problem was discussed with discounted guaranteed cost for nonlinear systems. It follows that the interconnection plays a significant role in designing the controller. Hence, it can be classified as decentralized and distributed control schemes. There is a certain distinction between decentralized control and distributed control. For decentralized control, each sub-controller only uses local information and the interconnection among subsystems can be assumed to be weak in nature. Compared with the decentralized control, the distributed control ^{[17–19]} can be introduced to improve the performance of the subsystems when the interconnections among subsystems become strong. In ^{[20]}, the distributed optimal observer was devised to assess the nonlinear leader state for all followers. In ^{[21]}, the distributed control was developed by means of online reinforcement learning for interconnected systems with exploration.

It is worth mentioning that the ADP algorithm has been extensively employed for dealing with various optimal regulation problems and tracking problems ^{[22–24]}, which will achieve the goal, that is, the actual signal can track the reference signal under the noisy and the uncertain environment. In ^{[25]}, Ha ^{[26–29]}. The DTC systems can be transformed into the the nominal augmented tracking isolated subsystems (ATISs), which are composed of the tracking error and the reference signal. In ^{[26]}, Qu ^{[27]}. In ^{[28]}, Liu ^{[29]}.

Game theory is a discipline that implements corresponding strategies. It contains cooperative and noncooperative types, that is, zero-sum (ZS) games and non-ZS games. In particular, ZS games have been widely applied in many fields ^{[30–33]}. The object of the ZS game is to derive the Nash equilibrium of nonliner systems, which makes the cost function optimized. In ^{[31]}, the finite-horizon H-infinity state estimator design was studied for periodic neural networks over multiple fading channels. The noncooperative control problem was formulated as a two-player ZS game in ^{[32]}. In ^{[33]}, Wang ^{[34]}, Zhang ^{[35]}, a general bounded ^{[36]}, Hou

As can be seen from the above, there are few studies that combine the DTC problem with the ZS game problem. It is necessary to take the related discounted cost function into account for the DTC system, which can transform the DTC problem into an optimal control problem with disturbances. In practice, the existence of disturbances will make an unpredictable impact on the plant. Hence, it is of vital importance to consider the stability of the DTC system. In the experimental simulation, it is a challenge to achieve the goal of effective online weight training, which is implemented under the tracking control law and the disturbance control law. Consequently, in this paper, we put forward a novel method in view of ADP to resolve the DTC problem with external disturbances for continuous-time (CT) nonlinear systems. More importantly, for the sake of overcoming the difficulty of selecting initial admissible control policies, an additional term is added during the weight updating process. Remarkably, in this paper, we introduce the discount factor for maximizing and minimizing the corresponding cost function.

The contributions of this paper are as follows: First, considering the disturbance input in the DTC system, the strategy feasibility and the system stability are discussed through theoretical proofs. It is worth noting that the discount factor is introduced to the cost function. Moreover, in the process of online weight training, we can make the DTC system reach a stable state without selecting the initial admissible control law. Additionally, we present the experimental process of the spring-mass-damper system. Besides, we derive the desired tracking error curves as well as control strategy curves, which demonstrates that they are uniformly ultimately bounded (UUB).

The whole paper is divided into six sections. The first section is the introduction of relevant background knowledges of the research content. The second section is the problem statement of basic problems about the two person ZS game and the DTC strategy. In the third section, we design the decentralized tracking controller by using the optimal control method through solving the HJI equations. Meanwhile, the relevant lemma and theorem are given to validate the establishment of the DTC strategy. In the fourth section, the design method in accordance with adaptive critic is elaborated. Most importantly, an improved critic learning rule is implemented via critic networks. In the fifth section, the practicability of this method is validated by an interconnected spring-mass-damper system. Finally, the sixth section displays conclusions and summarizes overall research content of the whole paper.

Consider a CT nonlinear interconnected system with disturbances, which is composed of

where

where

In this paper, considering the nonlinear system Equation (1), a reference system is introduced as follows:

where

Noticing

where

We aim to design a pair of decentralized control policies

In this section, the optimal DTC strategy of the ATIS with the disturbance rejection is elaborated. It is addressed by solving the HJI equation with a discounted cost function. Then, we consider the nominal part of the augmented system Equation (5) as

We assume that

where

If Equation (11) is continuously differentiable, the nonlinear Lyapunov equation is the infinitely small form of Equation (11). The Lyapunov equation is as follows:

Define the Hamiltonian of the ith ATIS for the optimization problem as

To acquire the saddle point solution

Then, the optimal cost function

Due to the saddle point solution

Substituting the optimal tracking control strategy Equation (16) into Equation (15), the HJI equation for the

In the following, we present the DTC strategy by adding the feedback gain to the interconnected system Equation (5). Herein, the following lemma is given by

can ensure the

Substituting Equations (18) and (19) into Equation (20), we can rewrite it as

Observing Equation (21), we can obtain that

where

Considering Equation (2), the mentioned inequality

Herein, in order to transform Equation (24) to the compact form, we denote

Therefore, we introduce a 2

Next, Equation (24) can be transformed to the following compact form:

According to Equation (29), it can be concluded that when

Obviously, the key point of designing the DTC strategy is to obtain the optimal controller of the ATIS based on Theorem 1. Next, for the sake of getting hold of optimal controllers for the

In this section, we show the process of finding the approximate optimal solution by employing the ADP method based on neural networks. The critic networks have the capability of approximating nonlinear mapping, and the approximate cost function can be derived for the DTC system. Hence,

where

Considering Equation (16), the optimal control policy for the

Utilizing Equations (31) and (32), the Hamiltonian associated with the

where

where

Based on Equation (35), we obtain the estimated value of

Considering Equations (34-36), the approximate Hamiltonian is expressed as

Then, we obtain an error function of the Hamiltonian, which is denoted as

where

where

Usually, in the traditional weight training process, it is often necessary to select the appropriate initial weight vector for effective training. To eliminate the initial admissible control law, an improved critic learning rule is presented in the following.

Herein, an additional Lyapunov function is introduced for the purpose of improving the critic learning mechanism. Then, the following rational assumption is given.

In other words, there exists a positive definite matrix

where

When the condition occurs, that is,

Thus, we describe the improved learning rule as

where

It is found that when the derivative of

Control structure of the ATIS. ATIS: augmented tracking isolated subsystem。

In accordance to

In this section, we will introduce the common mechanical vibration system, that is, the spring-mass-damper system. The structural diagram of the mechanical system is shown in

Simple diagram of the interconnected mass–spring–damper system.

In addition, let

For the object

and

where

where

and

Based on the online ADP algorithm, two critic networks are constructed as follows:

and

During the online learning process, we take basic learning rates and additional learning rates as

Herein, two probing noises are added within the beginning 400 steps to keep the persistence of excitation condition of the ATIS. The weight convergence curves are shown in

Weights convergence process of the critic network 1 and the critic network 2.

Next, in order to make the system achieve the purpose of the optimal tracking, feedback gains are selected as

Tracking control inputs and disturbance inputs for subsystem 1 and subsystem 2.

Tracking error trajectories for subsystem 1 and subsystem 2.

In this paper, the optimal DTC strategy for CT nonlinear large-scale systems with external disturbances is proposed by employing the ADP algorithm. The approximate optimal control law of the ATISs can achieve the trajectory tracking goal. Then, the establishment of the DTC strategy is derived by adding the appropriate feedback gain, whose feasibility has been proved via the Lyapunov theory. Note that all the above-mentioned results are investigated by considering a cost function with the discount. Then, only a series of single critic networks are employed to solve HJI equations of

Made significant contributions to the conception and experiments: Fan W, Wang D

Made significant contributions to the writing: Fan W, Wang D

Made substantial contributions to the revision and translation: Liu A, Wang D

Not applicable

This work was supported in part by the National Natural Science Foundation of China (No. 62222301; No. 61890930-5 and No. 62021003); in part by the National Key Research and Development Program of China (No. 2021ZD0112302; No. 2021ZD0112301 and No. 2018YFC1900800-5); and in part by the Beijing Natural Science Foundation (No. JQ19013).

All authors declared that there are no conflicts of interest.

Not applicable.

Not applicable.

© The Author(s) 2023.