Page 50 - Read Online
P. 50

Page 45                                                                  Qi et al. Intell Robot 2021;1(1):18-57  I http://dx.doi.org/10.20517/ir.2021.02


               Stackelberg game with multiple participants, including near devices (NDs), far devices (FDs) and relay devices
               (RDs). Take into account the limited scope of the heterogeneous devices, the authors model this multi-agent
               system as a POMDP. Furthermore, it is proved that MA-FRL is   -strongly convex and   -smooth and derives
               its convergence speed in expectation.


               Zhang et al. [102]  pays attention to the challenges in cellular vehicle-to-everything (V2X) communication for
               future vehicular applications. A joint optimization problem of selecting the transmission mode and allocating
               the resources is presented. This paper proposes a decentralized DRL algorithm for maximizing the amount of
               available vehicle-to-infrastructure capacity while meeting the latency and reliability requirements of vehicle-
               to-vehicle (V2V) pairs. Considering limited local training data at vehicles, the federated learning algorithm
               is conducted on a small timescale. On the other hand, the graph theory-based vehicle clustering algorithm is
               conducted on a large timescale.


               The development of communication technologies in extreme environments is important, including deep un-
               derwater exploration. The architecture and philosophy of FRL are applied to smart ocean applications in the
               study of Kwon [103] . To deal with the nonstationary environment and unreliable channels of underwater wire-
               less networks, the authors propose a multi-agent DRL-based algorithm that can realize FL computation with
               internet-of-underwater-things (IoUT) devices in the ocean environment. The cooperative model is trained by
               MADDPG for cell association and resource allocation problems. As for downlink throughput, it is found that
               the proposed MADDPG-based algorithm performed 80% and 41% better than the standard actor-critic and
               DDPG algorithms, respectively.


               5.3. FRL for control optimization
               Reinforcement learning based control schemes are considered as one of the most effective ways to learn a
               nonlinear control strategy in complex scenarios, such as robotics. Individual agent’s exploration of the envi-
               ronment is limited by its own field of vision and usually needs a great deal of training to obtain the optimal
               strategy. The FRL-based approach has emerged as an appealing way to realize control optimization without
               exposing agent data or compromising privacy.

               Automated control of robots is a typical example of control optimization problems. Liu et al. [57]  discusses
               robot navigation scenarios and focuses on how to make robots transfer their experience so that they can make
               use of prior knowledge and quickly adapt to changing environments. As a solution, a cooperative learning
               architecture, called LFRL, is proposed for navigation in cloud robotic systems. Under the FRL-based architec-
               ture, the authors propose a corresponding knowledge fusion algorithm to upgrade the shared model deployed
               on the cloud. In addition, the paper also discusses the problems and feasibility of applying transfer learning
               algorithms to different tasks and network structures between the shared model and the local model.

               FRL is combined with autonomous driving of robotic vehicles in the study of Liang et al. [104] . To reach rapid
               training from a simulation environment to a real-world environment, Liang et al. [104]  presents a federated
               transfer reinforcement learning (FTRL) framework for knowledge extraction where all the vehicles make cor-
               respondingactionswiththeknowledgelearnedbyothers. Theframeworkcanpotentiallybeusedtotrainmore
               powerful tasks by pooling the resources of multiple entities without revealing raw data information in real-life
               scenarios. To evaluate the feasibility of the proposed framework, authors perform real-life experiments on
               steering control tasks for collision avoidance of autonomous driving robotic cars and it is demonstrated that
               the framework has superior performance to the non-federated local training process. Note that the framework
               can be considered an extension of HFRL, because the target tasks to be accomplished are highly-relative and
               all observation data are pre-aligned.


               FRL also appears as an attractive approach for enabling intelligent control of IoT devices without revealing
   45   46   47   48   49   50   51   52   53   54   55