Page 50 - Read Online

P. 50

Page 45 Qi et al. Intell Robot 2021;1(1):18-57 I http://dx.doi.org/10.20517/ir.2021.02

Stackelberg game with multiple participants, including near devices (NDs), far devices (FDs) and relay devices
(RDs). Take into account the limited scope of the heterogeneous devices, the authors model this multi-agent
system as a POMDP. Furthermore, it is proved that MA-FRL is -strongly convex and -smooth and derives
its convergence speed in expectation.

Zhang et al. [102] pays attention to the challenges in cellular vehicle-to-everything (V2X) communication for
future vehicular applications. A joint optimization problem of selecting the transmission mode and allocating
the resources is presented. This paper proposes a decentralized DRL algorithm for maximizing the amount of
available vehicle-to-infrastructure capacity while meeting the latency and reliability requirements of vehicle-
to-vehicle (V2V) pairs. Considering limited local training data at vehicles, the federated learning algorithm
is conducted on a small timescale. On the other hand, the graph theory-based vehicle clustering algorithm is
conducted on a large timescale.

The development of communication technologies in extreme environments is important, including deep un-
derwater exploration. The architecture and philosophy of FRL are applied to smart ocean applications in the
study of Kwon [103] . To deal with the nonstationary environment and unreliable channels of underwater wire-
less networks, the authors propose a multi-agent DRL-based algorithm that can realize FL computation with
internet-of-underwater-things (IoUT) devices in the ocean environment. The cooperative model is trained by
MADDPG for cell association and resource allocation problems. As for downlink throughput, it is found that
the proposed MADDPG-based algorithm performed 80% and 41% better than the standard actor-critic and
DDPG algorithms, respectively.

5.3. FRL for control optimization
Reinforcement learning based control schemes are considered as one of the most effective ways to learn a
nonlinear control strategy in complex scenarios, such as robotics. Individual agent’s exploration of the envi-
ronment is limited by its own field of vision and usually needs a great deal of training to obtain the optimal
strategy. The FRL-based approach has emerged as an appealing way to realize control optimization without
exposing agent data or compromising privacy.

Automated control of robots is a typical example of control optimization problems. Liu et al. [57] discusses
robot navigation scenarios and focuses on how to make robots transfer their experience so that they can make
use of prior knowledge and quickly adapt to changing environments. As a solution, a cooperative learning
architecture, called LFRL, is proposed for navigation in cloud robotic systems. Under the FRL-based architec-
ture, the authors propose a corresponding knowledge fusion algorithm to upgrade the shared model deployed
on the cloud. In addition, the paper also discusses the problems and feasibility of applying transfer learning
algorithms to different tasks and network structures between the shared model and the local model.

FRL is combined with autonomous driving of robotic vehicles in the study of Liang et al. [104] . To reach rapid
training from a simulation environment to a real-world environment, Liang et al. [104] presents a federated
transfer reinforcement learning (FTRL) framework for knowledge extraction where all the vehicles make cor-
respondingactionswiththeknowledgelearnedbyothers. Theframeworkcanpotentiallybeusedtotrainmore
powerful tasks by pooling the resources of multiple entities without revealing raw data information in real-life
scenarios. To evaluate the feasibility of the proposed framework, authors perform real-life experiments on
steering control tasks for collision avoidance of autonomous driving robotic cars and it is demonstrated that
the framework has superior performance to the non-federated local training process. Note that the framework
can be considered an extension of HFRL, because the target tasks to be accomplished are highly-relative and
all observation data are pre-aligned.

FRL also appears as an attractive approach for enabling intelligent control of IoT devices without revealing

45 46 47 48 49 50 51 52 53 54 55