Page 46 - Read Online
P. 46

Page 41                             Qi et al. Intell Robot 2021;1(1):18-57  I http://dx.doi.org/10.20517/ir.2021.02


                  is because VFRL can consider some agents that cannot generate rewards into the system model, so as to
                  integrate their partial observation information of the environment based on FL while protecting privacy,
                  train a more robust RL agent, and further improve learning efficiency.


               4.4. Other types of FRL
               The above HFRL or VFRL algorithms borrow ideas from FL for federation between RL agents. Meanwhile,
               there are also some existing works on FRL that are less affected by FL. Hence, they do not belong to either
               HFRL or VFRL, but federation between agents is also implemented.

               The study from Hu et al. [86]  is a typical example, which proposes a reward shaping based general FRL algo-
               rithm, called federatedrewardshaping(FRS).Ituses rewardshapingtosharefederatedinformationtoimprove
               policy quality and training speed. FRS adopts the server-client architecture. The server includes the federated
               model, while each client completes its own tasks based on the local model. This algorithm can be combined
               with different kinds of RL algorithms. However, it should be noted that FRS focuses on reward shaping, this
               algorithm cannot be used when there is no reward in some agents in VFRL. In addition, FRS performs knowl-
               edge aggregation by sharing high-level information such as reward shaping value or embedding between client
               and server instead of sharing experience or policy directly. The convergence of FRS is also guaranteed since
               only minor changes are made during the learning process, which is the modification of the reward in the replay
               buffer.

               As another example, Anwar et al. [87]  achieves federation between agents by smoothing the average weight.
               This work analyzes the Multi-task FRL algorithms (MT-FedRL) with adversaries. Agents only interact and
               make observations in their environment, which can be featured by different MDPs. Different from HFRL, the
               state and action spaces do not need to be the same in these environments. The goal of MT-FedRL is to learn
               a unified policy, which is jointly optimized across all of the environments. MT-FedRL adopts policy gradient
               methods for RL. In other words, policy parameter is needed to learn the optimal policy. The server-client
               architecture is also applied and all agents should share their own information with a centralized server. The
               role of non-negative smoothing average weights is to achieve a consensus among the agents’ parameters. As a
               result, they can help to incorporate the knowledge from other agents as the process of federation.



               5. APPLICATIONS OF FRL
               In this section, we provide an extensive discussion of the application of FRL in a variety of tasks, such as edge
               computing, communications, control optimization, attack detection, etc. This section is aimed at enabling
               readers to understand the applicable scenarios and research status of FRL.


               5.1. FRL for edge computing
               In recent years, edge equipment, such as BSs and road side units (RSUs), has been equipped with increasingly
               advanced communication, computing and storage capabilities. As a result, edge computing is proposed to
               delegating more tasks to edge equipment in order to reduce the communication load and reduce the corre-
               sponding delay. However, the issue of privacy protection remains challenging since it may be untrustworthy
                                                                                      [4]
               for the data owner to hand off their private information to a third-party edge server . FRL offers a poten-
               tial solution for achieving privacy-protected intelligent edge computing, especially in decision-making tasks
               like caching and offloading. Additionally, the multi-layer processing architecture of edge computing is also
               suitable for implementing FRL through the server-client model. Therefore, many researchers have focused on
               applying FRL to edge computing.

               Thedistributeddataoflarge-scaleedgecomputingarchitecturemakesitpossibleforFRLtoprovidedistributed
               intelligent solutions to achieve resource optimization at the edge. For mobile edge networks, a potential FRL
   41   42   43   44   45   46   47   48   49   50   51