Page 24 - Read Online
P. 24

Page 19                              Qi et al. Intell Robot 2021;1(1):18-57  I http://dx.doi.org/10.20517/ir.2021.02


               difficult to resolve, especially when privacy is concerned. Federated learning (FL), in these cases, has attracted
               increasing interest among ML researchers. Technically, the FL is a decentralized collaborative approach that
               allows multiple partners to train data respectively and build a shared model while maintaining privacy. With
               itsinnovativelearningarchitectureandconcepts, FL providessaferexperienceexchangeservicesandenhances
               capabilities of ML in distributed scenarios.


               In ML, reinforcement learning (RL) is one of the branches that focuses on how individuals, i.e., agents, interact
               with their environment and maximize some portion of the cumulative reward. The process allows agents to
               learn to improve their behavior in a trial and error manner. Through a set of policies, they take actions to
               explore the environment and expect to be rewarded. Research on RL has been hot in recent years, and it has
               shown great potential in various applications, including games, robotics, communication, and so on.


               However, there are still many problems in the implementation of RL in practical scenarios. For example,
               considering that in the case of large action space and state space, the performance of agents is vulnerable to
               collected samples since it is nearly impossible to explore all sampling spaces. In addition, many RL algorithms
               have the problem of learning efficiency caused by low sample efficiency. Therefore, through information ex-
               change between agents, learning speed can be greatly accelerated. Although distributed RL and parallel RL
               algorithms [1–3]  can be used to address the above problems, they usually need to collect all the data, parame-
               ters, or gradients from each agent in a central server for model training. However, one of the important issues
               is that some tasks need to prevent agent information leakage and protect agent privacy during the application
               of RL. Agents’ distrust of the central server and the risk of eavesdropping on the transmission of raw data has
               become a major bottleneck for such RL applications. FL can not only complete information exchange while
               avoiding privacy disclosure, but also adapt various agents to their different environments. Another problem
               of RL is how to bridge the simulation-reality gap. Many RL algorithms require pre-training in simulated en-
               vironments as a prerequisite for application deployment, but one problem is that the simulated environments
               cannot accurately reflect the environments of the real world. FL can aggregate information from both environ-
               ments and thus bridge the gap between them. Finally, in some cases, only partial features can be observed by
               each agent in RL. However, these features, no matter observations or rewards, are not enough to obtain suffi-
               cient information required to make decisions. At this time, FL makes it possible to integrate this information
               through aggregation.

               Thus, the above challenges give rise to the idea of federated reinforcement learning (FRL). As FRL can be con-
               sidered as an integration of FL and RL under privacy protection, several elements of RL can be presented in FL
               frameworks to deals with sequential decision-making tasks. For example, these three dimensions of sample,
               feature and label in FL can be replaced by environment, state and action respectively in FRL. Since FL can be
               divided into several categories according to the distribution characteristics of data, including horizontal fed-
               erated learning (HFL) and vertical federated learning (VFL), we can similarly categorize FRL algorithms into
               horizontal federated reinforcement learning (HFRL) and vertical federated reinforcement learning (VFRL).


               ThoughafewsurveypapersonFL  [4–6]  havebeenpublished,tothebestofourknowledge,therearecurrentlyno
               relevant survey papers focused on FRL. Due to the fact that FRL is a relatively new technique, most researchers
               may be unfamiliar with it to some extent. We hope to identify achievements from current studies and serve as
               a stepping stone to further research. In summary, this paper sheds light on the following aspects.

               1. Systematic tutorial on FRL methodology. As a review focusing on FRL, this paper tries to explain the knowl-
                  edge about FRL to researchers systematically and in detail. The definition and categories of FRL are intro-
                  duced firstly, including system model, algorithm process, etc. In order to explain the framework of HFRL
                  and VFRL and the difference between them clearly, two specific cases are introduced, i.e., autonomous
                  driving and smart grid. Moreover, we comprehensively introduce the existing research on FRL’s algorithm
   19   20   21   22   23   24   25   26   27   28   29