Page 24 - Read Online

P. 24

Page 19 Qi et al. Intell Robot 2021;1(1):18-57 I http://dx.doi.org/10.20517/ir.2021.02

difficult to resolve, especially when privacy is concerned. Federated learning (FL), in these cases, has attracted
increasing interest among ML researchers. Technically, the FL is a decentralized collaborative approach that
allows multiple partners to train data respectively and build a shared model while maintaining privacy. With
itsinnovativelearningarchitectureandconcepts, FL providessaferexperienceexchangeservicesandenhances
capabilities of ML in distributed scenarios.

In ML, reinforcement learning (RL) is one of the branches that focuses on how individuals, i.e., agents, interact
with their environment and maximize some portion of the cumulative reward. The process allows agents to
learn to improve their behavior in a trial and error manner. Through a set of policies, they take actions to
explore the environment and expect to be rewarded. Research on RL has been hot in recent years, and it has
shown great potential in various applications, including games, robotics, communication, and so on.

However, there are still many problems in the implementation of RL in practical scenarios. For example,
considering that in the case of large action space and state space, the performance of agents is vulnerable to
collected samples since it is nearly impossible to explore all sampling spaces. In addition, many RL algorithms
have the problem of learning efficiency caused by low sample efficiency. Therefore, through information ex-
change between agents, learning speed can be greatly accelerated. Although distributed RL and parallel RL
algorithms [1–3] can be used to address the above problems, they usually need to collect all the data, parame-
ters, or gradients from each agent in a central server for model training. However, one of the important issues
is that some tasks need to prevent agent information leakage and protect agent privacy during the application
of RL. Agents’ distrust of the central server and the risk of eavesdropping on the transmission of raw data has
become a major bottleneck for such RL applications. FL can not only complete information exchange while
avoiding privacy disclosure, but also adapt various agents to their different environments. Another problem
of RL is how to bridge the simulation-reality gap. Many RL algorithms require pre-training in simulated en-
vironments as a prerequisite for application deployment, but one problem is that the simulated environments
cannot accurately reflect the environments of the real world. FL can aggregate information from both environ-
ments and thus bridge the gap between them. Finally, in some cases, only partial features can be observed by
each agent in RL. However, these features, no matter observations or rewards, are not enough to obtain suffi-
cient information required to make decisions. At this time, FL makes it possible to integrate this information
through aggregation.

Thus, the above challenges give rise to the idea of federated reinforcement learning (FRL). As FRL can be con-
sidered as an integration of FL and RL under privacy protection, several elements of RL can be presented in FL
frameworks to deals with sequential decision-making tasks. For example, these three dimensions of sample,
feature and label in FL can be replaced by environment, state and action respectively in FRL. Since FL can be
divided into several categories according to the distribution characteristics of data, including horizontal fed-
erated learning (HFL) and vertical federated learning (VFL), we can similarly categorize FRL algorithms into
horizontal federated reinforcement learning (HFRL) and vertical federated reinforcement learning (VFRL).

ThoughafewsurveypapersonFL [4–6] havebeenpublished,tothebestofourknowledge,therearecurrentlyno
relevant survey papers focused on FRL. Due to the fact that FRL is a relatively new technique, most researchers
may be unfamiliar with it to some extent. We hope to identify achievements from current studies and serve as
a stepping stone to further research. In summary, this paper sheds light on the following aspects.

1. Systematic tutorial on FRL methodology. As a review focusing on FRL, this paper tries to explain the knowl-
edge about FRL to researchers systematically and in detail. The definition and categories of FRL are intro-
duced firstly, including system model, algorithm process, etc. In order to explain the framework of HFRL
and VFRL and the difference between them clearly, two specific cases are introduced, i.e., autonomous
driving and smart grid. Moreover, we comprehensively introduce the existing research on FRL’s algorithm

19 20 21 22 23 24 25 26 27 28 29