Page 37 - Read Online

P. 37

Qi et al. Intell Robot 2021;1(1):18-57 I http://dx.doi.org/10.20517/ir.2021.02 Page 32

Observed
Agent A
Environment A
Environment A
Local Model A Local Model B Environment B Agent B State Action Agent A Reward State

Environment C
Local Model C
Observed
Reward
Action
s t+1
Local Model X Agent C Action Reward State Vertical Global Environment Agent N Environment B
Environment N
Agent B
r t+1
Train
s t
a t
r t
Agent N Observed Environment N
Horizontal Federated Reinforcement Learning Vertical Federated Reinforcement Learning

Figure 8. Comparison of horizontal federated reinforcement learning and vertical federated reinforcement learning.

In order to facilitate understanding and maintain consistency with FL, FRL is divided into two categories
[7]
depending on environment partition , i.e., HFRL and VFRL. Figure 8 gives the comparison between HFRL
and VFRL. In HFRL, the environment that each agent interacts with is independent of the others, while the
state space and action space of different agents are aligned to solve similar problems. The action of each agent
onlyaffectsitsownenvironmentandresultsincorrespondingrewards. Asanagentcanhardlyexploreallstates
of its environment, multiple agents interacting with their own copy of the environment can accelerate training
and improve model performance by sharing experience. Therefore, horizontal agents use server-client model
or peer-to-peer model to transmit and exchange the gradients or parameters of their policy models (actors)
and/or value function models (critics). In VFRL, multiple agents interact with the same global environment,
but each can only observe limited state information in the scope of its view. Agents can perform different
actions depending on the observed environment and receive local reward or even no reward. Based on the
actual scenario, there may be some observation overlap between agents. In addition, all agents’ actions affect
theglobalenvironmentdynamicsandtotalrewards. Asopposedtothehorizontalarrangementofindependent
environments in HFRL, the vertical arrangement of observations in VFRL poses a more complex problem and
is less studied in the existing literature.

4.2. Horizontal federated reinforcement learning
HFRL can be applied in scenarios in which the agents may be distributed geographically, but they face similar
decision-making tasks and have very little interaction with each other in the observed environments. Each
participating agent independently executes decision-making actions based on the current state of environment
andobtainspositiveornegativerewardsforevaluation. Sincetheenvironmentexplored byoneagentis limited
and each agent is unwilling to share the collected data, multiple agents try to train the policy and/or value
model together to improve model performance and increase learning efficiency. The purpose of HFRL is to
alleviate the sample-efficiency problem in RL, and help each agent quickly obtain the optimal policy which can
maximize the expected cumulative reward for specific tasks, while considering privacy protection.

In the HFRL problem, the environment, state space, and action space can replace the data set, feature space,

and label space of basic FL. More formally, we assume that agents {F } =1 can observe the environment

{E } within their field of vision. G denotes the collection of all environments. The environment E where
=1
the -th agent is located has a similar model, i.e., state transition probability and reward function compared to
other environments. Note that the environment E is independent of the other environments, in that the state

32 33 34 35 36 37 38 39 40 41 42