Page 42 - Read Online

P. 42

Page 37 Qi et al. Intell Robot 2021;1(1):18-57 I http://dx.doi.org/10.20517/ir.2021.02

Actions
Vertical Federated
Reinforcement Learning Agent A
Actions A

Agent B
Actions B

Agent N
Actions N

States
Observed by N Observed by B Observed by A
Global
Environment

Figure 11. Illustration of vertical federated reinforcement learning.

short, the goal of VFRL is for agents interacting with the same environment to improve the performance of
their policies and the effectiveness in learning them by sharing experiences without compromising the privacy.

as agents in VFRL, which interact with a global environment E. The -th
More formally, we denote {F }
=1
agent F is located in the environment E = E, obtains the local partial observation O , and can perform the set
of actions A . Different from HFRL, the state/observation and action spaces of two agents F and F may be
not identical, but the aggregation of the state/observation spaces and action spaces of all the agents constitutes
the global state and action spaces of the global environment E. The conditions for VFRL can be defined as i.e.,

∪ ∪
O ≠ O , A ≠ A , E = E = E, O =S, A =A, ∀ , ∈ {1,2,..., } , ≠ ,
=1 =1

where S and A denote the global state space and action space of all participant agents respectively. It can be
seen that all the observations of the agents together constitute the global state space S of the environment
E. Besides, the environments E and E are the same environment E. In most cases, there is a great difference
between the observations of two agents F and F .

Figure 11 shows the architecture of VFRL. The dataset and feature space in VFL are converted to the envi-
ronment and state space respectively. VFL divides the dataset vertically according to the features of samples,
and VFRL divides agents based on the state spaces observed from the global environment. Generally speak-
ing, every agent has its local state which can be different from that of the other agents and the aggregation of
these local partial states corresponds to the entire environment state [65] . In addition, after interacting with the
environment, agents may generate their local actions which correspond to the labels in VFL.

TwotypesofagentscanbedefinedforVFRL,i.e.,decision-orientedagentsandsupport-orientedagents. Decision-

oriented agents {F } =1 can interact with the environment E based on their local state {S } =1 and action

{A } . Meanwhile, support-oriented agents {F } take no actions and receive no rewards but only the
=1 = +1

observations of the environment, i.e., their local states {S } = +1 . In general, the following six steps, as shown
in Figure 12, are the basic procedure for VFRL, i.e.,

37 38 39 40 41 42 43 44 45 46 47