Page 38 - Read Online

P. 38

Page 33 Qi et al. Intell Robot 2021;1(1):18-57 I http://dx.doi.org/10.20517/ir.2021.02

Actions

States
Horizontal Federated
Reinforcement Learning
Agent N
States

Environment N
Agent B

Agent A
Environment B

Actions
Environment A

Environment

Figure 9. Illustration of horizontal federated reinforcement learning.

transition and reward model of E do not depend on the states and actions of the other environments. Each
agent F interacts with its own environment E to learn an optimal policy. Therefore, the conditions for HFRL
are presented as follows, i.e.,

S = S , A = A , E ≠ E , ∀ , ∈ {1,2,..., } , E , E ∈ G, ≠ ,

where S and S denote the similar state space encountered by the -th agent and -th agent, respectively. A
and A denotethesimilaractionspaceofthe -thagentand -thagent, respectivelyTheobservedenvironment
E and E aretwodifferentenvironmentsthatareassumedtobeindependentandideallyidenticallydistributed.

Figure9showstheHFRLingraphicform. Eachagentisrepresentedbyacuboid. Theaxesofthecuboiddenote
three dimensions of information, i.e., the environment, state space, and action space. We can intuitively see
that all environments are arranged horizontally, and multiple agents have aligned state and action spaces. In
other words, each agent explores independently in its respective environment, and needs to obtain optimal
strategies for similar tasks. In HFRL, agents share their experiences by exchanging masked models to enhance
sample efficiency and accelerate the learning process.

A typical example of HFRL is the autonomous driving system in IoV. As vehicles drive on roads throughout
the city and country, they can collect various environmental information and train the autonomous driving
models locally. Due to driving regulations, weather conditions, driving routes, and other factors, one vehicle
cannot be exposed to every possible situation in the environment. Moreover, the vehicles have basically the
same operations, including braking, acceleration, steering, etc. Therefore, vehicles driving on different roads,
differentcities, orevendifferentcountries could share theirlearned experiencewith each otherbyFRL without
revealing their driving data according to the premise of privacy protection. In this case, even if other vehicles
have never encountered a situation, they can still perform the best action by using the shared model. The
exploration of multiple vehicles together also creates an increased chance of learning rare cases to ensure the
reliability of the model.

33 34 35 36 37 38 39 40 41 42 43