Page 39 - Read Online

P. 39

Qi et al. Intell Robot 2021;1(1):18-57 I http://dx.doi.org/10.20517/ir.2021.02 Page 34

5 Global Model
Send
4 Aggregation
Updated Model Coordinator
1
Initialization
Submit
3 Submit
6 2
Update Local Model A Train
Exploration / Action Reward Local Model B
Exploitation
State S(t) State S(t+1)
Agent A Observation Agent B

Environment A Environment B

Figure 10. An example of horizontal federated reinforcement learning architecture.

For a better understanding of HFRL, Figure 10 shows an example of HFRL architecture using the server-client
model. The coordinator is responsible for establishing encrypted communication with agents and implement-
ing aggregation of shared models. The multiple parallel agents may be composed of heterogeneous equipment
(e.g., IoT devices, smart phone and computers, etc.) and distributed geographically. It is worth noting that
there is no specific requirement for the number of agents, and agents are free to choose to join or leave. The
basic procedure for conducting HFRL can be summarized as follows.

• Step 1: The initialization/join process can be divided into two cases, one is when the agent has no model
locally,andtheotheriswhentheagenthasamodellocally. Forthefirstcase,theagentcandirectlydownload
the shared global model from a coordinator. For the second case, the agent needs to confirm the model
type and parameters with the central coordinator.
• Step 2: Each agent independently observes the state of the environment and determines the private strategy
based on the local model. The selected action is evaluated by the next state and received reward. All agents
train respective models in state-action-reward-state (SARS) cycles.
• Step 3: Local model parameters are encrypted and transmitted to the coordinator. Agents may submit local
models at any time as long as the trigger conditions are met.
• Step 4: The coordinator conducts the specific aggregation algorithm to evolve the global federated model.
Actually, there is no need to wait for submissions from all agents, and appropriate aggregation conditions
can be formulated depending on communication resources.
• Step 5: The coordinator sends back the aggregated model to the agents.
• Step 6: The agents improve their respective models by fusing the federated model.

Following the above architecture and process, applications suitable for HFRL should meet the following char-
acteristics. First, agents have similar tasks to make decisions under dynamic environments. Different from
the FL setting, the goal of the HFRL-based application is to find the optimal strategy to maximize reward in
the future. For the agent to accomplish the task requirements, the optimal strategy directs them to perform
certain actions, such as control, scheduling, navigation, etc. Second, distributed agents maintain independent
observations. Eachagentcanonlyobservetheenvironmentwithinitsfieldofview, butdoesnotensurethatthe
collected data follows the same distribution. Third, it is important to protect the data that each agent collects

34 35 36 37 38 39 40 41 42 43 44