Page 43 - Read Online

P. 43

Qi et al. Intell Robot 2021;1(1):18-57 I http://dx.doi.org/10.20517/ir.2021.02 Page 38

Figure 12. An example of vertical federated reinforcement learning architecture.

• Step 1: Initialization is performed for all agent models.
• Step2: Agentsobtainstatesfromtheenvironment. Fordecision-orientedagents, actionsareobtainedbased
on the local models, and feedbacks are obtained through interactions with the environment, i.e., the states
of the next time step and rewards. The data tuple of state-action-reward-state (SARS) is used to train the
local models.
• Step 3: All agents calculate the mid-products of the local models and then transmit the encrypted mid-
products to the federated model.
• Step 4: The federated model performs the aggregation calculation for mid-products and trains the federated
model based on the aggregation results.
• Step 5: Federated model encrypts model parameters such as weight and gradient and passes them back to
other agents.
• Step 6: All agents update their local models based on the received encrypted parameters.

As an example of VFRL, consider a microgrid (MG) system including household users, the power company,
and the photovoltaic (PV) management company as the agents. All the agents observe the same MG environ-
ment while their local state spaces are quite different. The global states of the MG system generally consist of
several dimensions/features, i.e., state-of-charge (SOC) of the batteries, load consumption of the household
users, power generation from PV, etc. The household agents can obtain the SOC of their own batteries and
their own load consumption, the power company can know the load consumption of all the users, and PV
management company can know the power generation of PV. As to the action, the power company needs
to make decisions on the power dispatch of the diesel generators (DG), and the household users can make
decisions to manage their electrical utilities with demand response. Finally, the power company can observe
rewards such as the cost of DG power generation, the balance between power generation and consumption,
and the household users can observe rewards such as their electricity bill that is related to their power con-
sumption. In order to learn the optimal policies, these agents need to communicate with each other to share
their observations. However, PV managers do not want to expose their data to other companies, and house-
hold users also want to keep their consumption data private. In this way, VFRL is suitable to achieve this goal
and can help improve policy decisions without exposing specific data.

Compared with HFRL, there are currently few works on VFRL. Zhuo et al. [65] present the federated deep
reinforcement learning (FedRL) framework. The purpose of this paper is to solve the challenge where the
feature space of states is small and the training data are limited. Transfer learning approaches in DRL are
also solutions for this case. However, when considering the privacy-aware applications, directly transferring
data or models should not be used. Hence, FedRL combines the advantage of FL with RL, which is suitable
for the case when agents need to consider their privacy. FedRL framework assumes agents cannot share their
partial observations of the environment and some agents are unable to receive rewards. It builds a shared value

38 39 40 41 42 43 44 45 46 47 48