Page 43 - Read Online
P. 43

Qi et al. Intell Robot 2021;1(1):18-57  I http://dx.doi.org/10.20517/ir.2021.02        Page 38





















                                  Figure 12. An example of vertical federated reinforcement learning architecture.


                • Step 1: Initialization is performed for all agent models.
                • Step2: Agentsobtainstatesfromtheenvironment. Fordecision-orientedagents, actionsareobtainedbased
                  on the local models, and feedbacks are obtained through interactions with the environment, i.e., the states
                  of the next time step and rewards. The data tuple of state-action-reward-state (SARS) is used to train the
                  local models.
                • Step 3: All agents calculate the mid-products of the local models and then transmit the encrypted mid-
                  products to the federated model.
                • Step 4: The federated model performs the aggregation calculation for mid-products and trains the federated
                  model based on the aggregation results.
                • Step 5: Federated model encrypts model parameters such as weight and gradient and passes them back to
                  other agents.
                • Step 6: All agents update their local models based on the received encrypted parameters.

               As an example of VFRL, consider a microgrid (MG) system including household users, the power company,
               and the photovoltaic (PV) management company as the agents. All the agents observe the same MG environ-
               ment while their local state spaces are quite different. The global states of the MG system generally consist of
               several dimensions/features, i.e., state-of-charge (SOC) of the batteries, load consumption of the household
               users, power generation from PV, etc. The household agents can obtain the SOC of their own batteries and
               their own load consumption, the power company can know the load consumption of all the users, and PV
               management company can know the power generation of PV. As to the action, the power company needs
               to make decisions on the power dispatch of the diesel generators (DG), and the household users can make
               decisions to manage their electrical utilities with demand response. Finally, the power company can observe
               rewards such as the cost of DG power generation, the balance between power generation and consumption,
               and the household users can observe rewards such as their electricity bill that is related to their power con-
               sumption. In order to learn the optimal policies, these agents need to communicate with each other to share
               their observations. However, PV managers do not want to expose their data to other companies, and house-
               hold users also want to keep their consumption data private. In this way, VFRL is suitable to achieve this goal
               and can help improve policy decisions without exposing specific data.


               Compared with HFRL, there are currently few works on VFRL. Zhuo et al. [65]  present the federated deep
               reinforcement learning (FedRL) framework. The purpose of this paper is to solve the challenge where the
               feature space of states is small and the training data are limited. Transfer learning approaches in DRL are
               also solutions for this case. However, when considering the privacy-aware applications, directly transferring
               data or models should not be used. Hence, FedRL combines the advantage of FL with RL, which is suitable
               for the case when agents need to consider their privacy. FedRL framework assumes agents cannot share their
               partial observations of the environment and some agents are unable to receive rewards. It builds a shared value
   38   39   40   41   42   43   44   45   46   47   48