Page 38 - Read Online
P. 38

Page 33                                                                Qi et al. Intell Robot 2021;1(1):18-57  I http://dx.doi.org/10.20517/ir.2021.02



                                                             Actions

                                                                    States
                                       Horizontal Federated
                                       Reinforcement Learning
                                                        Agent N
                                                                                   States

                                                                            Environment N
                                                Agent B


                                         Agent A
                                                                    Environment B

                                  Actions
                                                            Environment A

                                    Environment

                                      Figure 9. Illustration of horizontal federated reinforcement learning.


               transition and reward model of E    do not depend on the states and actions of the other environments. Each
               agent F    interacts with its own environment E    to learn an optimal policy. Therefore, the conditions for HFRL
               are presented as follows, i.e.,




                                  S    = S    , A    = A    , E    ≠ E    , ∀  ,    ∈ {1,2,...,  } , E    , E    ∈ G,    ≠   ,




               where S    and S    denote the similar state space encountered by the   -th agent and   -th agent, respectively. A   
               and A    denotethesimilaractionspaceofthe   -thagentand   -thagent, respectivelyTheobservedenvironment
               E    and E    aretwodifferentenvironmentsthatareassumedtobeindependentandideallyidenticallydistributed.


               Figure9showstheHFRLingraphicform. Eachagentisrepresentedbyacuboid. Theaxesofthecuboiddenote
               three dimensions of information, i.e., the environment, state space, and action space. We can intuitively see
               that all environments are arranged horizontally, and multiple agents have aligned state and action spaces. In
               other words, each agent explores independently in its respective environment, and needs to obtain optimal
               strategies for similar tasks. In HFRL, agents share their experiences by exchanging masked models to enhance
               sample efficiency and accelerate the learning process.


               A typical example of HFRL is the autonomous driving system in IoV. As vehicles drive on roads throughout
               the city and country, they can collect various environmental information and train the autonomous driving
               models locally. Due to driving regulations, weather conditions, driving routes, and other factors, one vehicle
               cannot be exposed to every possible situation in the environment. Moreover, the vehicles have basically the
               same operations, including braking, acceleration, steering, etc. Therefore, vehicles driving on different roads,
               differentcities, orevendifferentcountries could share theirlearned experiencewith each otherbyFRL without
               revealing their driving data according to the premise of privacy protection. In this case, even if other vehicles
               have never encountered a situation, they can still perform the best action by using the shared model. The
               exploration of multiple vehicles together also creates an increased chance of learning rare cases to ensure the
               reliability of the model.
   33   34   35   36   37   38   39   40   41   42   43