Page 26 - Read Online
P. 26

Page 21                                                                   Qi et al. Intell Robot 2021;1(1):18-57  I http://dx.doi.org/10.20517/ir.2021.02


               sure of the federated model M        is denoted as V       , including accuracy, recall, and F1-score, etc, which
               should be a good approximation of the performance of the expected model M       , i.e., V       . In order to
               quantify differences in performance, let    be a non-negative real number and define the federated learning
               model M        has    performance loss if

                                                    |V        − V        | <   .


               Specifically, the FL model hold by each party is basically the same as the ML model, and it also includes a set of
                                                                              [10] . A training sample    typically
               parameters       which is learned based on the respective training dataset D   
               contains both the input of FL model and the expected output. For example, in the case of image recognition,
               theinputisthepixeloftheimage,andtheexpectedoutputisthecorrectlabel. Thelearningprocessisfacilitated
               by defining a loss function on parameter vector    for every data sample   . The loss function represents the
               error of the model in relation to the training data. For each dataset D    at party F   , the loss function on the
               collection of training samples can be defined as follow [11] ,



                                                           1  ∑
                                                        (  ) =          (  ),
                                                          |D    |
                                                                ∈D   

               where       (  ) denotes the loss function of the sample    with the given model parameter vector    and | · |
               represents the size of the set. In FL, it is important to define the global loss function since multiple parties are
               jointly training a global statistical model without sharing a dataset. The common global loss function on all
               the distributed datasets is given by,
                                                              
                                                           ∑
                                                         (  ) =              (  ),
                                                              =1

               where       indicates the relative impact of each party on the global model. In addition,       > 0 and  ∑           = 1.
                                                                                                    =1
               This term    can be flexibly defined to improve training efficiency. The natural setting is averaging between
               parties, i.e.,    = 1/  . The goal of the learning problem is to find the optimal parameter that minimizes the
               global loss function       (  ). It is presented in formula form,

                                                       = arg min       (  ) .
                                                     ∗
                                                             
               Considering that FL is designed to adapt to various scenarios, the objective function may be appropriate de-
               pending on the application. However, a closed-form solution is almost impossible to find with most FL models
               due to their inherent complexity. A canonical federated averaging algorithm (FedAvg) based on gradient-
               descent techniques is presented in the study from McMahan et al. [12] , which is widely used in FL systems. In
               general, the coordinator has the initial FL model and is responsible for aggregation. Distributed participants
               know the optimizer settings and can upload information that does not affect privacy. The specific architecture
               of FL will be discussed in the next subsection. Each participant uses their local data to perform one step (or
               multiple steps) of gradient descent on the current model parameter ¯   (  ) according to the following formula,

                                             ∀  ,       (   + 1) = ¯   (  ) −   ∇      ( ¯      (  )) ,


               where    denotes a fixed learning rate of each gradient descent. After receiving the local parameters from
               participants, the central coordinator updates the global model using a weighted average, i.e.,

                                                              
                                                           ∑       
                                                 ¯       (   + 1) =        (   + 1),
                                                                 
                                                             =1
   21   22   23   24   25   26   27   28   29   30   31