Page 26 - Read Online

P. 26

Page 21 Qi et al. Intell Robot 2021;1(1):18-57 I http://dx.doi.org/10.20517/ir.2021.02

sure of the federated model M is denoted as V , including accuracy, recall, and F1-score, etc, which
should be a good approximation of the performance of the expected model M , i.e., V . In order to
quantify differences in performance, let be a non-negative real number and define the federated learning
model M has performance loss if

|V − V | < .

Specifically, the FL model hold by each party is basically the same as the ML model, and it also includes a set of
[10] . A training sample typically
parameters which is learned based on the respective training dataset D
contains both the input of FL model and the expected output. For example, in the case of image recognition,
theinputisthepixeloftheimage,andtheexpectedoutputisthecorrectlabel. Thelearningprocessisfacilitated
by defining a loss function on parameter vector for every data sample . The loss function represents the
error of the model in relation to the training data. For each dataset D at party F , the loss function on the
collection of training samples can be defined as follow [11] ,

1 ∑
( ) = ( ),
|D |
∈D

where ( ) denotes the loss function of the sample with the given model parameter vector and | · |
represents the size of the set. In FL, it is important to define the global loss function since multiple parties are
jointly training a global statistical model without sharing a dataset. The common global loss function on all
the distributed datasets is given by,

∑
( ) = ( ),
=1

where indicates the relative impact of each party on the global model. In addition, > 0 and ∑ = 1.
=1
This term can be flexibly defined to improve training efficiency. The natural setting is averaging between
parties, i.e., = 1/ . The goal of the learning problem is to find the optimal parameter that minimizes the
global loss function ( ). It is presented in formula form,

= arg min ( ) .
∗

Considering that FL is designed to adapt to various scenarios, the objective function may be appropriate de-
pending on the application. However, a closed-form solution is almost impossible to find with most FL models
due to their inherent complexity. A canonical federated averaging algorithm (FedAvg) based on gradient-
descent techniques is presented in the study from McMahan et al. [12] , which is widely used in FL systems. In
general, the coordinator has the initial FL model and is responsible for aggregation. Distributed participants
know the optimizer settings and can upload information that does not affect privacy. The specific architecture
of FL will be discussed in the next subsection. Each participant uses their local data to perform one step (or
multiple steps) of gradient descent on the current model parameter ¯ ( ) according to the following formula,

∀ , ( + 1) = ¯ ( ) − ∇ ( ¯ ( )) ,

where denotes a fixed learning rate of each gradient descent. After receiving the local parameters from
participants, the central coordinator updates the global model using a weighted average, i.e.,

∑
¯ ( + 1) = ( + 1),

21 22 23 24 25 26 27 28 29 30 31