Page 29 - Read Online
P. 29

Qi et al. Intell Robot 2021;1(1):18-57  I http://dx.doi.org/10.20517/ir.2021.02       Page 24


                                                            Features

                                                              Training Data
                                            Overlapping
                                            Sample IDs



                                     Labels  Dataset A  Aligned Features            Labels

                                                                          Dataset B
                                            Horizontal Federated Learning


                                                        Features

                                           Figure 3. Illustration of horizontal federated learning.


               2.3. Categories of federated learning
               Based on the way data is partitioned within a feature and sample space, FL may be classified as HFL, VFL, or
                                           [8]
               federated transfer learning (FTL) . In Figure 3, Figure 4, and Figure 5, these three federated learning cate-
               gories for a two-party scenario are illustrated. In order to define each category more clearly, some parameters
               are formalized. We suppose that the   -th participant has its own dataset D   . The dataset includes three types
               of data, i.e., the feature space X   , the label space Y    and the sample ID space I   . In particular, the feature space
               X    is a high-dimensional abstraction of the variables within each pattern sample. Various features are used
               to characterize data held by the participant. All categories of association between input and task target are
               collected in the label space Y   . The sample ID space I    is added in consideration of actual application require-
               ments. The identification can facilitate the discovery of possible connections among different features of the
               same individual.

               HFL indicates the case in which participants have their dataset with a small sample overlap, while most of
               the data features are aligned. The word ”horizontal” is derived from the term ”horizontal partition”. This is
               similar to the situation where data is horizontally partitioned inside the traditional tabular view of a database.
               As shown in Figure 3, the training data of two participants with the aligned features is horizontally partitioned
               for HFL. A cuboid with a red border represents the training data required in learning. Especially, a row corre-
               sponds to complete data features collected from a sampling ID. Columns correspond to different sampling IDs.
               The overlapping part means there can be more than one participant sampling the same ID. In addition, HFL is
               also known as feature-aligned FL, sample-partitioned FL, or example-partitioned FL. Formally, the conditions
               for HFL can be summarized as

                                            X    = X    , Y    = Y    , I    ≠ I    , ∀D    , D    ,    ≠   ,
               where D    and D    denote the datasets of participant    and participant    respectively. In both datasets, the
               feature space X and label space Y are assumed to be the same, but the sampling ID space I is assumed to
               be different. The objective of HFL is to increase the amount of data with similar features, while keeping the
               original data from being transmitted, thus improving the performance of the training model. Participants can
               still perform feature extraction and classification if new samples appear. HFL can be applied in various fields
               because it benefits from privacy protection and experience sharing [15] . For example, regional hospitals may
               receive different patients, and the clinical manifestations of patients with the same disease are similar. It is
               imperative to protect the patient’s privacy, so data about patients cannot be shared. HFL provides a good way
               to jointly build a ML model for identifying diseases between hospitals.

               VFL refers to the case where different participants with various targets usually have datasets that have different
   24   25   26   27   28   29   30   31   32   33   34