Page 29 - Read Online
P. 29
Qi et al. Intell Robot 2021;1(1):18-57 I http://dx.doi.org/10.20517/ir.2021.02 Page 24
Features
Training Data
Overlapping
Sample IDs
Labels Dataset A Aligned Features Labels
Dataset B
Horizontal Federated Learning
Features
Figure 3. Illustration of horizontal federated learning.
2.3. Categories of federated learning
Based on the way data is partitioned within a feature and sample space, FL may be classified as HFL, VFL, or
[8]
federated transfer learning (FTL) . In Figure 3, Figure 4, and Figure 5, these three federated learning cate-
gories for a two-party scenario are illustrated. In order to define each category more clearly, some parameters
are formalized. We suppose that the -th participant has its own dataset D . The dataset includes three types
of data, i.e., the feature space X , the label space Y and the sample ID space I . In particular, the feature space
X is a high-dimensional abstraction of the variables within each pattern sample. Various features are used
to characterize data held by the participant. All categories of association between input and task target are
collected in the label space Y . The sample ID space I is added in consideration of actual application require-
ments. The identification can facilitate the discovery of possible connections among different features of the
same individual.
HFL indicates the case in which participants have their dataset with a small sample overlap, while most of
the data features are aligned. The word ”horizontal” is derived from the term ”horizontal partition”. This is
similar to the situation where data is horizontally partitioned inside the traditional tabular view of a database.
As shown in Figure 3, the training data of two participants with the aligned features is horizontally partitioned
for HFL. A cuboid with a red border represents the training data required in learning. Especially, a row corre-
sponds to complete data features collected from a sampling ID. Columns correspond to different sampling IDs.
The overlapping part means there can be more than one participant sampling the same ID. In addition, HFL is
also known as feature-aligned FL, sample-partitioned FL, or example-partitioned FL. Formally, the conditions
for HFL can be summarized as
X = X , Y = Y , I ≠ I , ∀D , D , ≠ ,
where D and D denote the datasets of participant and participant respectively. In both datasets, the
feature space X and label space Y are assumed to be the same, but the sampling ID space I is assumed to
be different. The objective of HFL is to increase the amount of data with similar features, while keeping the
original data from being transmitted, thus improving the performance of the training model. Participants can
still perform feature extraction and classification if new samples appear. HFL can be applied in various fields
because it benefits from privacy protection and experience sharing [15] . For example, regional hospitals may
receive different patients, and the clinical manifestations of patients with the same disease are similar. It is
imperative to protect the patient’s privacy, so data about patients cannot be shared. HFL provides a good way
to jointly build a ML model for identifying diseases between hospitals.
VFL refers to the case where different participants with various targets usually have datasets that have different