Page 31 - Read Online

P. 31

Qi et al. Intell Robot 2021;1(1):18-57 I http://dx.doi.org/10.20517/ir.2021.02 Page 26

Features

Training Data

Labels Labels

Federated Transfer Learning
New Features

Labels
Features

Figure 5. Illustration of federated transfer learning.

For example, a bank and an e-commerce company in two different countries want to build a shared ML model
for user risk assessment. In light of geographical restrictions, the user groups of these two organizations have
limited overlap. Due to the fact that businesses are different, only a small number of data features are the same.
It is important in this case to introduce FTL to solve the problem of small unilateral data and fewer sample
labels, and improve the model performance.

3. REINFORCEMENT LEARNING
3.1. Reinforcement learning definition and basics
Generally, the field of ML includes supervised learning, unsupervised learning, RL, etc [17] . While supervised
and unsupervised learning attempt to make the agent copy the data set, i.e., learning from the pre-provided
samples, RL is to make the agent gradually stronger in the interaction with the environment, i.e., generating
samples to learn by itself [18] . RL is a very hot research direction in the field of ML in recent years, which has
made great progress in many applications, such as IoT [19–22] , autonomous driving [23,24] , and game design [25] .
For example, the AlphaGo program developed by DeepMind is a good example to reflect the thinking of
RL [26] . The agent gradually accumulates the intelligent judgment on the sub-environment of each move by
playing game by game with different opponents, so as to continuously improve its level.

TheRLproblemcanbedefinedasamodeloftheagent-environmentinteraction,whichisrepresentedinFigure
6. The basic model of RL contains several important concepts, i.e.,

• Environment and agent: Agents are a part of a RL model that exists in an external environment, such as the
player in the environment of chess. Agents can improve their behavior by interacting with the environment.
Specifically, they take a series of actions to the environment through a set of policies and expect to get a
high payoff or achieve a certain goal.
• Time step: The whole process of RL can be discretized into different time steps. At every time step, the
environment and the agent interact accordingly.
• State: The state reflects agents’ observations of the environment. When agents take action, the state will
also change. In other words, the environment will move to the next state.
• Actions: Agents can assess the environment, make decisions and finally take certain actions. These actions
are imposed on the environment.
• Reward: After receiving the action of the agent, the environment will give the agent the state of the current

26 27 28 29 30 31 32 33 34 35 36