Page 65 - Read Online
P. 65
Su et al. Intell Robot 2022;2(3):24474 I http://dx.doi.org/10.20517/ir.2022.17 Page 258
6. APPLICATIONS OF FAIR MACHINE LEARNING
This section enumerates different domains of machine learning and the work that has been produced by each
domain to combat discrimination in their methods.
6.1. Data missing
One major challenge for fairness-enhancing algorithms is to deal with the biases inherent in the dataset that is
caused by missing data. Selection biases are due to the distribution of collected data, which cannot reflect the
real characteristics of disadvantaged groups. Martínez-Plumed et al. [63] learned that selection bias is mainly
caused by individuals in disadvantaged groups being reluctant to disclose information, e.g., people with high
incomes are more willing to share their earnings than people with low incomes, which results in bias inference
that training in the training institution helps to raise earnings. To address this problem, Bareinboim et al. [64]
and Spirtes et al. [65] studied how to deal with missing data and repair datasets that contain selection biases by
causal reasoning, in order to improve fairness.
On the other hand, the collected data represent only one side of the reality, that is, these data do not contain
any information about the population who were not selected. Biases may arise that decide which data are
contained or not contained in the datasets. For example, there is a dataset that records the information of
individuals whose loans were approved and the information about their ability to repay their loans. Although
the automatic decision system that satisfies certain fairness requirements is constrained based on this dataset
to predict whether they repay their loan on time, such a predictor may be discriminatory when it is used
to assess the credit score of further applicants, since populations not approved for loans are not sufficiently
representative in the training data. Goel et al. [66] used the causal graph-based framework to model the causal
process of possible missing data for different settings by which different types of decisions are made in the past,
and proved some data distributions can be inferred from incomplete available data based on the causal graph.
Although the practical scenarios they discussed are not exhaustive, their work shows that the causal structure
can be used for determining the recoverability of quantities of interest in any new scenario.
A promising solution for dealing with missing data can be found in causality-based methods. We see that
causality can provide tools to improve fairness when the dataset suffers from discrimination caused by missing
data.
6.2. Fair recommender Systems
Recommenders are recognized as the most effective way to alleviate information overloading. Nowadays, rec-
ommender systems have been widely used in variable applications, such as ecommerce platforms, advertise-
ments, news articles, jobs , etc. They are not only used to analyze user behavior to infer users’ preferences so
as to provide them with personalized recommendations, but they also benefit content providers with more
potential of making profits. Unfortunately, there exist fairness issues in recommender systems [67] , which are
challenging to handle and may deteriorate the effectiveness of the recommendation. The discrimination em-
bedded in the recommender systems is mainly caused by the following aspects. User behaviors in terms of the
exposed items make the observational data confounded by the exposure mechanism of recommenders and the
preference of the users. Another major cause of discrimination in recommender systems is that disadvantage
items reflected in the observational data are not representative. That is to say, some items may be more popular
than others and thus receive more user behavior. As a result, recommender systems tend to expose users to
these popular items, which results in discrimination towards unpopular items and leads to the systems not
providing sufficient opportunities for minority items. Finally, one characteristic of recommender systems is
the feedback loop. That is, the systems exposes to the user for determining the user behavior, which is circled
back as the training data for the recommender systems. Such a feedback loop not only creates biases but also
intensifies biases over time, resulting in “the rich get richer” Matthew effect.