Page 9 - Read Online

P. 9

Sathyan et al. Complex Eng Syst 2022;2:18 I http://dx.doi.org/10.20517/ces.2022.41 Page 5 of 12

Figure 1. Schematic of the DNN used for classification into benign and malignant. The network uses 30 features and has three hidden layers
(HL).

3.2. SHAP
SHAP is another methodology used for obtaining explanations for individual predictions. Additionally, SHAP
can provide additional insights into predictions made across a set of data points. SHAP is based on Shapely
values, a concept that is derived from game theory [16] . This is a game theoretic approach to explain any pre-
dictions made by a machine learning model. Game theory deals with how different players affect the overall
outcome of a game. For the explainability of a machine learning model, SHAP considers the outcome from
the trained model as the game and the input features that are used by the model as the players. Shapley values
are a way of representing the contribution of each player (feature) to the game (prediction).

Shapley values are based on the concept that each possible combination of features has an effect on the overall
prediction made by the model. The SHAP process for explaining predictions is as follows [27] :

1. For a set of features, there are 2 possible combination of features. For example, a dataset that consists of

three input features ( 1 , 2 , 3) will have the eight possible combinations: (a) no features, (b) 1 (c) 2, (d)
3, (e) ( 1 , 2 ), (f) ( 2 , 3 ), (g) ( 1 , 3 ), (h) ( 1 , 2 , 3 ).

2. Models are trained for each of the 2 combinations. Note that the model that uses no features just outputs
the mean of all output values in the training data. This is considered as the baseline prediction ( ).

3. For the data point whose output needs to be explained, the remaining 2 − 1 models are evaluated.
4. Marginal contribution of each of the models. Marginal contribution of model-j is calculated using the
difference between the predictions made by model-j and the baseline prediction.

(1)
= ˜ −
5. To obtain the overall effect of a feature on the prediction, the weighted mean of the marginal contributions
of every model containing that feature is evaluated. This is called the Shapley value of the feature for the
particular data point.

3.3. Deep neural network
We use a deep neural network (DNN) to diagnose a patient into two classes: benign or malignant. The ar-
chitecture of the DNN is shown in Figure 1. It uses the 30 features mentioned before to make predictions.
The development and training of the DNN was done in PyTorch [28] . Rectified linear units (ReLU) are used as
the activation functions in the hidden layers, and softmax activation is used at the output layer to output the
probabilities to the two output classes.

4 5 6 7 8 9 10 11 12 13 14