Page 47 - Read Online
P. 47
Shapey et al. Art Int Surg 2023;3:1-13 https://dx.doi.org/10.20517/ais.2022.31 Page 5
[31]
have occurred . It is reported that in due course, machine learning analysis could be incorporated in real-
time.
MACHINE LEARNING, METHODOLOGY AND DATA
The frequentist approach to statistical analysis has been the most commonly used approach to
understanding and interpreting data in surgical care. Its broad philosophy is to consider, within the context
of narrow rules and tight assumptions, the likelihood of achieving the same result if a test were to be
repeated a given number of times. Different approaches to the analysis of data have recently gained favour;
for example, the Bayesian approach, which is based on the application of pre-existing data to the
consideration of the a-priori (by theoretical deduction) conditional probability of a future event occurring.
The Bayesian approach represents a far more logical and intuitive approach to statistical analysis that is
highly relevant to the understanding of postoperative complications, but is currently under-utilised. In
contrast to the classical approach to statistical analysis, ML takes the relative certainty of known variables
and outcomes and applies algorithms to better appreciate the relationship between them. All algorithms,
regardless of their classification as frequentist statistical or ML methodology, have rules and prerequisites
that need to be followed. Consequently, the scientific basis for utilising a certain ML methodology ought to
be outlined on each occasion, lest the validity of the work performed should be challenged.
It is helpful to distinguish between algorithms that require supervision, where clearly labelled or defined
data is selected for the model, vs. unsupervised algorithms where the algorithm labels the data and seeks to
determine the relationships between them. Reinforced learning describes a situation where the machine
(i.e. a computer or robot) automatically processes the data for the first time and adapts its algorithms
accordingly. Table 1 provides an overview of the potential application of the various ML methods, their
strengths and limitations, to improve our understanding and prediction of postoperative complications.
While it can be challenging to appreciate the mathematical equations that relate to the various ML
algorithms, many of them are named according to everyday aspects of life that illustrate their methodology.
For example, decision trees start with a trunk (i.e. the problem, or presenting state) and culminate in a series
of branches that represent the various options and their associated probability of the outcome in question
(e.g. survival). Random forests, therefore, represent the amalgamation of multiple trees in a given scenario.
Neural networks are described in a manner that represents the neurons and synapses (i.e. nodes) in the
human nervous system with the overall aim of replicating the higher functions of a human brain, albeit at a
digital level.
The accuracy of ML rests on the reliability of the data entered, which comes in many forms and can be
handled in many ways. In the “real world”, missing data is a big problem and can be addressed, most
commonly, by imputation where a value is inferred to the missing data according to the distribution of
existing data. There are various methods for imputing data; modal - using the modal[most frequent] data
point; multiple - by creating multiple versions of the same dataset and attributing different values from
within the given distribution to the missing data, and calculating the mean value from the multiple data sets;
iterative - where multiple variables are taken into consideration together in order to provide an imputed
value; and arbitrary - which provides a random value from within a pre-defined range. There is also the
option of removing the subjects from a data set where there is missing data, but this is infrequently advised
in large and complex datasets with significant amounts of missing data. The handling of data is of critical
importance because some ML algorithms cannot be legitimately performed if there is considerable missing
data or if it has been addressed in a certain way. Likewise, if the outcomes have not been labelled according
to clear definitions, then the validity of the results could be questioned.