Page 47 - Read Online
P. 47

Shapey et al. Art Int Surg 2023;3:1-13  https://dx.doi.org/10.20517/ais.2022.31       Page 5

                           [31]
               have occurred . It is reported that in due course, machine learning analysis could be incorporated in real-
               time.

               MACHINE LEARNING, METHODOLOGY AND DATA
               The frequentist approach to statistical analysis has been the most commonly used approach to
               understanding and interpreting data in surgical care. Its broad philosophy is to consider, within the context
               of narrow rules and tight assumptions, the likelihood of achieving the same result if a test were to be
               repeated a given number of times. Different approaches to the analysis of data have recently gained favour;
               for example, the Bayesian approach, which is based on the application of pre-existing data to the
               consideration of the a-priori (by theoretical deduction) conditional probability of a future event occurring.
               The Bayesian approach represents a far more logical and intuitive approach to statistical analysis that is
               highly relevant to the understanding of postoperative complications, but is currently under-utilised. In
               contrast to the classical approach to statistical analysis, ML takes the relative certainty of known variables
               and outcomes and applies algorithms to better appreciate the relationship between them. All algorithms,
               regardless of their classification as frequentist statistical or ML methodology, have rules and prerequisites
               that need to be followed. Consequently, the scientific basis for utilising a certain ML methodology ought to
               be outlined on each occasion, lest the validity of the work performed should be challenged.


               It is helpful to distinguish between algorithms that require supervision, where clearly labelled or defined
               data is selected for the model, vs. unsupervised algorithms where the algorithm labels the data and seeks to
               determine the relationships between them.  Reinforced learning describes a situation where the machine
               (i.e. a computer or robot) automatically processes the data for the first time and adapts its algorithms
               accordingly. Table 1 provides an overview of the potential application of the various ML methods, their
               strengths and limitations, to improve our understanding and prediction of postoperative complications.
               While it can be challenging to appreciate the mathematical equations that relate to the various ML
               algorithms, many of them are named according to everyday aspects of life that illustrate their methodology.
               For example, decision trees start with a trunk (i.e. the problem, or presenting state) and culminate in a series
               of branches that represent the various options and their associated probability of the outcome in question
               (e.g. survival). Random forests, therefore, represent the amalgamation of multiple trees in a given scenario.
               Neural networks are described in a manner that represents the neurons and synapses (i.e. nodes) in the
               human nervous system with the overall aim of replicating the higher functions of a human brain, albeit at a
               digital level.

               The accuracy of ML rests on the reliability of the data entered, which comes in many forms and can be
               handled in many ways. In the “real world”, missing data is a big problem and can be addressed, most
               commonly, by imputation where a value is inferred to the missing data according to the distribution of
               existing data. There are various methods for imputing data; modal - using the modal[most frequent] data
               point; multiple - by creating multiple versions of the same dataset and attributing different values from
               within the given distribution to the missing data, and calculating the mean value from the multiple data sets;
               iterative - where multiple variables are taken into consideration together in order to provide an imputed
               value; and arbitrary - which provides a random value from within a pre-defined range. There is also the
               option of removing the subjects from a data set where there is missing data, but this is infrequently advised
               in large and complex datasets with significant amounts of missing data. The handling of data is of critical
               importance because some ML algorithms cannot be legitimately performed if there is considerable missing
               data or if it has been addressed in a certain way. Likewise, if the outcomes have not been labelled according
               to clear definitions, then the validity of the results could be questioned.
   42   43   44   45   46   47   48   49   50   51   52