Page 58 - Read Online
P. 58

Page 321                                                             Boyd et al. Art Int Surg 2024;4:316-23  https://dx.doi.org/10.20517/ais.2024.53

               have demonstrated a lack of sufficient understanding and knowledge of plastic surgery within the broader
               healthcare workforce [19,20] . A plastic surgery-specific AI tool or an NLP with additional plastic surgery
               training represents an opportunity to improve the knowledge and applicability of LLM integration within
               plastic surgery. Analyses of generic AI chatbots on plastic surgery in-service training examinations, for
               example, have demonstrated a wide range of accuracy, scoring at levels comparable to a first-year plastic
               surgery trainee [21,22] . The creation of a specialty-specific LLM has been previously explored, particularly in
               the field of otolaryngology, where an ENT-specific LLM called ChatENT was found to outperform existing
                                                                    [23]
               LLMs and exhibited promise in medical and patient education . An opportunity exists to develop a plastic
               surgery-focused LLM to deliver the most accurate and accessible information to patients and plastic
               surgeons alike. This LLM should also be able to be customized by surgeons so that individual surgeon
               preferences regarding perioperative instructions can be programmed. To ensure safety in the clinical
               application of these tools, appropriate escalation of patient inquiries for scenarios that merit urgent or
               emergent medical attention must be incorporated into the AI tool. Patients will inevitably utilize AI
               platforms to seek medical counsel independent of physician supervision. Patients have long used the
               Internet for self-diagnosis, self-referral, and research of their conditions [24,25] . Thus, studies of this nature are
               critical to ensure the reliability and accuracy of AI-generated health information to protect patients from
                            [26]
               misinformation .

               Since Doximity GPT and ChatGPT are backed by the same NLP program, they are subject to similar
               training data biases. While the data were largely deemed clinically reasonable by the study team, previous
               studies have identified inaccuracies and inadequacies when utilizing ChatGPT to answer common
               postoperative questions [27,28] . Additionally, the ever-evolving nature of clinical dogmas and accepted
               practices may not always align with the knowledge cut-off dates of these LLMs. Doximity GPT’s knowledge
               of clinical data extends only until September 2021, so novel medical or surgical information will not be
               included in any outputs. This highlights the importance of clinicians prioritizing clinical judgment and
               thoroughly reviewing any AI-generated output prior to distribution to patients.


               The ethical implications of incorporating AI tools into plastic surgery practice also warrant further
               discussion. Previous studies have highlighted the importance of informed consent, privacy protection, bias
               reduction, and regulation for these technologies [29,30] . Kenig et al. described the need for a partnership
               between physicians and lawmakers when creating guidelines and regulations for the use of AI in clinical
               practice, to ensure that the highest standards of quality and transparency are upheld . They also suggest
                                                                                        [29]
               the creation of an independent body to aid in the testing and validation of healthcare-specific AI models.
               Further, these tools must be trained with diverse training data, as bias from training datasets may affect the
               accuracy of AI-generated responses for patients of diverse backgrounds. Periodic review and validation of
               AI models used in healthcare can aid in fostering fairness, equity, and higher quality of patient-facing data.


               Limitations of this study include the comparison between Doximity GPT and only one other NLP. While
               ChatGPT has been demonstrated previously to have the highest working knowledge in plastic surgery, this
               may have changed or evolved since that time . Furthermore, Doximity GPT is powered by the updated
                                                      [21]
               ChatGPT 4.0. We elected to use ChatGPT 3.5 for comparison in this study, given it is freely accessible.
               Differences assessed by this study may be attributable to the subtle nuances between the two versions of the
               LLM. Assessment for the accuracy of LLM outputs is a time-consuming process to review each output, and
               it remains difficult to objectively determine accuracy other than relying on the clinical judgment of the
               study team. Future studies should seek to develop methodologies or tools that can more objectively
               determine medical accuracy on a broader scale. This difficulty further contributed to limiting the scope of
               the study, as the study team prioritized critical evaluation of each individual output rather than reviewing
   53   54   55   56   57   58   59   60   61   62   63