Page 53 - Read Online
P. 53
Boyd et al. Art Int Surg 2024;4:316-23 Artificial
DOI: 10.20517/ais.2024.53
Intelligence Surgery
Original Article Open Access
Analyzing the precision and readability of a
healthcare focused artificial intelligence platform on
common questions regarding breast augmentation
Carter J. Boyd, Lucas R. Perez Rivera, Kshipra Hemal, Thomas J. Sorenson, Chris Amro, Mihye Choi,
Nolan S. Karp
Hansjörg Wyss Department of Plastic Surgery, NYU Langone Health, New York, NY 10017, USA.
Correspondence to: Prof. Nolan S. Karp, Hansjörg Wyss Department of Plastic Surgery, NYU Langone Health, 305 East 47th,
suite 1A, New York, NY 10017, USA. E-mail: Nolan.karp@nyulangone.org
How to cite this article: Boyd CJ, Perez Rivera LR, Hemal K, Sorenson TJ, Amro C, Choi M, Karp NS. Analyzing the precision and
readability of a healthcare focused artificial intelligence platform on common questions regarding breast augmentation. Art Int
Surg 2024;4:316-23. https://dx.doi.org/10.20517/ais.2024.53
Received: 24 Jul 2024 First Decision: 19 Sep 2024 Revised: 25 Sep 2024 Accepted: 14 Oct 2024 Published: 19 Oct 2024
Academic Editor: Andrew Gumbs Copy Editor: Pei-Yun Wang Production Editor: Pei-Yun Wang
Abstract
Aim: The purpose of this study was to determine the quality and accessibility of the outputs from a healthcare-
specific artificial intelligence (AI) platform for common questions during the perioperative period for a common
plastic surgery procedure.
Methods: Doximity GPT (Doximity, San Francisco, CA) and ChatGPT 3.5 (OpenAI, San Francisco, CA) were
utilized to search 20 common perioperative patient inquiries regarding breast augmentation. The structure,
content, and readability of responses were compared using t-tests and chi-square tests, with P < 0.05 used as the
cutoff for significance.
Results: Out of 80 total AI-generated outputs, ChatGPT responses were significantly longer (331 vs. 218 words, P <
0.001). Doximity GPT outputs were structured as a letter from a medical provider to the patient, whereas ChatGPT
outputs were a bulleted list. Doximity GPT outputs were significantly more readable by four validated scales: Flesch
Kincaid Reading Ease (42.6 vs. 29.9, P < 0.001) and Flesch Kincaid Grade Level (11.4 vs. 14.1 grade, P < 0.001),
Coleman-Liau Index (14.9 vs. 17 grade, P < 0.001), and Automated Readability Index (11.3 vs. 14.8 grade, P < 0.001).
Regarding content, there was no difference between the two platforms regarding the appropriateness of the topic
(99% overall). Medical advice from all outputs was deemed reasonable.
© The Author(s) 2024. Open Access This article is licensed under a Creative Commons Attribution 4.0
International License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, sharing,
adaptation, distribution and reproduction in any medium or format, for any purpose, even commercially, as
long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and
indicate if changes were made.
www.oaepublish.com/ais