Page 34 - Read Online
P. 34
Koss et al. Art Int Surg. 2025;5:116-25 https://dx.doi.org/10.20517/ais.2024.91 Page 122
explain why it is not yet seen as a primary resource for making such important medical decisions.
The underutilization of ChatGPT for GAS information could also be attributed to broader concerns about
the accuracy and trustworthiness of LLMs in healthcare settings. Several studies have raised alarms about
the potential for LLMs to disseminate misinformation or oversimplify complex medical concepts, which can
lead to patient confusion or misinformed decisions [4,19] . Furthermore, the ability of LLMs to account for
patient-specific factors, such as individual medical histories or co-occurring health conditions, remains
limited. For transgender patients, whose healthcare needs are often highly specialized and require tailored
interventions, this lack of personalization could further diminish trust in ChatGPT as a reliable resource.
Interestingly, despite the limited use of ChatGPT for information on GAS, some participants did report that
it positively influenced their decision making. This indicates that while ChatGPT may not currently serve as
a primary resource, it may have greater utility as a supplementary tool, especially as LLMs evolve to
integrate more specialized medical data and provide real-time, accurate patient feedback. Furthermore,
research in fields such as ophthalmology and urology has shown that, while LLMs can provide reasonably
accurate and comprehensive information, they often fall short of addressing the full range of patient
needs [20,21] , underscoring ChatGPT’s potential to complement, rather than replace, traditional sources of
medical information for GAS .
[22]
This study additionally identified several areas where ChatGPT’s content could be improved to better meet
the needs of individuals seeking information on GAS. Participants specifically noted that ChatGPT
provided insufficient details on financial considerations, surgical techniques, and recovery processes - key
elements required in the decision-making process for GAS. This mirrors findings from other research where
LLMs have been criticized for their inability to provide comprehensive, context-specific medical
[18]
information, particularly in areas that require detailed, patient-centered guidance . These limitations are
particularly concerning in fields like transgender healthcare, where access to accurate, personalized, and
affirming medical information is often limited [23-25] .
From a broader perspective, the findings of this study emphasize the ongoing need to guide patients toward
trusted, reputable sources of medical information, especially for GAS, where misinformation can have
serious and life-long consequences. Healthcare providers should guide patients toward high-quality
resources, including peer-reviewed medical websites such as GAS websites produced by academic
institutions, such as our institution’s transgender care website https://genderaffirmingsurgicalcare.ucsf.edu/,
and consultations with trained professionals. It is also equally important to approach the integration of
LLMs into healthcare with caution, emphasizing that these technologies should complement, rather than
replace, human expertise and empathy.
Our study is not without limitations. The small number of participants who used ChatGPT specifically for
information on GAS limits the generalizability of our results. Additionally, the reliance on self-reported data
introduces potential biases, such as over-reporting or under-reporting the use of ChatGPT or other
information sources. Self-reported demographics such as gender identity and sexual orientation may also
introduce bias into the results given differences in participant understanding of specific terms, which could
differ from widely accepted definitions. There are limitations to the use of Prolific, including reliance on
self-reported data, the potential for participant bias due to financial incentives, and the fact that the
participant pool may not fully represent the general population. While Prolific participants may not fully
represent the broader population, prior research has shown that data quality from Prolific is comparable to
or better than other commonly used platforms such as MTurk. Efforts were also made to reduce the risk of

