Back to Search View Original Cite This Article

Abstract

<jats:p>Direct patient access to radiology reports is becoming standard clinical practice; however, the medical terminology used in these reports impedes patient comprehension. Large language models (LLMs) have been proposed as a tool for automatically generating reports adapted for patient understanding (patient-friendly radiology reports), yet a systematic assessment of their capabilities and limitations has not been conducted. Purpose to map the literature on the use of large language models and conventional approaches for generating patient-friendly radiology reports. Materials and methods. A systematic search was performed across five databases (PubMed, Google Scholar, Semantic Scholar, arXiv, and medRxiv) covering the period from January 2020 to December 2025. Publications addressing patient-friendly radiology reports for computed tomography, magnetic resonance imaging, radiography, mammography, fluorography, and dual-energy X-ray absorptiometry were eligible for inclusion. The primary search was conducted in English; Russian-language sources identified outside the main search strategy were also included. Data were extracted using a standardized form covering eight research questions. Screening and data extraction were performed by a single reviewer; the extracted data were verified by two co-authors through comparison with the original sources. Results. Of 1,615 records identified, 60 publications were included. More than half of the studies (n = 35; 58.3%) were published in 2024–2025; geographically, the United States predominated (n = 35; 58.3%). LLM-based approaches were employed in 33 studies (55.0%), predominantly using GPT-family models; conventional natural language processing (NLP) methods and manual simplification were used in 19 studies (31.7%), with the remaining publications comprising hybrid approaches and reviews. The most common study design was technical validation (n = 27; 45.0%). LLMs improved report readability: the Flesch–Kincaid Grade Level decreased from baseline values of 10th‑13th grade to 5th‑12th grade. All comparative studies involving patients demonstrated a statistically significant improvement in comprehension of patient-friendly reports. The authors of included studies systematically analyzed errors in LLM-generated texts: the rate of errors and hallucinations ranged from 0% to 50% depending on the model and prompt; no systematic error analysis was reported for conventional NLP approaches in the included studies. An empirically supported trade-off between readability and accuracy was identified: accuracy declined significantly when targeting reading levels below the 11th grade. Only 3 of the 60 studies were randomized controlled trials. Findings. LLMs can improve readability and patient comprehension of radiology reports while maintaining clinical accuracy in most cases. According to some studies, the optimal target reading level is the 8th‑11th grade; this recommendation requires confirmation in future research. A key prerequisite for clinical implementation is specialist verification of simplified reports, given the current maturity of quality control systems for generated texts. Priority directions include large-scale randomized controlled trials evaluating clinical outcomes and validation of approaches in non-English-language settings.</jats:p>

Show More

Keywords

reports studies radiology approaches grade

Related Articles

PORE

About

Connect