News | Radiology Imaging | September 25, 2024

While OpenAI’s GPT-4 with vision demonstrated a level of competency in answering text-based radiology in-training examination questions, the model showed deficits in analyzing key radiologic images.


Sept. 3, 2024 — Researchers evaluating the performance of ChatGPT-4 Vision found that the model performed well on text-based radiology exam questions but struggled to answer image-related questions accurately. The study’s results were published in Radiology, a journal of the Radiological Society of North America (RSNA).

Chat GPT-4 Vision is the first version of the large language model that can interpret both text and images.

“ChatGPT-4 has shown promise for assisting radiologists in tasks such as simplifying patient-facing radiology reports and identifying the appropriate protocol for imaging exams,” said Chad Klochko, M.D., musculoskeletal radiologist and artificial intelligence (AI) researcher at Henry Ford Health in Detroit, Michigan. “With image processing capabilities, GPT-4 Vision allows for new potential applications in radiology.”

For the study, Dr. Klochko’s research team used retired questions from the American College of Radiology’s Diagnostic Radiology In-Training Examinations, a series of tests used to benchmark the progress of radiology residents. After excluding duplicates, the researchers used 377 questions across 13 domains, including 195 questions that were text-only and 182 that contained an image.

GPT-4 Vision answered 246 of the 377 questions correctly, achieving an overall score of 65.3%. The model correctly answered 81.5% (159) of the 195 text-only queries and 47.8% (87) of the 182 questions with images.

“The 81.5% accuracy for text-only questions mirrors the performance of the model’s predecessor,” he said. “This consistency on text-based questions may suggest that the model has a degree of textual understanding in radiology.”

Genitourinary radiology was the only subspecialty for which GPT-4 Vision performed better on questions with images (67%, or 10 of 15) than text-only questions (57%, or 4 of 7). The model performed better on text-only questions in all other subspecialties.

The model performed best on image-based questions in the chest and genitourinary subspecialties, correctly answering 69% and 67% of the image-containing questions, respectively. The model performed lowest on image-containing questions in the nuclear medicine domain, correctly answering only 2 of 10 questions.

The study also evaluated the impact of various prompts on the performance of GPT-4 Vision.

  • Original: You are taking a radiology board exam. Images of the questions will be uploaded. Choose the correct answer for each question. 
  • Basic: Choose the single best answer in the following retired radiology board exam question. 
  • Short instruction: This is a retired radiology board exam question to gauge your medical knowledge. Choose the single best answer letter and do not provide any reasoning for your answer. 
  • Long instruction: You are a board-certified diagnostic radiologist taking an examination. Evaluate each question carefully and if the question additionally contains an image, please evaluate the image carefully in order to answer the question. Your response must include a single best answer choice. Failure to provide an answer choice will count as incorrect. 
  • Chain of thought: You are taking a retired board exam for research purposes. Given the provided image, think step by step for the provided question. 

Although the model correctly answered 183 of 265 questions with a basic prompt, it declined to answer 120 questions, most of which contained an image.

“The phenomenon of declining to answer questions was something we hadn’t seen in our initial exploration of the model,” Dr. Klochko said.

The short instruction prompt yielded the lowest accuracy (62.6%).

 On text-based questions, chain-of-thought prompting outperformed long instruction by 6.1%, basic by 6.8%, and original prompting style by 8.9%. There was no evidence to suggest performance differences between any two prompts on image-based questions.

“Our study showed evidence of hallucinatory responses when interpreting image findings,” Dr. Klochko said. “We noted an alarming tendency for the model to provide correct diagnoses based on incorrect image interpretations, which could have significant clinical implications.”

Dr. Klochko said his study’s findings underscore the need for more specialized and rigorous evaluation methods to assess large language model performance in radiology tasks.

“Given the current challenges in accurately interpreting key radiologic images and the tendency for hallucinatory responses, the applicability of GPT-4 Vision in information-critical fields such as radiology is limited in its current state,” he said.

 

“Performance of GPT-4 with Vision on Text- and Image-based ACR Diagnostic Radiology In-Training Examination Questions.” Collaborating with Dr. Klochko were Nolan Hayden, M.D., Spencer Gilbert, B.S., Laila M. Poisson, Ph.D., and Brent Griffith, M.D.

Radiology is edited by Linda Moy, M.D., New York University, New York, N.Y., and owned and published by the Radiological Society of North America, Inc. (https://pubs.rsna.org/journal/radiology)

RSNA is an association of radiologists, radiation oncologists, medical physicists and related scientists promoting excellence in patient care and health care delivery through education, research and technologic innovation. The Society is based in Oak Brook, Illinois. (RSNA.org)

For patient-friendly information on medical imaging, visit RadiologyInfo.org.


Related Content

News | Endoscopes

Oct. 22, 2025 — Fujifilm Healthcare Americas Corp. has launched its advanced endoscopy platform, the ELUXEO 8000 ...

Time October 23, 2025
arrow
News | X-Ray

Oct. 22, 2025 — Imaging technology company Adaptix has begun live imaging trials as part of a research program at the ...

Time October 22, 2025
arrow
News | Contrast Media

Oct. 21, 2025 — Subtle Medical, Inc., a provider of AI-powered medical imaging solutions, has announced positive ...

Time October 21, 2025
arrow
News | Artificial Intelligence

Oct. 20, 2025 — Viz.ai has launched of Viz Assist, a suite of autonomous AI agents that significantly enhance how care ...

Time October 20, 2025
arrow
News | Point-of-Care Ultrasound (POCUS)

Oct. 15, 2025 — GE HealthCare has announced the latest advancement in its Venue family of point-of-care ultrasound ...

Time October 16, 2025
arrow
News | Magnetic Resonance Imaging (MRI)

September 24, 2025—According to the American Journal of Roentgenology (AJR), MRI can reliably identify lateral meniscal ...

Time October 03, 2025
arrow
News | Radiopharmaceuticals and Tracers

Oct. 01, 2025 – Nuclidium AG, a clinical-stage radiopharmaceutical company developing a proprietary copper-based ...

Time October 02, 2025
arrow
News | Radiology Business | Harvey L. Neiman Health Policy Institute

Sept. 30, 2025 — A new study from the Harvey L. Neiman Health Policy Institute found that attrition (i.e., exit) from ...

Time October 02, 2025
arrow
News | Magnetic Resonance Imaging (MRI)

Sept. 10, 2025 —GE HealthCare announced it has entered into an agreement to acquire icometrix, a company focused on ...

Time September 10, 2025
arrow
News | Lung Imaging

Sept. 4, 2025 — Sentec recently announced that the U.S. Food and Drug Administration (FDA) has granted 510(k) clearance ...

Time September 08, 2025
arrow
Subscribe Now