Maranhao, Pedro Didier and Xavier, Vitoria de Araujo and Pereira, Heitor Mendes and Ren, Tsang Ing and Duggal, Anoushka and Seco de Herrera, Alba García and Abolghasemi, Vahid (2026) Evaluating Generative AI for Medical Image Captioning: A Benchmark Study on Radiology Images. In: SPIE Medical Imaging 2026: Imaging Informatics, 2026-02-15 - 2026-02-20, Vancouver, BC, Canada.
Maranhao, Pedro Didier and Xavier, Vitoria de Araujo and Pereira, Heitor Mendes and Ren, Tsang Ing and Duggal, Anoushka and Seco de Herrera, Alba García and Abolghasemi, Vahid (2026) Evaluating Generative AI for Medical Image Captioning: A Benchmark Study on Radiology Images. In: SPIE Medical Imaging 2026: Imaging Informatics, 2026-02-15 - 2026-02-20, Vancouver, BC, Canada.
Maranhao, Pedro Didier and Xavier, Vitoria de Araujo and Pereira, Heitor Mendes and Ren, Tsang Ing and Duggal, Anoushka and Seco de Herrera, Alba García and Abolghasemi, Vahid (2026) Evaluating Generative AI for Medical Image Captioning: A Benchmark Study on Radiology Images. In: SPIE Medical Imaging 2026: Imaging Informatics, 2026-02-15 - 2026-02-20, Vancouver, BC, Canada.
Abstract
The generation of clinically accurate and contextually rich captions for radiology images is a critical task in medical Artificial Intelligence (AI), with applications in education, documentation, and decision support. In this study, we benchmark the performance of leading generative AI models—including OpenAI’s GPT 4o, Google’s Gemini 2.5 pro, Anthropic’s Claude 4.5 Sonnet and Meta’s LLaMA 4—on the recently released ROCOv2 (Radiology Objects in COntext Version 2) dataset. ROCOv2 offers a large-scale, multimodal resource of radiology images paired with expert-generated captions, enabling robust evaluation of vision-language models in the medical domain. We assess models under zero-shot and few-shot prompting conditions, and evaluate outputs using automated metrics (BLEU, BERTScore, ROUGE and METEOR). Our analysis highlights the capabilities and limitations of current generative models in understanding and describing complex radiological content, discussing the potential for integrating these models into clinical workflows. This work provides a comprehensive evaluation of generative AI for medical image captioning and offers insights into future directions for improving reliability and clinical relevance in multimodal medical AI systems.
| Item Type: | Conference or Workshop Item (Paper) |
|---|---|
| Additional Information: | Published proceedings: _not provided_ |
| Divisions: | Faculty of Science and Health Faculty of Science and Health > Computer Science and Electronic Engineering, School of |
| SWORD Depositor: | Unnamed user with email elements@essex.ac.uk |
| Depositing User: | Unnamed user with email elements@essex.ac.uk |
| Date Deposited: | 21 Apr 2026 11:57 |
| Last Modified: | 21 Apr 2026 11:58 |
| URI: | http://repository.essex.ac.uk/id/eprint/42820 |
Available files
Filename: SPIE_FINAL.pdf