GPT-4 Vision: Multi-Modal Evolution of ChatGPT and Potential Role in Radiology.

Javan, Ramin; Kim, Theodore; Mostaghni, Navid

Javan, Ramin; Kim, Theodore; Mostaghni, Navid.

Afiliação

Javan R; Department of Radiology, George Washington University School of Medicine and Health Sciences, Washington, USA.
Kim T; Department of Radiology, George Washington University School of Medicine and Health Sciences, Washington, USA.
Mostaghni N; College of Medicine, California University of Science and Medicine, Colton, USA.

Cureus ; 16(8): e68298, 2024 Aug.

Article em En | MEDLINE | ID: mdl-39350878

ABSTRACT

ABSTRACT

GPT-4 Vision (GPT-4V) represents a significant advancement in multimodal artificial intelligence, enabling text generation from images without specialized training. This marks the transformation of ChatGPT as a large language model (LLM) into GPT-4's promised large multimodal model (LMM). As these AI models continue to advance, they may enhance radiology workflow and aid with decision support. This technical note explores potential GPT-4V applications in radiology and evaluates performance for sample tasks. GPT-4V capabilities were tested using images from the web, personal and institutional teaching files, and hand-drawn sketches. Prompts evaluated scientific figure analysis, radiologic image reporting, image comparison, handwriting interpretation, sketch-to-code, and artistic expression. In this limited demonstration of GPT-4V's capabilities, it showed promise in classifying images, counting entities, comparing images, and deciphering handwriting and sketches. However, it exhibited limitations in detecting some fractures, discerning a change in size of lesions, accurately interpreting complex diagrams, and consistently characterizing radiologic findings. Artistic expression responses were coherent. WhileGPT-4V may eventually assist with tasks related to radiology, current reliability gaps highlight the need for continued training and improvement before consideration for any medical use by the general public and ultimately clinical integration. Future iterations could enable a virtual assistant to discuss findings, improve reports, extract data from images, provide decision support based on guidelines, white papers, and appropriateness criteria. Human expertise remain essential for safe practice and partnerships between physicians, researchers, and technology leaders are necessary to safeguard against risks like bias and privacy concerns.

Palavras-chave

chatgpt; gpt-4; gpt-4 vision; large language models (llms); large multimodal model; lmm; multimodal ai

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Revista: Cureus Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Estados Unidos País de publicação: Estados Unidos

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google