Pesquisa | Portal Regional da BVS

Vision-Language Model for Visual Question Answering in Medical Imagery.

Bazi, Yakoub; Rahhal, Mohamad Mahmoud Al; Bashmal, Laila; Zuair, Mansour.

Bioengineering (Basel) ; 10(3)2023 Mar 20.

Artigo em Inglês | MEDLINE | ID: mdl-36978771

RESUMO

In the clinical and healthcare domains, medical images play a critical role. A mature medical visual question answering system (VQA) can improve diagnosis by answering clinical questions presented with a medical image. Despite its enormous potential in the healthcare industry and services, this technology is still in its infancy and is far from practical use. This paper introduces an approach based on a transformer encoder-decoder architecture. Specifically, we extract image features using the vision transformer (ViT) model, and we embed the question using a textual encoder transformer. Then, we concatenate the resulting visual and textual representations and feed them into a multi-modal decoder for generating the answer in an autoregressive way. In the experiments, we validate the proposed model on two VQA datasets for radiology images termed VQA-RAD and PathVQA. The model shows promising results compared to existing solutions. It yields closed and open accuracies of 84.99% and 72.97%, respectively, for VQA-RAD, and 83.86% and 62.37%, respectively, for PathVQA. Other metrics such as the BLUE score showing the alignment between the predicted and true answer sentences are also reported.

Contrasting EfficientNet, ViT, and gMLP for COVID-19 Detection in Ultrasound Imagery.

Rahhal, Mohamad Mahmoud Al; Bazi, Yakoub; Jomaa, Rami M; Zuair, Mansour; Melgani, Farid.

J Pers Med ; 12(10)2022 Oct 12.

Artigo em Inglês | MEDLINE | ID: mdl-36294846

RESUMO

A timely diagnosis of coronavirus is critical in order to control the spread of the virus. To aid in this, we propose in this paper a deep learning-based approach for detecting coronavirus patients using ultrasound imagery. We propose to exploit the transfer learning of a EfficientNet model pre-trained on the ImageNet dataset for the classification of ultrasound images of suspected patients. In particular, we contrast the results of EfficentNet-B2 with the results of ViT and gMLP. Then, we show the results of the three models by learning from scratch, i.e., without transfer learning. We view the detection problem from a multiclass classification perspective by classifying images as COVID-19, pneumonia, and normal. In the experiments, we evaluated the models on a publically available ultrasound dataset. This dataset consists of 261 recordings (202 videos + 59 images) belonging to 216 distinct patients. The best results were obtained using EfficientNet-B2 with transfer learning. In particular, we obtained precision, recall, and F1 scores of 95.84%, 99.88%, and 24 97.41%, respectively, for detecting the COVID-19 class. EfficientNet-B2 with transfer learning presented an overall accuracy of 96.79%, outperforming gMLP and ViT, which achieved accuracies of 93.03% and 92.82%, respectively.

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA