Pesquisa | Portal Regional da BVS

Improving Laryngoscopy Image Analysis Through Integration of Global Information and Local Features in VoFoCD Dataset.

Dao, Thao Thi Phuong; Huynh, Tuan-Luc; Pham, Minh-Khoi; Le, Trung-Nghia; Nguyen, Tan-Cong; Nguyen, Quang-Thuc; Tran, Bich Anh; Van, Boi Ngoc; Ha, Chanh Cong; Tran, Minh-Triet.

J Imaging Inform Med ; 2024 May 29.

Artigo em Inglês | MEDLINE | ID: mdl-38809338

RESUMO

The diagnosis and treatment of vocal fold disorders heavily rely on the use of laryngoscopy. A comprehensive vocal fold diagnosis requires accurate identification of crucial anatomical structures and potential lesions during laryngoscopy observation. However, existing approaches have yet to explore the joint optimization of the decision-making process, including object detection and image classification tasks simultaneously. In this study, we provide a new dataset, VoFoCD, with 1724 laryngology images designed explicitly for object detection and image classification in laryngoscopy images. Images in the VoFoCD dataset are categorized into four classes and comprise six glottic object types. Moreover, we propose a novel Multitask Efficient trAnsformer network for Laryngoscopy (MEAL) to classify vocal fold images and detect glottic landmarks and lesions. To further facilitate interpretability for clinicians, MEAL provides attention maps to visualize important learned regions for explainable artificial intelligence results toward supporting clinical decision-making. We also analyze our model's effectiveness in simulated clinical scenarios where shaking of the laryngoscopy process occurs. The proposed model demonstrates outstanding performance on our VoFoCD dataset. The accuracy for image classification and mean average precision at an intersection over a union threshold of 0.5 (mAP50) for object detection are 0.951 and 0.874, respectively. Our MEAL method integrates global knowledge, encompassing general laryngoscopy image classification, into local features, which refer to distinct anatomical regions of the vocal fold, particularly abnormal regions, including benign and malignant lesions. Our contribution can effectively aid laryngologists in identifying benign or malignant lesions of vocal folds and classifying images in the laryngeal endoscopy process visually.

Vision-Based Assistance for Vocal Fold Identification in Laryngoscopy with Knowledge Distillation.

Dao, Thao Thi Phuong; Pham, Minh-Khoi; Tran, Mai-Khiem; Ha, Chanh Cong; Van, Boi Ngoc; Tran, Bich Anh; Tran, Minh-Triet.

Stud Health Technol Inform ; 310: 946-950, 2024 Jan 25.

Artigo em Inglês | MEDLINE | ID: mdl-38269948

RESUMO

Laryngoscopy images play a vital role in merging computer vision and otorhinolaryngology research. However, limited studies offer laryngeal datasets for comparative evaluation. Hence, this study introduces a novel dataset focusing on vocal fold images. Additionally, we propose a lightweight network utilizing knowledge distillation, with our student model achieving around 98.4% accuracy-comparable to the original EfficientNetB1 while reducing model weights by up to 88%. We also present an AI-assisted smartphone solution, enabling a portable and intelligent laryngoscopy system that aids laryngoscopists in efficiently targeting vocal fold areas for observation and diagnosis. To sum up, our contribution includes a laryngeal image dataset and a compressed version of the efficient model, suitable for handheld laryngoscopy devices.

Assuntos

Laringe , Prega Vocal , Humanos , Laringoscopia , Inteligência , Conhecimento

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA