Search | VHL Regional Portal

A deep learning pipeline for automated classification of vocal fold polyps in flexible laryngoscopy.

Yao, Peter; Witte, Dan; German, Alexander; Periyakoil, Preethi; Kim, Yeo Eun; Gimonet, Hortense; Sulica, Lucian; Born, Hayley; Elemento, Olivier; Barnes, Josue; Rameau, Anaïs.

Eur Arch Otorhinolaryngol ; 281(4): 2055-2062, 2024 Apr.

Article in English | MEDLINE | ID: mdl-37695363

ABSTRACT

PURPOSE: To develop and validate a deep learning model for distinguishing healthy vocal folds (HVF) and vocal fold polyps (VFP) on laryngoscopy videos, while demonstrating the ability of a previously developed informative frame classifier in facilitating deep learning development. METHODS: Following retrospective extraction of image frames from 52 HVF and 77 unilateral VFP videos, two researchers manually labeled each frame as informative or uninformative. A previously developed informative frame classifier was used to extract informative frames from the same video set. Both sets of videos were independently divided into training (60%), validation (20%), and test (20%) by patient. Machine-labeled frames were independently verified by two researchers to assess the precision of the informative frame classifier. Two models, pre-trained on ResNet18, were trained to classify frames as containing HVF or VFP. The accuracy of the polyp classifier trained on machine-labeled frames was compared to that of the classifier trained on human-labeled frames. The performance was measured by accuracy and area under the receiver operating characteristic curve (AUROC). RESULTS: When evaluated on a hold-out test set, the polyp classifier trained on machine-labeled frames achieved an accuracy of 85% and AUROC of 0.84, whereas the classifier trained on human-labeled frames achieved an accuracy of 69% and AUROC of 0.66. CONCLUSION: An accurate deep learning classifier for vocal fold polyp identification was developed and validated with the assistance of a peer-reviewed informative frame classifier for dataset assembly. The classifier trained on machine-labeled frames demonstrates improved performance compared to the classifier trained on human-labeled frames.

Subject(s)

Deep Learning , Polyps , Humans , Laryngoscopy/methods , Vocal Cords/diagnostic imaging , Neural Networks, Computer , Retrospective Studies , Machine Learning , Polyps/diagnostic imaging

Automatic classification of informative laryngoscopic images using deep learning.

Yao, Peter; Witte, Dan; Gimonet, Hortense; German, Alexander; Andreadis, Katerina; Cheng, Michael; Sulica, Lucian; Elemento, Olivier; Barnes, Josue; Rameau, Anaïs.

Laryngoscope Investig Otolaryngol ; 7(2): 460-466, 2022 Apr.

Article in English | MEDLINE | ID: mdl-35434326

ABSTRACT

Objective: This study aims to develop and validate a convolutional neural network (CNN)-based algorithm for automatic selection of informative frames in flexible laryngoscopic videos. The classifier has the potential to aid in the development of computer-aided diagnosis systems and reduce data processing time for clinician-computer scientist teams. Methods: A dataset of 22,132 laryngoscopic frames was extracted from 137 flexible laryngostroboscopic videos from 115 patients. 55 videos were from healthy patients with no laryngeal pathology and 82 videos were from patients with vocal fold polyps. The extracted frames were manually labeled as informative or uninformative by two independent reviewers based on vocal fold visibility, lighting, focus, and camera distance, resulting in 18,114 informative frames and 4018 uninformative frames. The dataset was split into training and test sets. A pre-trained ResNet-18 model was trained using transfer learning to classify frames as informative or uninformative. Hyperparameters were set using cross-validation. The primary outcome was precision for the informative class and secondary outcomes were precision, recall, and F1-score for all classes. The processing rate for frames between the model and a human annotator were compared. Results: The automated classifier achieved an informative frame precision, recall, and F1-score of 94.4%, 90.2%, and 92.3%, respectively, when evaluated on a hold-out test set of 4438 frames. The model processed frames 16 times faster than a human annotator. Conclusion: The CNN-based classifier demonstrates high precision for classifying informative frames in flexible laryngostroboscopic videos. This model has the potential to aid researchers with dataset creation for computer-aided diagnosis systems by automatically extracting relevant frames from laryngoscopic videos.

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL