Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 29
Filter
Add more filters










Publication year range
1.
Biomed Opt Express ; 15(4): 2175-2186, 2024 Apr 01.
Article in English | MEDLINE | ID: mdl-38633078

ABSTRACT

Three-dimensional stacks acquired with confocal or two-photon microscopy are crucial for studying neuroanatomy. However, high-resolution image stacks acquired at multiple depths are time-consuming and susceptible to photobleaching. In vivo microscopy is further prone to motion artifacts. In this work, we suggest that deep neural networks with sine activation functions encoding implicit neural representations (SIRENs) are suitable for predicting intermediate planes and correcting motion artifacts, addressing the aforementioned shortcomings. We show that we can accurately estimate intermediate planes across multiple micrometers and fully automatically and unsupervised estimate a motion-corrected denoised picture. We show that noise statistics can be affected by SIRENs, however, rescued by a downstream denoising neural network, shown exemplarily with the recovery of dendritic spines. We believe that the application of these technologies will facilitate more efficient acquisition and superior post-processing in the future.

2.
PLoS Comput Biol ; 20(2): e1011774, 2024 Feb.
Article in English | MEDLINE | ID: mdl-38422112

ABSTRACT

Dendritic spines are the seat of most excitatory synapses in the brain, and a cellular structure considered central to learning, memory, and activity-dependent plasticity. The quantification of dendritic spines from light microscopy data is usually performed by humans in a painstaking and error-prone process. We found that human-to-human variability is substantial (inter-rater reliability 82.2±6.4%), raising concerns about the reproducibility of experiments and the validity of using human-annotated 'ground truth' as an evaluation method for computational approaches of spine identification. To address this, we present DeepD3, an open deep learning-based framework to robustly quantify dendritic spines in microscopy data in a fully automated fashion. DeepD3's neural networks have been trained on data from different sources and experimental conditions, annotated and segmented by multiple experts and they offer precise quantification of dendrites and dendritic spines. Importantly, these networks were validated in a number of datasets on varying acquisition modalities, species, anatomical locations and fluorescent indicators. The entire DeepD3 open framework, including the fully segmented training data, a benchmark that multiple experts have annotated, and the DeepD3 model zoo is fully available, addressing the lack of openly available datasets of dendritic spines while offering a ready-to-use, flexible, transparent, and reproducible spine quantification method.


Subject(s)
Benchmarking , Dendritic Spines , Humans , Reproducibility of Results , Brain , Coloring Agents
3.
J Voice ; 2024 Feb 22.
Article in English | MEDLINE | ID: mdl-38395653

ABSTRACT

The Glottal Area Waveform (GAW) is an important component in quantitative clinical voice assessment, providing valuable insights into vocal fold function. In this study, we introduce a novel method employing Variational Autoencoders (VAEs) to generate synthetic GAWs. Our approach enables the creation of synthetic GAWs that closely replicate real-world data, offering a versatile tool for researchers and clinicians. We elucidate the process of manipulating the VAE latent space using the Glottal Opening Vector (GlOVe). The GlOVe allows precise control over the synthetic closure and opening of the vocal folds. By utilizing the GlOVe, we generate synthetic laryngeal biosignals. These biosignals accurately reflect vocal fold behavior, allowing for the emulation of realistic glottal opening changes. This manipulation extends to the introduction of arbitrary oscillations in the vocal folds, closely resembling real vocal fold oscillations. The range of factor coefficient values enables the generation of diverse biosignals with varying frequencies and amplitudes. Our results demonstrate that this approach yields highly accurate laryngeal biosignals, with the Normalized Mean Absolute Error values for various frequencies ranging from 9.6 â‹… 10-3 to 1.20 â‹… 10-2 for different experimented frequencies, alongside a remarkable training effectiveness, reflected in reductions of up to approximately 89.52% in key loss components. This proposed method may have implications for downstream speech synthesis and phonetics research, offering the potential for advanced and natural-sounding speech technologies.

4.
Neoplasia ; 49: 100953, 2024 03.
Article in English | MEDLINE | ID: mdl-38232493

ABSTRACT

PURPOSE: Individual prediction of treatment response is crucial for personalized treatment in multimodal approaches against head-and-neck squamous cell carcinoma (HNSCC). So far, no reliable predictive parameters for treatment schemes containing immunotherapy have been identified. This study aims to predict treatment response to induction chemo-immunotherapy based on the peripheral blood immune status in patients with locally advanced HNSCC. METHODS: The peripheral blood immune phenotype was assessed in whole blood samples in patients treated in the phase II CheckRad-CD8 trial as part of the pre-planned translational research program. Blood samples were analyzed by multicolor flow cytometry before (T1) and after (T2) induction chemo-immunotherapy with cisplatin/docetaxel/durvalumab/tremelimumab. Machine Learning techniques were used to predict pathological complete response (pCR) after induction therapy. RESULTS: The tested classifier methods (LDA, SVM, LR, RF, DT, and XGBoost) allowed a distinct prediction of pCR. Highest accuracy was achieved with a low number of features represented as principal components. Immune parameters obtained from the absolute difference (lT2-T1l) allowed the best prediction of pCR. In general, less than 30 parameters and at most 10 principal components were needed for highly accurate predictions. Across several datasets, cells of the innate immune system such as polymorphonuclear cells, monocytes, and plasmacytoid dendritic cells are most prominent. CONCLUSIONS: Our analyses imply that alterations of the innate immune cell distribution in the peripheral blood following induction chemo-immuno-therapy is highly predictive for pCR in HNSCC.


Subject(s)
Carcinoma, Squamous Cell , Head and Neck Neoplasms , Humans , Squamous Cell Carcinoma of Head and Neck/therapy , Head and Neck Neoplasms/drug therapy , Carcinoma, Squamous Cell/drug therapy , Carcinoma, Squamous Cell/genetics , Induction Chemotherapy/methods , Immunophenotyping , Immunotherapy , CD8-Positive T-Lymphocytes , Immunity, Innate
5.
Cancer Med ; 2023 Dec 22.
Article in English | MEDLINE | ID: mdl-38132808

ABSTRACT

BACKGROUND: The significance of different histological spreading patterns of tumor tissue in oral tongue squamous cell carcinoma (TSCC) is well known. Our aim was to construct a numeric parameter on a continuous scale, that is, the modified Polsby-Popper (MPP) score, to describe the aggressiveness of tumor growth and infiltration, with the potential to analyze hematoxylin and eosin-stained whole slide images (WSIs) in an automated manner. We investigated the application of the MPP score in predicting survival and cervical lymph node metastases as well as in determining patients at risk in the context of different surgical margin scenarios. METHODS: We developed a semiautomated image analysis pipeline to detect areas belonging to the tumor tissue compartment. Perimeter and area measurements of all detected tissue regions were derived, and a specific mathematical formula was applied to reflect the perimeter/area ratio in a comparable, observer-independent manner across digitized WSIs. We demonstrated the plausibility of the MPP score by correlating it with well-established clinicopathologic parameters. We then performed survival analysis to assess the relevance of the MPP score, with an emphasis on different surgical margin scenarios. Machine learning models were developed to assess the relevance of the MPP score in predicting survival and occult cervical nodal metastases. RESULTS: The MPP score was associated with unfavorable tumor growth and infiltration patterns, the presence of lymph node metastases, the extracapsular spread of tumor cells, and higher tumor thickness. Higher MPP scores were associated with worse overall survival (OS) and tongue carcinoma-specific survival (TCSS), both when assessing all pT-categories and pT1-pT2 categories only; moreover, higher MPP scores were associated with a significantly worse TCSS in cases where a cancer-free surgical margin of <5 mm could be achieved on the main surgical specimen. This discriminatory capacity remained constant when examining pT1-pT2 categories only. Importantly, the MPP score could successfully define cases at risk in terms of metastatic disease in pT1-pT2 cancer where tumor thickness failed to exhibit a significant predictive value. Machine learning (ML) models incorporating the MPP score could predict the 5-year TCSS efficiently. Furthermore, we demonstrated that machine learning models that predict occult cervical lymph node involvement can benefit from including the MPP score. CONCLUSIONS: We introduced an objective, quantifiable, and observer-independent parameter, the MPP score, representing the aggressiveness of tumor growth and infiltration in TSCC. We showed its prognostic relevance especially in pT1-pT2 category TSCC, and its possible use in ML models predicting TCSS and occult lymph node metastases.

6.
Diagn Pathol ; 18(1): 121, 2023 Nov 03.
Article in English | MEDLINE | ID: mdl-37924082

ABSTRACT

PURPOSE: Although neural networks have shown remarkable performance in medical image analysis, their translation into clinical practice remains difficult due to their lack of interpretability. An emerging field that addresses this problem is Explainable AI. METHODS: Here, we aimed to investigate the ability of Convolutional Neural Networks (CNNs) to classify head and neck cancer histopathology. To this end, we manually annotated 101 histopathological slides of locally advanced head and neck squamous cell carcinoma. We trained a CNN to classify tumor and non-tumor tissue, and another CNN to semantically segment four classes - tumor, non-tumor, non-specified tissue, and background. We applied Explainable AI techniques, namely Grad-CAM and HR-CAM, to both networks and explored important features that contributed to their decisions. RESULTS: The classification network achieved an accuracy of 89.9% on previously unseen data. Our segmentation network achieved a class-averaged Intersection over Union score of 0.690, and 0.782 for tumor tissue in particular. Explainable AI methods demonstrated that both networks rely on features agreeing with the pathologist's expert opinion. CONCLUSION: Our work suggests that CNNs can predict head and neck cancer with high accuracy. Especially if accompanied by visual explanations, CNNs seem promising for assisting pathologists in the assessment of cancer sections.


Subject(s)
Head and Neck Neoplasms , Image Processing, Computer-Assisted , Humans , Image Processing, Computer-Assisted/methods , Neural Networks, Computer , Squamous Cell Carcinoma of Head and Neck
7.
Front Psychiatry ; 14: 1197697, 2023.
Article in English | MEDLINE | ID: mdl-37953937

ABSTRACT

Background: The interest in psychoactive agents for treating mental disorders has gathered a growing body of scientific interest. However, research on the relationship between altered states of consciousness (ASCs) and ketamine's antidepressant properties is still limited. Likewise, approaches to sustain early treatment success for the long-term are needed. Taking both aspects into account, the question arises whether the persistence of recurrent ASCs during the subsequent infusion sessions is crucial for the preservation of antidepressant effects during prolonged continued ketamine therapy. Aim: In this case study we explored whether recurrent ASC experiences across a large number of infusions are associated with improved antidepressant effects in a single case study. Methods: A 62-year-old patient with treatment-resistant depression, who has been suffering from depressive episodes for over 20 years, was observed for 12 consecutive infusions across 16 weeks. ASCs during ketamine sessions were measured with the 5D-ASC, and pre/post-infusion depression scores with the BDI-II questionnaire. To emphasize psychoactive experiences a personalized antidepressant dose regimen was used. Results: We found a strong correlation between the experienced ASCs during ketamine infusions and the antidepressant effect: the stronger the ASCs overall, the stronger the resulting antidepressant effect. This correlation was consistently observed throughout the infusion series, independent of the number of ketamine sessions completed before. However, despite a personalized dose regimen, neither peri-infusion ASCs nor antidepressant effects could be established on a regular basis, leading overall to no improvement in treatment outcome. Conclusion: Maintaining psychoactive effects over repeated ketamine infusions may be key to facilitate long-lasting antidepressant effects. However, for some depressed individuals maintenance of antidepressant effects and/or peri-infusion ASCs might not be achieved, even when personalized dosing is used.

8.
Bioengineering (Basel) ; 10(10)2023 Sep 25.
Article in English | MEDLINE | ID: mdl-37892855

ABSTRACT

As today's society ages, age-related diseases become more frequent. One very common but yet preventable disease is the development of pressure ulcers (PUs). PUs can occur if tissue is exposed to a long-lasting pressure load, e.g., lying on tissue without turning. The cure of PUs requires intensive care, especially for the elderly or people with preexisting conditions whose tissue needs longer healing times. The consequences are heavy suffering for the patient and extreme costs for the health care system. To avoid these consequences, our objective is to develop a pressure ulcer prophylaxis device. For that, we built a new sensor system able to monitor the pressure load and tissue vital signs in immediate local proximity at patient's predilection sites. In the clinical study, we found several indicators showing correlations between tissue perfusion and the risk of PU development, including strongly reduced SpO2 levels in body tissue prior to a diagnosed PU. Finally, we propose a prophylaxis system that allows for the prediction of PU developments in early stages before they become visible. This work is the first step in generating an effective system to warn patients or caregivers about developing PUs and taking appropriate preventative measures. Widespread application could reduce patient suffering and lead to substantial cost savings.

9.
IEEE J Transl Eng Health Med ; 11: 137-144, 2023.
Article in English | MEDLINE | ID: mdl-36816097

ABSTRACT

High-speed videoendoscopy is a major tool for quantitative laryngology. Glottis segmentation and glottal midline detection are crucial for computing vocal fold-specific, quantitative parameters. However, fully automated solutions show limited clinical applicability. Especially unbiased glottal midline detection remains a challenging problem. We developed a multitask deep neural network for glottis segmentation and glottal midline detection. We used techniques from pose estimation to estimate the anterior and posterior points in endoscopy images. Neural networks were set up in TensorFlow/Keras and trained and evaluated with the BAGLS dataset. We found that a dual decoder deep neural network termed GlottisNetV2 outperforms the previously proposed GlottisNet in terms of MAPE on the test dataset (1.85% to 6.3%) while converging faster. Using various hyperparameter tunings, we allow fast and directed training. Using temporal variant data on an additional data set designed for this task, we can improve the median prediction accuracy from 2.1% to 1.76% when using 12 consecutive frames and additional temporal filtering. We found that temporal glottal midline detection using a dual decoder architecture together with keypoint estimation allows accurate midline prediction. We show that our proposed architecture allows stable and reliable glottal midline predictions ready for clinical use and analysis of symmetry measures.


Subject(s)
Glottis , Vocal Cords , Neural Networks, Computer , Endoscopy
10.
J Speech Lang Hear Res ; 66(2): 565-572, 2023 02 13.
Article in English | MEDLINE | ID: mdl-36716396

ABSTRACT

PURPOSE: This research note illustrates the effects of video data with nonsquare pixels on the pixel-based measures obtained from videofluoroscopic swallow studies (VFSS). METHOD: Six pixel-based distance and area measures were obtained from two different videoflouroscopic study units; both yielding videos with nonsquare pixels with different pixel aspect ratios (PARs). The swallowing measures were obtained from the original VFSS videos and from the videos after their pixels were squared. RESULTS: The results demonstrated significant multivariate effects both in video type (original vs. squared) and in the interaction between video type and sample (two video recordings of different patients, different PARs, and opposing tilt angles of the external reference). A wide range of variabilities was observed on the pixel-based measures between original and squared videos with the percent deviation ranging from 0.1% to 9.1% with the maximum effect size of 7.43. CONCLUSIONS: This research note demonstrates the effect of disregarding PAR to distance and area pixel-based parameters. In addition, we present a multilevel roadmap to prevent possible measurement errors that could occur. At the planning stage, the PAR of video source should be identified, and, at the analyses stage, video data should be prescaled prior to analysis with PAR-unaware software. No methodology in prior absolute or relative pixel-based studies reports adjustment to the PAR prior to measurements nor identify the PAR as a possible source of variation within the literature. Addressing PAR will improve the precision and stability of pixel-based VFSS findings and improve comparability within and across clinical and research settings. SUPPLEMENTAL MATERIAL: https://doi.org/10.23641/asha.21957134.


Subject(s)
Deglutition Disorders , Humans , Deglutition Disorders/diagnostic imaging , Deglutition , Video Recording/methods , Software , Fluoroscopy/methods
11.
PLoS One ; 17(9): e0266989, 2022.
Article in English | MEDLINE | ID: mdl-36129922

ABSTRACT

Deep Learning has a large impact on medical image analysis and lately has been adopted for clinical use at the point of care. However, there is only a small number of reports of long-term studies that show the performance of deep neural networks (DNNs) in such an environment. In this study, we measured the long-term performance of a clinically optimized DNN for laryngeal glottis segmentation. We have collected the video footage for two years from an AI-powered laryngeal high-speed videoendoscopy imaging system and found that the footage image quality is stable across time. Next, we determined the DNN segmentation performance on lossy and lossless compressed data revealing that only 9% of recordings contain segmentation artifacts. We found that lossy and lossless compression is on par for glottis segmentation, however, lossless compression provides significantly superior image quality. Lastly, we employed continual learning strategies to continuously incorporate new data into the DNN to remove the aforementioned segmentation artifacts. With modest manual intervention, we were able to largely alleviate these segmentation artifacts by up to 81%. We believe that our suggested deep learning-enhanced laryngeal imaging platform consistently provides clinically sound results, and together with our proposed continual learning scheme will have a long-lasting impact on the future of laryngeal imaging.


Subject(s)
Larynx , Point-of-Care Systems , Artifacts , Glottis/diagnostic imaging , Image Processing, Computer-Assisted/methods , Larynx/diagnostic imaging , Neural Networks, Computer
12.
Front Pharmacol ; 13: 916641, 2022.
Article in English | MEDLINE | ID: mdl-35959442

ABSTRACT

Background: Cognition that is not dominated by thinking in terms of opposites (opposite diminishing) or by making judgments (non-judging) can be found both in Buddhist/mindfulness contexts and in mental states that are fostered by dissociative psychedelics (N-methyl-D-aspartate antagonists) such as ketamine. Especially for the Buddhist/mindfulness case, both opposite diminishing and non-judging have been proposed to relate to mental well-being. Whether ketamine-occasioned opposite diminishing and/or non-judging relate to increased mental well-being in the form of antidepressant response is unknown, and was investigated in the present study. Methods: In this open-label outpatient study, the dose level and frequency for the ketamine infusions were adjusted individually in close consultation with the patients suffering from depression with the overall goal to maximize antidepressant benefits-a novel dose regimen that we term personalized antidepressant dosing. In general, treatment started with an initial series of ketamine infusions with a dosage of 0.5 mg/kg body weight and was then adjusted (usually increased). A possible relationship between ketamine-induced antidepressant benefits and retrospectively reported peri-infusion experiences of opposite diminishing and non-judging was assessed based on a total of 45 ketamine-infusion treatment sessions from 11 different patients suffering from depression. Opposite diminishing and non-judging were measured with the two items from the Altered States of Consciousness Inventory (ASCI) that measure these concepts. Depression was measured with the Beck Depression Inventory (BDI-II). Results: Peri-infusion experiences of both opposite diminishing and non-judging were associated with antidepressant responses confirming our hypothesis. Furthermore, opposite diminishing and non-judging were closely related to one another while relating to antidepressant response in distinguishable ways. Conclusion: Future controlled randomized trials with dissociative and other psychedelics and with a larger number of participants are needed to establish the possible link of psychedelically induced opposite diminishing and non-judging with an antidepressant response more firmly.

13.
Sci Rep ; 12(1): 14292, 2022 08 22.
Article in English | MEDLINE | ID: mdl-35995933

ABSTRACT

Glottis segmentation is a crucial step to quantify endoscopic footage in laryngeal high-speed videoendoscopy. Recent advances in deep neural networks for glottis segmentation allow for a fully automatic workflow. However, exact knowledge of integral parts of these deep segmentation networks remains unknown, and understanding the inner workings is crucial for acceptance in clinical practice. Here, we show that a single latent channel as a bottleneck layer is sufficient for glottal area segmentation using systematic ablations. We further demonstrate that the latent space is an abstraction of the glottal area segmentation relying on three spatially defined pixel subtypes allowing for a transparent interpretation. We further provide evidence that the latent space is highly correlated with the glottal area waveform, can be encoded with four bits, and decoded using lean decoders while maintaining a high reconstruction accuracy. Our findings suggest that glottis segmentation is a task that can be highly optimized to gain very efficient and explainable deep neural networks, important for application in the clinic. In the future, we believe that online deep learning-assisted monitoring is a game-changer in laryngeal examinations.


Subject(s)
Glottis , Larynx , Endoscopy , Glottis/diagnostic imaging , Image Processing, Computer-Assisted , Neural Networks, Computer , Video Recording
14.
J Expo Sci Environ Epidemiol ; 32(5): 727-734, 2022 09.
Article in English | MEDLINE | ID: mdl-34611302

ABSTRACT

BACKGROUND: In the CoVID-19 pandemic, singing came into focus as a high-risk activity for the infection with airborne viruses and was therefore forbidden by many governmental administrations. OBJECTIVE: The aim of this study is to investigate the effectiveness of surgical masks regarding the spatial and temporal dispersion of aerosol and droplets during professional singing. METHODS: Ten professional singers performed a passage of the Ludwig van Beethoven's "Ode of Joy" in two experimental setups-each with and without surgical masks. First, they sang with previously inhaled vapor of e-cigarettes. The emitted cloud was recorded by three cameras to measure its dispersion dynamics. Secondly, the naturally expelled larger droplets were illuminated by a laser light sheet and recorded by a high-speed camera. RESULTS: The exhaled vapor aerosols were decelerated and deflected by the mask and stayed in the singer's near-field around and above their heads. In contrast, without mask, the aerosols spread widely reaching distances up to 1.3 m. The larger droplets were reduced by up to 86% with a surgical mask worn. SIGNIFICANCE: The study shows that surgical masks display an effective tool to reduce the range of aerosol dispersion during singing. In combination with an appropriate aeration strategy for aerosol removal, choir singers could be positioned in a more compact assembly without contaminating neighboring singers all singers.


Subject(s)
COVID-19 , Electronic Nicotine Delivery Systems , Singing , Humans , Masks , Pandemics , Respiratory Aerosols and Droplets
15.
Nat Commun ; 12(1): 6694, 2021 11 18.
Article in English | MEDLINE | ID: mdl-34795244

ABSTRACT

Animals must adapt their behavior to survive in a changing environment. Behavioral adaptations can be evoked by two mechanisms: feedback control and internal-model-based control. Feedback controllers can maintain the sensory state of the animal at a desired level under different environmental conditions. In contrast, internal models learn the relationship between the motor output and its sensory consequences and can be used to recalibrate behaviors. Here, we present multiple unpredictable perturbations in visual feedback to larval zebrafish performing the optomotor response and show that they react to these perturbations through a feedback control mechanism. In contrast, if a perturbation is long-lasting, fish adapt their behavior by updating a cerebellum-dependent internal model. We use modelling and functional imaging to show that the neuronal requirements for these mechanisms are met in the larval zebrafish brain. Our results illustrate the role of the cerebellum in encoding internal models and how these can calibrate neuronal circuits involved in reactive behaviors depending on the interactions between animal and environment.


Subject(s)
Cerebellum/physiology , Feedback, Physiological/physiology , Feedback, Sensory/physiology , Zebrafish/physiology , Adaptation, Physiological/physiology , Animals , Animals, Genetically Modified , Brain/cytology , Brain/physiology , Cerebellum/cytology , Humans , Larva/genetics , Larva/physiology , Learning/physiology , Neurons/physiology , Zebrafish/genetics
16.
Sci Rep ; 11(1): 13760, 2021 07 02.
Article in English | MEDLINE | ID: mdl-34215788

ABSTRACT

High-speed videoendoscopy is an important tool to study laryngeal dynamics, to quantify vocal fold oscillations, to diagnose voice impairments at laryngeal level and to monitor treatment progress. However, there is a significant lack of an open source, expandable research tool that features latest hardware and data analysis. In this work, we propose an open research platform termed OpenHSV that is based on state-of-the-art, commercially available equipment and features a fully automatic data analysis pipeline. A publicly available, user-friendly graphical user interface implemented in Python is used to interface the hardware. Video and audio data are recorded in synchrony and are subsequently fully automatically analyzed. Video segmentation of the glottal area is performed using efficient deep neural networks to derive glottal area waveform and glottal midline. Established quantitative, clinically relevant video and audio parameters were implemented and computed. In a preliminary clinical study, we recorded video and audio data from 28 healthy subjects. Analyzing these data in terms of image quality and derived quantitative parameters, we show the applicability, performance and usefulness of OpenHSV. Therefore, OpenHSV provides a valid, standardized access to high-speed videoendoscopy data acquisition and analysis for voice scientists, highlighting its use as a valuable research tool in understanding voice physiology. We envision that OpenHSV serves as basis for the next generation of clinical HSV systems.


Subject(s)
Glottis/surgery , Laryngeal Diseases/surgery , Laryngoscopy/methods , Larynx/surgery , Adolescent , Adult , Female , Glottis/diagnostic imaging , Glottis/physiopathology , Humans , Laryngeal Diseases/diagnostic imaging , Laryngeal Diseases/pathology , Laryngoscopy/instrumentation , Larynx/diagnostic imaging , Larynx/pathology , Male , Middle Aged , Neural Networks, Computer , Video Recording , Vocal Cords/diagnostic imaging , Vocal Cords/physiopathology , Vocal Cords/surgery , Voice/physiology , Voice Disorders/diagnostic imaging , Voice Disorders/physiopathology , Voice Disorders/surgery , Voice Quality/physiology , Young Adult
17.
J Speech Lang Hear Res ; 64(6): 1889-1903, 2021 06 04.
Article in English | MEDLINE | ID: mdl-34000199

ABSTRACT

Purpose High-speed videoendoscopy (HSV) is an emerging, but barely used, endoscopy technique in the clinic to assess and diagnose voice disorders because of the lack of dedicated software to analyze the data. HSV allows to quantify the vocal fold oscillations by segmenting the glottal area. This challenging task has been tackled by various studies; however, the proposed approaches are mostly limited and not suitable for daily clinical routine. Method We developed a user-friendly software in C# that allows the editing, motion correction, segmentation, and quantitative analysis of HSV data. We further provide pretrained deep neural networks for fully automatic glottis segmentation. Results We freely provide our software Glottis Analysis Tools (GAT). Using GAT, we provide a general threshold-based region growing platform that enables the user to analyze data from various sources, such as in vivo recordings, ex vivo recordings, and high-speed footage of artificial vocal folds. Additionally, especially for in vivo recordings, we provide three robust neural networks at various speed and quality settings to allow a fully automatic glottis segmentation needed for application by untrained personnel. GAT further evaluates video and audio data in parallel and is able to extract various features from the video data, among others the glottal area waveform, that is, the changing glottal area over time. In total, GAT provides 79 unique quantitative analysis parameters for video- and audio-based signals. Many of these parameters have already been shown to reflect voice disorders, highlighting the clinical importance and usefulness of the GAT software. Conclusion GAT is a unique tool to process HSV and audio data to determine quantitative, clinically relevant parameters for research, diagnosis, and treatment of laryngeal disorders. Supplemental Material https://doi.org/10.23641/asha.14575533.


Subject(s)
Deep Learning , Larynx , Glottis , Humans , Laryngoscopy , Phonation , Software , Vibration , Video Recording , Vocal Cords
18.
PLoS One ; 16(2): e0246136, 2021.
Article in English | MEDLINE | ID: mdl-33529244

ABSTRACT

In voice research, uncovering relations between the oscillating vocal folds, being the sound source of phonation, and the resulting perceived acoustic signal are of great interest. This is especially the case in the context of voice disorders, such as functional dysphonia (FD). We investigated 250 high-speed videoendoscopy (HSV) recordings with simultaneously recorded acoustic signals (124 healthy females, 60 FD females, 44 healthy males, 22 FD males). 35 glottal area waveform (GAW) parameters and 14 acoustic parameters were calculated for each recording. Linear and non-linear relations between GAW and acoustic parameters were investigated using Pearson correlation coefficients (PCC) and distance correlation coefficients (DCC). Further, norm values for parameters obtained from 250 ms long sustained phonation data (vowel /i/) were provided. 26 PCCs in females (5.3%) and 8 in males (1.6%) were found to be statistically significant (|corr.| ≥ 0.3). Only minor differences were found between PCCs and DCCs, indicating presence of weak non-linear dependencies between parameters. Fundamental frequency was involved in the majority of all relevant PCCs between GAW and acoustic parameters (19 in females and 7 in males). The most distinct difference between correlations in females and males was found for the parameter Period Variability Index. The study shows only weak relations between investigated acoustic and GAW-parameters. This indicates that the reduction of the complex 3D glottal dynamics to the 1D-GAW may erase laryngeal dynamic characteristics that are reflected within the acoustic signal. Hence, other GAW parameters, 2D-, 3D-laryngeal dynamics and vocal tract parameters should be further investigated towards potential correlations to the acoustic signal.


Subject(s)
Dysphonia/physiopathology , Glottis/physiopathology , Laryngoscopy/methods , Acoustics , Adult , Aged , Case-Control Studies , Female , Humans , Laryngoscopy/instrumentation , Male , Middle Aged , Video Recording , Voice Quality , Young Adult
19.
Sci Rep ; 10(1): 20723, 2020 11 26.
Article in English | MEDLINE | ID: mdl-33244031

ABSTRACT

A healthy voice is crucial for verbal communication and hence in daily as well as professional life. The basis for a healthy voice are the sound producing vocal folds in the larynx. A hallmark of healthy vocal fold oscillation is the symmetric motion of the left and right vocal fold. Clinically, videoendoscopy is applied to assess the symmetry of the oscillation and evaluated subjectively. High-speed videoendoscopy, an emerging method that allows quantification of the vocal fold oscillation, is more commonly employed in research due to the amount of data and the complex, semi-automatic analysis. In this study, we provide a comprehensive evaluation of methods that detect fully automatically the glottal midline. We used a biophysical model to simulate different vocal fold oscillations, extended the openly available BAGLS dataset using manual annotations, utilized both, simulations and annotated endoscopic images, to train deep neural networks at different stages of the analysis workflow, and compared these to established computer vision algorithms. We found that classical computer vision perform well on detecting the glottal midline in glottis segmentation data, but are outperformed by deep neural networks on this task. We further suggest GlottisNet, a multi-task neural architecture featuring the simultaneous prediction of both, the opening between the vocal folds and the symmetry axis, leading to a huge step forward towards clinical applicability of quantitative, deep learning-assisted laryngeal endoscopy, by fully automating segmentation and midline detection.

20.
IEEE J Transl Eng Health Med ; 8: 2100511, 2020.
Article in English | MEDLINE | ID: mdl-32518739

ABSTRACT

BACKGROUND: Various voice assessment tools, such as questionnaires and aerodynamic voice characteristics, can be used to assess vocal function of individuals. However, not much is known about the best combinations of these parameters in identification of functional dysphonia in clinical settings. METHODS: This study investigated six scores from clinically commonly used questionnaires and seven acoustic parameters. 514 females and 277 males were analyzed. The subjects were divided into three groups: one healthy group (N01) (49 females, 50 males) and two disordered groups with perceptually hoarse (FD23) (220 females, 96 males) and perceptually not hoarse (FD01) (245 females, 131 males) sounding voices. A tree stumps Adaboost approach was applied to find the subset of parameters that best separates the groups. Subsequently, it was determined if this parameter subset reflects treatment outcome for 120 female and 51 male patients by pairwise pre- and post-treatment comparisons of parameters. RESULTS: The questionnaire "Voice-related-quality-of-Life" and three objective parameters ("maximum fundamental frequency", "maximum Intensity" and "Jitter Percent") were sufficient to separate the groups (accuracy ranging from 0.690 (FD01 vs. FD23, females) to 0.961 (N01 vs. FD23, females)). Our study suggests that a reduced parameter subset (4 out of 13) is sufficient to separate these three groups. All parameters reflected treatment outcome for patients with hoarse voices, Voice-related-quality-of-Life showed improvement for the not hoarse group (FD01). CONCLUSION: Results show that single parameters are insufficient to separate voice disorders but a set of several well-chosen parameters is. These findings will help to optimize and reduce clinical assessment time.

SELECTION OF CITATIONS
SEARCH DETAIL
...