Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 87
Filtrar
1.
Radiother Oncol ; 194: 110186, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38412906

RESUMO

BACKGROUND: Accurate gross tumor volume (GTV) delineation is a critical step in radiation therapy treatment planning. However, it is reader dependent and thus susceptible to intra- and inter-reader variability. GTV delineation of soft tissue sarcoma (STS) often relies on CT and MR images. PURPOSE: This study investigates the potential role of 18F-FDG PET in reducing intra- and inter-reader variability thereby improving reproducibility of GTV delineation in STS, without incurring additional costs or radiation exposure. MATERIALS AND METHODS: Three readers performed independent GTV delineation of 61 patients with STS using first CT and MR followed by CT, MR, and 18F-FDG PET images. Each reader performed a total of six delineation trials, three trials per imaging modality group. Dice Similarity Coefficient (DSC) score and Hausdorff distance (HD) were used to assess both intra- and inter-reader variability using generated simultaneous truth and performance level estimation (STAPLE) GTVs as ground truth. Statistical analysis was performed using a Wilcoxon signed-ranked test. RESULTS: There was a statistically significant decrease in both intra- and inter-reader variability in GTV delineation using CT, MR 18F-FDG PET images vs. CT and MR images. This was translated by an increase in the DSC score and a decrease in the HD for GTVs drawn from CT, MR and 18F-FDG PET images vs. GTVs drawn from CT and MR for all readers and across all three trials. CONCLUSION: Incorporation of 18F-FDG PET into CT and MR images decreased intra- and inter-reader variability and subsequently increased reproducibility of GTV delineation in STS.


Assuntos
Fluordesoxiglucose F18 , Imageamento por Ressonância Magnética , Tomografia por Emissão de Pósitrons , Sarcoma , Carga Tumoral , Humanos , Sarcoma/diagnóstico por imagem , Sarcoma/patologia , Sarcoma/radioterapia , Tomografia por Emissão de Pósitrons/métodos , Feminino , Masculino , Imageamento por Ressonância Magnética/métodos , Pessoa de Meia-Idade , Compostos Radiofarmacêuticos , Variações Dependentes do Observador , Adulto , Idoso , Reprodutibilidade dos Testes , Tomografia Computadorizada por Raios X/métodos , Neoplasias de Tecidos Moles/diagnóstico por imagem , Neoplasias de Tecidos Moles/patologia , Neoplasias de Tecidos Moles/radioterapia , Planejamento da Radioterapia Assistida por Computador/métodos
2.
Magn Reson Med ; 91(1): 61-74, 2024 01.
Artigo em Inglês | MEDLINE | ID: mdl-37677043

RESUMO

PURPOSE: To improve the spatiotemporal qualities of images and dynamics of speech MRI through an improved data sampling and image reconstruction approach. METHODS: For data acquisition, we used a Poisson-disc random under sampling scheme that reduced the undersampling coherence. For image reconstruction, we proposed a novel locally higher-rank partial separability model. This reconstruction model represented the oral and static regions using separate low-rank subspaces, therefore, preserving their distinct temporal signal characteristics. Regional optimized temporal basis was determined from the regional-optimized virtual coil approach. Overall, we achieved a better spatiotemporal image reconstruction quality with the potential of reducing total acquisition time by 50%. RESULTS: The proposed method was demonstrated through several 2-mm isotropic, 64 mm total thickness, dynamic acquisitions with 40 frames per second and compared to the previous approach using a global subspace model along with other k-space sampling patterns. Individual timeframe images and temporal profiles of speech samples were shown to illustrate the ability of the Poisson-disc under sampling pattern in reducing total acquisition time. Temporal information of sagittal and coronal directions was also shown to illustrate the effectiveness of the locally higher-rank operator and regional optimized temporal basis. To compare the reconstruction qualities of different regions, voxel-wise temporal SNR analysis were performed. CONCLUSION: Poisson-disc sampling combined with a locally higher-rank model and a regional-optimized temporal basis can drastically improve the spatiotemporal image quality and provide a 50% reduction in overall acquisition time.


Assuntos
Imageamento por Ressonância Magnética , Fala , Imageamento por Ressonância Magnética/métodos , Processamento de Imagem Assistida por Computador/métodos , Algoritmos
3.
Interspeech ; 2023: 4189-4193, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38107509

RESUMO

Finite element models (FEM) of the tongue have facilitated speech studies through analysis of internal muscle forces indirectly derived from imaging data. In this work, we build a uniform hexahedral FEM of a tongue atlas constructed from magnetic resonance imaging data of a healthy population. The FEM is driven by inverse internal tongue tissue kinematics of speakers temporally aligned and deformed into the same atlas space, while performing the speech task "a souk" allowing muscle activation predictions. This work aims to investigate the commonalities in tongue motor strategies in the articulation of "a souk" predicted by the inverse tongue atlas model. Our findings report variability among five speakers for estimated muscle activations with a similarity index using a dynamic time warp function. Two speakers show similarity index > 0.9 and two others < 0.7 with respect to a reference speaker for most tongue muscles. The relative motion tracking error of the model is less than 2% which is promising for speech study applications.

4.
Artigo em Inglês | MEDLINE | ID: mdl-38009135

RESUMO

Investigating the relationship between internal tissue point motion of the tongue and oropharyngeal muscle deformation measured from tagged MRI and intelligible speech can aid in advancing speech motor control theories and developing novel treatment methods for speech related-disorders. However, elucidating the relationship between these two sources of information is challenging, due in part to the disparity in data structure between spatiotemporal motion fields (i.e., 4D motion fields) and one-dimensional audio waveforms. In this work, we present an efficient encoder-decoder translation network for exploring the predictive information inherent in 4D motion fields via 2D spectrograms as a surrogate of the audio data. Specifically, our encoder is based on 3D convolutional spatial modeling and transformer-based temporal modeling. The extracted features are processed by an asymmetric 2D convolution decoder to generate spectrograms that correspond to 4D motion fields. Furthermore, we incorporate a generative adversarial training approach into our framework to further improve synthesis quality on our generated spectrograms. We experiment on 63 paired motion field sequences and speech waveforms, demonstrating that our framework enables the generation of clear audio waveforms from a sequence of motion fields. Thus, our framework has the potential to improve our understanding of the relationship between these two modalities and inform the development of treatments for speech disorders.

5.
Artigo em Inglês | MEDLINE | ID: mdl-38031559

RESUMO

Cardiac cine magnetic resonance imaging (MRI) has been used to characterize cardiovascular diseases (CVD), often providing a noninvasive phenotyping tool. While recently flourished deep learning based approaches using cine MRI yield accurate characterization results, the performance is often degraded by small training samples. In addition, many deep learning models are deemed a "black box," for which models remain largely elusive in how models yield a prediction and how reliable they are. To alleviate this, this work proposes a lightweight successive subspace learning (SSL) framework for CVD classification, based on an interpretable feedforward design, in conjunction with a cardiac atlas. Specifically, our hierarchical SSL model is based on (i) neighborhood voxel expansion, (ii) unsupervised subspace approximation, (iii) supervised regression, and (iv) multi-level feature integration. In addition, using two-phase 3D deformation fields, including end-diastolic and end-systolic phases, derived between the atlas and individual subjects as input offers objective means of assessing CVD, even with small training samples. We evaluate our framework on the ACDC2017 database, comprising one healthy group and four disease groups. Compared with 3D CNN-based approaches, our framework achieves superior classification performance with 140× fewer parameters, which supports its potential value in clinical use.

6.
Artigo em Inglês | MEDLINE | ID: mdl-37621417

RESUMO

New developments in dynamic magnetic resonance imaging (MRI) facilitate high-quality data acquisition of human velopharyngeal deformations in real-time speech. With recently established speech motion atlases, group analysis is made possible via spatially and temporally aligned datasets in the atlas space from a desired population of interest. In practice, when analyzing motion characteristics from various subjects performing a designated speech task, it is observed that different subjects' velopharyngeal deformation patterns could vary during the pronunciation of the same utterance, regardless of the spatial and temporal alignment of their MRI. Since such variation can be subtle, identification and extraction of unique patterns out of these high-dimensional datasets is a challenging task. In this work, we present a method that computes and visualizes subtle deformation variation patterns as principal components of a subject group's dynamic motion fields in the atlas space. Coupled with the real-time speech audio recordings during image acquisition, the key time frames that contain maximum speech variations are identified by the principal components of temporally aligned audio waveforms, which in turn inform the temporal location of the maximum spatial deformation variation. Henceforth, the motion fields between the key frames and the reference frame for each subject are computed and warped into the common atlas space, enabling a direct extraction of motion variation patterns via quantitative analysis. The method was evaluated on a dataset of twelve healthy subjects. Subtle velopharyngeal motion differences were visualized quantitatively to reveal pronunciation-specific patterns among different subjects.

7.
ArXiv ; 2023 May 30.
Artigo em Inglês | MEDLINE | ID: mdl-37396599

RESUMO

Deep learning (DL) models for segmenting various anatomical structures have achieved great success via a static DL model that is trained in a single source domain. Yet, the static DL model is likely to perform poorly in a continually evolving environment, requiring appropriate model updates. In an incremental learning setting, we would expect that well-trained static models are updated, following continually evolving target domain data-e.g., additional lesions or structures of interest-collected from different sites, without catastrophic forgetting. This, however, poses challenges, due to distribution shifts, additional structures not seen during the initial model training, and the absence of training data in a source domain. To address these challenges, in this work, we seek to progressively evolve an "off-the-shelf" trained segmentation model to diverse datasets with additional anatomical categories in a unified manner. Specifically, we first propose a divergence-aware dual-flow module with balanced rigidity and plasticity branches to decouple old and new tasks, which is guided by continuous batch renormalization. Then, a complementary pseudo-label training scheme with self-entropy regularized momentum MixUp decay is developed for adaptive network optimization. We evaluated our framework on a brain tumor segmentation task with continually changing target domains-i.e., new MRI scanners/modalities with incremental structures. Our framework was able to well retain the discriminability of previously learned structures, hence enabling the realistic life-long segmentation model extension along with the widespread accumulation of big medical data.

8.
Cleft Palate Craniofac J ; : 10556656231183385, 2023 Jun 19.
Artigo em Inglês | MEDLINE | ID: mdl-37335134

RESUMO

OBJECTIVE: To introduce a highly innovative imaging method to study the complex velopharyngeal (VP) system and introduce the potential future clinical applications of a VP atlas in cleft care. DESIGN: Four healthy adults participated in a 20-min dynamic magnetic resonance imaging scan that included a high-resolution T2-weighted turbo-spin-echo 3D structural scan and five custom dynamic speech imaging scans. Subjects repeated a variety of phrases when in the scanner as real-time audio was captured. SETTING: Multisite institution and clinical setting. PARTICIPANTS: Four adult subjects with normal anatomy were recruited for this study. MAIN OUTCOME: Establishment of 4-D atlas constructed from dynamic VP MRI data. RESULTS: Three-dimensional dynamic magnetic resonance imaging was successfully used to obtain high quality dynamic speech scans in an adult population. Scans were able to be re-sliced in various imaging planes. Subject-specific MR data were then reconstructed and time-aligned to create a velopharyngeal atlas representing the averaged physiological movements across the four subjects. CONCLUSIONS: The current preliminary study examined the feasibility of developing a VP atlas for potential clinical applications in cleft care. Our results indicate excellent potential for the development and use of a VP atlas for assessing VP physiology during speech.

9.
Med Image Anal ; 88: 102851, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37329854

RESUMO

Self-training is an important class of unsupervised domain adaptation (UDA) approaches that are used to mitigate the problem of domain shift, when applying knowledge learned from a labeled source domain to unlabeled and heterogeneous target domains. While self-training-based UDA has shown considerable promise on discriminative tasks, including classification and segmentation, through reliable pseudo-label filtering based on the maximum softmax probability, there is a paucity of prior work on self-training-based UDA for generative tasks, including image modality translation. To fill this gap, in this work, we seek to develop a generative self-training (GST) framework for domain adaptive image translation with continuous value prediction and regression objectives. Specifically, we quantify both aleatoric and epistemic uncertainties within our GST using variational Bayes learning to measure the reliability of synthesized data. We also introduce a self-attention scheme that de-emphasizes the background region to prevent it from dominating the training process. The adaptation is then carried out by an alternating optimization scheme with target domain supervision that focuses attention on the regions with reliable pseudo-labels. We evaluated our framework on two cross-scanner/center, inter-subject translation tasks, including tagged-to-cine magnetic resonance (MR) image translation and T1-weighted MR-to-fractional anisotropy translation. Extensive validations with unpaired target domain data showed that our GST yielded superior synthesis performance in comparison to adversarial training UDA methods.


Assuntos
Processamento de Imagem Assistida por Computador , Aprendizagem , Humanos , Teorema de Bayes , Reprodutibilidade dos Testes , Anisotropia , Incerteza
10.
ArXiv ; 2023 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-37292465

RESUMO

Self-training is an important class of unsupervised domain adaptation (UDA) approaches that are used to mitigate the problem of domain shift, when applying knowledge learned from a labeled source domain to unlabeled and heterogeneous target domains. While self-training-based UDA has shown considerable promise on discriminative tasks, including classification and segmentation, through reliable pseudo-label filtering based on the maximum softmax probability, there is a paucity of prior work on self-training-based UDA for generative tasks, including image modality translation. To fill this gap, in this work, we seek to develop a generative self-training (GST) framework for domain adaptive image translation with continuous value prediction and regression objectives. Specifically, we quantify both aleatoric and epistemic uncertainties within our GST using variational Bayes learning to measure the reliability of synthesized data. We also introduce a self-attention scheme that de-emphasizes the background region to prevent it from dominating the training process. The adaptation is then carried out by an alternating optimization scheme with target domain supervision that focuses attention on the regions with reliable pseudo-labels. We evaluated our framework on two cross-scanner/center, inter-subject translation tasks, including tagged-to-cine magnetic resonance (MR) image translation and T1-weighted MR-to-fractional anisotropy translation. Extensive validations with unpaired target domain data showed that our GST yielded superior synthesis performance in comparison to adversarial training UDA methods.

11.
ArXiv ; 2023 Mar 17.
Artigo em Inglês | MEDLINE | ID: mdl-36994161

RESUMO

Background: In medical imaging, images are usually treated as deterministic, while their uncertainties are largely underexplored. Purpose: This work aims at using deep learning to efficiently estimate posterior distributions of imaging parameters, which in turn can be used to derive the most probable parameters as well as their uncertainties. Methods: Our deep learning-based approaches are based on a variational Bayesian inference framework, which is implemented using two different deep neural networks based on conditional variational auto-encoder (CVAE), CVAE-dual-encoder and CVAE-dual-decoder. The conventional CVAE framework, i.e., CVAE-vanilla, can be regarded as a simplified case of these two neural networks. We applied these approaches to a simulation study of dynamic brain PET imaging using a reference region-based kinetic model. Results: In the simulation study, we estimated posterior distributions of PET kinetic parameters given a measurement of time-activity curve. Our proposed CVAE-dual-encoder and CVAE-dual-decoder yield results that are in good agreement with the asymptotically unbiased posterior distributions sampled by Markov Chain Monte Carlo (MCMC). The CVAE-vanilla can also be used for estimating posterior distributions, although it has an inferior performance to both CVAE-dual-encoder and CVAE-dual-decoder. Conclusions: We have evaluated the performance of our deep learning approaches for estimating posterior distributions in dynamic brain PET. Our deep learning approaches yield posterior distributions, which are in good agreement with unbiased distributions estimated by MCMC. All these neural networks have different characteristics and can be chosen by the user for specific applications. The proposed methods are general and can be adapted to other problems.

12.
J Speech Lang Hear Res ; 66(2): 513-526, 2023 02 13.
Artigo em Inglês | MEDLINE | ID: mdl-36716389

RESUMO

PURPOSE: Muscle groups within the tongue in healthy and diseased populations show different behaviors during speech. Visualizing and quantifying strain patterns of these muscle groups during tongue motion can provide insights into tongue motor control and adaptive behaviors of a patient. METHOD: We present a pipeline to estimate the strain along the muscle fiber directions in the deforming tongue during speech production. A deep convolutional network estimates the crossing muscle fiber directions in the tongue using diffusion-weighted magnetic resonance imaging (MRI) data acquired at rest. A phase-based registration algorithm is used to estimate motion of the tongue muscles from tagged MRI acquired during speech. After transforming both muscle fiber directions and motion fields into a common atlas space, strain tensors are computed and projected onto the muscle fiber directions, forming so-called strains in the line of actions (SLAs) throughout the tongue. SLAs are then averaged over individual muscles that have been manually labeled in the atlas space using high-resolution T2-weighted MRI. Data were acquired, and this pipeline was run on a cohort of eight healthy controls and two glossectomy patients. RESULTS: The crossing muscle fibers reconstructed by the deep network show orthogonal patterns. The strain analysis results demonstrate consistency of muscle behaviors among some healthy controls during speech production. The patients show irregular muscle patterns, and their tongue muscles tend to show more extension than the healthy controls. CONCLUSIONS: The study showed visual evidence of correlation between two muscle groups during speech production. Patients tend to have different strain patterns compared to the controls. Analysis of variations in muscle strains can potentially help develop treatment strategies in oral diseases. SUPPLEMENTAL MATERIAL: https://doi.org/10.23641/asha.21957011.


Assuntos
Imageamento por Ressonância Magnética , Fala , Humanos , Fala/fisiologia , Imageamento por Ressonância Magnética/métodos , Língua/diagnóstico por imagem , Língua/fisiologia , Glossectomia , Fibras Musculares Esqueléticas
13.
Med Phys ; 50(3): 1539-1548, 2023 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-36331429

RESUMO

BACKGROUND: In medical imaging, images are usually treated as deterministic, while their uncertainties are largely underexplored. PURPOSE: This work aims at using deep learning to efficiently estimate posterior distributions of imaging parameters, which in turn can be used to derive the most probable parameters as well as their uncertainties. METHODS: Our deep learning-based approaches are based on a variational Bayesian inference framework, which is implemented using two different deep neural networks based on conditional variational auto-encoder (CVAE), CVAE-dual-encoder, and CVAE-dual-decoder. The conventional CVAE framework, that is, CVAE-vanilla, can be regarded as a simplified case of these two neural networks. We applied these approaches to a simulation study of dynamic brain PET imaging using a reference region-based kinetic model. RESULTS: In the simulation study, we estimated posterior distributions of PET kinetic parameters given a measurement of the time-activity curve. Our proposed CVAE-dual-encoder and CVAE-dual-decoder yield results that are in good agreement with the asymptotically unbiased posterior distributions sampled by Markov Chain Monte Carlo (MCMC). The CVAE-vanilla can also be used for estimating posterior distributions, although it has an inferior performance to both CVAE-dual-encoder and CVAE-dual-decoder. CONCLUSIONS: We have evaluated the performance of our deep learning approaches for estimating posterior distributions in dynamic brain PET. Our deep learning approaches yield posterior distributions, which are in good agreement with unbiased distributions estimated by MCMC. All these neural networks have different characteristics and can be chosen by the user for specific applications. The proposed methods are general and can be adapted to other problems.


Assuntos
Aprendizado Profundo , Teorema de Bayes , Tomografia por Emissão de Pósitrons/métodos , Simulação por Computador , Redes Neurais de Computação
14.
Med Image Anal ; 83: 102641, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-36265264

RESUMO

Unsupervised domain adaptation (UDA) has been a vital protocol for migrating information learned from a labeled source domain to facilitate the implementation in an unlabeled heterogeneous target domain. Although UDA is typically jointly trained on data from both domains, accessing the labeled source domain data is often restricted, due to concerns over patient data privacy or intellectual property. To sidestep this, we propose "off-the-shelf (OS)" UDA (OSUDA), aimed at image segmentation, by adapting an OS segmentor trained in a source domain to a target domain, in the absence of source domain data in adaptation. Toward this goal, we aim to develop a novel batch-wise normalization (BN) statistics adaptation framework. In particular, we gradually adapt the domain-specific low-order BN statistics, e.g., mean and variance, through an exponential momentum decay strategy, while explicitly enforcing the consistency of the domain shareable high-order BN statistics, e.g., scaling and shifting factors, via our optimization objective. We also adaptively quantify the channel-wise transferability to gauge the importance of each channel, via both low-order statistics divergence and a scaling factor. Furthermore, we incorporate unsupervised self-entropy minimization into our framework to boost performance alongside a novel queued, memory-consistent self-training strategy to utilize the reliable pseudo label for stable and efficient unsupervised adaptation. We evaluated our OSUDA-based framework on both cross-modality and cross-subtype brain tumor segmentation and cardiac MR to CT segmentation tasks. Our experimental results showed that our memory consistent OSUDA performs better than existing source-relaxed UDA methods and yields similar performance to UDA methods with source data.


Assuntos
Neoplasias Encefálicas , Aprendizagem , Humanos , Neoplasias Encefálicas/diagnóstico por imagem , Entropia , Coração , Movimento (Física)
15.
Magn Reson Med ; 89(2): 652-664, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36289572

RESUMO

PURPOSE: To enable a more comprehensive view of articulations during speech through near-isotropic 3D dynamic MRI with high spatiotemporal resolution and large vocal-tract coverage. METHODS: Using partial separability model-based low-rank reconstruction coupled with a sparse acquisition of both spatial and temporal models, we are able to achieve near-isotropic resolution 3D imaging with a high frame rate. The total acquisition time of the speech acquisition is shortened by introducing a sparse temporal sampling that interleaves one temporal navigator with four randomized phase and slice-encoded imaging samples. Memory and computation time are improved through compressing coils based on the region of interest for low-rank constrained reconstruction with an edge-preserving spatial penalty. RESULTS: The proposed method has been evaluated through experiments on several speech samples, including a standard reading passage. A near-isotropic 1.875 × 1.875 × 2 mm3 spatial resolution, 64-mm through-plane coverage, and a 35.6-fps temporal resolution are achieved. Investigations and analysis on specific speech samples support novel insights into nonsymmetric tongue movement, velum raising, and coarticulation events with adequate visualization of rapid articulatory movements. CONCLUSION: Three-dimensional dynamic images of the vocal tract structures during speech with high spatiotemporal resolution and axial coverage is capable of enhancing linguistic research, enabling visualization of soft tissue motions that are not possible with other modalities.


Assuntos
Imageamento por Ressonância Magnética , Fala , Imageamento por Ressonância Magnética/métodos , Imageamento Tridimensional/métodos , Idioma , Linguística
16.
IEEE Trans Biomed Eng ; 70(4): 1252-1263, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-36227815

RESUMO

Deep learning (DL)-based automatic sleep staging approaches have attracted much attention recently due in part to their outstanding accuracy. At the testing stage, however, the performance of these approaches is likely to be degraded, when applied in different testing environments, because of the problem of domain shift. This is because while a pre-trained model is typically trained on noise-free electroencephalogram (EEG) signals acquired from accurate medical equipment, deployment is carried out on consumer-level devices with undesirable noise. To alleviate this challenge, in this work, we propose an efficient training approach that is robust against unseen arbitrary noise. In particular, we propose to generate the worst-case input perturbations by means of adversarial transformation in an auxiliary model, to learn a wide range of input perturbations and thereby to improve reliability. Our approach is based on two separate training models: (i) an auxiliary model to generate adversarial noise and (ii) a target network to incorporate the noise signal to enhance robustness. Furthermore, we exploit novel class-wise robustness during the training of the target network to represent different robustness patterns of each sleep stage. Our experimental results demonstrated that our approach improved sleep staging performance on healthy controls, in the presence of moderate to severe noise levels, compared with competing methods. Our approach was able to effectively train and deploy a DL model to handle different types of noise, including adversarial, Gaussian, and shot noise.


Assuntos
Eletroencefalografia , Fases do Sono , Reprodutibilidade dos Testes , Distribuição Normal
17.
Med Image Comput Comput Assist Interv ; 14226: 435-445, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-38651032

RESUMO

The tongue's intricate 3D structure, comprising localized functional units, plays a crucial role in the production of speech. When measured using tagged MRI, these functional units exhibit cohesive displacements and derived quantities that facilitate the complex process of speech production. Non-negative matrix factorization-based approaches have been shown to estimate the functional units through motion features, yielding a set of building blocks and a corresponding weighting map. Investigating the link between weighting maps and speech acoustics can offer significant insights into the intricate process of speech production. To this end, in this work, we utilize two-dimensional spectrograms as a proxy representation, and develop an end-to-end deep learning framework for translating weighting maps to their corresponding audio waveforms. Our proposed plastic light transformer (PLT) framework is based on directional product relative position bias and single-level spatial pyramid pooling, thus enabling flexible processing of weighting maps with variable size to fixed-size spectrograms, without input information loss or dimension expansion. Additionally, our PLT framework efficiently models the global correlation of wide matrix input. To improve the realism of our generated spectrograms with relatively limited training samples, we apply pair-wise utterance consistency with Maximum Mean Discrepancy constraint and adversarial training. Experimental results on a dataset of 29 subjects speaking two utterances demonstrated that our framework is able to synthesize speech audio waveforms from weighting maps, outperforming conventional convolution and transformer models.

18.
Med Image Comput Comput Assist Interv ; 14221: 46-56, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-38665992

RESUMO

Deep learning (DL) models for segmenting various anatomical structures have achieved great success via a static DL model that is trained in a single source domain. Yet, the static DL model is likely to perform poorly in a continually evolving environment, requiring appropriate model updates. In an incremental learning setting, we would expect that well-trained static models are updated, following continually evolving target domain data-e.g., additional lesions or structures of interest-collected from different sites, without catastrophic forgetting. This, however, poses challenges, due to distribution shifts, additional structures not seen during the initial model training, and the absence of training data in a source domain. To address these challenges, in this work, we seek to progressively evolve an "off-the-shelf" trained segmentation model to diverse datasets with additional anatomical categories in a unified manner. Specifically, we first propose a divergence-aware dual-flow module with balanced rigidity and plasticity branches to decouple old and new tasks, which is guided by continuous batch renormalization. Then, a complementary pseudo-label training scheme with self-entropy regularized momentum MixUp decay is developed for adaptive network optimization. We evaluated our framework on a brain tumor segmentation task with continually changing target domains-i.e., new MRI scanners/modalities with incremental structures. Our framework was able to well retain the discriminability of previously learned structures, hence enabling the realistic life-long segmentation model extension along with the widespread accumulation of big medical data.

19.
Artigo em Inglês | MEDLINE | ID: mdl-36203947

RESUMO

Cycle reconstruction regularized adversarial training-e.g., CycleGAN, DiscoGAN, and DualGAN-has been widely used for image style transfer with unpaired training data. Several recent works, however, have shown that local distortions are frequent, and structural consistency cannot be guaranteed. Targeting this issue, prior works usually relied on additional segmentation or consistent feature extraction steps that are task-specific. To counter this, this work aims to learn a general add-on structural feature extractor, by explicitly enforcing the structural alignment between an input and its synthesized image. Specifically, we propose a novel input-output image patches self-training scheme to achieve a disentanglement of underlying anatomical structures and imaging modalities. The translator and structure encoder are updated, following an alternating training protocol. In addition, the information w.r.t. imaging modality can be eliminated with an asymmetric adversarial game. We train, validate, and test our network on 1,768, 416, and 1,560 unpaired subject-independent slices of tagged and cine magnetic resonance imaging from a total of twenty healthy subjects, respectively, demonstrating superior performance over competing methods.

20.
Artigo em Inglês | MEDLINE | ID: mdl-36212702

RESUMO

Multimodal representation learning using visual movements from cine magnetic resonance imaging (MRI) and their acoustics has shown great potential to learn shared representation and to predict one modality from another. Here, we propose a new synthesis framework to translate from cine MRI sequences to spectrograms with a limited dataset size. Our framework hinges on a novel fully convolutional heterogeneous translator, with a 3D CNN encoder for efficient sequence encoding and a 2D transpose convolution decoder. In addition, a pairwise correlation of the samples with the same speech word is utilized with a latent space representation disentanglement scheme. Furthermore, an adversarial training approach with generative adversarial networks is incorporated to provide enhanced realism on our generated spectrograms. Our experimental results, carried out with a total of 63 cine MRI sequences alongside speech acoustics, show that our framework improves synthesis accuracy, compared with competing methods. Our framework thereby has shown the potential to aid in better understanding the relationship between the two modalities.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...