Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Sci Rep ; 14(1): 15569, 2024 Jul 06.
Artigo em Inglês | MEDLINE | ID: mdl-38971838

RESUMO

Auditory and visual signals are two primary perception modalities that are usually present together and correlate with each other, not only in natural environments but also in clinical settings. However, audio-visual modelling in the latter case can be more challenging, due to the different sources of audio/video signals and the noise (both signal-level and semantic-level) in auditory signals-usually speech audio. In this study, we consider audio-visual modelling in a clinical setting, providing a solution to learn medical representations that benefit various clinical tasks, without relying on dense supervisory annotations from human experts for the model training. A simple yet effective multi-modal self-supervised learning framework is presented for this purpose. The proposed approach is able to help find standard anatomical planes, predict the focusing position of sonographer's eyes, and localise anatomical regions of interest during ultrasound imaging. Experimental analysis on a large-scale clinical multi-modal ultrasound video dataset show that the proposed novel representation learning method provides good transferable anatomical representations that boost the performance of automated downstream clinical tasks, even outperforming fully-supervised solutions. Being able to learn such medical representations in a self-supervised manner will contribute to several aspects including a better understanding of obstetric imaging, training new sonographers, more effective assistive tools for human experts, and enhancement of the clinical workflow.


Assuntos
Ultrassonografia , Humanos , Ultrassonografia/métodos , Feminino , Percepção Visual/fisiologia , Gravação em Vídeo
2.
Artigo em Inglês | MEDLINE | ID: mdl-37665699

RESUMO

Monitoring the healthy development of a fetus requires accurate and timely identification of different maternal-fetal structures as they grow. To facilitate this objective in an automated fashion, we propose a deep-learning-based image classification architecture called the COMFormer to classify maternal-fetal and brain anatomical structures present in 2-D fetal ultrasound (US) images. The proposed architecture classifies the two subcategories separately: maternal-fetal (abdomen, brain, femur, thorax, mother's cervix (MC), and others) and brain anatomical structures [trans-thalamic (TT), trans-cerebellum (TC), trans-ventricular (TV), and non-brain (NB)]. Our proposed architecture relies on a transformer-based approach that leverages spatial and global features using a newly designed residual cross-variance attention block. This block introduces an advanced cross-covariance attention (XCA) mechanism to capture a long-range representation from the input using spatial (e.g., shape, texture, intensity) and global features. To build COMFormer, we used a large publicly available dataset (BCNatal) consisting of 12 400 images from 1792 subjects. Experimental results prove that COMFormer outperforms the recent CNN and transformer-based models by achieving 95.64% and 96.33% classification accuracy on maternal-fetal and brain anatomy, respectively.


Assuntos
Encéfalo , Ultrassonografia Pré-Natal , Feminino , Gravidez , Humanos , Encéfalo/diagnóstico por imagem , Ultrassonografia , Fontes de Energia Elétrica , Fêmur
3.
Med Image Anal ; 82: 102630, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-36223683

RESUMO

In this work, we present a novel gaze-assisted natural language processing (NLP)-based video captioning model to describe routine second-trimester fetal ultrasound scan videos in a vocabulary of spoken sonography. The primary novelty of our multi-modal approach is that the learned video captioning model is built using a combination of ultrasound video, tracked gaze and textual transcriptions from speech recordings. The textual captions that describe the spatio-temporal scan video content are learnt from sonographer speech recordings. The generation of captions is assisted by sonographer gaze-tracking information reflecting their visual attention while performing live-imaging and interpreting a frozen image. To evaluate the effect of adding, or withholding, different forms of gaze on the video model, we compare spatio-temporal deep networks trained using three multi-modal configurations, namely: (1) a gaze-less neural network with only text and video as input, (2) a neural network additionally using real sonographer gaze in the form of attention maps, and (3) a neural network using automatically-predicted gaze in the form of saliency maps instead. We assess algorithm performance through established general text-based metrics (BLEU, ROUGE-L, F1 score), a domain-specific metric (ARS), and metrics that consider the richness and efficiency of the generated captions with respect to the scan video. Results show that the proposed gaze-assisted models can generate richer and more diverse captions for clinical fetal ultrasound scan videos than those without gaze at the expense of the perceived sentence structure. The results also show that the generated captions are similar to sonographer speech in terms of discussing the visual content and the scanning actions performed.


Assuntos
Algoritmos , Redes Neurais de Computação , Humanos , Gravidez , Feminino , Ultrassonografia Pré-Natal
4.
Med Image Underst Anal (2022) ; 13413: 187-198, 2022 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-36848308

RESUMO

Medical image captioning models generate text to describe the semantic contents of an image, aiding the non-experts in understanding and interpretation. We propose a weakly-supervised approach to improve the performance of image captioning models on small image-text datasets by leveraging a large anatomically-labelled image classification dataset. Our method generates pseudo-captions (weak labels) for caption-less but anatomically-labelled (class-labelled) images using an encoder-decoder sequence-to-sequence model. The augmented dataset is used to train an image-captioning model in a weakly supervised learning manner. For fetal ultrasound, we demonstrate that the proposed augmentation approach outperforms the baseline on semantics and syntax-based metrics, with nearly twice as much improvement in value on BLEU-1 and ROUGE-L. Moreover, we observe that superior models are trained with the proposed data augmentation, when compared with the existing regularization techniques. This work allows seamless automatic annotation of images that lack human-prepared descriptive captions for training image-captioning models. Using pseudo-captions in the training data is particularly useful for medical image captioning when significant time and effort of medical experts is required to obtain real image captions.

5.
Diagnostics (Basel) ; 13(1)2022 Dec 29.
Artigo em Inglês | MEDLINE | ID: mdl-36611396

RESUMO

Medical image analysis methods for mammograms, ultrasound, and magnetic resonance imaging (MRI) cannot provide the underline features on the cellular level to understand the cancer microenvironment which makes them unsuitable for breast cancer subtype classification study. In this paper, we propose a convolutional neural network (CNN)-based breast cancer classification method for hematoxylin and eosin (H&E) whole slide images (WSIs). The proposed method incorporates fused mobile inverted bottleneck convolutions (FMB-Conv) and mobile inverted bottleneck convolutions (MBConv) with a dual squeeze and excitation (DSE) network to accurately classify breast cancer tissue into binary (benign and malignant) and eight subtypes using histopathology images. For that, a pre-trained EfficientNetV2 network is used as a backbone with a modified DSE block that combines the spatial and channel-wise squeeze and excitation layers to highlight important low-level and high-level abstract features. Our method outperformed ResNet101, InceptionResNetV2, and EfficientNetV2 networks on the publicly available BreakHis dataset for the binary and multi-class breast cancer classification in terms of precision, recall, and F1-score on multiple magnification levels.

6.
Proc IEEE Int Symp Biomed Imaging ; 2021: 716-720, 2021 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-34413932

RESUMO

We propose a curriculum learning captioning method to caption fetal ultrasound images by training a model to dynamically transition between two different modalities (image and text) as training progresses. Specifically, we propose a course-focused dual curriculum method, where a course is training with a curriculum based on only one of the two modalities involved in image captioning. We compare two configurations of the course-focused dual curriculum; an image-first course-focused dual curriculum which prepares the early training batches primarily on the complexity of the image information before slowly introducing an order of batches for training based on the complexity of the text information, and a text-first course-focused dual curriculum which operates in reverse. The evaluation results show that dynamically transitioning between text and images over epochs of training improves results when compared to the scenario where both modalities are considered in equal measure in every epoch.

7.
Sci Rep ; 11(1): 14109, 2021 07 08.
Artigo em Inglês | MEDLINE | ID: mdl-34238950

RESUMO

Ultrasound is the primary modality for obstetric imaging and is highly sonographer dependent. Long training period, insufficient recruitment and poor retention of sonographers are among the global challenges in the expansion of ultrasound use. For the past several decades, technical advancements in clinical obstetric ultrasound scanning have largely concerned improving image quality and processing speed. By contrast, sonographers have been acquiring ultrasound images in a similar fashion for several decades. The PULSE (Perception Ultrasound by Learning Sonographer Experience) project is an interdisciplinary multi-modal imaging study aiming to offer clinical sonography insights and transform the process of obstetric ultrasound acquisition and image analysis by applying deep learning to large-scale multi-modal clinical data. A key novelty of the study is that we record full-length ultrasound video with concurrent tracking of the sonographer's eyes, voice and the transducer while performing routine obstetric scans on pregnant women. We provide a detailed description of the novel acquisition system and illustrate how our data can be used to describe clinical ultrasound. Being able to measure different sonographer actions or model tasks will lead to a better understanding of several topics including how to effectively train new sonographers, monitor the learning progress, and enhance the scanning workflow of experts.

8.
Med Image Comput Comput Assist Interv ; 12263: 534-543, 2020 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-33103162

RESUMO

In medical imaging, manual annotations can be expensive to acquire and sometimes infeasible to access, making conventional deep learning-based models difficult to scale. As a result, it would be beneficial if useful representations could be derived from raw data without the need for manual annotations. In this paper, we propose to address the problem of self-supervised representation learning with multi-modal ultrasound video-speech raw data. For this case, we assume that there is a high correlation between the ultrasound video and the corresponding narrative speech audio of the sonographer. In order to learn meaningful representations, the model needs to identify such correlation and at the same time understand the underlying anatomical features. We designed a framework to model the correspondence between video and audio without any kind of human annotations. Within this framework, we introduce cross-modal contrastive learning and an affinity-aware self-paced learning scheme to enhance correlation modelling. Experimental evaluations on multi-modal fetal ultrasound video and audio show that the proposed approach is able to learn strong representations and transfers well to downstream tasks of standard plane detection and eye-gaze prediction.

9.
Artigo em Inglês | MEDLINE | ID: mdl-33103165

RESUMO

We present a novel curriculum learning approach to train a natural language processing (NLP) based fetal ultrasound image captioning model. Datasets containing medical images and corresponding textual descriptions are relatively rare and hence, smaller-sized when compared to the datasets of natural images and their captions. This fact inspired us to develop an approach to train a captioning model suitable for small-sized medical data. Our datasets are prepared using real-world ultrasound video along with synchronised and transcribed sonographer speech recordings. We propose a "dual-curriculum" method for the ultrasound image captioning problem. The method relies on building and learning from curricula of image and text information for the ultrasound image captioning problem. We compare several distance measures for creating the dual curriculum and observe the best performance using the Wasserstein distance for image information and tf-idf metric for text information. The evaluation results show an improvement in all performance metrics when using curriculum learning over stochastic mini-batch training for the individual task of image classification as well as using a dual curriculum for image captioning.

10.
Artigo em Inglês | MEDLINE | ID: mdl-31976493

RESUMO

We describe an automatic natural language processing (NLP)-based image captioning method to describe fetal ultrasound video content by modelling the vocabulary commonly used by sonographers and sonologists. The generated captions are similar to the words spoken by a sonographer when describing the scan experience in terms of visual content and performed scanning actions. Using full-length second-trimester fetal ultrasound videos and text derived from accompanying expert voice-over audio recordings, we train deep learning models consisting of convolutional neural networks and recurrent neural networks in merged configurations to generate captions for ultrasound video frames. We evaluate different model architectures using established general metrics (BLEU, ROUGE-L) and application-specific metrics. Results show that the proposed models can learn joint representations of image and text to generate relevant and descriptive captions for anatomies, such as the spine, the abdomen, the heart, and the head, in clinical fetal ultrasound scans.

11.
Neurology ; 86(24): 2264-70, 2016 06 14.
Artigo em Inglês | MEDLINE | ID: mdl-27170570

RESUMO

OBJECTIVE: To determine quantitative size thresholds for enlargement of the optic nerve, chiasm, and tract in children with neurofibromatosis type 1 (NF1). METHODS: Children 0.5-18.6 years of age who underwent high-resolution T1-weighted MRI were eligible for inclusion. This consisted of children with NF1 with or without optic pathway gliomas (OPGs) and a control group who did not have other acquired, systemic, or genetic conditions that could alter their anterior visual pathway (AVP). Maximum and average diameter and volume of AVP structures were calculated from reconstructed MRI images. Values above the 95th percentile from the controls were considered the threshold for defining an abnormally large AVP measure. RESULTS: A total of 186 children (controls = 82; NF1noOPG = 54; NF1+OPG = 50) met inclusion criteria. NF1noOPG and NF1+OPG participants demonstrated greater maximum optic nerve diameter and volume, optic chiasm volume, and total brain volume compared to controls (p < 0.05, all comparisons). Total brain volume, rather than age, predicted optic nerve and chiasm volume in controls (p < 0.05). Applying the 95th percentile threshold to all NF1 participants, the maximum optic nerve diameter (3.9 mm) and AVP volumes resulted in few false-positive errors (specificity >80%, all comparisons). CONCLUSIONS: Quantitative reference values for AVP enlargement will enhance the development of objective diagnostic criteria for OPGs secondary to NF1.


Assuntos
Imageamento por Ressonância Magnética , Neurofibromatose 1/diagnóstico por imagem , Nervo Óptico/diagnóstico por imagem , Adolescente , Encéfalo/diagnóstico por imagem , Encéfalo/crescimento & desenvolvimento , Criança , Pré-Escolar , Feminino , Humanos , Interpretação de Imagem Assistida por Computador/métodos , Lactente , Imageamento por Ressonância Magnética/métodos , Masculino , Neurofibromatose 1/complicações , Nervo Óptico/crescimento & desenvolvimento , Glioma do Nervo Óptico/diagnóstico por imagem , Glioma do Nervo Óptico/etiologia , Tamanho do Órgão , Estudos Retrospectivos , Adulto Jovem
12.
IEEE Trans Med Imaging ; 35(8): 1856-65, 2016 08.
Artigo em Inglês | MEDLINE | ID: mdl-26930677

RESUMO

Analysis of cranial nerve systems, such as the anterior visual pathway (AVP), from MRI sequences is challenging due to their thin long architecture, structural variations along the path, and low contrast with adjacent anatomic structures. Segmentation of a pathologic AVP (e.g., with low-grade gliomas) poses additional challenges. In this work, we propose a fully automated partitioned shape model segmentation mechanism for AVP steered by multiple MRI sequences and deep learning features. Employing deep learning feature representation, this framework presents a joint partitioned statistical shape model able to deal with healthy and pathological AVP. The deep learning assistance is particularly useful in the poor contrast regions, such as optic tracts and pathological areas. Our main contributions are: 1) a fast and robust shape localization method using conditional space deep learning, 2) a volumetric multiscale curvelet transform-based intensity normalization method for robust statistical model, and 3) optimally partitioned statistical shape and appearance models based on regional shape variations for greater local flexibility. Our method was evaluated on MRI sequences obtained from 165 pediatric subjects. A mean Dice similarity coefficient of 0.779 was obtained for the segmentation of the entire AVP (optic nerve only =0.791 ) using the leave-one-out validation. Results demonstrated that the proposed localized shape and sparse appearance-based learning approach significantly outperforms current state-of-the-art segmentation approaches and is as robust as the manual segmentation.


Assuntos
Vias Visuais , Humanos , Imageamento por Ressonância Magnética , Modelos Estatísticos , Reprodutibilidade dos Testes
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...