Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 103
Filter
1.
Med Phys ; 2024 Jun 11.
Article in English | MEDLINE | ID: mdl-38860890

ABSTRACT

BACKGROUND: Focusing on the complicated pathological features, such as blurred boundaries, severe scale differences between symptoms, and background noise interference, we aim to enhance the reliability of multiple lesions joint segmentation from medical images. PURPOSE: Propose a novel reliable multi-scale wavelet-enhanced transformer network, which can provide accurate segmentation results with reliability assessment. METHODS: Focusing on enhancing the model's capability to capture intricate pathological features in medical images, this work introduces a novel segmentation backbone. The backbone integrates a wavelet-enhanced feature extractor network and incorporates a multi-scale transformer module developed within the scope of this work. Simultaneously, to enhance the reliability of segmentation outcomes, a novel uncertainty segmentation head is proposed. This segmentation head is rooted in the SL, contributing to the generation of final segmentation results along with an associated overall uncertainty evaluation score map. RESULTS: Comprehensive experiments are conducted on the public database of AI-Challenge 2018 for retinal edema lesions segmentation and the segmentation of Thoracic Organs at Risk in CT images. The experimental results highlight the superior segmentation accuracy and heightened reliability achieved by the proposed method in comparison to other state-of-the-art segmentation approaches. CONCLUSIONS: Unlike previous segmentation methods, the proposed approach can produce reliable segmentation results with an estimated uncertainty and higher accuracy, enhancing the overall reliability of the model. The code will be release on https://github.com/LooKing9218/ReMultiSeg.

2.
Med Image Anal ; 96: 103214, 2024 May 22.
Article in English | MEDLINE | ID: mdl-38815358

ABSTRACT

Multi-modal ophthalmic image classification plays a key role in diagnosing eye diseases, as it integrates information from different sources to complement their respective performances. However, recent improvements have mainly focused on accuracy, often neglecting the importance of confidence and robustness in predictions for diverse modalities. In this study, we propose a novel multi-modality evidential fusion pipeline for eye disease screening. It provides a measure of confidence for each modality and elegantly integrates the multi-modality information using a multi-distribution fusion perspective. Specifically, our method first utilizes normal inverse gamma prior distributions over pre-trained models to learn both aleatoric and epistemic uncertainty for uni-modality. Then, the normal inverse gamma distribution is analyzed as the Student's t distribution. Furthermore, within a confidence-aware fusion framework, we propose a mixture of Student's t distributions to effectively integrate different modalities, imparting the model with heavy-tailed properties and enhancing its robustness and reliability. More importantly, the confidence-aware multi-modality ranking regularization term induces the model to more reasonably rank the noisy single-modal and fused-modal confidence, leading to improved reliability and accuracy. Experimental results on both public and internal datasets demonstrate that our model excels in robustness, particularly in challenging scenarios involving Gaussian noise and modality missing conditions. Moreover, our model exhibits strong generalization capabilities to out-of-distribution data, underscoring its potential as a promising solution for multimodal eye disease screening.

3.
Comput Biol Med ; 177: 108569, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38781640

ABSTRACT

Accurate segmentation of polyps in colonoscopy images has gained significant attention in recent years, given its crucial role in automated colorectal cancer diagnosis. Many existing deep learning-based methods follow a one-stage processing pipeline, often involving feature fusion across different levels or utilizing boundary-related attention mechanisms. Drawing on the success of applying Iterative Feedback Units (IFU) in image polyp segmentation, this paper proposes FlowICBNet by extending the IFU to the domain of video polyp segmentation. By harnessing the unique capabilities of IFU to propagate and refine past segmentation results, our method proves effective in mitigating challenges linked to the inherent limitations of endoscopic imaging, notably the presence of frequent camera shake and frame defocusing. Furthermore, in FlowICBNet, we introduce two pivotal modules: Reference Frame Selection (RFS) and Flow Guided Warping (FGW). These modules play a crucial role in filtering and selecting the most suitable historical reference frames for the task at hand. The experimental results on a large video polyp segmentation dataset demonstrate that our method can significantly outperform state-of-the-art methods by notable margins achieving an average metrics improvement of 7.5% on SUN-SEG-Easy and 7.4% on SUN-SEG-Hard. Our code is available at https://github.com/eraserNut/ICBNet.


Subject(s)
Colonic Polyps , Humans , Colonic Polyps/diagnostic imaging , Colonoscopy/methods , Deep Learning , Image Interpretation, Computer-Assisted/methods , Video Recording , Colorectal Neoplasms/diagnostic imaging , Algorithms , Image Processing, Computer-Assisted/methods
4.
IEEE Trans Med Imaging ; PP2024 Apr 08.
Article in English | MEDLINE | ID: mdl-38587957

ABSTRACT

Accurate retinal layer segmentation on optical coherence tomography (OCT) images is hampered by the challenges of collecting OCT images with diverse pathological characterization and balanced distribution. Current generative models can produce high-realistic images and corresponding labels without quantitative limitations by fitting distributions of real collected data. Nevertheless, the diversity of their generated data is still limited due to the inherent imbalance of training data. To address these issues, we propose an image-label pair generation framework that generates diverse and balanced potential data from imbalanced real samples. Specifically, the framework first generates diverse layer masks, and then generates plausible OCT images corresponding to these layer masks using two customized diffusion probabilistic models respectively. To learn from imbalanced data and facilitate balanced generation, we introduce pathological-related conditions to guide the generation processes. To enhance the diversity of the generated image-label pairs, we propose a potential structure modeling technique that transfers the knowledge of diverse sub-structures from lowly- or non-pathological samples to highly pathological samples. We conducted extensive experiments on two public datasets for retinal layer segmentation. Firstly, our method generates OCT images with higher image quality and diversity compared to other generative methods. Furthermore, based on the extensive training with the generated OCT images, downstream retinal layer segmentation tasks demonstrate improved results. The code is publicly available at: https://github.com/nicetomeetu21/GenPSM.

5.
Article in English | MEDLINE | ID: mdl-38530724

ABSTRACT

Disentanglement learning aims to separate explanatory factors of variation so that different attributes of the data can be well characterized and isolated, which promotes efficient inference for downstream tasks. Mainstream disentanglement approaches based on generative adversarial networks (GANs) learn interpretable data representation. However, most typical GAN-based works lack the discussion of the latent subspace, causing insufficient consideration of the variation of independent factors. Although some recent research analyzes the latent space on pretrained GANs for image editing, they do not emphasize learning representation directly from the subspace perspective. Appropriate subspace properties could facilitate corresponding feature representation learning to satisfy the independent variation requirements of the obtained explanatory factors, which is crucial for better disentanglement. In this work, we propose a unified framework for ensuring disentanglement, which fully investigates latent subspace learning (SL) in GAN. The novel GAN-based architecture explores orthogonal subspace representation (OSR) on vanilla GAN, named OSRGAN. To guide a subspace with strong correlation, less redundancy, and robust distinguishability, our OSR includes three stages, self-latent-aware, orthogonal subspace-aware, and structure representation-aware, respectively. First, the self-latent-aware stage promotes the latent subspace strongly correlated with the data space to discover interpretable factors, but with poor independence of variation. Second, the following orthogonal subspace-aware stage adaptively learns some 1-D linear subspace spanned by a set of orthogonal bases in the latent space. There is less redundancy between them, expressing the corresponding independence. Third, the structure representation-aware stage aligns the projection on the orthogonal subspace and the latent variables. Accordingly, feature representation in each linear subspace can be distinguishable, enhancing the independent expression of interpretable factors. In addition, we design an alternating optimization step, achieving a tradeoff training of OSRGAN on different properties. Despite it strictly constrains orthogonality, the loss weight coefficient of distinguishability induced by orthogonality could be adjusted and balanced with correlation constraint. To elucidate, this tradeoff training prevents our OSRGAN from overemphasizing any property and damaging the expressiveness of the feature representation. It takes into account both interpretable factors and their independent variation characteristics. Meanwhile, alternating optimization could keep the cost and efficiency of forward inference unchanged and will not burden the computational complexity. In theory, we clarify the significance of OSR, which brings better independence of factors, along with interpretability as correlation could converge to a high range faster. Moreover, through the convergence behavior analysis, including the objective functions under different constraints and the evaluation curve with iterations, our model demonstrates enhanced stability and definitely converges toward a higher peak for disentanglement. To depict the performance in downstream tasks, we compared the state-of-the-art GAN-based and even VAE-based approaches on different datasets. Our OSRGAN achieves higher disentanglement scores on FactorVAE, SAP, MIG, and VP metrics. All the experimental results illustrate that our novel GAN-based framework has considerable advantages on disentanglement.

6.
IEEE Trans Med Imaging ; PP2024 Mar 26.
Article in English | MEDLINE | ID: mdl-38530715

ABSTRACT

Instrument-tissue interaction detection task, which helps understand surgical activities, is vital for constructing computer-assisted surgery systems but with many challenges. Firstly, most models represent instrument-tissue interaction in a coarse-grained way which only focuses on classification and lacks the ability to automatically detect instruments and tissues. Secondly, existing works do not fully consider relations between intra-and inter-frame of instruments and tissues. In the paper, we propose to represent instrument-tissue interaction as ⟨instrument class, instrument bounding box, tissue class, tissue bounding box, action class⟩ quintuple and present an Instrument-Tissue Interaction Detection Network (ITIDNet) to detect the quintuple for surgery videos understanding. Specifically, we propose a Snippet Consecutive Feature (SCF) Layer to enhance features by modeling relationships of proposals in the current frame using global context information in the video snippet. We also propose a Spatial Corresponding Attention (SCA) Layer to incorporate features of proposals between adjacent frames through spatial encoding. To reason relationships between instruments and tissues, a Temporal Graph (TG) Layer is proposed with intra-frame connections to exploit relationships between instruments and tissues in the same frame and inter-frame connections to model the temporal information for the same instance. For evaluation, we build a cataract surgery video (PhacoQ) dataset and a cholecystectomy surgery video (CholecQ) dataset. Experimental results demonstrate the promising performance of our model, which outperforms other state-of-the-art models on both datasets.

7.
Article in English | MEDLINE | ID: mdl-38381644

ABSTRACT

Super-resolving the magnetic resonance (MR) image of a target contrast under the guidance of the corresponding auxiliary contrast, which provides additional anatomical information, is a new and effective solution for fast MR imaging. However, current multi-contrast super-resolution (SR) methods tend to concatenate different contrasts directly, ignoring their relationships in different clues, e.g., in the high-and low-intensity regions. In this study, we propose a separable attention network (comprising high-intensity priority (HP) attention and low-intensity separation (LS) attention), named SANet. Our SANet could explore the areas of high-and low-intensity regions in the "forward" and "reverse" directions with the help of the auxiliary contrast while learning clearer anatomical structure and edge information for the SR of a target-contrast MR image. SANet provides three appealing benefits: First, it is the first model to explore a separable attention mechanism that uses the auxiliary contrast to predict the high-and low-intensity regions, diverting more attention to refining any uncertain details between these regions and correcting the fine areas in the reconstructed results. Second, a multistage integration module is proposed to learn the response of multi-contrast fusion at multiple stages, get the dependency between the fused representations, and boost their representation ability. Third, extensive experiments with various state-of-the-art multi-contrast SR methods on fastMRI and clinical in vivo datasets demonstrate the superiority of our model. The code is released at https://github.com/chunmeifeng/SANet.

8.
Article in English | MEDLINE | ID: mdl-38356213

ABSTRACT

RGB-D salient object detection (SOD) has gained tremendous attention in recent years. In particular, transformer has been employed and shown great potential. However, existing transformer models usually overlook the vital edge information, which is a major issue restricting the further improvement of SOD accuracy. To this end, we propose a novel edge-aware RGB-D SOD transformer, called, which explicitly models the edge information in a dual-band decomposition framework. Specifically, we employ two parallel decoder networks to learn the high-frequency edge and low-frequency body features from the low-and high-level features extracted from a two-steam multimodal backbone network, respectively. Next, we propose a cross-attention complementarity exploration module to enrich the edge/body features by exploiting the multimodal complementarity information. The refined features are then fed into our proposed color-hint guided fusion module for enhancing the depth feature and fusing the multimodal features. Finally, the resulting features are fused using our deeply supervised progressive fusion module, which progressively integrates edge and body features for predicting saliency maps. Our model explicitly considers the edge information for accurate RGB-D SOD, overcoming the limitations of existing methods and effectively improving the performance. Extensive experiments on benchmark datasets demonstrate that is an effective RGB-D SOD framework that outperforms the current state-of-the-art models, both quantitatively and qualitatively. A further extension to RGB-T SOD demonstrates the promising potential of our model in various kinds of multimodal SOD tasks.

9.
IEEE Trans Med Imaging ; 43(5): 1945-1957, 2024 May.
Article in English | MEDLINE | ID: mdl-38206778

ABSTRACT

Color fundus photography (CFP) and Optical coherence tomography (OCT) images are two of the most widely used modalities in the clinical diagnosis and management of retinal diseases. Despite the widespread use of multimodal imaging in clinical practice, few methods for automated diagnosis of eye diseases utilize correlated and complementary information from multiple modalities effectively. This paper explores how to leverage the information from CFP and OCT images to improve the automated diagnosis of retinal diseases. We propose a novel multimodal learning method, named geometric correspondence-based multimodal learning network (GeCoM-Net), to achieve the fusion of CFP and OCT images. Specifically, inspired by clinical observations, we consider the geometric correspondence between the OCT slice and the CFP region to learn the correlated features of the two modalities for robust fusion. Furthermore, we design a new feature selection strategy to extract discriminative OCT representations by automatically selecting the important feature maps from OCT slices. Unlike the existing multimodal learning methods, GeCoM-Net is the first method that formulates the geometric relationships between the OCT slice and the corresponding region of the CFP image explicitly for CFP and OCT fusion. Experiments have been conducted on a large-scale private dataset and a publicly available dataset to evaluate the effectiveness of GeCoM-Net for diagnosing diabetic macular edema (DME), impaired visual acuity (VA) and glaucoma. The empirical results show that our method outperforms the current state-of-the-art multimodal learning methods by improving the AUROC score 0.4%, 1.9% and 2.9% for DME, VA and glaucoma detection, respectively.


Subject(s)
Image Interpretation, Computer-Assisted , Multimodal Imaging , Tomography, Optical Coherence , Humans , Tomography, Optical Coherence/methods , Multimodal Imaging/methods , Image Interpretation, Computer-Assisted/methods , Algorithms , Retinal Diseases/diagnostic imaging , Retina/diagnostic imaging , Machine Learning , Photography/methods , Diagnostic Techniques, Ophthalmological , Databases, Factual
10.
Sci Data ; 11(1): 99, 2024 Jan 20.
Article in English | MEDLINE | ID: mdl-38245589

ABSTRACT

Pathologic myopia (PM) is a common blinding retinal degeneration suffered by highly myopic population. Early screening of this condition can reduce the damage caused by the associated fundus lesions and therefore prevent vision loss. Automated diagnostic tools based on artificial intelligence methods can benefit this process by aiding clinicians to identify disease signs or to screen mass populations using color fundus photographs as inputs. This paper provides insights about PALM, our open fundus imaging dataset for pathological myopia recognition and anatomical structure annotation. Our databases comprises 1200 images with associated labels for the pathologic myopia category and manual annotations of the optic disc, the position of the fovea and delineations of lesions such as patchy retinal atrophy (including peripapillary atrophy) and retinal detachment. In addition, this paper elaborates on other details such as the labeling process used to construct the database, the quality and characteristics of the samples and provides other relevant usage notes.


Subject(s)
Myopia, Degenerative , Optic Disk , Retinal Degeneration , Humans , Artificial Intelligence , Fundus Oculi , Myopia, Degenerative/diagnostic imaging , Myopia, Degenerative/pathology , Optic Disk/diagnostic imaging
11.
Genome Med ; 16(1): 12, 2024 Jan 12.
Article in English | MEDLINE | ID: mdl-38217035

ABSTRACT

Optimal integration of transcriptomics data and associated spatial information is essential towards fully exploiting spatial transcriptomics to dissect tissue heterogeneity and map out inter-cellular communications. We present SEDR, which uses a deep autoencoder coupled with a masked self-supervised learning mechanism to construct a low-dimensional latent representation of gene expression, which is then simultaneously embedded with the corresponding spatial information through a variational graph autoencoder. SEDR achieved higher clustering performance on manually annotated 10 × Visium datasets and better scalability on high-resolution spatial transcriptomics datasets than existing methods. Additionally, we show SEDR's ability to impute and denoise gene expression (URL: https://github.com/JinmiaoChenLab/SEDR/ ).


Subject(s)
Cell Communication , Gene Expression Profiling , Humans , Cluster Analysis
12.
Br J Ophthalmol ; 108(4): 513-521, 2024 Mar 20.
Article in English | MEDLINE | ID: mdl-37495263

ABSTRACT

BACKGROUND: The crystalline lens is a transparent structure of the eye to focus light on the retina. It becomes muddy, hard and dense with increasing age, which makes the crystalline lens gradually lose its function. We aim to develop a nuclear age predictor to reflect the degeneration of the crystalline lens nucleus. METHODS: First we trained and internally validated the nuclear age predictor with a deep-learning algorithm, using 12 904 anterior segment optical coherence tomography (AS-OCT) images from four diverse Asian and American cohorts: Zhongshan Ophthalmic Center with Machine0 (ZOM0), Tomey Corporation (TOMEY), University of California San Francisco and the Chinese University of Hong Kong. External testing was done on three independent datasets: Tokyo University (TU), ZOM1 and Shenzhen People's Hospital (SPH). We also demonstrate the possibility of detecting nuclear cataracts (NCs) from the nuclear age gap. FINDINGS: In the internal validation dataset, the nuclear age could be predicted with a mean absolute error (MAE) of 2.570 years (95% CI 1.886 to 2.863). Across the three external testing datasets, the algorithm achieved MAEs of 4.261 years (95% CI 3.391 to 5.094) in TU, 3.920 years (95% CI 3.332 to 4.637) in ZOM1-NonCata and 4.380 years (95% CI 3.730 to 5.061) in SPH-NonCata. The MAEs for NC eyes were 8.490 years (95% CI 7.219 to 9.766) in ZOM1-NC and 9.998 years (95% CI 5.673 to 14.642) in SPH-NC. The nuclear age gap outperformed both ophthalmologists in detecting NCs, with areas under the receiver operating characteristic curves of 0.853 years (95% CI 0.787 to 0.917) in ZOM1 and 0.909 years (95% CI 0.828 to 0.978) in SPH. INTERPRETATION: The nuclear age predictor shows good performance, validating the feasibility of using AS-OCT images as an effective screening tool for nucleus degeneration. Our work also demonstrates the potential use of the nuclear age gap to detect NCs.


Subject(s)
Cataract , Lens, Crystalline , Humans , Child, Preschool , Infant , Lens, Crystalline/diagnostic imaging , Cataract/diagnosis , Retina , Algorithms , Tomography, Optical Coherence/methods
13.
Br J Ophthalmol ; 108(3): 432-439, 2024 02 21.
Article in English | MEDLINE | ID: mdl-36596660

ABSTRACT

BACKGROUND: Optical coherence tomography angiography (OCTA) enables fast and non-invasive high-resolution imaging of retinal microvasculature and is suggested as a potential tool in the early detection of retinal microvascular changes in Alzheimer's Disease (AD). We developed a standardised OCTA analysis framework and compared their extracted parameters among controls and AD/mild cognitive impairment (MCI) in a cross-section study. METHODS: We defined and extracted geometrical parameters of retinal microvasculature at different retinal layers and in the foveal avascular zone (FAZ) from segmented OCTA images obtained using well-validated state-of-the-art deep learning models. We studied these parameters in 158 subjects (62 healthy control, 55 AD and 41 MCI) using logistic regression to determine their potential in predicting the status of our subjects. RESULTS: In the AD group, there was a significant decrease in vessel area and length densities in the inner vascular complexes (IVC) compared with controls. The number of vascular bifurcations in AD is also significantly lower than that of healthy people. The MCI group demonstrated a decrease in vascular area, length densities, vascular fractal dimension and the number of bifurcations in both the superficial vascular complexes (SVC) and the IVC compared with controls. A larger vascular tortuosity in the IVC, and a larger roundness of FAZ in the SVC, can also be observed in MCI compared with controls. CONCLUSION: Our study demonstrates the applicability of OCTA for the diagnosis of AD and MCI, and provides a standard tool for future clinical service and research. Biomarkers from retinal OCTA images can provide useful information for clinical decision-making and diagnosis of AD and MCI.


Subject(s)
Alzheimer Disease , Cognitive Dysfunction , Humans , Fluorescein Angiography/methods , Retinal Vessels/diagnostic imaging , Tomography, Optical Coherence/methods , Alzheimer Disease/diagnostic imaging , Microvessels/diagnostic imaging , Cognitive Dysfunction/diagnostic imaging
14.
IEEE Trans Med Imaging ; 43(3): 1237-1246, 2024 Mar.
Article in English | MEDLINE | ID: mdl-37956005

ABSTRACT

Retinal arteriovenous nicking (AVN) manifests as a reduced venular caliber of an arteriovenous crossing. AVNs are signs of many systemic, particularly cardiovascular diseases. Studies have shown that people with AVN are twice as likely to have a stroke. However, AVN classification faces two challenges. One is the lack of data, especially AVNs compared to the normal arteriovenous (AV) crossings. The other is the significant intra-class variations and minute inter-class differences. AVNs may look different in shape, scale, pose, and color. On the other hand, the AVN could be different from the normal AV crossing only by slight thinning of the vein. To address these challenges, first, we develop a data synthesis method to generate AV crossings, including normal and AVNs. Second, to mitigate the domain shift between the synthetic and real data, an edge-guided unsupervised domain adaptation network is designed to guide the transfer of domain invariant information. Third, a semantic contrastive learning branch (SCLB) is introduced and a set of semantically related images, as a semantic triplet, are input to the network simultaneously to guide the network to focus on the subtle differences in venular width and to ignore the differences in appearance. These strategies effectively mitigate the lack of data, domain shift between synthetic and real data, and significant intra- but minute inter-class differences. Extensive experiments have been performed to demonstrate the outstanding performance of the proposed method.


Subject(s)
Cardiovascular Diseases , Retinal Diseases , Retinal Vein , Humans
15.
IEEE Trans Med Imaging ; 43(4): 1323-1336, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38015687

ABSTRACT

Medical imaging provides many valuable clues involving anatomical structure and pathological characteristics. However, image degradation is a common issue in clinical practice, which can adversely impact the observation and diagnosis by physicians and algorithms. Although extensive enhancement models have been developed, these models require a well pre-training before deployment, while failing to take advantage of the potential value of inference data after deployment. In this paper, we raise an algorithm for source-free unsupervised domain adaptive medical image enhancement (SAME), which adapts and optimizes enhancement models using test data in the inference phase. A structure-preserving enhancement network is first constructed to learn a robust source model from synthesized training data. Then a teacher-student model is initialized with the source model and conducts source-free unsupervised domain adaptation (SFUDA) by knowledge distillation with the test data. Additionally, a pseudo-label picker is developed to boost the knowledge distillation of enhancement tasks. Experiments were implemented on ten datasets from three medical image modalities to validate the advantage of the proposed algorithm, and setting analysis and ablation studies were also carried out to interpret the effectiveness of SAME. The remarkable enhancement performance and benefits for downstream tasks demonstrate the potential and generalizability of SAME. The code is available at https://github.com/liamheng/Annotation-free-Medical-Image-Enhancement.


Subject(s)
Algorithms , Image Enhancement , Humans , Image Processing, Computer-Assisted
16.
IEEE Trans Med Imaging ; 43(5): 1715-1726, 2024 May.
Article in English | MEDLINE | ID: mdl-38153819

ABSTRACT

Massive high-quality annotated data is required by fully-supervised learning, which is difficult to obtain for image segmentation since the pixel-level annotation is expensive, especially for medical image segmentation tasks that need domain knowledge. As an alternative solution, semi-supervised learning (SSL) can effectively alleviate the dependence on the annotated samples by leveraging abundant unlabeled samples. Among the SSL methods, mean-teacher (MT) is the most popular one. However, in MT, teacher model's weights are completely determined by student model's weights, which will lead to the training bottleneck at the late training stages. Besides, only pixel-wise consistency is applied for unlabeled data, which ignores the category information and is susceptible to noise. In this paper, we propose a bilateral supervision network with bilateral exponential moving average (bilateral-EMA), named BSNet to overcome these issues. On the one hand, both the student and teacher models are trained on labeled data, and then their weights are updated with the bilateral-EMA, and thus the two models can learn from each other. On the other hand, pseudo labels are used to perform bilateral supervision for unlabeled data. Moreover, for enhancing the supervision, we adopt adversarial learning to enforce the network generate more reliable pseudo labels for unlabeled data. We conduct extensive experiments on three datasets to evaluate the proposed BSNet, and results show that BSNet can improve the semi-supervised segmentation performance by a large margin and surpass other state-of-the-art SSL methods.


Subject(s)
Algorithms , Image Processing, Computer-Assisted , Supervised Machine Learning , Humans , Image Processing, Computer-Assisted/methods , Neural Networks, Computer , Magnetic Resonance Imaging/methods
17.
Article in English | MEDLINE | ID: mdl-38039172

ABSTRACT

Federated learning (FL) collaboratively trains a shared global model depending on multiple local clients, while keeping the training data decentralized to preserve data privacy. However, standard FL methods ignore the noisy client issue, which may harm the overall performance of the shared model. We first investigate the critical issue caused by noisy clients in FL and quantify the negative impact of the noisy clients in terms of the representations learned by different layers. We have the following two key observations: 1) the noisy clients can severely impact the convergence and performance of the global model in FL and 2) the noisy clients can induce greater bias in the deeper layers than the former layers of the global model. Based on the above observations, we propose federated noisy client learning (Fed-NCL), a framework that conducts robust FL with noisy clients. Specifically, Fed-NCL first identifies the noisy clients through well estimating the data quality and model divergence. Then robust layerwise aggregation is proposed to adaptively aggregate the local models of each client to deal with the data heterogeneity caused by the noisy clients. We further perform label correction on the noisy clients to improve the generalization of the global model. Experimental results on various datasets demonstrate that our algorithm boosts the performances of different state-of-the-art systems with noisy clients. Our code is available at https://github.com/TKH666/Fed-NCL.

18.
Nat Commun ; 14(1): 6757, 2023 10 24.
Article in English | MEDLINE | ID: mdl-37875484

ABSTRACT

Failure to recognize samples from the classes unseen during training is a major limitation of artificial intelligence in the real-world implementation for recognition and classification of retinal anomalies. We establish an uncertainty-inspired open set (UIOS) model, which is trained with fundus images of 9 retinal conditions. Besides assessing the probability of each category, UIOS also calculates an uncertainty score to express its confidence. Our UIOS model with thresholding strategy achieves an F1 score of 99.55%, 97.01% and 91.91% for the internal testing set, external target categories (TC)-JSIEC dataset and TC-unseen testing set, respectively, compared to the F1 score of 92.20%, 80.69% and 64.74% by the standard AI model. Furthermore, UIOS correctly predicts high uncertainty scores, which would prompt the need for a manual check in the datasets of non-target categories retinal diseases, low-quality fundus images, and non-fundus images. UIOS provides a robust method for real-world screening of retinal anomalies.


Subject(s)
Eye Abnormalities , Retinal Diseases , Humans , Artificial Intelligence , Algorithms , Uncertainty , Retina/diagnostic imaging , Fundus Oculi , Retinal Diseases/diagnostic imaging
19.
Med Image Anal ; 90: 102938, 2023 Dec.
Article in English | MEDLINE | ID: mdl-37806020

ABSTRACT

Glaucoma is a chronic neuro-degenerative condition that is one of the world's leading causes of irreversible but preventable blindness. The blindness is generally caused by the lack of timely detection and treatment. Early screening is thus essential for early treatment to preserve vision and maintain life quality. Colour fundus photography and Optical Coherence Tomography (OCT) are the two most cost-effective tools for glaucoma screening. Both imaging modalities have prominent biomarkers to indicate glaucoma suspects, such as the vertical cup-to-disc ratio (vCDR) on fundus images and retinal nerve fiber layer (RNFL) thickness on OCT volume. In clinical practice, it is often recommended to take both of the screenings for a more accurate and reliable diagnosis. However, although numerous algorithms are proposed based on fundus images or OCT volumes for the automated glaucoma detection, there are few methods that leverage both of the modalities to achieve the target. To fulfil the research gap, we set up the Glaucoma grAding from Multi-Modality imAges (GAMMA) Challenge to encourage the development of fundus & OCT-based glaucoma grading. The primary task of the challenge is to grade glaucoma from both the 2D fundus images and 3D OCT scanning volumes. As part of GAMMA, we have publicly released a glaucoma annotated dataset with both 2D fundus colour photography and 3D OCT volumes, which is the first multi-modality dataset for machine learning based glaucoma grading. In addition, an evaluation framework is also established to evaluate the performance of the submitted methods. During the challenge, 1272 results were submitted, and finally, ten best performing teams were selected for the final stage. We analyse their results and summarize their methods in the paper. Since all the teams submitted their source code in the challenge, we conducted a detailed ablation study to verify the effectiveness of the particular modules proposed. Finally, we identify the proposed techniques and strategies that could be of practical value for the clinical diagnosis of glaucoma. As the first in-depth study of fundus & OCT multi-modality glaucoma grading, we believe the GAMMA Challenge will serve as an essential guideline and benchmark for future research.


Subject(s)
Glaucoma , Humans , Glaucoma/diagnostic imaging , Retina , Fundus Oculi , Diagnostic Techniques, Ophthalmological , Blindness , Tomography, Optical Coherence/methods
20.
IEEE Trans Med Imaging ; 42(12): 3871-3883, 2023 Dec.
Article in English | MEDLINE | ID: mdl-37682644

ABSTRACT

Multiple instance learning (MIL)-based methods have become the mainstream for processing the megapixel-sized whole slide image (WSI) with pyramid structure in the field of digital pathology. The current MIL-based methods usually crop a large number of patches from WSI at the highest magnification, resulting in a lot of redundancy in the input and feature space. Moreover, the spatial relations between patches can not be sufficiently modeled, which may weaken the model's discriminative ability on fine-grained features. To solve the above limitations, we propose a Multi-scale Graph Transformer (MG-Trans) with information bottleneck for whole slide image classification. MG-Trans is composed of three modules: patch anchoring module (PAM), dynamic structure information learning module (SILM), and multi-scale information bottleneck module (MIBM). Specifically, PAM utilizes the class attention map generated from the multi-head self-attention of vision Transformer to identify and sample the informative patches. SILM explicitly introduces the local tissue structure information into the Transformer block to sufficiently model the spatial relations between patches. MIBM effectively fuses the multi-scale patch features by utilizing the principle of information bottleneck to generate a robust and compact bag-level representation. Besides, we also propose a semantic consistency loss to stabilize the training of the whole model. Extensive studies on three subtyping datasets and seven gene mutation detection datasets demonstrate the superiority of MG-Trans.


Subject(s)
Image Processing, Computer-Assisted , Semantics
SELECTION OF CITATIONS
SEARCH DETAIL
...