Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 44
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Artigo em Inglês | MEDLINE | ID: mdl-38896517

RESUMO

Semi-supervised learning (SSL), which aims to learn with limited labeled data and massive amounts of unlabeled data, offers a promising approach to exploit the massive amounts of satellite Earth observation images. The fundamental concept underlying most state-of-the-art SSL methods involves generating pseudo-labels for unlabeled data based on image-level predictions. However, complex remote sensing (RS) scene images frequently encounter challenges, such as interference from multiple background objects and significant intra-class differences, resulting in unreliable pseudo-labels. In this paper, we propose the SemiRS-COC, a novel semi-supervised classification method for complex RS scenes. Inspired by the idea that neighboring objects in feature space should share consistent semantic labels, SemiRS-COC utilizes the similarity between foreground objects in RS images to generate reliable object-level pseudo-labels, effectively addressing the issues of multiple background objects and significant intra-class differences in complex RS images. Specifically, we first design a Local Self-Learning Object Perception (LSLOP) mechanism, which transforms multiple background objects interference of RS images into usable annotation information, enhancing the model's object perception capability. Furthermore, we present a Cross-Object Consistency Pseudo-Labeling (COCPL) strategy, which generates reliable object-level pseudo-labels by comparing the similarity of foreground objects across different RS images, effectively handling significant intra-class differences. Extensive experiments demonstrate that our proposed method achieves excellent performance compared to state-of-the-art methods on three widely-adopted RS datasets.

2.
Mikrochim Acta ; 191(6): 343, 2024 May 27.
Artigo em Inglês | MEDLINE | ID: mdl-38801537

RESUMO

A portable and integrated electrochemical detection system has been constructed for on-site and real-time detection of chemical oxygen demand (COD). The system mainly consists of four parts: (i) sensing electrode with a copper-cobalt bimetallic oxide (CuCoOx)-modified screen-printed electrode; (ii) an integrated electrochemical detector for the conversion, amplification, and transmission of weak signals; (iii) a smartphone installed with a self-developed Android application (APP) for issuing commands, receiving, and displaying detection results; and (iv) a 3D-printed microfluidic cell for the continuous input of water samples. Benefiting from the superior catalytic capability of CuCoOx, the developed system shows a high detection sensitivity with 0.335 µA/(mg/L) and a low detection limit of 5.957 mg/L for COD determination and possessing high anti-interference ability to chloride ions. Moreover, this system presents good consistency with the traditional dichromate method in COD detection of actual water samples. Due to the advantages of cost effectiveness, portability, and point-of-care testing, the system shows great potential for water quality monitoring, especially in resource-limited remote areas.

3.
IEEE Trans Image Process ; 33: 3707-3721, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38809730

RESUMO

Recent advancements in deep learning techniques have pushed forward the frontiers of real photograph denoising. However, due to the inherent pooling operations in the spatial domain, current CNN-based denoisers are biased towards focusing on low-frequency representations, while discarding the high-frequency components. This will induce a problem for suboptimal visual quality as the image denoising tasks target completely eliminating the complex noises and recovering all fine-scale and salient information. In this work, we tackle this challenge from the frequency perspective and present a new solution pipeline, coined as frequency attention denoising network (FADNet). Our key idea is to build a learning-based frequency attention framework, where the feature correlations on a broader frequency spectrum can be fully characterized, thus enhancing the representational power of the network across multiple frequency channels. Based on this, we design a cascade of adaptive instance residual modules (AIRMs). In each AIRM, we first transform the spatial-domain features into the frequency space. Then, a learning-based frequency attention framework is devised to explore the feature inter-dependencies converted in the frequency domain. Besides this, we introduce an adaptive layer by leveraging the guidance of the estimated noise map and intermediate features to meet the challenges of model generalization in the noise discrepancy. The effectiveness of our method is demonstrated on several real camera benchmark datasets, with superior denoising performance, generalization capability, and efficiency versus the state-of-the-art.

4.
IEEE Trans Image Process ; 32: 5705-5720, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37843992

RESUMO

Color plays an important role in human visual perception, reflecting the spectrum of objects. However, the existing infrared and visible image fusion methods rarely explore how to handle multi-spectral/channel data directly and achieve high color fidelity. This paper addresses the above issue by proposing a novel method with diffusion models, termed as Dif-Fusion, to generate the distribution of the multi-channel input data, which increases the ability of multi-source information aggregation and the fidelity of colors. In specific, instead of converting multi-channel images into single-channel data in existing fusion methods, we create the multi-channel data distribution with a denoising network in a latent space with forward and reverse diffusion process. Then, we use the the denoising network to extract the multi-channel diffusion features with both visible and infrared information. Finally, we feed the multi-channel diffusion features to the multi-channel fusion module to directly generate the three-channel fused image. To retain the texture and intensity information, we propose multi-channel gradient loss and intensity loss. Along with the current evaluation metrics for measuring texture and intensity fidelity, we introduce Delta E as a new evaluation metric to quantify color fidelity. Extensive experiments indicate that our method is more effective than other state-of-the-art image fusion methods, especially in color fidelity. The source code is available at https://github.com/GeoVectorMatrix/Dif-Fusion.

5.
Artigo em Inglês | MEDLINE | ID: mdl-37656638

RESUMO

Despite the great potential of convolutional neural networks (CNNs) in various tasks, the resource-hungry nature greatly hinders their wide deployment in cost-sensitive and low-powered scenarios, especially applications in remote sensing. Existing model pruning approaches, implemented by a "subtraction" operation, impose a performance ceiling on the slimmed model. Self-knowledge distillation (Self-KD) resorts to auxiliary networks that are only active in the training phase for performance improvement. However, the knowledge is holistic and crude, and the learning-based knowledge transfer is mediate and lossy. Here, we propose a novel model-compression method, termed block-wise partner learning (BPL), which comprises "extension" and "fusion" operations and liberates the compressed model from the bondage of baseline. Different from the Self-KD, the proposed BPL creates a partner for each block for performance enhancement in training. For the model to absorb more diverse information, a diversity loss (DL) is designed to evaluate the difference between the original block and the partner. Besides, the partner is fused equivalently instead of being discarded directly. After training, we can simply adopt the fused compressed model that contains the enhancement information of partners but with fewer parameters and less inference cost. As validated using the UC Merced land-use, NWPU-RESISC45, and RSD46-WHU datasets, the BPL demonstrates superiority over other compared model-compression approaches. For example, it attains a substantial floating-point operations (FLOPs) reduction of 73.97% with only 0.24 accuracy (ACC.) loss for ResNet-50 on the UC Merced land-use dataset. The code is available at https://github.com/zhangxin-xd/BPL.

6.
IEEE Trans Image Process ; 32: 4689-4700, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37561618

RESUMO

Network pruning is one of the chief means for improving the computational efficiency of Deep Neural Networks (DNNs). Pruning-based methods generally discard network kernels, channels, or layers, which however inevitably will disrupt original well-learned network correlation and thus lead to performance degeneration. In this work, we propose an Efficient Layer Compression (ELC) approach to efficiently compress serial layers by decoupling and merging rather than pruning. Specifically, we first propose a novel decoupling module to decouple the layers, enabling us readily merge serial layers that include both nonlinear and convolutional layers. Then, the decoupled network is losslessly merged based on the equivalent conversion of the parameters. In this way, our ELC can effectively reduce the depth of the network without destroying the correlation of the convolutional layers. To our best knowledge, we are the first to exploit the mergeability of serial convolutional layers for lossless network layer compression. Experimental results conducted on two datasets demonstrate that our method retains superior performance with a FLOPs reduction of 74.1% for VGG-16 and 54.6% for ResNet-56, respectively. In addition, our ELC improves the inference speed by 2× on Jetson AGX Xavier edge device.

7.
IEEE Trans Image Process ; 32: 3912-3923, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37436852

RESUMO

Neurologically, filter pruning is a procedure of forgetting and remembering recovering. Prevailing methods directly forget less important information from an unrobust baseline at first and expect to minimize the performance sacrifice. However, unsaturated base remembering imposes a ceiling on the slimmed model leading to suboptimal performance. And significantly forgetting at first would cause unrecoverable information loss. Here, we design a novel filter pruning paradigm termed Remembering Enhancement and Entropy-based Asymptotic Forgetting (REAF). Inspired by robustness theory, we first enhance remembering by over-parameterizing baseline with fusible compensatory convolutions which liberates pruned model from the bondage of baseline at no inference cost. Then the collateral implication between original and compensatory filters necessitates a bilateral-collaborated pruning criterion. Specifically, only when the filter has the largest intra-branch distance and its compensatory counterpart has the strongest remembering enhancement power, they are preserved. Further, Ebbinghaus curve-based asymptotic forgetting is proposed to protect the pruned model from unstable learning. The number of pruned filters is increasing asymptotically in the training procedure, which enables the remembering of pretrained weights gradually to be concentrated in the remaining filters. Extensive experiments demonstrate the superiority of REAF over many state-of-the-art (SOTA) methods. For example, REAF removes 47.55% FLOPs and 42.98% parameters of ResNet-50 only with 0.98% TOP-1 accuracy loss on ImageNet. The code is available at https://github.com/zhangxin-xd/REAF.

8.
IEEE Trans Pattern Anal Mach Intell ; 45(7): 8827-8844, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-37018311

RESUMO

Semi-supervised semantic segmentation aims to learn a semantic segmentation model via limited labeled images and adequate unlabeled images. The key to this task is generating reliable pseudo labels for unlabeled images. Existing methods mainly focus on producing reliable pseudo labels based on the confidence scores of unlabeled images while largely ignoring the use of labeled images with accurate annotations. In this paper, we propose a Cross-Image Semantic Consistency guided Rectifying (CISC-R) approach for semi-supervised semantic segmentation, which explicitly leverages the labeled images to rectify the generated pseudo labels. Our CISC-R is inspired by the fact that images belonging to the same class have a high pixel-level correspondence. Specifically, given an unlabeled image and its initial pseudo labels, we first query a guiding labeled image that shares the same semantic information with the unlabeled image. Then, we estimate the pixel-level similarity between the unlabeled image and the queried labeled image to form a CISC map, which guides us to achieve a reliable pixel-level rectification for the pseudo labels. Extensive experiments on the PASCAL VOC 2012, Cityscapes, and COCO datasets demonstrate that the proposed CISC-R can significantly improve the quality of the pseudo labels and outperform the state-of-the-art methods. Code is available at https://github.com/Luffy03/CISC-R.

9.
Front Med (Lausanne) ; 10: 1038534, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36936204

RESUMO

Retinal images have been proven significant in diagnosing multiple diseases such as diabetes, glaucoma, and hypertension. Retinal vessel segmentation is crucial for the quantitative analysis of retinal images. However, current methods mainly concentrate on the segmentation performance of overall retinal vessel structures. The small vessels do not receive enough attention due to their small percentage in the full retinal images. Small retinal vessels are much more sensitive to the blood circulation system and have great significance in the early diagnosis and warning of various diseases. This paper combined two unsupervised methods, local phase congruency (LPC) and orientation scores (OS), with a deep learning network based on the U-Net as attention. And we proposed the U-Net using local phase congruency and orientation scores (UN-LPCOS), which showed a remarkable ability to identify and segment small retinal vessels. A new metric called sensitivity on a small ship (Sesv ) was also proposed to evaluate the methods' performance on the small vessel segmentation. Our strategy was validated on both the DRIVE dataset and the data from Maastricht Study and achieved outstanding segmentation performance on both the overall vessel structure and small vessels.

10.
Comput Biol Med ; 155: 106658, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36827787

RESUMO

A multiscale extension for the well-known block matching and 4D filtering (BM4D) method is proposed by analyzing and extending the wavelet subbands denoising method in such a way that the proposed method avoids directly denoising detail subbands, which considerably simplifies the computations and makes the multiscale processing feasible in 3D. To this end, we first derive the multiscale construction method in 2D and propose multiscale extensions for three 2D natural image denoising methods. Then, the derivation is extended to 3D by proposing mixed multiscale BM4D (mmBM4D) for optical coherence tomography (OCT) image denoising. We tested mmBM4D on three public OCT datasets captured by various imaging devices. The experiments revealed that mmBM4D significantly outperforms its original counterpart and performs on par with the state-of-the-art OCT denoising methods. In terms of peak-signal-to-noise-ratio (PSNR), mmBM4D surpasses the original BM4D by more than 0.68 decibels over the first dataset. In the second and third datasets, significant improvements in the mean to standard deviation ratio, contrast to noise ratio, and equivalent number of looks were achieved. Furthermore, on the downstream task of retinal layer segmentation, the layer quality preservation of the compared OCT denoising methods is evaluated.


Assuntos
Retina , Tomografia de Coerência Óptica , Tomografia de Coerência Óptica/métodos , Razão Sinal-Ruído , Coleta de Dados , Algoritmos , Processamento de Imagem Assistida por Computador
11.
IEEE Trans Cybern ; 53(10): 6395-6407, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-35580100

RESUMO

Supervised deep learning techniques have been widely explored in real photograph denoising and achieved noticeable performances. However, being subject to specific training data, most current image denoising algorithms can easily be restricted to certain noisy types and exhibit poor generalizability across testing sets. To address this issue, we propose a novel flexible and well-generalized approach, coined as dual meta attention network (DMANet). The DMANet is mainly composed of a cascade of the self-meta attention blocks (SMABs) and collaborative-meta attention blocks (CMABs). These two blocks have two forms of advantages. First, they simultaneously take both spatial and channel attention into account, allowing our model to better exploit more informative feature interdependencies. Second, the attention blocks are embedded with the meta-subnetwork, which is based on metalearning and supports dynamic weight generation. Such a scheme can provide a beneficial means for self and collaborative updating of the attention maps on-the-fly. Instead of directly stacking the SMABs and CMABs to form a deep network architecture, we further devise a three-stage learning framework, where different blocks are utilized for each feature extraction stage according to the individual characteristics of SMAB and CMAB. On five real datasets, we demonstrate the superiority of our approach against the state of the art. Unlike most existing image denoising algorithms, our DMANet not only possesses a good generalization capability but can also be flexibly used to cope with the unknown and complex real noises, making it highly competitive for practical applications.

12.
IEEE Trans Image Process ; 31: 7419-7434, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36417727

RESUMO

Semantic segmentation methods based on deep neural networks have achieved great success in recent years. However, training such deep neural networks relies heavily on a large number of images with accurate pixel-level labels, which requires a huge amount of human effort, especially for large-scale remote sensing images. In this paper, we propose a point-based weakly supervised learning framework called the deep bilateral filtering network (DBFNet) for the semantic segmentation of remote sensing images. Compared with pixel-level labels, point annotations are usually sparse and cannot reveal the complete structure of the objects; they also lack boundary information, thus resulting in incomplete prediction within the object and the loss of object boundaries. To address these problems, we incorporate the bilateral filtering technique into deeply learned representations in two respects. First, since a target object contains smooth regions that always belong to the same category, we perform deep bilateral filtering (DBF) to filter the deep features by a nonlinear combination of nearby feature values, which encourages the nearby and similar features to become closer, thus achieving a consistent prediction in the smooth region. In addition, the DBF can distinguish the boundary by enlarging the distance between the features on different sides of the edge, thus preserving the boundary information well. Experimental results on two widely used datasets, the ISPRS 2-D semantic labeling Potsdam and Vaihingen datasets, demonstrate that our proposed DBFNet can achieve a highly competitive performance compared with state-of-the-art fully-supervised methods. Code is available at https://github.com/Luffy03/DBFNet.

13.
Artigo em Inglês | MEDLINE | ID: mdl-36083964

RESUMO

As the foundation of image interpretation, semantic segmentation is an active topic in the field of remote sensing. Facing the complex combination of multiscale objects existing in remote sensing images (RSIs), the exploration and modeling of contextual information have become the key to accurately identifying the objects at different scales. Although several methods have been proposed in the past decade, insufficient context modeling of global or local information, which easily results in the fragmentation of large-scale objects, the ignorance of small-scale objects, and blurred boundaries. To address the above issues, we propose a contextual representation enhancement network (CRENet) to strengthen the global context (GC) and local context (LC) modeling in high-level features. The core components of the CRENet are the local feature alignment enhancement module (LFAEM) and the superpixel affinity loss (SAL). The LFAEM aligns and enhances the LC in low-level features by constructing contextual contrast through multilayer cascaded deformable convolution and is then supplemented with high-level features to refine the segmentation map. The SAL assists the network to accurately capture the GC by supervising semantic information and relationship learned from superpixels. The proposed method is plug-and-play and can be embedded in any FCN-based network. Experiments on two popular RSI datasets demonstrate the effectiveness of our proposed network with competitive performance in qualitative and quantitative aspects.

14.
Med Biol Eng Comput ; 60(10): 2851-2863, 2022 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-35931872

RESUMO

Deep learning's great success in image classification is heavily reliant on large-scale annotated datasets. However, obtaining labels for optical coherence tomography (OCT) data requires the significant effort of professional ophthalmologists, which hinders the application of deep learning in OCT image classification. In this paper, we propose a self-supervised patient-specific features learning (SSPSF) method to reduce the amount of data required for well OCT image classification results. Specifically, the SSPSF consists of a self-supervised learning phase and a downstream OCT image classification learning phase. The self-supervised learning phase contains two self-supervised patient-specific features learning tasks. One is to learn to discriminate an OCT scan which belongs to a specific patient. The other task is to learn the invariant features related to patients. In addition, our proposed self-supervised learning model can learn inherent representations from the OCT images without any manual labels, which provides well initialization parameters for the downstream OCT image classification model. The proposed SSPSF achieves classification accuracy of 97.74% and 98.94% on the public RETOUCH dataset and AI Challenger dataset, respectively. The experimental results on two public OCT datasets show the effectiveness of the proposed method compared with other well-known OCT image classification methods with less annotated data.


Assuntos
Tomografia de Coerência Óptica , Humanos , Tomografia de Coerência Óptica/métodos
15.
IEEE Trans Image Process ; 31: 5227-5241, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35914047

RESUMO

Deep learning-based methods have produced significant gains for hyperspectral image (HSI) classification in recent years, leading to high impact academic achievements and industrial applications. Despite the success of deep learning-based methods in HSI classification, they still lack the robustness of handling unknown object in open-set environment (OSE). Open-set classification is to deal with the problem of unknown classes that are not included in the training set, while in closed-set environment (CSE), unknown classes will not appear in the test set. The existing open-set classifiers almost entirely rely on the supervision information given by the known classes in the training set, which leads to the specialization of the learned representations into known classes, and makes it easy to classify unknown classes as known classes. To improve the robustness of HSI classification methods in OSE and meanwhile maintain the classification accuracy of known classes, a spectral-spatial latent reconstruction framework which simultaneously conducts spectral feature reconstruction, spatial feature reconstruction and pixel-wise classification in OSE is proposed. By reconstructing the spectral and spatial features of HSI, the learned feature representation is enhanced, so as to retain the spectral-spatial information useful for rejecting unknown classes and distinguishing known classes. The proposed method uses latent representations for spectral-spatial reconstruction, and achieves robust unknown detection without compromising the accuracy of known classes. Experimental results show that the performance of the proposed method outperforms the existing state-of-the-art methods in OSE.

16.
IEEE Trans Image Process ; 31: 1870-1881, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35139015

RESUMO

OCT fluid segmentation is a crucial task for diagnosis and therapy in ophthalmology. The current convolutional neural networks (CNNs) supervised by pixel-wise annotated masks achieve great success in OCT fluid segmentation. However, requiring pixel-wise masks from OCT images is time-consuming, expensive and expertise needed. This paper proposes an Intra- and inter-Slice Contrastive Learning Network (ISCLNet) for OCT fluid segmentation with only point supervision. Our ISCLNet learns visual representation by designing contrastive tasks that exploit the inherent similarity or dissimilarity from unlabeled OCT data. Specifically, we propose an intra-slice contrastive learning strategy to leverage the fluid-background similarity and the retinal layer-background dissimilarity. Moreover, we construct an inter-slice contrastive learning architecture to learn the similarity of adjacent OCT slices from one OCT volume. Finally, an end-to-end model combining intra- and inter-slice contrastive learning processes learns to segment fluid under the point supervision. The experimental results on two public OCT fluid segmentation datasets (i.e., AI Challenger and RETOUCH) demonstrate that the ISCLNet bridges the gap between fully-supervised and weakly-supervised OCT fluid segmentation and outperforms other well-known point-supervised segmentation methods.


Assuntos
Processamento de Imagem Assistida por Computador , Redes Neurais de Computação , Processamento de Imagem Assistida por Computador/métodos , Retina , Aprendizado de Máquina Supervisionado
17.
Artigo em Inglês | MEDLINE | ID: mdl-37015552

RESUMO

Accurate retinal fluid segmentation on Optical Coherence Tomography (OCT) images plays an important role in diagnosing and treating various eye diseases. The art deep models have shown promising performance on OCT image segmentation given pixel-wise annotated training data. However, the learned model will achieve poor performance on OCT images that are obtained from different devices (domains) due to the domain shift issue. This problem largely limits the real-world application of OCT image segmentation since the types of devices usually are different in each hospital. In this paper, we study the task of cross-domain OCT fluid segmentation, where we are given a labeled dataset of the source device (domain) and an unlabeled dataset of the target device (domain). The goal is to learn a model that can perform well on the target domain. To solve this problem, in this paper, we propose a novel Structure-guided Cross-Attention Network (SCAN), which leverages the retinal layer structure to facilitate domain alignment. Our SCAN is inspired by the fact that the retinal layer structure is robust to domains and can reflect regions that are important to fluid segmentation. In light of this, we build our SCAN in a multi-task manner by jointly learning the retinal structure prediction and fluid segmentation. To exploit the mutual benefit between layer structure and fluid segmentation, we further introduce a cross-attention module to measure the correlation between the layer-specific feature and the fluid-specific feature encouraging the model to concentrate on highly relative regions during domain alignment. Moreover, an adaptation difficulty map is evaluated based on the retinal structure predictions from different domains, which enforces the model focus on hard regions during structure-aware adversarial learning. Extensive experiments on the three domains of the RETOUCH dataset demonstrate the effectiveness of the proposed method and show that our approach produces state-of-the-art performance on cross-domain OCT fluid segmentation.

18.
IEEE Trans Med Imaging ; 40(12): 3641-3651, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34197318

RESUMO

As the labeled anomalous medical images are usually difficult to acquire, especially for rare diseases, the deep learning based methods, which heavily rely on the large amount of labeled data, cannot yield a satisfactory performance. Compared to the anomalous data, the normal images without the need of lesion annotation are much easier to collect. In this paper, we propose an anomaly detection framework, namely [Formula: see text], extracting [Formula: see text]elf-supervised and tr [Formula: see text]ns [Formula: see text]ation-consistent features for [Formula: see text]nomaly [Formula: see text]etection. The proposed SALAD is a reconstruction-based method, which learns the manifold of normal data through an encode-and-reconstruct translation between image and latent spaces. In particular, two constraints (i.e., structure similarity loss and center constraint loss) are proposed to regulate the cross-space (i.e., image and feature) translation, which enforce the model to learn translation-consistent and representative features from the normal data. Furthermore, a self-supervised learning module is engaged into our framework to further boost the anomaly detection accuracy by deeply exploiting useful information from the raw normal data. An anomaly score, as a measure to separate the anomalous data from the healthy ones, is constructed based on the learned self-supervised-and-translation-consistent features. Extensive experiments are conducted on optical coherence tomography (OCT) and chest X-ray datasets. The experimental results demonstrate the effectiveness of our approach.


Assuntos
Tomografia de Coerência Óptica
19.
IEEE Trans Med Imaging ; 40(6): 1591-1602, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-33625978

RESUMO

Recently, automatic diagnostic approaches have been widely used to classify ocular diseases. Most of these approaches are based on a single imaging modality (e.g., fundus photography or optical coherence tomography (OCT)), which usually only reflect the oculopathy to a certain extent, and neglect the modality-specific information among different imaging modalities. This paper proposes a novel modality-specific attention network (MSAN) for multi-modal retinal image classification, which can effectively utilize the modality-specific diagnostic features from fundus and OCT images. The MSAN comprises two attention modules to extract the modality-specific features from fundus and OCT images, respectively. Specifically, for the fundus image, ophthalmologists need to observe local and global pathologies at multiple scales (e.g., from microaneurysms at the micrometer level, optic disc at millimeter level to blood vessels through the whole eye). Therefore, we propose a multi-scale attention module to extract both the local and global features from fundus images. Moreover, large background regions exist in the OCT image, which is meaningless for diagnosis. Thus, a region-guided attention module is proposed to encode the retinal layer-related features and ignore the background in OCT images. Finally, we fuse the modality-specific features to form a multi-modal feature and train the multi-modal retinal image classification network. The fusion of modality-specific features allows the model to combine the advantages of fundus and OCT modality for a more accurate diagnosis. Experimental results on a clinically acquired multi-modal retinal image (fundus and OCT) dataset demonstrate that our MSAN outperforms other well-known single-modal and multi-modal retinal image classification methods.


Assuntos
Disco Óptico , Retina , Técnicas de Diagnóstico Oftalmológico , Fundo de Olho , Retina/diagnóstico por imagem , Tomografia de Coerência Óptica
20.
IEEE Trans Med Imaging ; 40(10): 2600-2614, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-33326376

RESUMO

Due to its noninvasive character, optical coherence tomography (OCT) has become a popular diagnostic method in clinical settings. However, the low-coherence interferometric imaging procedure is inevitably contaminated by heavy speckle noise, which impairs both visual quality and diagnosis of various ocular diseases. Although deep learning has been applied for image denoising and achieved promising results, the lack of well-registered clean and noisy image pairs makes it impractical for supervised learning-based approaches to achieve satisfactory OCT image denoising results. In this paper, we propose an unsupervised OCT image speckle reduction algorithm that does not rely on well-registered image pairs. Specifically, by employing the ideas of disentangled representation and generative adversarial network, the proposed method first disentangles the noisy image into content and noise spaces by corresponding encoders. Then, the generator is used to predict the denoised OCT image with the extracted content features. In addition, the noise patches cropped from the noisy image are utilized to facilitate more accurate disentanglement. Extensive experiments have been conducted, and the results suggest that our proposed method is superior to the classic methods and demonstrates competitive performance to several recently proposed learning-based approaches in both quantitative and qualitative aspects. Code is available at: https://github.com/tsmotlp/DRGAN-OCT.


Assuntos
Processamento de Imagem Assistida por Computador , Tomografia de Coerência Óptica , Algoritmos , Retina , Razão Sinal-Ruído
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...