Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 47
Filter
1.
Article in English | MEDLINE | ID: mdl-38885100

ABSTRACT

Multispectral image (MS) and panchromatic image (PAN) fusion, which is also named as multispectral pansharpening, aims to obtain MS with high spatial resolution and high spectral resolution. However, due to the usual neglect of noise and blur generated in the imaging and transmission phases of data during training, many deep learning (DL) pansharpening methods fail to perform on the dataset containing noise and blur. To tackle this problem, a variational optimization-guided two-stage network (VOGTNet) for multispectral pansharpening is proposed in this work, and the performance of variational optimization (VO)-based pansharpening methods relies on prior information and estimates of spatial-spectral degradation from the target image to other two original images. Concretely, we propose a dual-branch fusion network (DBFN) based on supervised learning and train it by using the datasets containing noise and blur to generate the prior fusion result as the prior information that can remove noise and blur in the initial stage. Subsequently, we exploit the estimated spectral response function (SRF) and point spread function (PSF) to simulate the process of spatial-spectral degradation, respectively, thereby making the prior fusion result and the adaptive recovery model (ARM) jointly perform unsupervised learning on the original dataset to restore more image details and results in the generation of the high-resolution MSs in the second stage. Experimental results indicate that the proposed VOGTNet improves pansharpening performance and shows strong robustness against noise and blur. Furthermore, the proposed VOGTNet can be extended to be a general pansharpening framework, which can improve the ability to resist noise and blur of other supervised learning-based pansharpening methods. The source code is available at https://github.com/HZC-1998/VOGTNet.

2.
Article in English | MEDLINE | ID: mdl-38833391

ABSTRACT

Accurately distinguishing between background and anomalous objects within hyperspectral images poses a significant challenge. The primary obstacle lies in the inadequate modeling of prior knowledge, leading to a performance bottleneck in hyperspectral anomaly detection (HAD). In response to this challenge, we put forth a groundbreaking coupling paradigm that combines model-driven low-rank representation (LRR) methods with data-driven deep learning techniques by learning disentangled priors (LDP). LDP seeks to capture complete priors for effectively modeling the background, thereby extracting anomalies from hyperspectral images more accurately. LDP follows a model-driven deep unfolding architecture, where the prior knowledge is separated into the explicit low-rank prior formulated by expert knowledge and implicit learnable priors by means of deep networks. The internal relationships between explicit and implicit priors within LDP are elegantly modeled through a skip residual connection. Furthermore, we provide a mathematical proof of the convergence of our proposed model. Our experiments, conducted on multiple widely recognized datasets, demonstrate that LDP surpasses most of the current advanced HAD techniques, exceling in both detection performance and generalization capability.

3.
Article in English | MEDLINE | ID: mdl-38568772

ABSTRACT

The foundation model has recently garnered significant attention due to its potential to revolutionize the field of visual representation learning in a self-supervised manner. While most foundation models are tailored to effectively process RGB images for various visual tasks, there is a noticeable gap in research focused on spectral data, which offers valuable information for scene understanding, especially in remote sensing (RS) applications. To fill this gap, we created for the first time a universal RS foundation model, named SpectralGPT, which is purpose-built to handle spectral RS images using a novel 3D generative pretrained transformer (GPT). Compared to existing foundation models, SpectralGPT 1) accommodates input images with varying sizes, resolutions, time series, and regions in a progressive training fashion, enabling full utilization of extensive RS Big Data; 2) leverages 3D token generation for spatial-spectral coupling; 3) captures spectrally sequential patterns via multi-target reconstruction; 4) trains on one million spectral RS images, yielding models with over 600 million parameters. Our evaluation highlights significant performance improvements with pretrained SpectralGPT models, signifying substantial potential in advancing spectral RS Big Data applications within the field of geoscience across four downstream tasks: single/multi-label scene classification, semantic segmentation, and change detection.

4.
Article in English | MEDLINE | ID: mdl-38381635

ABSTRACT

multimodal image fusion involves tasks like pan-sharpening and depth super-resolution. Both tasks aim to generate high-resolution target images by fusing the complementary information from the texture-rich guidance and low-resolution target counterparts. They are inborn with reconstructing high-frequency information. Despite their inherent frequency domain connection, most existing methods only operate solely in the spatial domain and rarely explore the solutions in the frequency domain. This study addresses this limitation by proposing solutions in both the spatial and frequency domains. We devise a Spatial-Frequency Information Integration Network, abbreviated as SFINet for this purpose. The SFINet includes a core module tailored for image fusion. This module consists of three key components: a spatial-domain information branch, a frequency-domain information branch, and a dual-domain interaction. The spatial-domain information branch employs the spatial convolution-equipped invertible neural operators to integrate local information from different modalities in the spatial domain. Meanwhile, the frequency-domain information branch adopts a modality-aware deep Fourier transformation to capture the image-wide receptive field for exploring global contextual information. In addition, the dual-domain interaction facilitates information flow and the learning of complementary representations. We further present an improved version of SFINet, SFINet++, that enhances the representation of spatial information by replacing the basic convolution unit in the original spatial domain branch with the information-lossless invertible neural operator. We conduct extensive experiments to validate the effectiveness of the proposed networks and demonstrate their outstanding performance against state-of-the-art methods in two representative multimodal image fusion tasks: pan-sharpening and depth super-resolution. The source code is publicly available at https://github.com/manman1995/Awaresome-pansharpening.

5.
IEEE Trans Image Process ; 32: 5877-5892, 2023.
Article in English | MEDLINE | ID: mdl-37889806

ABSTRACT

The synthesis of high-resolution (HR) hyperspectral image (HSI) by fusing a low-resolution HSI with a corresponding HR multispectral image has emerged as a prevalent HSI super-resolution (HSR) scheme. Recent researches have revealed that tensor analysis is an emerging tool for HSR. However, most off-the-shelf tensor-based HSR algorithms tend to encounter challenges in rank determination and modeling capacity. To address these issues, we construct nonlocal patch tensors (NPTs) and characterize low-rank structures with coupled Bayesian tensor factorization. It is worth emphasizing that the intrinsic global spectral correlation and nonlocal spatial similarity can be simultaneously explored under the proposed model. Moreover, benefiting from the technique of automatic relevance determination, we propose a hierarchical probabilistic framework based on Canonical Polyadic (CP) factorization, which incorporates a sparsity-inducing prior over the underlying factor matrices. We further develop an effective expectation-maximization-type optimization scheme for framework estimation. In contrast to existing works, the proposed model can infer the latent CP rank of NPT adaptively without tuning parameters. Extensive experiments on synthesized and real datasets illustrate the intrinsic capability of our model in rank determination as well as its superiority in fusion performance.

6.
Article in English | MEDLINE | ID: mdl-37527326

ABSTRACT

Convolutional neural networks (CNNs) have recently achieved outstanding performance for hyperspectral (HS) and multispectral (MS) image fusion. However, CNNs cannot explore the long-range dependence for HS and MS image fusion because of their local receptive fields. To overcome this limitation, a transformer is proposed to leverage the long-range dependence from the network inputs. Because of the ability of long-range modeling, the transformer overcomes the sole CNN on many tasks, whereas its use for HS and MS image fusion is still unexplored. In this article, we propose a spectral-spatial transformer (SST) to show the potentiality of transformers for HS and MS image fusion. We devise first two branches to extract spectral and spatial features in the HS and MS images by SST blocks, which can explore the spectral and spatial long-range dependence, respectively. Afterward, spectral and spatial features are fused feeding the result back to spectral and spatial branches for information interaction. Finally, the high-resolution (HR) HS image is reconstructed by dense links from all the fused features to make full use of them. The experimental analysis demonstrates the high performance of the proposed approach compared with some state-of-the-art (SOTA) methods.

7.
IEEE Trans Image Process ; 32: 4649-4663, 2023.
Article in English | MEDLINE | ID: mdl-37552588

ABSTRACT

In this paper, we introduce a new algorithm based on archetypal analysis for blind hyperspectral unmixing, assuming linear mixing of endmembers. Archetypal analysis is a natural formulation for this task. This method does not require the presence of pure pixels (i.e., pixels containing a single material) but instead represents endmembers as convex combinations of a few pixels present in the original hyperspectral image. Our approach leverages an entropic gradient descent strategy, which (i) provides better solutions for hyperspectral unmixing than traditional archetypal analysis algorithms, and (ii) leads to efficient GPU implementations. Since running a single instance of our algorithm is fast, we also propose an ensembling mechanism along with an appropriate model selection procedure that make our method robust to hyper-parameter choices while keeping the computational complexity reasonable. By using six standard real datasets, we show that our approach outperforms state-of-the-art matrix factorization and recent deep learning methods. We also provide an open-source PyTorch implementation: https://github.com/inria-thoth/EDAA.

8.
Article in English | MEDLINE | ID: mdl-37379187

ABSTRACT

It is generally known that pan-sharpening is fundamentally a PAN-guided multispectral (MS) image super-resolution problem that involves learning the nonlinear mapping from low-resolution (LR) to high-resolution (HR) MS images. Since an infinite number of HR-MS images can be downsampled to produce the same corresponding LR-MS image, learning the mapping from LR-MS to HR-MS image is typically ill-posed and the space of the possible pan-sharpening functions can be extremely large, making it difficult to estimate the optimal mapping solution. To address the above issue, we propose a closed-loop scheme that learns the two opposite mapping including the pan-sharpening and its corresponding degradation process simultaneously to regularize the solution space in a single pipeline. More specifically, an invertible neural network (INN) is introduced to perform a bidirectional closed-loop: the forward operation for LR-MS pan-sharpening and the backward operation for learning the corresponding HR-MS image degradation process. In addition, given the vital importance of high-frequency textures for the Pan-sharpened MS images, we further strengthen the INN by designing a specified multiscale high-frequency texture extraction module. Extensive experimental results demonstrate that the proposed algorithm performs favorably against state-of-the-art methods qualitatively and quantitatively with fewer parameters. Ablation studies also verify the effectiveness of the closed-loop mechanism in pan-sharpening. The source code is made publicly available at https://github.com/manman1995/pan-sharpening-Team-zhouman/.

9.
Article in English | MEDLINE | ID: mdl-37027760

ABSTRACT

Pansharpening refers to the fusion of a low spatial-resolution multispectral image with a high spatial-resolution panchromatic image. In this paper, we propose a novel low-rank tensor completion (LRTC)-based framework with some regularizers for multispectral image pansharpening, called LRTCFPan. The tensor completion technique is commonly used for image recovery, but it cannot directly perform the pansharpening or, more generally, the super-resolution problem because of the formulation gap. Different from previous variational methods, we first formulate a pioneering image super-resolution (ISR) degradation model, which equivalently removes the downsampling operator and transforms the tensor completion framework. Under such a framework, the original pansharpening problem is realized by the LRTC-based technique with some deblurring regularizers. From the perspective of regularizer, we further explore a local-similarity-based dynamic detail mapping (DDM) term to more accurately capture the spatial content of the panchromatic image. Moreover, the low-tubal-rank property of multispectral images is investigated, and the low-tubal-rank prior is introduced for better completion and global characterization. To solve the proposed LRTCFPan model, we develop an alternating direction method of multipliers (ADMM)-based algorithm. Comprehensive experiments at reduced-resolution (i.e., simulated) and full-resolution (i.e., real) data exhibit that the LRTCFPan method significantly outperforms other state-of-the-art pansharpening methods. The code is publicly available at: https://github.com/zhongchengwu/code_LRTCFPan.

10.
IEEE Trans Cybern ; 53(7): 4148-4161, 2023 Jul.
Article in English | MEDLINE | ID: mdl-37022388

ABSTRACT

Hyperspectral image super-resolution (HISR) is about fusing a low-resolution hyperspectral image (LR-HSI) and a high-resolution multispectral image (HR-MSI) to generate a high-resolution hyperspectral image (HR-HSI). Recently, convolutional neural network (CNN)-based techniques have been extensively investigated for HISR yielding competitive outcomes. However, existing CNN-based methods often require a huge amount of network parameters leading to a heavy computational burden, thus, limiting the generalization ability. In this article, we fully consider the characteristic of the HISR, proposing a general CNN fusion framework with high-resolution guidance, called GuidedNet. This framework consists of two branches, including 1) the high-resolution guidance branch (HGB) that can decompose the high-resolution guidance image into several scales and 2) the feature reconstruction branch (FRB) that takes the low-resolution image and the multiscaled high-resolution guidance images from the HGB to reconstruct the high-resolution fused image. GuidedNet can effectively predict the high-resolution residual details that are added to the upsampled HSI to simultaneously improve spatial quality and preserve spectral information. The proposed framework is implemented using recursive and progressive strategies, which can promote high performance with a significant network parameter reduction, even ensuring network stability by supervising several intermediate outputs. Additionally, the proposed approach is also suitable for other resolution enhancement tasks, such as remote sensing pansharpening and single-image super-resolution (SISR). Extensive experiments on simulated and real datasets demonstrate that the proposed framework generates state-of-the-art outcomes for several applications (i.e., HISR, pansharpening, and SISR). Finally, an ablation study and more discussions assessing, for example, the network generalization, the low computational cost, and the fewer network parameters, are provided to the readers. The code link is: https://github.com/Evangelion09/GuidedNet.

11.
IEEE Trans Neural Netw Learn Syst ; 34(11): 9337-9351, 2023 Nov.
Article in English | MEDLINE | ID: mdl-35320108

ABSTRACT

In practice, the acquirement of labeled samples for hyperspectral image (HSI) is time-consuming and labor-intensive. It frequently induces the trouble of model overfitting and performance degradation for the supervised methodologies in HSI classification (HSIC). Fortunately, semisupervised learning can alleviate this deficiency, and graph convolutional network (GCN) is one of the most effective semisupervised approaches, which propagates the node information from each other in a transductive manner. In this study, we propose a cross-scale graph prototypical network (X-GPN) to achieve semisupervised high-quality HSIC. Specifically, considering the multiscale appearance of the land covers in the same remotely captured scene, we involve the neighborhoods of different scales to construct the adjacency matrices and simultaneously design a multibranch framework to investigate the abundant spectral-spatial features through graph convolutions. Furthermore, to exploit the complementary information between different scales, we simply employ the standard 1-D convolution to excavate the dependence of the intranode and concatenate the output with the features generated from other scales. Intuitively, different branches for various samples should have different importance to predict their categories. Thus, we develop a self-branch attentional addition (SBAA) module to adaptively highlight the most critical features produced by multiple branches. In addition, different from previous GCN for HSIC, we devise an innovative prototypical layer comprising a distance-based cross-entropy (DCE) loss function and a novel temporal entropy-based regularizer (TER), which can enhance the discrimination and representativeness of the node features and prototypes actively. Extensive experiments demonstrate that the proposed X-GPN is superior to the classic and state-of-the-art (SOTA) methods in terms of the classification performance.

12.
IEEE Trans Cybern ; 53(1): 679-691, 2023 Jan.
Article in English | MEDLINE | ID: mdl-35609106

ABSTRACT

Recently, low-rank representation (LRR) methods have been widely applied for hyperspectral anomaly detection, due to their potentials in separating the backgrounds and anomalies. However, existing LRR models generally convert 3-D hyperspectral images (HSIs) into 2-D matrices, inevitably leading to the destruction of intrinsic 3-D structure properties in HSIs. To this end, we propose a novel tensor low-rank and sparse representation (TLRSR) method for hyperspectral anomaly detection. A 3-D TLR model is expanded to separate the LR background part represented by a tensorial background dictionary and corresponding coefficients. This representation characterizes the multiple subspace property of the complex LR background. Based on the weighted tensor nuclear norm and the LF,1 sparse norm, a dictionary is designed to make its atoms more relevant to the background. Moreover, a principal component analysis (PCA) method can be assigned as one preprocessing step to exact a subset of HSI bands, retaining enough the HSI object information and reducing computational time of the postprocessing tensorial operations. The proposed model is efficiently solved by the well-designed alternating direction method of multipliers (ADMMs). A comparison with the existing algorithms via experiments establishes the competitiveness of the proposed method with the state-of-the-art competitors in the hyperspectral anomaly detection task.

13.
IEEE Trans Neural Netw Learn Syst ; 34(11): 9088-9101, 2023 Nov.
Article in English | MEDLINE | ID: mdl-35263264

ABSTRACT

Pansharpening refers to the fusion of a panchromatic (PAN) image with a high spatial resolution and a multispectral (MS) image with a low spatial resolution, aiming to obtain a high spatial resolution MS (HRMS) image. In this article, we propose a novel deep neural network architecture with level-domain-based loss function for pansharpening by taking into account the following double-type structures, i.e., double-level, double-branch, and double-direction, called as triple-double network (TDNet). By using the structure of TDNet, the spatial details of the PAN image can be fully exploited and utilized to progressively inject into the low spatial resolution MS (LRMS) image, thus yielding the high spatial resolution output. The specific network design is motivated by the physical formula of the traditional multi-resolution analysis (MRA) methods. Hence, an effective MRA fusion module is also integrated into the TDNet. Besides, we adopt a few ResNet blocks and some multi-scale convolution kernels to deepen and widen the network to effectively enhance the feature extraction and the robustness of the proposed TDNet. Extensive experiments on reduced- and full-resolution datasets acquired by WorldView-3, QuickBird, and GaoFen-2 sensors demonstrate the superiority of the proposed TDNet compared with some recent state-of-the-art pansharpening approaches. An ablation study has also corroborated the effectiveness of the proposed approach. The code is available at https://github.com/liangjiandeng/TDNet.

14.
Article in English | MEDLINE | ID: mdl-36279325

ABSTRACT

Multimodal data provide complementary information of a natural phenomenon by integrating data from various domains with very different statistical properties. Capturing the intramodality and cross-modality information of multimodal data is the essential capability of multimodal learning methods. The geometry-aware data analysis approaches provide these capabilities by implicitly representing data in various modalities based on their geometric underlying structures. Also, in many applications, data are explicitly defined on an intrinsic geometric structure. Generalizing deep learning methods to the non-Euclidean domains is an emerging research field, which has recently been investigated in many studies. Most of those popular methods are developed for unimodal data. In this article, a multimodal graph wavelet convolutional network (M-GWCN) is proposed as an end-to-end network. M-GWCN simultaneously finds intramodality representation by applying the multiscale graph wavelet transform to provide helpful localization properties in the graph domain of each modality and cross-modality representation by learning permutations that encode correlations among various modalities. M-GWCN is not limited to either the homogeneous modalities with the same number of data or any prior knowledge indicating correspondences between modalities. Several semisupervised node classification experiments have been conducted on three popular unimodal explicit graph-based datasets and five multimodal implicit ones. The experimental results indicate the superiority and effectiveness of the proposed methods compared with both spectral graph domain convolutional neural networks and state-of-the-art multimodal methods.

15.
IEEE Trans Image Process ; 31: 6440-6454, 2022.
Article in English | MEDLINE | ID: mdl-36215361

ABSTRACT

Outlier detection is to separate anomalous data from inliers in the dataset. Recently, the most deep learning methods of outlier detection leverage an auxiliary reconstruction task by assuming that outliers are more difficult to recover than normal samples (inliers). However, it is not always true in deep auto-encoder (AE) based models. The auto-encoder based detectors may recover certain outliers even if outliers are not in the training data, because they do not constrain the feature learning. Instead, we think outlier detection can be done in the feature space by measuring the distance between outliers' features and the consistency feature of inliers. To achieve this, we propose an unsupervised outlier detection method using a memory module and a contrastive learning module (MCOD). The memory module constrains the consistency of features, which merely represent the normal data. The contrastive learning module learns more discriminative features, which boosts the distinction between outliers and inliers. Extensive experiments on four benchmark datasets show that our proposed MCOD performs well and outperforms eleven state-of-the-art methods.


Subject(s)
Algorithms , Learning
16.
Article in English | MEDLINE | ID: mdl-36301787

ABSTRACT

This article addresses the problem of the building an out-of-the-box deep detector, motivated by the need to perform anomaly detection across multiple hyperspectral images (HSIs) without repeated training. To solve this challenging task, we propose a unified detector anomaly detection network (AUD-Net) inspired by few-shot learning. The crucial issues solved by AUD-Net include: how to improve the generalization of the model on various HSIs that contain different categories of land cover; and how to unify the different spectral sizes between HSIs. To achieve this, we first build a series of subtasks to classify the relations between the center and its surroundings in the dual window. Through relation learning, AUD-Net can be more easily generalized to unseen HSIs, as the relations of the pixel pairs are shared among different HSIs. Secondly, to handle different HSIs with various spectral sizes, we propose a pooling layer based on the vector of local aggregated descriptors, which maps the variable-sized features to the same space and acquires the fixed-sized relation embeddings. To determine whether the center of the dual window is an anomaly, we build a memory model by the transformer, which integrates the contextual relation embeddings in the dual window and estimates the relation embeddings of the center. By computing the feature difference between the estimated relation embeddings of the centers and the corresponding real ones, the centers with large differences will be detected as anomalies, as they are more difficult to be estimated by the corresponding surroundings. Extensive experiments on both the simulation dataset and 13 real HSIs demonstrate that this proposed AUD-Net has strong generalization for various HSIs and achieves significant advantages over the specific-trained detectors for each HSI.

17.
Article in English | MEDLINE | ID: mdl-35939475

ABSTRACT

This article focuses on end-to-end image matching through joint key-point detection and descriptor extraction. To find repeatable and high discrimination key points, we improve the deep matching network from the perspectives of network structure and network optimization. First, we propose a concurrent multiscale detector (CS-det) network, which consists of several parallel convolutional networks to extract multiscale features and multilevel discriminative information for key-point detection. Moreover, we introduce an attention module to fuse the response maps of various features adaptively. Importantly, we propose two novel rank consistent losses (RC-losses) for network optimization, significantly improving image matching performances. On the one hand, we propose a score rank consistent loss (RC-S-loss) to ensure that the key points have high repeatability. Different from the score difference loss merely focusing on the absolute score of an individual key point, our proposed RC-S-loss pays more attention to the relative score of key points in the image. On the other hand, we propose a score-discrimination RC-loss to ensure that the key point has high discrimination, which can reduce the confusion from other key points in subsequent matching and then further enhance the accuracy of image matching. Extensive experimental results demonstrate that the proposed CS-det improves the mean matching result of deep detector by 1.4%-2.1%, and the proposed RC-losses can boost the matching performances by 2.7%-3.4% than score difference loss. Our source codes are available at https://github.com/iquandou/CS-Net.

18.
IEEE Trans Image Process ; 31: 5079-5092, 2022.
Article in English | MEDLINE | ID: mdl-35881603

ABSTRACT

Recently, embedding and metric-based few-shot learning (FSL) has been introduced into hyperspectral image classification (HSIC) and achieved impressive progress. To further enhance the performance with few labeled samples, we in this paper propose a novel FSL framework for HSIC with a class-covariance metric (CMFSL). Overall, the CMFSL learns global class representations for each training episode by interactively using training samples from the base and novel classes, and a synthesis strategy is employed on the novel classes to avoid overfitting. During the meta-training and meta-testing, the class labels are determined directly using the Mahalanobis distance measurement rather than an extra classifier. Benefiting from the task-adapted class-covariance estimations, the CMFSL can construct more flexible decision boundaries than the commonly used Euclidean metric. Additionally, a lightweight cross-scale convolutional network (LXConvNet) consisting of 3D and 2D convolutions is designed to thoroughly exploit the spectral-spatial information in the high-frequency and low-frequency scales with low computational complexity. Furthermore, we devise a spectral-prior-based refinement module (SPRM) in the initial stage of feature extraction, which cannot only force the network to emphasize the most informative bands while suppressing the useless ones, but also alleviate the effects of the domain shift between the base and novel categories to learn a collaborative embedding mapping. Extensive experiment results on four benchmark data sets demonstrate that the proposed CMFSL can outperform the state-of-the-art methods with few-shot annotated samples.

19.
Article in English | MEDLINE | ID: mdl-37015404

ABSTRACT

Learning-based infrared small object detection methods currently rely heavily on the classification backbone network. This tends to result in tiny object loss and feature distinguishability limitations as the network depth increases. Furthermore, small objects in infrared images are frequently emerged bright and dark, posing severe demands for obtaining precise object contrast information. For this reason, we in this paper propose a simple and effective "U-Net in U-Net" framework, UIU-Net for short, and detect small objects in infrared images. As the name suggests, UIU-Net embeds a tiny U-Net into a larger U-Net backbone, enabling the multi-level and multi-scale representation learning of objects. Moreover, UIU-Net can be trained from scratch, and the learned features can enhance global and local contrast information effectively. More specifically, the UIU-Net model is divided into two modules: the resolution-maintenance deep supervision (RM-DS) module and the interactive-cross attention (IC-A) module. RM-DS integrates Residual U-blocks into a deep supervision network to generate deep multi-scale resolution-maintenance features while learning global context information. Further, IC-A encodes the local context information between the low-level details and high-level semantic features. Extensive experiments conducted on two infrared single-frame image datasets, i.e., SIRST and Synthetic datasets, show the effectiveness and superiority of the proposed UIU-Net in comparison with several state-of-the-art infrared small object detection methods. The proposed UIU-Net also produces powerful generalization performance for video sequence infrared small object datasets, e.g., ATR ground/air video sequence dataset. The codes of this work are available openly at https://github.com/danfenghong/IEEE_TIP_UIU-Net.

20.
IEEE Trans Neural Netw Learn Syst ; 33(8): 3372-3386, 2022 Aug.
Article in English | MEDLINE | ID: mdl-33544676

ABSTRACT

Recently, the majority of successful matching approaches are based on convolutional neural networks, which focus on learning the invariant and discriminative features for individual image patches based on image content. However, the image patch matching task is essentially to predict the matching relationship of patch pairs, that is, matching (similar) or non-matching (dissimilar). Therefore, we consider that the feature relation (FR) learning is more important than individual feature learning for image patch matching problem. Motivated by this, we propose an element-wise FR learning network for image patch matching, which transforms the image patch matching task into an image relationship-based pattern classification problem and dramatically improves generalization performances on image matching. Meanwhile, the proposed element-wise learning methods encourage full interaction between feature information and can naturally learn FR. Moreover, we propose to aggregate FR from multilevels, which integrates the multiscale FR for more precise matching. Experimental results demonstrate that our proposal achieves superior performances on cross-spectral image patch matching and single spectral image patch matching, and good generalization on image patch retrieval.

SELECTION OF CITATIONS
SEARCH DETAIL
...