Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Artigo em Inglês | MEDLINE | ID: mdl-38954572

RESUMO

Multisource optical remote sensing (RS) image classification has obtained extensive research interest with demonstrated superiority. Existing approaches mainly improve classification performance by exploiting complementary information from multisource data. However, these approaches are insufficient in effectively extracting data features and utilizing correlations of multisource optical RS images. For this purpose, this article proposes a generalized spatial-spectral relation-guided fusion network ( S2 RGF-Net) for multisource optical RS image classification. First, we elaborate on spatial-and spectral-domain-specific feature encoders based on data characteristics to explore the rich feature information of optical RS data deeply. Subsequently, two relation-guided fusion strategies are proposed at the dual-level (intradomain and interdomain) to integrate multisource image information effectively. In the intradomain feature fusion, an adaptive de-redundancy fusion module (ADRF) is introduced to eliminate redundancy so that the spatial and spectral features are complete and compact, respectively. In interdomain feature fusion, we construct a spatial-spectral joint attention module (SSJA) based on interdomain relationships to sufficiently enhance the complementary features, so as to facilitate later fusion. Experiments on various multisource optical RS datasets demonstrate that S2 RGF-Net outperforms other state-of-the-art (SOTA) methods.

2.
Artigo em Inglês | MEDLINE | ID: mdl-37603473

RESUMO

Recently, the excellent performance of transformer has attracted the attention of the visual community. Visual transformer models usually reshape images into sequence format and encode them sequentially. However, it is difficult to explicitly represent the relative relationship in distance and direction of visual data with typical 2-D spatial structures. Also, the temporal motion properties of consecutive frames are hardly exploited when it comes to dynamic video tasks like tracking. Therefore, we propose a novel dynamic polar spatio-temporal encoding for video scenes. We use spiral functions in polar space to fully exploit the spatial dependences of distance and direction in real scenes. We then design a dynamic relative encoding mode for continuous frames to capture the continuous spatio-temporal motion characteristics among video frames. Finally, we construct a complex-former framework with the proposed encoding applied to video-tracking tasks, where the complex fusion mode (CFM) realizes the effective fusion of scenes and positions for consecutive frames. The theoretical analysis demonstrates the feasibility and effectiveness of our proposed method. The experimental results on multiple datasets validate that our method can improve tracker performance in various video scenarios.

3.
IEEE Trans Cybern ; 53(2): 1012-1025, 2023 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-36227820

RESUMO

Supervised salient object detection (SOD) methods achieve state-of-the-art performance by relying on human-annotated saliency maps, while unsupervised methods attempt to achieve SOD by not using any annotations. In unsupervised SOD, how to obtain saliency in a completely unsupervised manner is a huge challenge. Existing unsupervised methods usually gain saliency by introducing other handcrafted feature-based saliency methods. In general, the location information of salient objects is included in the feature maps. If the features belonging to salient objects are called salient features and the features that do not belong to salient objects, such as background, are called nonsalient features, by dividing the feature maps into salient features and nonsalient features in an unsupervised way, then the object at the location of the salient feature is the salient object. Based on the above motivation, a novel method called learning salient feature (LSF) is proposed, which achieves unsupervised SOD by LSF from the data itself. This method takes enhancing salient feature and suppressing nonsalient features as the objective. Furthermore, a salient object localization method is proposed to roughly locate objects where the salient feature is located, so as to obtain the salient activation map. Usually, the object in the salient activation map is incomplete and contains a lot of noise. To address this issue, a saliency map update strategy is introduced to gradually remove noise and strengthen boundaries. The visualization of images and their salient activation maps show that our method can effectively learn salient visual objects. Experiments show that we achieve superior unsupervised performance on a series of datasets.

4.
IEEE Trans Neural Netw Learn Syst ; 34(9): 5497-5516, 2023 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-34968181

RESUMO

Deep learning (DL) has made breakthroughs in many computer vision tasks and also in visual tracking. From the beginning of the research on the automatic acquisition of high abstract feature representation, DL has gone deep into all aspects of tracking to date, to name a few, similarity metric, data association, and bounding box estimation. Also, pure DL-based trackers have obtained the state-of-the-art performance after the community's constant research. We believe that it is time to comprehensively review the development of DL research in visual tracking. In this article, we overview the critical improvements brought to the field by DL: deep feature representations, network architecture, and four crucial issues in visual tracking (spatiotemporal information integration, target-specific classification, target information update, and bounding box estimation). The scope of the survey of DL-based tracking covers two primary subtasks for the first time, single-object tracking and multiple-object tracking. Also, we analyze the performance of DL-based approaches and give meaningful conclusions. Finally, we provide several promising directions and tasks in visual tracking and relevant fields.

5.
IEEE Trans Image Process ; 31: 7306-7321, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36383578

RESUMO

Since the superpixel segmentation method aggregates pixels based on similarity, the boundaries of some superpixels indicate the outline of the object and the superpixels provide prerequisites for learning structural-aware features. It is worthwhile to research how to utilize these superpixel priors effectively. In this work, by constructing the graph within superpixel and the graph among superpixels, we propose a novel Multi-level Feature Network (MFNet) based on graph neural network with the above superpixel priors. In our MFNet, we learn three-level features in a hierarchical way: from pixel-level feature to superpixel-level feature, and then to image-level feature. To solve the problem that the existing methods cannot represent superpixels well, we propose a superpixel representation method based on graph neural network, which takes the graph constructed by a single superpixel as input to extract the feature of the superpixel. To reflect the versatility of our MFNet, we apply it to an image-level prediction task and a pixel-level prediction task by designing different prediction modules. An attention linear classifier prediction module is proposed for image-level prediction tasks, such as image classification. An FC-based superpixel prediction module and a Decoder-based pixel prediction module are proposed for pixel-level prediction tasks, such as salient object detection. Our MFNet achieves competitive results on a number of datasets when compared with related methods. The visualization shows that the object boundaries and outline of the saliency maps predicted by our proposed MFNet are more refined and pay more attention to details.

6.
Artigo em Inglês | MEDLINE | ID: mdl-36427283

RESUMO

The feature representation learning process greatly determines the performance of networks in classification tasks. By combining multiscale geometric tools and networks, better representation and learning can be achieved. However, relatively fixed geometric features and multiscale structures are always used. In this article, we propose a more flexible framework called the multiscale dynamic curvelet scattering network (MSDCCN). This data-driven dynamic network is based on multiscale geometric prior knowledge. First, multiresolution scattering and multiscale curvelet features are efficiently aggregated in different levels. Then, these features can be reused in networks flexibly and dynamically, depending on the multiscale intervention flag. The initial value of this flag is based on the complexity assessment, and it is updated according to feature sparsity statistics on the pretrained model. With the multiscale dynamic reuse structure, the feature representation learning process can be improved in the following training process. Also, multistage fine-tuning can be performed to further improve the classification accuracy. Furthermore, a novel multiscale dynamic curvelet scattering module, which is more flexible, is developed to be further embedded into other networks. Extensive experimental results show that better classification accuracies can be achieved by MSDCCN. In addition, necessary evaluation experiments have been performed, including convergence analysis, insight analysis, and adaptability analysis.

7.
Artigo em Inglês | MEDLINE | ID: mdl-35767481

RESUMO

Deep learning-based clustering methods usually regard feature extraction and feature clustering as two independent steps. In this way, the features of all images need to be extracted before feature clustering, which consumes a lot of calculation. Inspired by the self-organizing map network, a self-supervised self-organizing clustering network (S 3 OCNet) is proposed to jointly learn feature extraction and feature clustering, thus realizing a single-stage clustering method. In order to achieve joint learning, we propose a self-organizing clustering header (SOCH), which takes the weight of the self-organizing layer as the cluster centers, and the output of the self-organizing layer as the similarities between the feature and the cluster centers. In order to optimize our network, we first convert the similarities into probabilities which represents a soft cluster assignment, and then we obtain a target for self-supervised learning by transforming the soft cluster assignment into a hard cluster assignment, and finally we jointly optimize backbone and SOCH. By setting different feature dimensions, a Multilayer SOCHs strategy is further proposed by cascading SOCHs. This strategy achieves clustering features in multiple clustering spaces. S 3 OCNet is evaluated on widely used image classification benchmarks such as Canadian Institute For Advanced Research (CIFAR)-10, CIFAR-100, Self-Taught Learning (STL)-10, and Tiny ImageNet. Experimental results show that our method significant improvement over other related methods. The visualization of features and images shows that our method can achieve good clustering results.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...