Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 9 de 9
Filter
Add more filters










Database
Language
Publication year range
1.
Article in English | MEDLINE | ID: mdl-38669166

ABSTRACT

The conventional approach to image recognition has been based on raster graphics, which can suffer from aliasing and information loss when scaled up or down. In this paper, we propose a novel approach that leverages the benefits of vector graphics for object localization and classification. Our method, called YOLaT (You Only Look at Text), takes the textual document of vector graphics as input, rather than rendering it into pixels. YOLaT builds multi-graphs to model the structural and spatial information in vector graphics and utilizes a dual-stream graph neural network (GNN) to detect objects from the graph. However, for real-world vector graphics, YOLaT only models in flat GNN with vertexes as nodes ignore higher-level information of vector data. Therefore, we propose YOLaT++ to learn Multi-level Abstraction Feature Learning from a new perspective: Primitive Shapes to Curves and Points. On the other hand, given few public datasets focus on vector graphics, data-driven learning cannot exert its full power on this format. We provide a large-scale and challenging dataset for Chart-based Vector Graphics Detection and Chart Understanding, termed VG-DCU, with vector graphics, raster graphics, annotations, and raw data drawn for creating these vector charts. Experiments show that the YOLaT series outperforms both vector graphics and raster graphics-based object detection methods on both subsets of VG-DCU in terms of both accuracy and efficiency, showcasing the potential of vector graphics for image recognition tasks. Our codes, models, and the VG-DCU dataset are available at: https://github.com/microsoft/YOLaT-VectorGraphicsRecognition.

2.
IEEE Trans Image Process ; 33: 3200-3211, 2024.
Article in English | MEDLINE | ID: mdl-38687652

ABSTRACT

Person re-identification (ReID) typically encounters varying degrees of occlusion in real-world scenarios. While previous methods have addressed this using handcrafted partitions or external cues, they often compromise semantic information or increase network complexity. In this paper, we propose a new method from a novel perspective, termed as OAT. Specifically, we first use a Transformer backbone with multiple class tokens for diverse pedestrian feature learning. Given that the self-attention mechanism in the Transformer solely focuses on low-level feature correlations, neglecting higher-order relations among different body parts or regions. Thus, we propose the Second-Order Attention (SOA) module to capture more comprehensive features. To address computational efficiency, we further derive approximation formulations for implementing second-order attention. Observing that the importance of semantics associated with different class tokens varies due to the uncertainty of the location and size of occlusion, we propose the Entropy Guided Fusion (EGF) module for multiple class tokens. By conducting uncertainty analysis on each class token, higher weights are assigned to those with lower information entropy, while lower weights are assigned to class tokens with higher entropy. The dynamic weight adjustment can mitigate the impact of occlusion-induced uncertainty on feature learning, thereby facilitating the acquisition of discriminative class token representations. Extensive experiments have been conducted on occluded and holistic person re-identification datasets, which demonstrate the effectiveness of our proposed method.

3.
IEEE Trans Image Process ; 33: 1348-1360, 2024.
Article in English | MEDLINE | ID: mdl-38335087

ABSTRACT

Prompt learning stands out as one of the most efficient approaches for adapting powerful vision-language foundational models like CLIP to downstream datasets by tuning learnable prompt vectors with very few samples. However, despite its success in achieving remarkable performance on in-domain data, prompt learning still faces the significant challenge of effectively generalizing to novel classes and domains. Some existing methods address this concern by dynamically generating distinct prompts for different domains. Yet, they overlook the inherent potential of prompts to generalize across unseen domains. To address these limitations, our study introduces an innovative prompt learning paradigm, called MetaPrompt, aiming to directly learn domain invariant prompt in few-shot scenarios. To facilitate learning prompts for image and text inputs independently, we present a dual-modality prompt tuning network comprising two pairs of coupled encoders. Our study centers on an alternate episodic training algorithm to enrich the generalization capacity of the learned prompts. In contrast to traditional episodic training algorithms, our approach incorporates both in-domain updates and domain-split updates in a batch-wise manner. For in-domain updates, we introduce a novel asymmetric contrastive learning paradigm, where representations from the pre-trained encoder assume supervision to regularize prompts from the prompted encoder. To enhance performance on out-of-domain distribution, we propose a domain-split optimization on visual prompts for cross-domain tasks or textual prompts for cross-class tasks during domain-split updates. Extensive experiments across 11 datasets for base-to-new generalization and 4 datasets for domain generalization exhibit favorable performance. Compared with the state-of-the-art method, MetaPrompt achieves an absolute gain of 1.02% on the overall harmonic mean in base-to-new generalization and consistently demonstrates superiority over all benchmarks in domain generalization.

4.
J Hazard Mater ; 465: 133281, 2024 03 05.
Article in English | MEDLINE | ID: mdl-38134688

ABSTRACT

Degraded mulch pollution is of a great concern for agricultural soils. Although numerous studies have examined this issue from an environmental perspective, there is a lack of research focusing on crop-specific factors such as crop type. This study aimed to explore the correlation between meteorological and crop factors and mulch contamination. The first step was to estimate the amounts of mulch-derived microplastics (MPs) and phthalic acid esters (PAEs) during the rapid expansion period (1993-2012) of mulch usage in China. Subsequently, the Elastic Net (EN) and Random Forest (RF) models were employed to process a dataset that included meteorological, crop, and estimation data. At the national level, the RF model suggested that coldness in fall was crucial for MPs generation, while vegetables acted as a key factor for PAEs release. On a regional scale, the EN results showed that crops like vegetables, cotton, and peanuts remained significantly involved in PAEs contamination. As for MPs generation, coldness prevailed over all regions. Aridity became more critical for southern regions compared to northern regions due to solar radiation. Lastly, each region possessed specific crop types that could potentially influence its MPs contamination levels and provide guidance for developing sustainable ways to manage mulch contamination.


Subject(s)
Phthalic Acids , Soil Pollutants , Phthalic Acids/analysis , Plastics , Soil Pollutants/analysis , Agriculture , Soil , Vegetables , Microplastics , China , Esters/analysis
5.
IEEE Trans Image Process ; 32: 4223-4236, 2023.
Article in English | MEDLINE | ID: mdl-37405883

ABSTRACT

The occluded person re-identification (ReID) aims to match person images captured in severely occluded environments. Current occluded ReID works mostly rely on auxiliary models or employ a part-to-part matching strategy. However, these methods may be sub-optimal since the auxiliary models are constrained by occlusion scenes and the matching strategy will deteriorate when both query and gallery set contain occlusion. Some methods attempt to solve this problem by applying image occlusion augmentation (OA) and have shown great superiority in their effectiveness and lightness. But there are two defects that existed in the previous OA-based method: 1) The occlusion policy is fixed throughout the entire training and cannot be dynamically adjusted based on the current training status of the ReID network. 2) The position and area of the applied OA are completely random, without reference to the image content to choose the most suitable policy. To address these challenges, we propose a novel Content-Adaptive Auto-Occlusion Network (CAAO), that is able to dynamically select the proper occlusion region of an image based on its content and the current training status. Specifically, CAAO consists of two parts: the ReID network and the Auto-Occlusion Controller (AOC) module. AOC automatically generates the optimal OA policy based on the feature map extracted from the ReID network and applies occlusion on the images for ReID network training. An on-policy reinforcement learning based alternating training paradigm is proposed to iteratively update the ReID network and AOC module. Comprehensive experiments on occluded and holistic person ReID benchmarks demonstrate the superiority of CAAO.

6.
Article in English | MEDLINE | ID: mdl-37015433

ABSTRACT

Occluded person re-identification (ReID) is a challenging task due to more background noises and incomplete foreground information. Although existing human parsing-based ReID methods can tackle this problem with semantic alignment at the finest pixel level, their performance is heavily affected by the human parsing model. Most supervised methods propose to train an extra human parsing model aside from the ReID model with cross-domain human parts annotation, suffering from expensive annotation cost and domain gap; Unsupervised methods integrate a feature clustering-based human parsing process into the ReID model, but lacking supervision signals brings less satisfactory segmentation results. In this paper, we argue that the pre-existing information in the ReID training dataset can be directly used as supervision signals to train the human parsing model without any extra annotation. By integrating a weakly supervised human co-parsing network into the ReID network, we propose a novel framework that exploits shared information across different images of the same pedestrian, called the Human Co-parsing Guided Alignment (HCGA) framework. Specifically, the human co-parsing network is weakly supervised by three consistency criteria, namely global semantics, local space, and background. By feeding the semantic information and deep features from the person ReID network into the guided alignment module, features of the foreground and human parts can then be obtained for effective occluded person ReID. Experiment results on two occluded and two holistic datasets demonstrate the superiority of our method. Especially on Occluded-DukeMTMC, it achieves 70.2% Rank-1 accuracy and 57.5% mAP.

7.
IEEE Trans Image Process ; 30: 7776-7789, 2021.
Article in English | MEDLINE | ID: mdl-34495830

ABSTRACT

Person Re-identification (ReID) aims to retrieve the pedestrian with the same identity across different views. Existing studies mainly focus on improving accuracy, while ignoring their efficiency. Recently, several hash based methods have been proposed. Despite their improvement in efficiency, there still exists an unacceptable gap in accuracy between these methods and real-valued ones. Besides, few attempts have been made to simultaneously explicitly reduce redundancy and improve discrimination of hash codes, especially for short ones. Integrating Mutual learning may be a possible solution to reach this goal. However, it fails to utilize the complementary effect of teacher and student models. Additionally, it will degrade the performance of teacher models by treating two models equally. To address these issues, we propose a salience-guided iterative asymmetric mutual hashing (SIAMH) to achieve high-quality hash code generation and fast feature extraction. Specifically, a salience-guided self-distillation branch (SSB) is proposed to enable SIAMH to generate hash codes based on salience regions, thus explicitly reducing the redundancy between codes. Moreover, a novel iterative asymmetric mutual training strategy (IAMT) is proposed to alleviate drawbacks of common mutual learning, which can continuously refine the discriminative regions for SSB and extract regularized dark knowledge for two models as well. Extensive experiment results on five widely used datasets demonstrate the superiority of the proposed method in efficiency and accuracy when compared with existing state-of-the-art hashing and real-valued approaches. The code is released at https://github.com/Vill-Lab/SIAMH.


Subject(s)
Algorithms , Pedestrians , Humans
8.
IEEE Trans Image Process ; 30: 4212-4224, 2021.
Article in English | MEDLINE | ID: mdl-33822724

ABSTRACT

Person re-identification (re-id) suffers from the significant challenge of occlusion, where an image contains occlusions and less discriminative pedestrian information. However, certain work consistently attempts to design complex modules to capture implicit information (including human pose landmarks, mask maps, and spatial information). The network, consequently, focuses on discriminative features learning on human non-occluded body regions and realizes effective matching under spatial misalignment. Few studies have focused on data augmentation, given that existing single-based data augmentation methods bring limited performance improvement. To address the occlusion problem, we propose a novel Incremental Generative Occlusion Adversarial Suppression (IGOAS) network. It consists of 1) an incremental generative occlusion block, generating easy-to-hard occlusion data, that makes the network more robust to occlusion by gradually learning harder occlusion instead of hardest occlusion directly. And 2) a global-adversarial suppression (G&A) framework with a global branch and an adversarial suppression branch. The global branch extracts steady global features of the images. The adversarial suppression branch, embedded with two occlusion suppression module, minimizes the generated occlusion's response and strengthens attentive feature representation on human non-occluded body regions. Finally, we get a more discriminative pedestrian feature descriptor by concatenating two branches' features, which is robust to the occlusion problem. The experiments on the occluded dataset show the competitive performance of IGOAS. On Occluded-DukeMTMC, it achieves 60.1% Rank-1 accuracy and 49.4% mAP.

9.
IEEE Trans Neural Netw Learn Syst ; 25(10): 1779-92, 2014 Oct.
Article in English | MEDLINE | ID: mdl-25291733

ABSTRACT

Multilinear/tensor extensions of manifold learning based algorithms have been widely used in computer vision and pattern recognition. This paper first provides a systematic analysis of the multilinear extensions for the most popular methods by using alignment techniques, thereby obtaining a general tensor alignment framework. From this framework, it is easy to show that the manifold learning based tensor learning methods are intrinsically different from the alignment techniques. Based on the alignment framework, a robust tensor learning method called sparse tensor alignment (STA) is then proposed for unsupervised tensor feature extraction. Different from the existing tensor learning methods, L1- and L2-norms are introduced to enhance the robustness in the alignment step of the STA. The advantage of the proposed technique is that the difficulty in selecting the size of the local neighborhood can be avoided in the manifold learning based tensor feature extraction algorithms. Although STA is an unsupervised learning method, the sparsity encodes the discriminative information in the alignment step and provides the robustness of STA. Extensive experiments on the well-known image databases as well as action and hand gesture databases by encoding object images as tensors demonstrate that the proposed STA algorithm gives the most competitive performance when compared with the tensor-based unsupervised learning methods.


Subject(s)
Algorithms , Artificial Intelligence , Learning/physiology , Pattern Recognition, Automated/methods , Databases, Factual/statistics & numerical data , Face , Humans , Image Interpretation, Computer-Assisted , Linear Models , Pattern Recognition, Visual , Principal Component Analysis
SELECTION OF CITATIONS
SEARCH DETAIL
...