Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 11 de 11
Filter
Add more filters










Publication year range
1.
IEEE Trans Pattern Anal Mach Intell ; 46(7): 4908-4925, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38306258

ABSTRACT

Point-based object localization (POL), which pursues high-performance object sensing under low-cost data annotation, has attracted increased attention. However, the point annotation mode inevitably introduces semantic variance due to the inconsistency of annotated points. Existing POL heavily rely on strict annotation rules, which are difficult to define and apply, to handle the problem. In this study, we propose coarse point refinement (CPR), which to our best knowledge is the first attempt to alleviate semantic variance from an algorithmic perspective. CPR reduces the semantic variance by selecting a semantic centre point in a neighbourhood region to replace the initial annotated point. Furthermore, We design a sampling region estimation module to dynamically compute a sampling region for each object and use a cascaded structure to achieve end-to-end optimization. We further integrate a variance regularization into the structure to concentrate the predicted scores, yielding CPR++. We observe that CPR++ can obtain scale information and further reduce the semantic variance in a global region, thus guaranteeing high-performance object localization. Extensive experiments on four challenging datasets validate the effectiveness of both CPR and CPR++. We hope our work can inspire more research on designing algorithms rather than annotation rules to address the semantic variance problem in POL.

2.
IEEE Trans Pattern Anal Mach Intell ; 45(8): 9454-9468, 2023 Aug.
Article in English | MEDLINE | ID: mdl-37022836

ABSTRACT

With convolution operations, Convolutional Neural Networks (CNNs) are good at extracting local features but experience difficulty to capture global representations. With cascaded self-attention modules, vision transformers can capture long-distance feature dependencies but unfortunately deteriorate local feature details. In this paper, we propose a hybrid network structure, termed Conformer, to take both advantages of convolution operations and self-attention mechanisms for enhanced representation learning. Conformer roots in feature coupling of CNN local features and transformer global representations under different resolutions in an interactive fashion. Conformer adopts a dual structure so that local details and global dependencies are retained to the maximum extent. We also propose a Conformer-based detector (ConformerDet), which learns to predict and refine object proposals, by performing region-level feature coupling in an augmented cross-attention fashion. Experiments on ImageNet and MS COCO datasets validate Conformer's superiority for visual recognition and object detection, demonstrating its potential to be a general backbone network.


Subject(s)
Algorithms , Learning , Neural Networks, Computer
3.
IEEE Trans Image Process ; 32: 29-42, 2023.
Article in English | MEDLINE | ID: mdl-36459604

ABSTRACT

Unsupervised person re-identification (re-ID) remains a challenging task. While extensive research has focused on the framework design and loss function, this paper shows that sampling strategy plays an equally important role. We analyze the reasons for the performance differences between various sampling strategies under the same framework and loss function. We suggest that deteriorated over-fitting is an important factor causing poor performance, and enhancing statistical stability can rectify this problem. Inspired by that, a simple yet effective approach is proposed, termed group sampling, which gathers samples from the same class into groups. The model is thereby trained using normalized group samples, which helps alleviate the negative impact of individual samples. Group sampling updates the pipeline of pseudo-label generation by guaranteeing that samples are more efficiently classified into the correct classes. It regulates the representation learning process, enhancing statistical stability for feature representation in a progressive fashion. Extensive experiments on Market-1501, DukeMTMC-reID and MSMT17 show that group sampling achieves performance comparable to state-of-the-art methods and outperforms the current techniques under purely camera-agnostic settings. Code has been available at https://github.com/ucas-vg/GroupSampling.

4.
IEEE Trans Neural Netw Learn Syst ; 33(1): 117-129, 2022 01.
Article in English | MEDLINE | ID: mdl-33119512

ABSTRACT

Visual commonsense knowledge has received growing attention in the reasoning of long-tailed visual relationships biased in terms of object and relation labels. Most current methods typically collect and utilize external knowledge for visual relationships by following the fixed reasoning path of {subject, object → predicate} to facilitate the recognition of infrequent relationships. However, the knowledge incorporation for such fixed multidependent path suffers from the data set biased and exponentially grown combinations of object and relation labels and ignores the semantic gap between commonsense knowledge and real scenes. To alleviate this, we propose configurable graph reasoning (CGR) to decompose the reasoning path of visual relationships and the incorporation of external knowledge, achieving configurable knowledge selection and personalized graph reasoning for each relation type in each image. Given a commonsense knowledge graph, CGR learns to match and retrieve knowledge for different subpaths and selectively compose the knowledge routed path. CGR adaptively configures the reasoning path based on the knowledge graph, bridges the semantic gap between the commonsense knowledge, and the real-world scenes and achieves better knowledge generalization. Extensive experiments show that CGR consistently outperforms previous state-of-the-art methods on several popular benchmarks and works well with different knowledge graphs. Detailed analyses demonstrated that CGR learned explainable and compelling configurations of reasoning paths.


Subject(s)
Algorithms , Neural Networks, Computer , Knowledge , Recognition, Psychology , Semantics
5.
IEEE Trans Image Process ; 30: 5096-5108, 2021.
Article in English | MEDLINE | ID: mdl-33999820

ABSTRACT

Conventional networks for object skeleton detection are usually hand-crafted. Despite the effectiveness, hand-crafted network architectures lack the theoretical basis and require intensive prior knowledge to implement representation complementarity for objects/parts in different granularity. In this paper, we propose an adaptive linear span network (AdaLSN), driven by neural architecture search (NAS), to automatically configure and integrate scale-aware features for object skeleton detection. AdaLSN is formulated with the theory of linear span, which provides one of the earliest explanations for multi-scale deep feature fusion. AdaLSN is materialized by defining a mixed unit-pyramid search space, which goes beyond many existing search spaces using unit-level or pyramid-level features. Within the mixed space, we apply genetic architecture search to jointly optimize unit-level operations and pyramid-level connections for adaptive feature space expansion. AdaLSN substantiates its versatility by achieving significantly higher accuracy and latency trade-off compared with the state-of-the-arts. It also demonstrates general applicability to image-to-mask tasks such as edge detection and road extraction. Code is available at https://github.com/sunsmarterjie/SDL-Skeletongithub.com/sunsmarterjie/SDL-Skeleton.

6.
IEEE Trans Image Process ; 30: 3142-3153, 2021.
Article in English | MEDLINE | ID: mdl-33596173

ABSTRACT

Few-shot semantic segmentation remains an open problem because limited support (training) images are insufficient to represent the diverse semantics within target categories. Conventional methods typically model a target category solely using information from the support image(s), resulting in incomplete semantic activation. In this paper, we propose a novel few-shot segmentation approach, termed harmonic feature activation (HFA), with the aim to implement dense support-to-query semantic transform by incorporating the features of both query and support images. HFA is formulated as a bilinear model, which takes charge of the pixel-wise dense correlation (bilinear feature activation) between query and support images in a systematic way. HFA incorporates a low-rank decomposition procedure, which speeds up bilinear feature activation with negligible performance cost. In addition, a semantic diffusion procedure is fused with HFA, which further improves the global harmony and local consistency of the feature activation. Extensive experiments on commonly used datasets (PASCAL VOC and MS COCO) show that HFA improves the state-of-the-arts with significant margins. Code is available at https://github.com/Bibikiller/HFA.

7.
IEEE Trans Neural Netw Learn Syst ; 32(5): 1881-1895, 2021 05.
Article in English | MEDLINE | ID: mdl-32481230

ABSTRACT

This article establishes a baseline for object reflection symmetry detection in natural images by releasing a new benchmark named Sym-PASCAL and proposing an end-to-end deep learning approach for reflection symmetry. Sym-PASCAL spans challenges of multiobjects, object diversity, part invisibility, and clustered backgrounds, which is far beyond those in existing data sets. The end-to-end deep learning approach, referred to as a side-output residual network (SRN), leverages the output residual units (RUs) to fit the errors between the symmetry ground truth and the side outputs of multiple stages of a trunk network. By cascading RUs from deep to shallow, SRN exploits the "flow" of errors along multiple stages to effectively matching object symmetry at different scales and suppress the clustered backgrounds. SRN is interpreted as a boosting-like algorithm, which assembles features using RUs during network forward and backward propagations. SRN is further upgraded to a multitask SRN (MT-SRN) for joint symmetry and edge detection, demonstrating its generality to image-to-mask learning tasks. Experimental results verify that the Sym-PASCAL benchmark is challenging related to real-world images, SRN achieves state-of-the-art performance, and MT-SRN has the capability to simultaneously predict edge and symmetry mask without loss of performance.

8.
IEEE Trans Pattern Anal Mach Intell ; 41(10): 2395-2409, 2019 Oct.
Article in English | MEDLINE | ID: mdl-30762529

ABSTRACT

Weakly supervised object detection is a challenging task when provided with image category supervision but required to learn, at the same time, object locations and object detectors. The inconsistency between the weak supervision and learning objectives introduces significant randomness to object locations and ambiguity to detectors. In this paper, a min-entropy latent model (MELM) is proposed for weakly supervised object detection. Min-entropy serves as a model to learn object locations and a metric to measure the randomness of object localization during learning. It aims to principally reduce the variance of learned instances and alleviate the ambiguity of detectors. MELM is decomposed into three components including proposal clique partition, object clique discovery, and object localization. MELM is optimized with a recurrent learning algorithm, which leverages continuation optimization to solve the challenging non-convexity problem. Experiments demonstrate that MELM significantly improves the performance of weakly supervised object detection, weakly supervised object localization, and image classification, against the state-of-the-art approaches.

9.
IEEE Trans Image Process ; 26(12): 5575-5589, 2017 Dec.
Article in English | MEDLINE | ID: mdl-28574358

ABSTRACT

Tracking multiple persons is a challenging task when persons move in groups and occlude each other. Existing group-based methods have extensively investigated how to make group division more accurately in a tracking-by-detection framework; however, few of them quantify the group dynamics from the perspective of targets' spatial topology or consider the group in a dynamic view. Inspired by the sociological properties of pedestrians, we propose a novel socio-topology model with a topology-energy function to factor the group dynamics of moving persons and groups. In this model, minimizing the topology-energy-variance in a two-level energy form is expected to produce smooth topology transitions, stable group tracking, and accurate target association. To search for the strong minimum in energy variation, we design the discrete group-tracklet jump moves embedded in the gradient descent method, which ensures that the moves reduce the energy variation of group and trajectory alternately in the varying topology dimension. Experimental results on both RGB and RGB-D data sets show the superiority of our proposed model for multiple person tracking in crowd scenes.

10.
IEEE Trans Image Process ; 26(7): 3221-3234, 2017 Jul.
Article in English | MEDLINE | ID: mdl-28422661

ABSTRACT

Scene images usually involve semantic correlations, particularly when considering large-scale image data sets. This paper proposes a novel generative image representation, correlated topic vector, to model such semantic correlations. Oriented from the correlated topic model, correlated topic vector intends to naturally utilize the correlations among topics, which are seldom considered in the conventional feature encoding, e.g., Fisher vector, but do exist in scene images. It is expected that the involvement of correlations can increase the discriminative capability of the learned generative model and consequently improve the recognition accuracy. Incorporated with the Fisher kernel method, correlated topic vector inherits the advantages of Fisher vector. The contributions to the topics of visual words have been further employed by incorporating the Fisher kernel framework to indicate the differences among scenes. Combined with the deep convolutional neural network (CNN) features and Gibbs sampling solution, correlated topic vector shows great potential when processing large-scale and complex scene image data sets. Experiments on two scene image data sets demonstrate that correlated topic vector improves significantly the deep CNN features, and outperforms existing Fisher kernel-based features.

11.
IEEE Trans Image Process ; 22(2): 778-89, 2013 Feb.
Article in English | MEDLINE | ID: mdl-23060336

ABSTRACT

Human detection in images is challenged by the view and posture variation problem. In this paper, we propose a piecewise linear support vector machine (PL-SVM) method to tackle this problem. The motivation is to exploit the piecewise discriminative function to construct a nonlinear classification boundary that can discriminate multiview and multiposture human bodies from the backgrounds in a high-dimensional feature space. A PL-SVM training is designed as an iterative procedure of feature space division and linear SVM training, aiming at the margin maximization of local linear SVMs. Each piecewise SVM model is responsible for a subspace, corresponding to a human cluster of a special view or posture. In the PL-SVM, a cascaded detector is proposed with block orientation features and a histogram of oriented gradient features. Extensive experiments show that compared with several recent SVM methods, our method reaches the state of the art in both detection accuracy and computational efficiency, and it performs best when dealing with low-resolution human regions in clutter backgrounds.


Subject(s)
Image Processing, Computer-Assisted/methods , Pattern Recognition, Automated/methods , Support Vector Machine , Activities of Daily Living , Animals , Databases, Factual , Humans , Video Recording
SELECTION OF CITATIONS
SEARCH DETAIL
...