Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Artigo em Inglês | MEDLINE | ID: mdl-38833389

RESUMO

Weakly supervised object localization (WSOL) stands as a pivotal endeavor within the realm of computer vision, entailing the location of objects utilizing merely image-level labels. Contemporary approaches in WSOL have leveraged FPMs, yielding commendable outcomes. However, these existing FPM-based techniques are predominantly confined to rudimentary strategies of either augmenting the foreground or diminishing the background presence. We argue for the exploration and exploitation of the intricate interplay between the object's foreground and its background to achieve efficient object localization. In this manuscript, we introduce an innovative framework, termed adaptive zone learning (AZL), which operates on a coarse-to-fine basis to refine FPMs through a triad of adaptive zone mechanisms. First, an adversarial learning mechanism (ALM) is employed, orchestrating an interplay between the foreground and background regions. This mechanism accentuates coarse-grained object regions in a mutually adversarial manner. Subsequently, an oriented learning mechanism (OLM) is unveiled, which harnesses local insights from both foreground and background in a fine-grained manner. This mechanism is instrumental in delineating object regions with greater granularity, thereby generating better FPMs. Furthermore, we propose a reinforced learning mechanism (RLM) as the compensatory mechanism for adversarial design, by which the undesirable foreground maps are refined again. Extensive experiments on CUB-200-2011 and ILSVRC datasets demonstrate that AZL achieves significant and consistent performance improvements over other state-of-the-art WSOL methods.

2.
IEEE Trans Image Process ; 33: 2895-2907, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38607701

RESUMO

Transformer-based instance-level recognition has attracted increasing research attention recently due to the superior performance. However, although attempts have been made to encode masks as embeddings into Transformer-based frameworks, how to combine mask embeddings and spatial information for a transformer-based approach is still not fully explored. In this paper, we revisit the design of mask-embedding-based pipelines and propose an Instance Segmentation TRansformer (ISTR) with Mask Meta-Embeddings (MME), leveraging the strengths of transformer models in encoding embedding information and incorporating spatial information from mask embeddings. ISTR incorporates a recurrent refining head that consists of a Dynamic Box Predictor (DBP), a Mask Information Generator (MIG), and a Mask Meta-Decoder (MMD). To improve the quality of mask embeddings, MME interprets the mask encoding-decoding processes as a mutual information maximization problem, which unifies the objective functions of different decoding schemes such as Principal Component Analysis (PCA) and Discrete Cosine Transform (DCT) with a meta-formulation. Under the meta-formulation, a learnable Spatial Mask Tuner (SMT) is further proposed, which fuses the spatial and embedding information produced from MIG and can significantly boost the segmentation performance. The resulting varieties, i.e., ISTR-PCA, ISTR-DCT, and ISTR-SMT, demonstrate the effectiveness and efficiency of incorporating mask embeddings with the query-based instance segmentation pipelines. On the COCO dataset, ISTR surpasses all predominant mask-embedding-based models by a large margin, and achieves competitive performance compared to concurrent state-of-the-art models. On the Cityscapes dataset, ISTR also outperforms several strong baselines. Our code has been made available at: https://github.com/hujiecpp/ISTR.

3.
Artigo em Inglês | MEDLINE | ID: mdl-37934637

RESUMO

Unsupervised domain adaptation (UDA) person reidentification (Re-ID) aims to identify pedestrian images within an unlabeled target domain with an auxiliary labeled source-domain dataset. Many existing works attempt to recover reliable identity information by considering multiple homogeneous networks. And take these generated labels to train the model in the target domain. However, these homogeneous networks identify people in approximate subspaces and equally exchange their knowledge with others or their mean net to improve their ability, inevitably limiting the scope of available knowledge and putting them into the same mistake. This article proposes a dual-level asymmetric mutual learning (DAML) method to learn discriminative representations from a broader knowledge scope with diverse embedding spaces. Specifically, two heterogeneous networks mutually learn knowledge from asymmetric subspaces through the pseudo label generation in a hard distillation manner. The knowledge transfer between two networks is based on an asymmetric mutual learning (AML) manner. The teacher network learns to identify both the target and source domain while adapting to the target domain distribution based on the knowledge of the student. Meanwhile, the student network is trained on the target dataset and employs the ground-truth label through the knowledge of the teacher. Extensive experiments in Market-1501, CUHK-SYSU, and MSMT17 public datasets verified the superiority of DAML over state-of-the-arts (SOTA).

4.
IEEE Trans Pattern Anal Mach Intell ; 45(9): 11108-11119, 2023 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-37023149

RESUMO

A resource-adaptive supernet adjusts its subnets for inference to fit the dynamically available resources. In this paper, we propose prioritized subnet sampling to train a resource-adaptive supernet, termed PSS-Net. We maintain multiple subnet pools, each of which stores the information of substantial subnets with similar resource consumption. Considering a resource constraint, subnets conditioned on this resource constraint are sampled from a pre-defined subnet structure space and high-quality ones will be inserted into the corresponding subnet pool. Then, the sampling will gradually be prone to sampling subnets from the subnet pools. Moreover, the one with a better performance metric is assigned with higher priority to train our PSS-Net, if sampling is from a subnet pool. At the end of training, our PSS-Net retains the best subnet in each pool to entitle a fast switch of high-quality subnets for inference when the available resources vary. Experiments on ImageNet using MobileNet-V1/V2 and ResNet-50 show that our PSS-Net can well outperform state-of-the-art resource-adaptive supernets. Our project is publicly available at https://github.com/chenbong/PSS-Net.

5.
IEEE Trans Neural Netw Learn Syst ; 34(11): 9139-9148, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-35294359

RESUMO

This article focuses on filter-level network pruning. A novel pruning method, termed CLR-RNF, is proposed. We first reveal a "long-tail" pruning problem in magnitude-based weight pruning methods and then propose a computation-aware measurement for individual weight importance, followed by a cross-layer ranking (CLR) of weights to identify and remove the bottom-ranked weights. Consequently, the per-layer sparsity makes up the pruned network structure in our filter pruning. Then, we introduce a recommendation-based filter selection scheme where each filter recommends a group of its closest filters. To pick the preserved filters from these recommended groups, we further devise a k -reciprocal nearest filter (RNF) selection scheme where the selected filters fall into the intersection of these recommended groups. Both our pruned network structure and the filter selection are nonlearning processes, which, thus, significantly reduces the pruning complexity and differentiates our method from existing works. We conduct image classification on CIFAR-10 and ImageNet to demonstrate the superiority of our CLR-RNF over the state-of-the-arts. For example, on CIFAR-10, CLR-RNF removes 74.1% FLOPs and 95.0% parameters from VGGNet-16 with even 0.3% accuracy improvements. On ImageNet, it removes 70.2% FLOPs and 64.8% parameters from ResNet-50 with only 1.7% top-five accuracy drops. Our project is available at https://github.com/lmbxmu/CLR-RNF.

6.
IEEE Trans Image Process ; 31: 3386-3398, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35471883

RESUMO

Despite the exciting performance, Transformer is criticized for its excessive parameters and computation cost. However, compressing Transformer remains as an open problem due to its internal complexity of the layer designs, i.e., Multi-Head Attention (MHA) and Feed-Forward Network (FFN). To address this issue, we introduce Group-wise Transformation towards a universal yet lightweight Transformer for vision-and-language tasks, termed as LW-Transformer. LW-Transformer applies Group-wise Transformation to reduce both the parameters and computations of Transformer, while also preserving its two main properties, i.e., the efficient attention modeling on diverse subspaces of MHA, and the expanding-scaling feature transformation of FFN. We apply LW-Transformer to a set of Transformer-based networks, and quantitatively measure them on three vision-and-language tasks and six benchmark datasets. Experimental results show that while saving a large number of parameters and computations, LW-Transformer achieves very competitive performance against the original Transformer networks for vision-and-language tasks. To examine the generalization ability, we apply LW-Transformer to the task of image classification, and build its network based on a recently proposed image Transformer called Swin-Transformer, where the effectiveness can be also confirmed.

7.
IEEE Trans Neural Netw Learn Syst ; 33(12): 7091-7100, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-34125685

RESUMO

We propose a novel network pruning approach by information preserving of pretrained network weights (filters). Network pruning with the information preserving is formulated as a matrix sketch problem, which is efficiently solved by the off-the-shelf frequent direction method. Our approach, referred to as FilterSketch, encodes the second-order information of pretrained weights, which enables the representation capacity of pruned networks to be recovered with a simple fine-tuning procedure. FilterSketch requires neither training from scratch nor data-driven iterative optimization, leading to a several-orders-of-magnitude reduction of time cost in the optimization of pruning. Experiments on CIFAR-10 show that FilterSketch reduces 63.3% of floating-point operations (FLOPs) and prunes 59.9% of network parameters with negligible accuracy cost for ResNet-110. On ILSVRC-2012, it reduces 45.5% of FLOPs and removes 43.0% of parameters with only 0.69% accuracy drop for ResNet-50. Our code and pruned models can be found at https://github.com/lmbxmu/FilterSketch.

8.
Biomed Res Int ; 2016: 9406259, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27847827

RESUMO

In this paper, we propose deep architecture to dynamically learn the most discriminative features from data for both single-cell and object tracking in computational biology and computer vision. Firstly, the discriminative features are automatically learned via a convolutional deep belief network (CDBN). Secondly, we design a simple yet effective method to transfer features learned from CDBNs on the source tasks for generic purpose to the object tracking tasks using only limited amount of training data. Finally, to alleviate the tracker drifting problem caused by model updating, we jointly consider three different types of positive samples. Extensive experiments validate the robustness and effectiveness of the proposed method.


Assuntos
Rastreamento de Células/métodos , Interpretação de Imagem Assistida por Computador/métodos , Aprendizado de Máquina , Microscopia/métodos , Redes Neurais de Computação , Animais , Simulação por Computador , Interpretação Estatística de Dados , Humanos , Aumento da Imagem/métodos , Modelos Estatísticos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
9.
Biomed Res Int ; 2016: 8182416, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27689090

RESUMO

Tracking individual-cell/object over time is important in understanding drug treatment effects on cancer cells and video surveillance. A fundamental problem of individual-cell/object tracking is to simultaneously address the cell/object appearance variations caused by intrinsic and extrinsic factors. In this paper, inspired by the architecture of deep learning, we propose a robust feature learning method for constructing discriminative appearance models without large-scale pretraining. Specifically, in the initial frames, an unsupervised method is firstly used to learn the abstract feature of a target by exploiting both classic principal component analysis (PCA) algorithms with recent deep learning representation architectures. We use learned PCA eigenvectors as filters and develop a novel algorithm to represent a target by composing of a PCA-based filter bank layer, a nonlinear layer, and a patch-based pooling layer, respectively. Then, based on the feature representation, a neural network with one hidden layer is trained in a supervised mode to construct a discriminative appearance model. Finally, to alleviate the tracker drifting problem, a sample update scheme is carefully designed to keep track of the most representative and diverse samples during tracking. We test the proposed tracking method on two standard individual cell/object tracking benchmarks to show our tracker's state-of-the-art performance.

10.
PLoS One ; 9(5): e85236, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24837851

RESUMO

With the blooming of online social media applications, Community Question Answering (CQA) services have become one of the most important online resources for information and knowledge seekers. A large number of high quality question and answer pairs have been accumulated, which allow users to not only share their knowledge with others, but also interact with each other. Accordingly, volumes of efforts have been taken to explore the questions and answers retrieval in CQA services so as to help users to finding the similar questions or the right answers. However, to our knowledge, less attention has been paid so far to question popularity in CQA. Question popularity can reflect the attention and interest of users. Hence, predicting question popularity can better capture the users' interest so as to improve the users' experience. Meanwhile, it can also promote the development of the community. In this paper, we investigate the problem of predicting question popularity in CQA. We first explore the factors that have impact on question popularity by employing statistical analysis. We then propose a supervised machine learning approach to model these factors for question popularity prediction. The experimental results show that our proposed approach can effectively distinguish the popular questions from unpopular ones in the Yahoo! Answers question and answer repository.


Assuntos
Inteligência Artificial/tendências , Participação da Comunidade/métodos , Disseminação de Informação/métodos , Mídias Sociais/estatística & dados numéricos , Mídias Sociais/tendências , Interface Usuário-Computador , Inteligência Artificial/estatística & dados numéricos , Participação da Comunidade/tendências , Humanos
11.
PLoS One ; 9(3): e71511, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24595052

RESUMO

With the blooming of Web 2.0, Community Question Answering (CQA) services such as Yahoo! Answers (http://answers.yahoo.com), WikiAnswer (http://wiki.answers.com), and Baidu Zhidao (http://zhidao.baidu.com), etc., have emerged as alternatives for knowledge and information acquisition. Over time, a large number of question and answer (Q&A) pairs with high quality devoted by human intelligence have been accumulated as a comprehensive knowledge base. Unlike the search engines, which return long lists of results, searching in the CQA services can obtain the correct answers to the question queries by automatically finding similar questions that have already been answered by other users. Hence, it greatly improves the efficiency of the online information retrieval. However, given a question query, finding the similar and well-answered questions is a non-trivial task. The main challenge is the word mismatch between question query (query) and candidate question for retrieval (question). To investigate this problem, in this study, we capture the word semantic similarity between query and question by introducing the topic modeling approach. We then propose an unsupervised machine-learning approach to finding similar questions on CQA Q&A archives. The experimental results show that our proposed approach significantly outperforms the state-of-the-art methods.


Assuntos
Arquivos , Armazenamento e Recuperação da Informação , Ferramenta de Busca , Algoritmos , Análise por Conglomerados , Bases de Dados como Assunto , Modelos Teóricos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...