Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Artigo em Inglês | MEDLINE | ID: mdl-37224360

RESUMO

Vision transformer (ViT) has become a leading tool in various computer vision tasks, owing to its unique self-attention mechanism that learns visual representations explicitly through cross-patch information interactions. Despite having good success, the literature seldom explores the explainability of ViT, and there is no clear picture of how the attention mechanism with respect to the correlation across comprehensive patches will impact the performance and what is the further potential. In this work, we propose a novel explainable visualization approach to analyze and interpret the crucial attention interactions among patches for ViT. Specifically, we first introduce a quantification indicator to measure the impact of patch interaction and verify such quantification on attention window design and indiscriminative patches removal. Then, we exploit the effective responsive field of each patch in ViT and devise a window-free transformer (WinfT) architecture accordingly. Extensive experiments on ImageNet demonstrate that the exquisitely designed quantitative method is shown able to facilitate ViT model learning, leading the top-1 accuracy by 4.28% at most. More remarkably, the results on downstream fine-grained recognition tasks further validate the generalization of our proposal.

2.
IEEE Trans Pattern Anal Mach Intell ; 45(4): 5158-5173, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-35917573

RESUMO

Variation of scales or aspect ratios has been one of the main challenges for tracking. To overcome this challenge, most existing methods adopt either multi-scale search or anchor-based schemes, which use a predefined search space in a handcrafted way and therefore limit their performance in complicated scenes. To address this problem, recent anchor-free based trackers have been proposed without using prior scale or anchor information. However, an inconsistency problem between classification and regression degrades the tracking performance. To address the above issues, we propose a simple yet effective tracker (named Siamese Box Adaptive Network, SiamBAN) to learn a target-aware scale handling schema in a data-driven manner. Our basic idea is to predict the target boxes in a per-pixel fashion through a fully convolutional network, which is anchor-free. Specifically, SiamBAN divides the tracking problem into classification and regression tasks, which directly predict objectiveness and regress bounding boxes, respectively. A no-prior box design is proposed to avoid tuning hyper-parameters related to candidate boxes, which makes SiamBAN more flexible. SiamBAN further uses a target-aware branch to address the inconsistency problem. Experiments on benchmarks including VOT2018, VOT2019, OTB100, UAV123, LaSOT and TrackingNet show that SiamBAN achieves promising performance and runs at 35 FPS.

3.
Artigo em Inglês | MEDLINE | ID: mdl-36383579

RESUMO

Recently, self-supervised video object segmentation (VOS) has attracted much interest. However, most proxy tasks are proposed to train only a single backbone, which relies on a point-to-point correspondence strategy to propagate masks through a video sequence. Due to its simple pipeline, the performance of the single backbone paradigm is still unsatisfactory. Instead of following the previous literature, we propose our self-supervised progressive network (SSPNet) which consists of a memory retrieval module (MRM) and collaborative refinement module (CRM). The MRM can perform point-to-point correspondence and produce a propagated coarse mask for a query frame through self-supervised pixel-level and frame-level similarity learning. The CRM, which is trained via cycle consistency region tracking, aggregates the reference & query information and learns the collaborative relationship among them implicitly to refine the coarse mask. Furthermore, to learn semantic knowledge from unlabeled data, we also design two novel mask-generation strategies to provide the training data with meaningful semantic information for the CRM. Extensive experiments conducted on DAVIS-17, YouTube-VOS and SegTrack v2 demonstrate that our method surpasses the state-of-the-art self-supervised methods and narrows the gap with the fully supervised methods.

4.
IEEE Trans Cybern ; 52(9): 9090-9100, 2022 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-33635812

RESUMO

Subspace clustering is a popular method to discover underlying low-dimensional structures of high-dimensional multimedia data (e.g., images, videos, and texts). In this article, we consider a large-scale subspace clustering (LS2C) problem, that is, partitioning million data points with a millon dimensions. To address this, we explore an independent distributed and parallel framework by dividing big data/variable matrices and regularization by both columns and rows. Specifically, LS2C is independently decomposed into many subproblems by distributing those matrices into different machines by columns since the regularization of the code matrix is equal to a sum of that of its submatrices (e.g., square-of-Frobenius/ l1 -norm). Consensus optimization is designed to solve these subproblems in a parallel way for saving communication costs. Moreover, we provide theoretical guarantees that LS2C can recover consensus subspace representations of high-dimensional data points under broad conditions. Compared with the state-of-the-art LS2C methods, our approach achieves better clustering results in public datasets, including a million images and videos.

5.
IEEE Trans Neural Netw Learn Syst ; 33(4): 1467-1481, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-33347415

RESUMO

Consider the lifelong machine learning paradigm whose objective is to learn a sequence of tasks depending on previous experiences, e.g., knowledge library or deep network weights. However, the knowledge libraries or deep networks for most recent lifelong learning models are of prescribed size and can degenerate the performance for both learned tasks and coming ones when facing with a new task environment (cluster). To address this challenge, we propose a novel incremental clustered lifelong learning framework with two knowledge libraries: feature learning library and model knowledge library, called Flexible Clustered Lifelong Learning (FCL3). Specifically, the feature learning library modeled by an autoencoder architecture maintains a set of representation common across all the observed tasks, and the model knowledge library can be self-selected by identifying and adding new representative models (clusters). When a new task arrives, our FCL3 model firstly transfers knowledge from these libraries to encode the new task, i.e., effectively and selectively soft-assigning this new task to multiple representative models over feature learning library. Then: 1) the new task with a higher outlier probability will be judged as a new representative, and used to redefine both feature learning library and representative models over time; or 2) the new task with lower outlier probability will only refine the feature learning library. For model optimization, we cast this lifelong learning problem as an alternating direction minimization problem as a new task comes. Finally, we evaluate the proposed framework by analyzing several multitask data sets, and the experimental results demonstrate that our FCL3 model can achieve better performance than most lifelong learning frameworks, even batch clustered multitask learning models.


Assuntos
Educação Continuada , Redes Neurais de Computação
6.
IEEE Trans Neural Netw Learn Syst ; 33(11): 6627-6639, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-34057899

RESUMO

Recent years have witnessed significant progress of person reidentification (reID) driven by expert-designed deep neural network architectures. Despite the remarkable success, such architectures often suffer from high model complexity and time-consuming pretraining process, as well as the mismatches between the image classification-driven backbones and the reID task. To address these issues, we introduce neural architecture search (NAS) into automatically designing person reID backbones, i.e., reID-NAS, which is achieved via automatically searching attention-based network architectures from scratch. Different from traditional NAS approaches that originated for image classification, we design a reID-based search space as well as a search objective to fit NAS for the reID tasks. In terms of the search space, reID-NAS includes a lightweight attention module to precisely locate arbitrary pedestrian bounding boxes, which is automatically added as attention to the reID architectures. In terms of the search objective, reID-NAS introduces a new retrieval objective to search and train reID architectures from scratch. Finally, we propose a hybrid optimization strategy to improve the search stability in reID-NAS. In our experiments, we validate the effectiveness of different parts in reID-NAS, and show that the architecture searched by reID-NAS achieves a new state of the art, with one order of magnitude fewer parameters on three-person reID datasets. As a concomitant benefit, the reliance on the pretraining process is vastly reduced by reID-NAS, which facilitates one to directly search and train a lightweight reID model from scratch.


Assuntos
Identificação Biométrica , Pedestres , Humanos , Identificação Biométrica/métodos , Redes Neurais de Computação , Processamento de Imagem Assistida por Computador/métodos
7.
IEEE Trans Pattern Anal Mach Intell ; 43(7): 2480-2495, 2021 07.
Artigo em Inglês | MEDLINE | ID: mdl-31985406

RESUMO

Recently, deep convolutional neural network (CNN) has achieved great success for image restoration (IR) and provided hierarchical features at the same time. However, most deep CNN based IR models do not make full use of the hierarchical features from the original low-quality images; thereby, resulting in relatively-low performance. In this work, we propose a novel and efficient residual dense network (RDN) to address this problem in IR, by making a better tradeoff between efficiency and effectiveness in exploiting the hierarchical features from all the convolutional layers. Specifically, we propose residual dense block (RDB) to extract abundant local features via densely connected convolutional layers. RDB further allows direct connections from the state of preceding RDB to all the layers of current RDB, leading to a contiguous memory mechanism. To adaptively learn more effective features from preceding and current local features and stabilize the training of wider network, we proposed local feature fusion in RDB. After fully obtaining dense local features, we use global feature fusion to jointly and adaptively learn global hierarchical features in a holistic way. We demonstrate the effectiveness of RDN with several representative IR applications, single image super-resolution, Gaussian image denoising, image compression artifact reduction, and image deblurring. Experiments on benchmark and real-world datasets show that our RDN achieves favorable performance against state-of-the-art methods for each IR task quantitatively and visually.

8.
Comb Chem High Throughput Screen ; 22(10): 694-704, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31793417

RESUMO

AIMS AND OBJECTIVE: Cancer is one of the deadliest diseases, taking the lives of millions every year. Traditional methods of treating cancer are expensive and toxic to normal cells. Fortunately, anti-cancer peptides (ACPs) can eliminate this side effect. However, the identification and development of new anti-cancer peptides through experiments take a lot of time and money, therefore, it is necessary to develop a fast and accurate calculation model to identify the anti-cancer peptide. Machine learning algorithms are a good choice. MATERIALS AND METHODS: In our study, a multi-classifier system was used, combined with multiple machine learning models, to predict anti-cancer peptides. These individual learners are composed of different feature information and algorithms, and form a multi-classifier system by voting. RESULTS AND CONCLUSION: The experiments show that the overall prediction rate of each individual learner is above 80% and the overall accuracy of multi-classifier system for anti-cancer peptides prediction can reach 95.93%, which is better than the existing prediction model.


Assuntos
Antineoplásicos/química , Aprendizado de Máquina , Peptídeos/química
9.
Sensors (Basel) ; 19(5)2019 Feb 27.
Artigo em Inglês | MEDLINE | ID: mdl-30818796

RESUMO

Although widely used in many applications, accurate and efficient human action recognition remains a challenging area of research in the field of computer vision. Most recent surveys have focused on narrow problems such as human action recognition methods using depth data, 3D-skeleton data, still image data, spatiotemporal interest point-based methods, and human walking motion recognition. However, there has been no systematic survey of human action recognition. To this end, we present a thorough review of human action recognition methods and provide a comprehensive overview of recent approaches in human action recognition research, including progress in hand-designed action features in RGB and depth data, current deep learning-based action feature representation methods, advances in human⁻object interaction recognition methods, and the current prominent research topic of action detection methods. Finally, we present several analysis recommendations for researchers. This survey paper provides an essential reference for those interested in further research on human action recognition.


Assuntos
Reconhecimento Automatizado de Padrão/métodos , Visão Ocular/fisiologia , Percepção Visual/fisiologia , Algoritmos , Atividades Humanas , Humanos , Movimento (Física) , Esqueleto/fisiologia , Inquéritos e Questionários
10.
Artigo em Inglês | MEDLINE | ID: mdl-30530365

RESUMO

A class-agnostic tracker typically consists of three key components, i.e., its motion model, its target appearance model, and its updating strategy. However, most recent topperforming trackers mainly focus on constructing complicated appearance models and updating strategies, while using comparatively simple and heuristic motion models that may result in an inefficient search and degrade the tracking performance. To address this issue, we propose a hierarchical tracker that learns to move and track based on the combination of data-driven search at the coarse level, and coarse-to-fine verification at the fine level. At the coarse level, a data-driven motion model learned from deep recurrent reinforcement learning provides our tracker with coarse localization of an object. By formulating motion search as an action-decision problem in reinforcement learning, our tracker utilizes a recurrent convolutional neural network based deep Q-network to effectively learn data-driven searching policies. The learned motion model cannot only significantly reduce the search space, but also provide more reliable interested regions for further verifying. At the fine level, a kernelized correlation filter (KCF) based appearance model is adopted to densely yet efficiently verify a local region centered on the predicted location from the motion model. Through using of circulant matrices and fast Fourier transformation, a large number of candidate samples in the local region can be efficiently and effectively evaluated by the KCF based appearance model. Finally, a simple yet robust estimator is designed to analyze possible tracking failure. The experiments on OTB50 and OTB100 illustrate that our tracker achieves better performance than the state-of-the-art trackers.

11.
Biomed Res Int ; 2016: 9406259, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27847827

RESUMO

In this paper, we propose deep architecture to dynamically learn the most discriminative features from data for both single-cell and object tracking in computational biology and computer vision. Firstly, the discriminative features are automatically learned via a convolutional deep belief network (CDBN). Secondly, we design a simple yet effective method to transfer features learned from CDBNs on the source tasks for generic purpose to the object tracking tasks using only limited amount of training data. Finally, to alleviate the tracker drifting problem caused by model updating, we jointly consider three different types of positive samples. Extensive experiments validate the robustness and effectiveness of the proposed method.


Assuntos
Rastreamento de Células/métodos , Interpretação de Imagem Assistida por Computador/métodos , Aprendizado de Máquina , Microscopia/métodos , Redes Neurais de Computação , Animais , Simulação por Computador , Interpretação Estatística de Dados , Humanos , Aumento da Imagem/métodos , Modelos Estatísticos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
12.
Biomed Res Int ; 2016: 8182416, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27689090

RESUMO

Tracking individual-cell/object over time is important in understanding drug treatment effects on cancer cells and video surveillance. A fundamental problem of individual-cell/object tracking is to simultaneously address the cell/object appearance variations caused by intrinsic and extrinsic factors. In this paper, inspired by the architecture of deep learning, we propose a robust feature learning method for constructing discriminative appearance models without large-scale pretraining. Specifically, in the initial frames, an unsupervised method is firstly used to learn the abstract feature of a target by exploiting both classic principal component analysis (PCA) algorithms with recent deep learning representation architectures. We use learned PCA eigenvectors as filters and develop a novel algorithm to represent a target by composing of a PCA-based filter bank layer, a nonlinear layer, and a patch-based pooling layer, respectively. Then, based on the feature representation, a neural network with one hidden layer is trained in a supervised mode to construct a discriminative appearance model. Finally, to alleviate the tracker drifting problem, a sample update scheme is carefully designed to keep track of the most representative and diverse samples during tracking. We test the proposed tracking method on two standard individual cell/object tracking benchmarks to show our tracker's state-of-the-art performance.

13.
Springerplus ; 5(1): 1226, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27536510

RESUMO

The majority of methods for recognizing human actions are based on single-view video or multi-camera data. In this paper, we propose a novel multi-surface video analysis strategy. The video can be expressed as three-surface motion feature (3SMF) and spatio-temporal interest feature. 3SMF is extracted from the motion history image in three different video surfaces: horizontal-vertical, horizontal- and vertical-time surface. In contrast to several previous studies, the prior probability is estimated by 3SMF rather than using a uniform distribution. Finally, we model the relationship score between each video and action as a probability inference to bridge the feature descriptors and action categories. We demonstrate our methods by comparing them to several state-of-the-arts action recognition benchmarks.

14.
PLoS One ; 11(8): e0161808, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27575684

RESUMO

To achieve effective visual tracking, a robust feature representation composed of two separate components (i.e., feature learning and selection) for an object is one of the key issues. Typically, a common assumption used in visual tracking is that the raw video sequences are clear, while real-world data is with significant noise and irrelevant patterns. Consequently, the learned features may be not all relevant and noisy. To address this problem, we propose a novel visual tracking method via a point-wise gated convolutional deep network (CPGDN) that jointly performs the feature learning and feature selection in a unified framework. The proposed method performs dynamic feature selection on raw features through a gating mechanism. Therefore, the proposed method can adaptively focus on the task-relevant patterns (i.e., a target object), while ignoring the task-irrelevant patterns (i.e., the surrounding background of a target object). Specifically, inspired by transfer learning, we firstly pre-train an object appearance model offline to learn generic image features and then transfer rich feature hierarchies from an offline pre-trained CPGDN into online tracking. In online tracking, the pre-trained CPGDN model is fine-tuned to adapt to the tracking specific objects. Finally, to alleviate the tracker drifting problem, inspired by an observation that a visual target should be an object rather than not, we combine an edge box-based object proposal method to further improve the tracking accuracy. Extensive evaluation on the widely used CVPR2013 tracking benchmark validates the robustness and effectiveness of the proposed method.


Assuntos
Processamento de Imagem Assistida por Computador/métodos , Aprendizado de Máquina , Modelos Teóricos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...