Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
IEEE Trans Pattern Anal Mach Intell ; 46(7): 4850-4865, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38261483

RESUMO

Although stereo image restoration has been extensively studied, most existing work focuses on restoring stereo images with limited horizontal parallax due to the binocular symmetry constraint. Stereo images with unlimited parallax (e.g., large ranges and asymmetrical types) are more challenging in real-world applications and have rarely been explored so far. To restore high-quality stereo images with unlimited parallax, this paper proposes an attention-guided correspondence learning method, which learns both self- and cross-views feature correspondence guided by parallax and omnidirectional attention. To learn cross-view feature correspondence, a Selective Parallax Attention Module (SPAM) is proposed to interact with cross-view features under the guidance of parallax attention that adaptively selects receptive fields for different parallax ranges. Furthermore, to handle asymmetrical parallax, we propose a Non-local Omnidirectional Attention Module (NOAM) to learn the non-local correlation of both self- and cross-view contexts, which guides the aggregation of global contextual features. Finally, we propose an Attention-guided Correspondence Learning Restoration Network (ACLRNet) upon SPAMs and NOAMs to restore stereo images by associating the features of two views based on the learned correspondence. Extensive experiments on five benchmark datasets demonstrate the effectiveness and generalization of the proposed method on three stereo image restoration tasks including super-resolution, denoising, and compression artifact reduction.

2.
IEEE Trans Image Process ; 32: 4701-4715, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37549080

RESUMO

Existing low-light video enhancement methods are dominated by Convolution Neural Networks (CNNs) that are trained in a supervised manner. Due to the difficulty of collecting paired dynamic low/normal-light videos in real-world scenes, they are usually trained on synthetic, static, and uniform motion videos, which undermines their generalization to real-world scenes. Additionally, these methods typically suffer from temporal inconsistency (e.g., flickering artifacts and motion blurs) when handling large-scale motions since the local perception property of CNNs limits them to model long-range dependencies in both spatial and temporal domains. To address these problems, we propose the first unsupervised method for low-light video enhancement to our best knowledge, named LightenFormer, which models long-range intra- and inter-frame dependencies with a spatial-temporal co-attention transformer to enhance brightness while maintaining temporal consistency. Specifically, an effective but lightweight S-curve Estimation Network (SCENet) is first proposed to estimate pixel-wise S-shaped non-linear curves (S-curves) to adaptively adjust the dynamic range of an input video. Next, to model the temporal consistency of the video, we present a Spatial-Temporal Refinement Network (STRNet) to refine the enhanced video. The core module of STRNet is a novel Spatial-Temporal Co-attention Transformer (STCAT), which exploits multi-scale self- and cross-attention interactions to capture long-range correlations in both spatial and temporal domains among frames for implicit motion estimation. To achieve unsupervised training, we further propose two non-reference loss functions based on the invertibility of the S-curve and the noise independence among frames. Extensive experiments on the SDSD and LLIV-Phone datasets demonstrate that our LightenFormer outperforms state-of-the-art methods.

3.
Sensors (Basel) ; 23(9)2023 Apr 30.
Artigo em Inglês | MEDLINE | ID: mdl-37177628

RESUMO

Hybrid models which combine the convolution and transformer model achieve impressive performance on human pose estimation. However, the existing hybrid models on human pose estimation, which typically stack self-attention modules after convolution, are prone to mutual conflict. The mutual conflict enforces one type of module to dominate over these hybrid sequential models. Consequently, the performance of higher-precision keypoints localization is not consistent with overall performance. To alleviate this mutual conflict, we developed a hybrid parallel network by parallelizing the self-attention modules and the convolution modules, which conduce to leverage the complementary capabilities effectively. The parallel network ensures that the self-attention branch tends to model the long-range dependency to enhance the semantic representation, whereas the local sensitivity of the convolution branch contributes to high-precision localization simultaneously. To further mitigate the conflict, we proposed a cross-branches attention module to gate the features generated by both branches along the channel dimension. The hybrid parallel network achieves 75.6% and 75.4%AP on COCO validation and test-dev sets and achieves consistent performance on both higher-precision localization and overall performance. The experiments show that our hybrid parallel network is on par with the state-of-the-art human pose estimation models.


Assuntos
Fontes de Energia Elétrica , Semântica , Humanos
4.
Gene Expr Patterns ; 47: 119304, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36754104

RESUMO

Most of the existing works on fine-grained image categorization and retrieval focus on finding similar images from the same species and often give little importance to inter-species similarities. However, these similarities may carry species correlations such as the same ancestors or similar habits, which are helpful in taxonomy and understanding biological traits. In this paper, we devise a new fine-grained retrieval task that searches for similar instances from different species based on body parts. To this end, we propose a two-step strategy. In the first step, we search for visually similar parts to a query image using a deep convolutional neural network (CNN). To improve the quality of the retrieved candidates, structural cues are introduced into the CNN using a novel part-pooling layer, in which the receptive field of each part is adjusted automatically. In the second step, we re-rank the retrieved candidates to improve the species diversity. We achieve this by formulating a novel ranking function that balances between the similarity of the candidates to the queried parts, while decreasing the similarity to the query species. We provide experiments on the benchmark CUB200 dataset and Columbia Dogs dataset, and demonstrate clear benefits of our schemes.


Assuntos
Processamento de Imagem Assistida por Computador , Redes Neurais de Computação , Animais , Cães , Processamento de Imagem Assistida por Computador/métodos , Fenótipo
5.
IEEE Trans Image Process ; 30: 6130-6141, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34185644

RESUMO

In recent years, supervised hashing has been validated to greatly boost the performance of image retrieval. However, the label-hungry property requires massive label collection, making it intractable in practical scenarios. To liberate the model training procedure from laborious manual annotations, some unsupervised methods are proposed. However, the following two factors make unsupervised algorithms inferior to their supervised counterparts: (1) Without manually-defined labels, it is difficult to capture the semantic information across data, which is of crucial importance to guide robust binary code learning. (2) The widely adopted relaxation on binary constraints results in quantization error accumulation in the optimization procedure. To address the above-mentioned problems, in this paper, we propose a novel Unsupervised Discrete Hashing method (UDH). Specifically, to capture the semantic information, we propose a balanced graph-based semantic loss which explores the affinity priors in the original feature space. Then, we propose a novel self-supervised loss, termed orthogonal consistent loss, which can leverage semantic loss of instance and impose independence of codes. Moreover, by integrating the discrete optimization into the proposed unsupervised framework, the binary constraints are consistently preserved, alleviating the influence of quantization errors. Extensive experiments demonstrate that UDH outperforms state-of-the-art unsupervised methods for image retrieval.

6.
Artigo em Inglês | MEDLINE | ID: mdl-32191885

RESUMO

In recent years, hashing methods have been proved to be effective and efficient for large-scale Web media search. However, the existing general hashing methods have limited discriminative power for describing fine-grained objects that share similar overall appearance but have a subtle difference. To solve this problem, we for the first time introduce the attention mechanism to the learning of fine-grained hashing codes. Specifically, we propose a novel deep hashing model, named deep saliency hashing (DSaH), which automatically mines salient regions and learns semantic-preserving hashing codes simultaneously. DSaH is a two-step end-to-end model consisting of an attention network and a hashing network. Our loss function contains three basic components, including the semantic loss, the saliency loss, and the quantization loss. As the core of DSaH, the saliency loss guides the attention network to mine discriminative regions from pairs of images.We conduct extensive experiments on both fine-grained and general retrieval datasets for performance evaluation. Experimental results on fine-grained datasets, including Oxford Flowers, Stanford Dogs, and CUB Birds demonstrate that our DSaH performs the best for the fine-grained retrieval task and beats the strongest competitor (DTQ) by approximately 10% on both Stanford Dogs and CUB Birds. DSaH is also comparable to several state-of-the-art hashing methods on CIFAR-10 and NUS-WIDE.

7.
IEEE Trans Pattern Anal Mach Intell ; 41(5): 1116-1130, 2019 05.
Artigo em Inglês | MEDLINE | ID: mdl-29993908

RESUMO

Convolutional Neural Networks (CNNs) have been applied to visual tracking with demonstrated success in recent years. Most CNN-based trackers utilize hierarchical features extracted from a certain layer to represent the target. However, features from a certain layer are not always effective for distinguishing the target object from the backgrounds especially in the presence of complicated interfering factors (e.g., heavy occlusion, background clutter, illumination variation, and shape deformation). In this work, we propose a CNN-based tracking algorithm which hedges deep features from different CNN layers to better distinguish target objects and background clutters. Correlation filters are applied to feature maps of each CNN layer to construct a weak tracker, and all weak trackers are hedged into a strong one. For robust visual tracking, we propose a hedge method to adaptively determine weights of weak classifiers by considering both the difference between the historical as well as instantaneous performance, and the difference among all weak trackers over time. In addition, we design a Siamese network to define the loss of each weak tracker for the proposed hedge method. Extensive experiments on large benchmark datasets demonstrate the effectiveness of the proposed algorithm against the state-of-the-art tracking methods.


Assuntos
Processamento de Imagem Assistida por Computador/métodos , Redes Neurais de Computação , Algoritmos , Humanos , Gravação em Vídeo
8.
IEEE Trans Neural Netw Learn Syst ; 28(10): 2357-2370, 2017 10.
Artigo em Inglês | MEDLINE | ID: mdl-27448375

RESUMO

In this paper, we propose a biologically inspired appearance model for robust visual tracking. Motivated in part by the success of the hierarchical organization of the primary visual cortex (area V1), we establish an architecture consisting of five layers: whitening, rectification, normalization, coding, and pooling. The first three layers stem from the models developed for object recognition. In this paper, our attention focuses on the coding and pooling layers. In particular, we use a discriminative sparse coding method in the coding layer along with spatial pyramid representation in the pooling layer, which makes it easier to distinguish the target to be tracked from its background in the presence of appearance variations. An extensive experimental study shows that the proposed method has higher tracking accuracy than several state-of-the-art trackers.

9.
IEEE Trans Image Process ; 24(11): 3386-99, 2015 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-26099142

RESUMO

We present a novel approach to non-rigid objects contour tracking in this paper based on a supervised level set model (SLSM). In contrast to most existing trackers that use bounding box to specify the tracked target, the proposed method extracts the accurate contours of the target as tracking output, which achieves better description of the non-rigid objects while reduces background pollution to the target model. Moreover, conventional level set models only emphasize the regional intensity consistency and consider no priors. Differently, the curve evolution of the proposed SLSM is object-oriented and supervised by the specific knowledge of the targets we want to track. Therefore, the SLSM can ensure a more accurate convergence to the exact targets in tracking applications. In particular, we firstly construct the appearance model for the target in an online boosting manner due to its strong discriminative power between the object and the background. Then, the learnt target model is incorporated to model the probabilities of the level set contour by a Bayesian manner, leading the curve converge to the candidate region with maximum likelihood of being the target. Finally, the accurate target region qualifies the samples fed to the boosting procedure as well as the target model prepared for the next time step. We firstly describe the proposed mechanism of two-phase SLSM for single target tracking, then give its generalized multi-phase version for dealing with multi-target tracking cases. Positive decrease rate is used to adjust the learning pace over time, enabling tracking to continue under partial and total occlusion. Experimental results on a number of challenging sequences validate the effectiveness of the proposed method.

10.
Network ; 26(1): 1-24, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25387273

RESUMO

Latching dynamics retrieve pattern sequences successively by neural adaption and pattern correlation. We have previously proposed a modular latching chain model in Song et al. (2014) to better accommodate the structured transitions in the brain. Different cortical areas have different network structures. To explore how structural parameters like rewiring probability, threshold, noise and feedback connections affect the latching dynamics, two different connection schemes, K-nearest-neighbor network and modular network both having modular structure are considered. Latching chains are measured using two proposed measures characterizing length of intra-modular latching chains and sequential inter-modular association transitions. Our main findings include: (1) With decreasing threshold coefficient and rewiring probability, both the K-nearest-neighbor network and the modular network experience quantitatively similar phase change processes. (2) The modular network exhibits selectively enhanced latching in the small-world range of connectivity. (3) The K-nearest-neighbor network is more robust to changes in rewiring probability, while the modular network is more robust to the presence of noise pattern pairs and to changes in the strength of feedback connections. According to our findings, the relationships between latching chains in K-nearest-neighbor and modular networks and different forms of cognition and information processing emerging in the brain are discussed.


Assuntos
Mapeamento Encefálico , Encéfalo/fisiologia , Modelos Neurológicos , Rede Nervosa/fisiologia , Redes Neurais de Computação , Animais , Simulação por Computador , Humanos , Dinâmica não Linear , Probabilidade , Sinapses/fisiologia
11.
IEEE Trans Image Process ; 23(11): 4649-62, 2014 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-25029460

RESUMO

In this paper, we present a unified statistical framework for modeling both saccadic eye movements and visual saliency. By analyzing the statistical properties of human eye fixations on natural images, we found that human attention is sparsely distributed and usually deployed to locations with abundant structural information. This observations inspired us to model saccadic behavior and visual saliency based on super-Gaussian component (SGC) analysis. Our model sequentially obtains SGC using projection pursuit, and generates eye movements by selecting the location with maximum SGC response. Besides human saccadic behavior simulation, we also demonstrated our superior effectiveness and robustness over state-of-the-arts by carrying out dense experiments on synthetic patterns and human eye fixation benchmarks. Multiple key issues in saliency modeling research, such as individual differences, the effects of scale and blur, are explored in this paper. Based on extensive qualitative and quantitative experimental results, we show promising potentials of statistical approaches for human behavior research.


Assuntos
Atenção/fisiologia , Fixação Ocular/fisiologia , Modelos Biológicos , Modelos Estatísticos , Movimentos Sacádicos/fisiologia , Percepção Visual/fisiologia , Adulto , Simulação por Computador , Feminino , Humanos , Masculino , Adulto Jovem
12.
Cogn Neurodyn ; 8(1): 37-46, 2014 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-24465284

RESUMO

Many cognitive tasks involve transitions between distinct mental processes, which may range from discrete states to complex strategies. The ability of cortical networks to combine discrete jumps with continuous glides along ever changing trajectories, dubbed latching dynamics, may be essential for the emergence of the unique cognitive capacities of modern humans. Novel trajectories have to be followed in the multidimensional space of cortical activity for novel behaviours to be produced; yet, not everything changes: several lines of evidence point at recurring patterns in the sequence of activation of cortical areas in a variety of behaviours. To extend a mathematical model of latching dynamics beyond the simple unstructured auto-associative Potts network previously analysed, we introduce delayed structured connectivity and hetero-associative connection weights, and we explore their effects on the dynamics. A modular model in the small-world regime is considered, with modules arranged on a ring. The synaptic weights include a standard auto-associative component, stabilizing distinct patterns of activity, and a hetero-associative component, favoring transitions from one pattern, expressed in one module, to the next, in the next module. We then study, through simulations, how structural parameters, like those regulating rewiring probability, noise and feedback connections, determine sequential association dynamics.

13.
SenseCam 2013 ; 2013: 80-81, 2013 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-30596207

RESUMO

We present an eating activity detection method via automatic detecting dining plates from images acquired chronically by a wearable camera. Convex edge segments and their combinations within each input image are modeled with respect to probabilities of belonging to candidate ellipses. Then, a dining plate is determined according to a confidence score. Finally, the presence/absence of an eating event in an image sequence is determined by analyzing successive frames. Our experimental results verified the effectiveness of this method.

14.
IEEE Trans Image Process ; 21(4): 2282-93, 2012 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-22128004

RESUMO

A visual codebook serves as a fundamental component in many state-of-the-art computer vision systems. Most existing codebooks are built based on quantizing local feature descriptors extracted from training images. Subsequently, each image is represented as a high-dimensional bag-of-words histogram. Such highly redundant image description lacks efficiency in both storage and retrieval, in which only a few bins are nonzero and distributed sparsely. Furthermore, most existing codebooks are built based solely on the visual statistics of local descriptors, without considering the supervise labels coming from the subsequent recognition or classification tasks. In this paper, we propose a task-dependent codebook compression framework to handle the above two problems. First, we propose to learn a compression function to map an originally high-dimensional codebook into a compact codebook while maintaining its visual discriminability. This is achieved by a codeword sparse coding scheme with Lasso regression, which minimizes the descriptor distortions of training images after codebook compression. Second, we propose to adapt our codebook compression to the subsequent recognition or classification tasks. This is achieved by introducing a label constraint kernel (LCK) into our compression loss function. In particular, our LCK can model heterogeneous kinds of supervision, i.e., (partial) category labels, correlative semantic annotations, and image query logs. We validated our codebook compression in three computer vision tasks: 1) object recognition in PASCAL Visual Object Class 07; 2) near-duplicate image retrieval in UKBench; and 3) web image search in a collection of 0.5 million Flickr photographs. Our compressed codebook has shown superior performances over several state-of-the-art supervised and unsupervised codebooks.


Assuntos
Algoritmos , Compressão de Dados/métodos , Interpretação de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Sistemas de Informação em Radiologia , Técnica de Subtração , Inteligência Artificial , Aumento da Imagem/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...