Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
IEEE Trans Image Process ; 32: 2481-2492, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37083510

RESUMO

Despite the recent success achieved by deep neural networks (DNNs), it remains challenging to disclose/explain the decision-making process from the numerous parameters and complex non-linear functions. To address the problem, explainable AI (XAI) aims to provide explanations corresponding to the learning and prediction processes for deep learning models. In this paper, we propose a novel representation learning framework of Describe, Spot and eXplain (DSX). Based on the architecture of Transformer, our proposed DSX framework is composed of two learning stages, descriptive prototype learning and discriminative prototype discovery. Given an input image, the former stage is designed to derive a set of descriptive representations, while the latter stage further identifies a discriminative subset, offering semantic interpretability for the corresponding classification tasks. While our DSX does not require any ground truth attribute supervision during training, the derived visual representations can be practically associated with physical attributes provided by domain experts. Extensive experiments on fine-grained classification and person re-identification tasks qualitatively and quantitatively verify the use our DSX model for offering semantically practical interpretability with satisfactory recognition performances.

2.
IEEE Trans Pattern Anal Mach Intell ; 44(8): 4212-4224, 2022 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-33591911

RESUMO

Point clouds are among the popular geometry representations in 3D vision. However, unlike 2D images with pixel-wise layouts, such representations containing unordered data points which make the processing and understanding the associated semantic information quite challenging. Although a number of previous works attempt to analyze point clouds and achieve promising performances, their performances would degrade significantly when data variations like shift and scale changes are presented. In this paper, we propose 3D graph convolution networks (3D-GCN), which uniquely learns 3D kernels with graph max-pooling mechanisms for extracting geometric features from point cloud data across different scales. We show that, with the proposed 3D-GCN, satisfactory shift and scale invariance can be jointly achieved. We show that 3D-GCN can be applied to point cloud classification and segmentation tasks, with ablation studies and visualizations verifying the design of 3D-GCN.

3.
IEEE Trans Image Process ; 30: 9245-9258, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34739379

RESUMO

Few-shot learning (FSL) refers to the learning task that generalizes from base to novel concepts with only few examples observed during training. One intuitive FSL approach is to hallucinate additional training samples for novel categories. While this is typically done by learning from a disjoint set of base categories with sufficient amount of training data, most existing works did not fully exploit the intra-class information from base categories, and thus there is no guarantee that the hallucinated data would represent the class of interest accordingly. In this paper, we propose Feature Disentanglement and Hallucination Network (FDH-Net), which jointly performs feature disentanglement and hallucination for FSL purposes. More specifically, our FDH-Net is able to disentangle input visual data into class-specific and appearance-specific features. With both data recovery and classification constraints, hallucination of image features for novel categories using appearance information extracted from base categories can be achieved. We perform extensive experiments on two fine-grained datasets (CUB and FLO) and two coarse-grained ones (mini-ImageNet and CIFAR-100). The results confirm that our framework performs favorably against state-of-the-art metric-learning and hallucination-based FSL models.

4.
Artigo em Inglês | MEDLINE | ID: mdl-31751274

RESUMO

Learning interpretable data representation has been an active research topic in deep learning and computer vision. While representation disentanglement is an effective technique for addressing this task, existing works cannot easily handle the problems in which manipulating and recognizing data across multiple domains are desirable. In this paper, we present a unified network architecture of Multi-domain and Multi-modal Representation Disentangler (M2RD), with the goal of learning domain-invariant content representation with the associated domain-specific representation observed. By advancing adversarial learning and disentanglement techniques, the proposed model is able to perform continuous image manipulation across data domains with multiple modalities. More importantly, the resulting domain-invariant feature representation can be applied for unsupervised domain adaptation. Finally, our quantitative and qualitative results would confirm the effectiveness and robustness of the proposed model over state-of-the-art methods on the above tasks.

5.
Artigo em Inglês | MEDLINE | ID: mdl-31056497

RESUMO

Heterogeneous domain adaptation (HDA) addresses the task of associating data not only across dissimilar domains but also described by different types of features. Inspired by the recent advances of neural networks and deep learning, we propose a deep leaning model of Transfer Neural Trees (TNT), which jointly solves cross-domain feature mapping, adaptation, and classification in a unified architecture. As the prediction layer in TNT, we introduce Transfer Neural Decision Forest (Transfer- NDF), which is able to learn the neurons in TNT for adaptation by stochastic pruning. In order to handle semi-supervised HDA, a unique embedding loss term is introduced to TNT for preserving prediction and structural consistency between labeled and unlabeled target-domain data. We further show that our TNT can be extended to zero shot learning for associating image and attribute data with promising performance. Finally, experiments on different classification tasks across features, datasets, and modalities would verify the effectiveness of our TNT.

6.
IEEE Trans Cybern ; 48(1): 371-384, 2018 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-28129196

RESUMO

Compared to the color images, their associated depth images captured by the RGB-D sensors are typically with lower resolution. The task of depth map super-resolution (SR) aims at increasing the resolution of the range data by utilizing the high-resolution (HR) color image, while the details of the depth information are to be properly preserved. In this paper, we present a joint trilateral filtering (JTF) algorithm for depth image SR. The proposed JTF first observes context information from the HR color image. In addition to the extracted spatial and range information of local pixels, our JTF further integrates local gradient information of the depth image, which allows the prediction and refinement of HR depth image outputs without artifacts like textural copies or edge discontinuities. Quantitative and qualitative experimental results demonstrate the effectiveness and robustness of our approach over prior depth map upsampling works.

7.
IEEE Trans Image Process ; 25(12): 5552-5562, 2016 12.
Artigo em Inglês | MEDLINE | ID: mdl-27654485

RESUMO

Unsupervised domain adaptation deals with scenarios in which labeled data are available in the source domain, but only unlabeled data can be observed in the target domain. Since the classifiers trained by source-domain data would not be expected to generalize well in the target domain, how to transfer the label information from source to target-domain data is a challenging task. A common technique for unsupervised domain adaptation is to match cross-domain data distributions, so that the domain and distribution differences can be suppressed. In this paper, we propose to utilize the label information inferred from the source domain, while the structural information of the unlabeled target-domain data will be jointly exploited for adaptation purposes. Our proposed model not only reduces the distribution mismatch between domains, improved recognition of target-domain data can be achieved simultaneously. In the experiments, we will show that our approach performs favorably against the state-of-the-art unsupervised domain adaptation methods on benchmark data sets. We will also provide convergence, sensitivity, and robustness analysis, which support the use of our model for cross-domain classification.

8.
IEEE Trans Image Process ; 24(6): 1722-34, 2015 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-25769163

RESUMO

In this paper, we address the problem of robust face recognition with undersampled training data. Given only one or few training images available per subject, we present a novel recognition approach, which not only handles test images with large intraclass variations such as illumination and expression. The proposed method is also to handle the corrupted ones due to occlusion or disguise, which is not present during training. This is achieved by the learning of a robust auxiliary dictionary from the subjects not of interest. Together with the undersampled training data, both intra and interclass variations can thus be successfully handled, while the unseen occlusions can be automatically disregarded for improved recognition. Our experiments on four face image datasets confirm the effectiveness and robustness of our approach, which is shown to outperform state-of-the-art sparse representation-based methods.

9.
IEEE Trans Image Process ; 24(4): 1330-40, 2015 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-25700448

RESUMO

Given a query image containing the object of interest (OOI), we propose a novel learning framework for retrieving relevant frames from the input video sequence. While techniques based on object matching have been applied to solve this task, their performance would be typically limited due to the lack of capabilities in handling variations in visual appearances of the OOI across video frames. Our proposed framework can be viewed as a weakly supervised approach, which only requires a small number of (randomly selected) relevant and irrelevant frames from the input video for performing satisfactory retrieval performance. By utilizing frame-level label information of such video frames together with the query image, we propose a novel query-adaptive multiple instance learning algorithm, which exploits the visual appearance information of the OOI from the query and that of the aforementioned video frames. As a result, the derived learning model would exhibit additional discriminating abilities while retrieving relevant instances. Experiments on two real-world video data sets would confirm the effectiveness and robustness of our proposed approach.

10.
IEEE Trans Image Process ; 23(8): 3294-307, 2014 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-24951689

RESUMO

For the task of robust face recognition, we particularly focus on the scenario in which training and test image data are corrupted due to occlusion or disguise. Prior standard face recognition methods like Eigenfaces or state-of-the-art approaches such as sparse representation-based classification did not consider possible contamination of data during training, and thus their recognition performance on corrupted test data would be degraded. In this paper, we propose a novel face recognition algorithm based on low-rank matrix decomposition to address the aforementioned problem. Besides the capability of decomposing raw training data into a set of representative bases for better modeling the face images, we introduce a constraint of structural incoherence into the proposed algorithm, which enforces the bases learned for different classes to be as independent as possible. As a result, additional discriminating ability is added to the derived base matrices for improved recognition performance. Experimental results on different face databases with a variety of variations verify the effectiveness and robustness of our proposed method.


Assuntos
Algoritmos , Biometria/métodos , Face/anatomia & histologia , Interpretação de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Técnica de Subtração , Inteligência Artificial , Humanos , Aumento da Imagem/métodos , Armazenamento e Recuperação da Informação/métodos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
11.
IEEE Trans Image Process ; 23(5): 2009-18, 2014 May.
Artigo em Inglês | MEDLINE | ID: mdl-24710401

RESUMO

We present a novel domain adaptation approach for solving cross-domain pattern recognition problems, i.e., the data or features to be processed and recognized are collected from different domains of interest. Inspired by canonical correlation analysis (CCA), we utilize the derived correlation subspace as a joint representation for associating data across different domains, and we advance reduced kernel techniques for kernel CCA (KCCA) if nonlinear correlation subspace are desirable. Such techniques not only makes KCCA computationally more efficient, potential over-fitting problems can be alleviated as well. Instead of directly performing recognition in the derived CCA subspace (as prior CCA-based domain adaptation methods did), we advocate the exploitation of domain transfer ability in this subspace, in which each dimension has a unique capability in associating cross-domain data. In particular, we propose a novel support vector machine (SVM) with a correlation regularizer, named correlation-transfer SVM, which incorporates the domain adaptation ability into classifier design for cross-domain recognition. We show that our proposed domain adaptation and classification approach can be successfully applied to a variety of cross-domain recognition tasks such as cross-view action recognition, handwritten digit recognition with different features, and image-to-text or text-to-image classification. From our empirical results, we verify that our proposed method outperforms state-of-the-art domain adaptation approaches in terms of recognition performance.

12.
IEEE Trans Med Imaging ; 32(12): 2262-73, 2013 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-24001985

RESUMO

Computer-aided diagnosis (CAD) systems in gray-scale breast ultrasound images have the potential to reduce unnecessary biopsy of breast masses. The purpose of our study is to develop a robust CAD system based on the texture analysis. First, gray-scale invariant features are extracted from ultrasound images via multi-resolution ranklet transform. Thus, one can apply linear support vector machines (SVMs) on the resulting gray-level co-occurrence matrix (GLCM)-based texture features for discriminating the benign and malignant masses. To verify the effectiveness and robustness of the proposed texture analysis, breast ultrasound images obtained from three different platforms are evaluated based on cross-platform training/testing and leave-one-out cross-validation (LOO-CV) schemes. We compare our proposed features with those extracted by wavelet transform in terms of receiver operating characteristic (ROC) analysis. The AUC values derived from the area under the curve for the three databases via ranklet transform are 0.918 (95% confidence interval [CI], 0.848 to 0.961), 0.943 (95% CI, 0.906 to 0.968), and 0.934 (95% CI, 0.883 to 0.961), respectively, while those via wavelet transform are 0.847 (95% CI, 0.762 to 0.910), 0.922 (95% CI, 0.878 to 0.958), and 0.867 (95% CI, 0.798 to 0.914), respectively. Experiments with cross-platform training/testing scheme between each database reveal that the diagnostic performance of our texture analysis using ranklet transform is less sensitive to the sonographic ultrasound platforms. Also, we adopt several co-occurrence statistics in terms of quantization levels and orientations (i.e., descriptor settings) for computing the co-occurrence matrices with 0.632+ bootstrap estimators to verify the use of the proposed texture analysis. These experiments suggest that the texture analysis using multi-resolution gray-scale invariant features via ranklet transform is useful for designing a robust CAD system.

13.
IEEE Trans Image Process ; 22(7): 2600-10, 2013 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-23529093

RESUMO

This paper presents a saliency-based video object extraction (VOE) framework. The proposed framework aims to automatically extract foreground objects of interest without any user interaction or the use of any training data (i.e., not limited to any particular type of object). To separate foreground and background regions within and across video frames, the proposed method utilizes visual and motion saliency information extracted from the input video. A conditional random field is applied to effectively combine the saliency induced features, which allows us to deal with unknown pose and scale variations of the foreground object (and its articulated parts). Based on the ability to preserve both spatial continuity and temporal consistency in the proposed VOE framework, experiments on a variety of videos verify that our method is able to produce quantitatively and qualitatively satisfactory VOE results.

14.
Neural Netw ; 21(2-3): 502-10, 2008.
Artigo em Inglês | MEDLINE | ID: mdl-18187285

RESUMO

We propose a new hierarchical design method, weighted support vector (WSV) k-means clustering, to design a binary hierarchical classification structure. This method automatically selects the classes to be separated at each node in the hierarchy, and allows visualization of clusters of high-dimensional support vector data; no prior hierarchical designs address this. At each node in the hierarchy, we use an SVRDM (support vector representation and discrimination machine) classifier, which offers generalization and good rejection of unseen false objects (rejection is not achieved with the standard SVMs). We give the basis and new insight into why a Gaussian kernel provides good rejection. Recognition and rejection test results on a real IR (infrared) database show that our proposed method outperforms the standard one-vs-rest methods and the use of standard SVM classifiers.


Assuntos
Algoritmos , Inteligência Artificial , Classificação , Reconhecimento Automatizado de Padrão/métodos , Bases de Dados Factuais , Análise Discriminante , Análise de Fourier , Interpretação de Imagem Assistida por Computador , Armazenamento e Recuperação da Informação , Análise dos Mínimos Quadrados
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...