Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
IEEE Trans Pattern Anal Mach Intell ; 45(1): 964-979, 2023 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-35133959

RESUMO

In comparison to classical shallow representation learning techniques, deep neural networks have achieved superior performance in nearly every application benchmark. But despite their clear empirical advantages, it is still not well understood what makes them so effective. To approach this question, we introduce deep frame approximation: a unifying framework for constrained representation learning with structured overcomplete frames. While exact inference requires iterative optimization, it may be approximated by the operations of a feed-forward deep neural network. We indirectly analyze how model capacity relates to frame structures induced by architectural hyperparameters such as depth, width, and skip connections. We quantify these structural differences with the deep frame potential, a data-independent measure of coherence linked to representation uniqueness and stability. As a criterion for model selection, we show correlation with generalization error on a variety of common deep network architectures and datasets. We also demonstrate how recurrent networks implementing iterative optimization algorithms can achieve performance comparable to their feed-forward approximations while improving adversarial robustness. This connection to the established theory of overcomplete representations suggests promising new directions for principled deep network architecture design with less reliance on ad-hoc engineering.

2.
IEEE Trans Pattern Anal Mach Intell ; 43(12): 4365-4377, 2021 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-32750772

RESUMO

Non-rigid structure from motion (NRSfM) refers to the problem of reconstructing cameras and the 3D point cloud of a non-rigid object from an ensemble of images with 2D correspondences. Current NRSfM algorithms are limited from two perspectives: (i) the number of images, and (ii) the type of shape variability they can handle. These difficulties stem from the inherent conflict between the condition of the system and the degrees of freedom needing to be modeled - which has hampered its practical utility for many applications within vision. In this paper we propose a novel hierarchical sparse coding model for NRSFM which can overcome (i) and (ii) to such an extent, that NRSFM can be applied to problems in vision previously thought too ill posed. Our approach is realized in practice as the training of an unsupervised deep neural network (DNN) auto-encoder with a unique architecture that is able to disentangle pose from 3D structure. Using modern deep learning computational platforms allows us to solve NRSfM problems at an unprecedented scale and shape complexity. Our approach has no 3D supervision, relying solely on 2D point correspondences. Further, our approach is also able to handle missing/occluded 2D points without the need for matrix completion. Extensive experiments demonstrate the impressive performance of our approach where we exhibit superior precision and robustness against all available state-of-the-art works in some instances by an order of magnitude. We further propose a new quality measure (based on the network weights) which circumvents the need for 3D ground-truth to ascertain the confidence we have in the reconstructability. We believe our work to be a significant advance over state-of-the-art in NRSFM.

3.
IEEE Trans Pattern Anal Mach Intell ; 37(3): 529-40, 2015 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-26353259

RESUMO

Trajectory basis Non-Rigid Structure from Motion (NRSfM) refers to the process of reconstructing the 3D trajectory of each point of a non-rigid object from just their 2D projected trajectories. Reconstruction relies on two factors: (i) the condition of the composed camera & trajectory basis matrix, and (ii) whether the trajectory basis has enough degrees of freedom to model the 3D point trajectory. These two factors are inherently conflicting. Employing a trajectory basis with small capacity has the positive characteristic of reducing the likelihood of an ill-conditioned system (when composed with the camera) during reconstruction. However, this has the negative characteristic of increasing the likelihood that the basis will not be able to fully model the object's "true" 3D point trajectories. In this paper we draw upon a well known result centering around the Reduced Isometry Property (RIP) condition for sparse signal reconstruction. RIP allow us to relax the requirement that the full trajectory basis composed with the camera matrix must be well conditioned. Further, we propose a strategy for learning an over-complete basis using convolutional sparse coding from naturally occurring point trajectory corpora to increase the likelihood that the RIP condition holds for a broad class of point trajectories and camera motions. Finally, we propose an l1 inspired objective for trajectory reconstruction that is able to "adaptively" select the smallest sub-matrix from an over-complete trajectory basis that balances (i) and (ii). We present more practical 3D reconstruction results compared to current state of the art in trajectory basis NRSfM.

4.
Artigo em Inglês | MEDLINE | ID: mdl-27275131

RESUMO

By systematically varying the number of subjects and the number of frames per subject, we explored the influence of training set size on appearance and shape-based approaches to facial action unit (AU) detection. Digital video and expert coding of spontaneous facial activity from 80 subjects (over 350,000 frames) were used to train and test support vector machine classifiers. Appearance features were shape-normalized SIFT descriptors and shape features were 66 facial landmarks. Ten-fold cross-validation was used in all evaluations. Number of subjects and number of frames per subject differentially affected appearance and shape-based classifiers. For appearance features, which are high-dimensional, increasing the number of training subjects from 8 to 64 incrementally improved performance, regardless of the number of frames taken from each subject (ranging from 450 through 3600). In contrast, for shape features, increases in the number of training subjects and frames were associated with mixed results. In summary, maximal performance was attained using appearance features from large numbers of subjects with as few as 450 frames per subject. These findings suggest that variation in the number of subjects rather than number of frames per subject yields most efficient performance.

5.
IEEE Trans Pattern Anal Mach Intell ; 35(6): 1383-96, 2013 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-23599053

RESUMO

In this paper, we propose a framework for both gradient descent image and object alignment in the Fourier domain. Our method centers upon the classical Lucas & Kanade (LK) algorithm where we represent the source and template/model in the complex 2D Fourier domain rather than in the spatial 2D domain. We refer to our approach as the Fourier LK (FLK) algorithm. The FLK formulation is advantageous when one preprocesses the source image and template/model with a bank of filters (e.g., oriented edges, Gabor, etc.) as 1) it can handle substantial illumination variations, 2) the inefficient preprocessing filter bank step can be subsumed within the FLK algorithm as a sparse diagonal weighting matrix, 3) unlike traditional LK, the computational cost is invariant to the number of filters and as a result is far more efficient, and 4) this approach can be extended to the Inverse Compositional (IC) form of the LK algorithm where nearly all steps (including Fourier transform and filter bank preprocessing) can be precomputed, leading to an extremely efficient and robust approach to gradient descent image matching. Further, these computational savings translate to nonrigid object alignment tasks that are considered extensions of the LK algorithm, such as those found in Active Appearance Models (AAMs).


Assuntos
Algoritmos , Análise de Fourier , Reconhecimento Automatizado de Padrão , Expressão Facial , Humanos , Processamento de Imagem Assistida por Computador/métodos
6.
Perception ; 42(9): 950-70, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24386715

RESUMO

Facial movement may provide cues to identity, by supporting the extraction of face shape information via structure-from-motion, or via characteristic patterns of movement. Currently, it is unclear whether familiar and unfamiliar faces derive the same benefit from these mechanisms. This study examined the movement advantage by asking participants to match moving and static images of famous and unfamiliar faces to facial point-light displays (PLDs) or shape-normalised avatars in a same/different task (experiment 1). In experiment 2 we also used a same/different task, but participants matched from PLD to PLD or from avatar to avatar. In both experiments, unfamiliar face matching was more accurate for PLDs than for avatars, but there was no effect of stimulus type on famous faces. In experiment 1, there was no movement advantage, but in experiment 2, there was a significant movement advantage for famous and unfamiliar faces. There was no evidence that familiarity increased the movement advantage. For unfamiliar faces, results suggest that participants were relying on characteristic movement patterns to match the faces, and did not derive any extra benefit from the structure-from-motion cues in the PLDs. The results indicate that participants may use static and movement-based cues in a flexible manner when matching famous and unfamiliar faces.


Assuntos
Face , Movimento/fisiologia , Reconhecimento Visual de Modelos/fisiologia , Reconhecimento Psicológico/fisiologia , Adolescente , Adulto , Análise de Variância , Austrália , Sinais (Psicologia) , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Estimulação Luminosa/métodos , Tempo de Reação/fisiologia , Estudantes/psicologia , Adulto Jovem
7.
IEEE Trans Syst Man Cybern B Cybern ; 41(3): 664-74, 2011 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-21097382

RESUMO

In a clinical setting, pain is reported either through patient self-report or via an observer. Such measures are problematic as they are: 1) subjective, and 2) give no specific timing information. Coding pain as a series of facial action units (AUs) can avoid these issues as it can be used to gain an objective measure of pain on a frame-by-frame basis. Using video data from patients with shoulder injuries, in this paper, we describe an active appearance model (AAM)-based system that can automatically detect the frames in video in which a patient is in pain. This pain data set highlights the many challenges associated with spontaneous emotion detection, particularly that of expression and head movement due to the patient's reaction to pain. In this paper, we show that the AAM can deal with these movements and can achieve significant improvements in both the AU and pain detection performance compared to the current-state-of-the-art approaches which utilize similarity-normalized appearance features only.


Assuntos
Inteligência Artificial , Face/patologia , Interpretação de Imagem Assistida por Computador/métodos , Medição da Dor/métodos , Dor/patologia , Reconhecimento Automatizado de Padrão/métodos , Gravação em Vídeo/métodos , Humanos , Dor/classificação , Fotografação/métodos
8.
Artigo em Inglês | MEDLINE | ID: mdl-24598812

RESUMO

A real time facial puppetry system is presented. Compared with existing systems, the proposed method requires no special hardware, runs in real time (23 frames-per-second), and requires only a single image of the avatar and user. The user's facial expression is captured through a real-time 3D non-rigid tracking system. Expression transfer is achieved by combining a generic expression model with synthetically generated examples that better capture person specific characteristics. Performance of the system is evaluated on avatars of real people as well as masks and cartoon characters.

9.
IEEE Trans Pattern Anal Mach Intell ; 32(7): 1335-41, 2010 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-20489236

RESUMO

Linear filters are ubiquitously used as a preprocessing step for many classification tasks in computer vision. In particular, applying Gabor filters followed by a classification stage, such as a support vector machine (SVM), is now common practice in computer vision applications like face identity and expression recognition. A fundamental problem occurs, however, with respect to the high dimensionality of the concatenated Gabor filter responses in terms of memory requirements and computational efficiency during training and testing. In this paper, we demonstrate how the preprocessing step of applying a bank of linear filters can be reinterpreted as manipulating the type of margin being maximized within the linear SVM. This new interpretation leads to sizable memory and computational advantages with respect to existing approaches. The reinterpreted formulation turns out to be independent of the number of filters, thereby allowing the examination of the feature spaces derived from arbitrarily large number of linear filters, a hitherto untestable prospect. Further, this new interpretation of filter banks gives new insights, other than the often cited biological motivations, into why the preprocessing of images with filter banks, like Gabor filters, improves classification performance.


Assuntos
Algoritmos , Processamento de Imagem Assistida por Computador/métodos , Modelos Lineares , Inteligência Artificial , Movimentos Oculares , Face , Análise de Fourier , Humanos
10.
Image Vis Comput ; 28(5): 781-789, 2010 May.
Artigo em Inglês | MEDLINE | ID: mdl-25242852

RESUMO

In this paper we present a new discriminative approach to achieve consistent and efficient tracking of non-rigid object motion, such as facial expressions. By utilizing both spatial and temporal appearance coherence at the patch level, the proposed approach can reduce ambiguity and increase accuracy. Recent research demonstrates that feature based approaches, such as constrained local models (CLMs), can achieve good performance in non-rigid object alignment/tracking using local region descriptors and a non-rigid shape prior. However, the matching performance of the learned generic patch experts is susceptible to local appearance ambiguity. Since there is no motion continuity constraint between neighboring frames of the same sequence, the resultant object alignment might not be consistent from frame to frame and the motion field is not temporally smooth. In this paper, we extend the CLM method into the spatio-temporal domain by enforcing the appearance consistency constraint of each local patch between neighboring frames. More importantly, we show that the global warp update can be optimized jointly in an efficient manner using convex quadratic fitting. Finally, we demonstrate that our approach receives improved performance for the task of non-rigid facial motion tracking on the videos of clinical patients.

11.
Image Vis Comput ; 27(12): 1804-1813, 2009 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-20046797

RESUMO

Active appearance models (AAMs) have demonstrated great utility when being employed for non-rigid face alignment/tracking. The "simultaneous" algorithm for fitting an AAM achieves good non-rigid face registration performance, but has poor real time performance (2-3 fps). The "project-out" algorithm for fitting an AAM achieves faster than real time performance (> 200 fps) but suffers from poor generic alignment performance. In this paper we introduce an extension to a discriminative method for non-rigid face registration/tracking referred to as a constrained local model (CLM). Our proposed method is able to achieve superior performance to the "simultaneous" AAM algorithm along with real time fitting speeds (35 fps). We improve upon the canonical CLM formulation, to gain this performance, in a number of ways by employing: (i) linear SVMs as patch-experts, (ii) a simplified optimization criteria, and (iii) a composite rather than additive warp update step. Most notably, our simplified optimization criteria for fitting the CLM divides the problem of finding a single complex registration/warp displacement into that of finding N simple warp displacements. From these N simple warp displacements, a single complex warp displacement is estimated using a weighted least-squares constraint. Another major advantage of this simplified optimization lends from its ability to be parallelized, a step which we also theoretically explore in this paper. We refer to our approach for fitting the CLM as the "exhaustive local search" (ELS) algorithm. Experiments were conducted on the CMU Multi-PIE database.

12.
Image Vis Comput ; 27(12): 1788-1796, 2009 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-22837587

RESUMO

Pain is typically assessed by patient self-report. Self-reported pain, however, is difficult to interpret and may be impaired or in some circumstances (i.e., young children and the severely ill) not even possible. To circumvent these problems behavioral scientists have identified reliable and valid facial indicators of pain. Hitherto, these methods have required manual measurement by highly skilled human observers. In this paper we explore an approach for automatically recognizing acute pain without the need for human observers. Specifically, our study was restricted to automatically detecting pain in adult patients with rotator cuff injuries. The system employed video input of the patients as they moved their affected and unaffected shoulder. Two types of ground truth were considered. Sequence-level ground truth consisted of Likert-type ratings by skilled observers. Frame-level ground truth was calculated from presence/absence and intensity of facial actions previously associated with pain. Active appearance models (AAM) were used to decouple shape and appearance in the digitized face images. Support vector machines (SVM) were compared for several representations from the AAM and of ground truth of varying granularity. We explored two questions pertinent to the construction, design and development of automatic pain detection systems. First, at what level (i.e., sequence- or frame-level) should datasets be labeled in order to obtain satisfactory automatic pain detection performance? Second, how important is it, at both levels of labeling, that we non-rigidly register the face?

13.
Artigo em Inglês | MEDLINE | ID: mdl-21278824

RESUMO

Pain is generally measured by patient self-report, normally via verbal communication. However, if the patient is a child or has limited ability to communicate (i.e. the mute, mentally impaired, or patients having assisted breathing) self-report may not be a viable measurement. In addition, these self-report measures only relate to the maximum pain level experienced during a sequence so a frame-by-frame measure is currently not obtainable. Using image data from patients with rotator-cuff injuries, in this paper we describe an AAM-based automatic system which can detect pain on a frame-by-frame level. We do this two ways: directly (straight from the facial features); and indirectly (through the fusion of individual AU detectors). From our results, we show that the latter method achieves the optimal results as most discriminant features from each AU detector (i.e. shape or appearance) are used.

14.
Artigo em Inglês | MEDLINE | ID: mdl-20411036

RESUMO

Despite significant progress in deformable model fitting over the last decade, the problem of efficient and accurate person-independent face fitting remains a challenging problem. In this work, a reformulation of the generative fitting objective is presented, where only soft correspondences between the model and the image are enforced. This has the dual effect of improving robustness to unseen faces as well as affording fitting time which scales linearly with the model's complexity. This approach is compared with three state-of-the-art fitting methods on the problem of person independent face fitting, where it is shown to closely approach the accuracy of the currently best performing method while affording significant computational savings.

15.
Artigo em Inglês | MEDLINE | ID: mdl-25285316

RESUMO

Automatically recognizing pain from video is a very useful application as it has the potential to alert carers to patients that are in discomfort who would otherwise not be able to communicate such emotion (i.e young children, patients in postoperative care etc.). In previous work [1], a "pain-no pain" system was developed which used an AAM-SVM approach to good effect. However, as with any task involving a large amount of video data, there are memory constraints that need to be adhered to and in the previous work this was compressing the temporal signal using K-means clustering in the training phase. In visual speech recognition, it is well known that the dynamics of the signal play a vital role in recognition. As pain recognition is very similar to the task of visual speech recognition (i.e. recognising visual facial actions), it is our belief that compressing the temporal signal reduces the likelihood of accurately recognising pain. In this paper, we show that by compressing the spatial signal instead of the temporal signal, we achieve better pain recognition. Our results show the importance of the temporal signal in recognizing pain, however, we do highlight some problems associated with doing this due to the randomness of a patient's facial actions.

16.
Artigo em Inglês | MEDLINE | ID: mdl-20622926

RESUMO

Constrained local models (CLMs) have recently demonstrated good performance in non-rigid object alignment/tracking in comparison to leading holistic approaches (e.g., AAMs). A major problem hindering the development of CLMs further, for non-rigid object alignment/tracking, is how to jointly optimize the global warp update across all local search responses. Previous methods have either used general purpose optimizers (e.g., simplex methods) or graph based optimization techniques. Unfortunately, problems exist with both these approaches when applied to CLMs. In this paper, we propose a new approach for optimizing the global warp update in an efficient manner by enforcing convexity at each local patch response surface. Furthermore, we show that the classic Lucas-Kanade approach to gradient descent image alignment can be viewed as a special case of our proposed framework. Finally, we demonstrate that our approach receives improved performance for the task of non-rigid face alignment/tracking on the MultiPIE database and the UNBC-McMaster archive.

17.
IEEE Workshop Multimed Signal Proc ; 2008: 337-342, 2008 Oct 08.
Artigo em Inglês | MEDLINE | ID: mdl-20689666

RESUMO

A common problem that affects object alignment algorithms is when they have to deal with objects with unseen intra-class appearance variation. Several variants based on gradient-decent algorithms, such as the Lucas-Kanade (or forward-additive) and inverse-compositional algorithms, have been proposed to deal with this issue by solving for both alignment and appearance simultaneously. In [1], Baker and Matthews showed that without appearance variation, the inverse-compositional (IC) algorithm was theoretically and empirically equivalent to the forward-additive (FA) algorithm, whilst achieving significant improvement in computational efficiency. With appearance variation, it would be intuitive that a similar benefit of the IC algorithm would be experienced over the FA counterpart. However, to date no such comparison has been performed. In this paper we remedy this situation by performing such a comparison. In this comparison we show that the two algorithms are not equivalent due to the inclusion of the appearance variation parameters. Through a number of experiments on the MultiPIE face database, we show that we can gain greater refinement using the FA algorithm due to it being a truer solution than the IC approach.

18.
Artigo em Inglês | MEDLINE | ID: mdl-20706553

RESUMO

In this paper, we present an approach we refer to as "least squares congealing" which provides a solution to the problem of aligning an ensemble of images in an unsupervised manner. Our approach circumvents many of the limitations existing in the canonical "congealing" algorithm. Specifically, we present an algorithm that:- (i) is able to simultaneously, rather than sequentially, estimate warp parameter updates, (ii) exhibits fast convergence and (iii) requires no pre-defined step size. We present alignment results which show an improvement in performance for the removal of unwanted spatial variation when compared with the related work of Learned-Miller on two datasets, the MNIST hand written digit database and the MultiPIE face database.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...