Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 10 de 10
Filter
1.
IEEE Trans Pattern Anal Mach Intell ; 41(9): 2161-2175, 2019 09.
Article in English | MEDLINE | ID: mdl-29994653

ABSTRACT

Hand pose estimation, formulated as an inverse problem, is typically optimized by an energy function over pose parameters using a 'black box' image generation procedure, knowing little about either the relationships between the parameters or the form of the energy function. In this paper, we show significant improvement upon such black box optimization by exploiting high-level knowledge of the parameter structure and using a local surrogate energy function. Our new framework, hierarchical sampling optimization (HSO), consists of a sequence of discriminative predictors organized into a kinematic hierarchy. Each predictor is conditioned on its ancestors, and generates a set of samples over a subset of the pose parameters, with only one selected by the highly-efficient surrogate energy. The selected partial poses are concatenated to generate a full-pose hypothesis. Repeating the same process, several hypotheses are generated and the full energy function selects the best result. Under the same kinematic hierarchy, two methods based on decision forest and convolutional neural network are proposed to generate the samples and two optimization methods are studied when optimizing these samples. Experimental evaluations on three publicly available datasets show that our method is particularly impressive in low-compute scenarios where it significantly outperforms all other state-of-the-art methods.

2.
IEEE Trans Vis Comput Graph ; 21(5): 571-83, 2015 May.
Article in English | MEDLINE | ID: mdl-26357205

ABSTRACT

Recovery from tracking failure is essential in any simultaneous localization and tracking system. In this context, we explore an efficient keyframe-based relocalization method based on frame encoding using randomized ferns. The method enables automatic discovery of keyframes through online harvesting in tracking mode, and fast retrieval of pose candidates in the case when tracking is lost. Frame encoding is achieved by applying simple binary feature tests which are stored in the nodes of an ensemble of randomized ferns. The concatenation of small block codes generated by each fern yields a global compact representation of camera frames. Based on those representations we define the frame dissimilarity as the block-wise hamming distance (BlockHD). Dissimilarities between an incoming query frame and a large set of keyframes can be efficiently evaluated by simply traversing the nodes of the ferns and counting image co-occurrences in corresponding code tables. In tracking mode, those dissimilarities decide whether a frame/pose pair is considered as a novel keyframe. For tracking recovery, poses of the most similar keyframes are retrieved and used for reinitialization of the tracking algorithm. The integration of our relocalization method into a hand-held KinectFusion system allows seamless continuation of mapping even when tracking is frequently lost.

3.
IEEE Trans Pattern Anal Mach Intell ; 35(12): 2821-40, 2013 Dec.
Article in English | MEDLINE | ID: mdl-24136424

ABSTRACT

We describe two new approaches to human pose estimation. Both can quickly and accurately predict the 3D positions of body joints from a single depth image without using any temporal information. The key to both approaches is the use of a large, realistic, and highly varied synthetic set of training images. This allows us to learn models that are largely invariant to factors such as pose, body shape, field-of-view cropping, and clothing. Our first approach employs an intermediate body parts representation, designed so that an accurate per-pixel classification of the parts will localize the joints of the body. The second approach instead directly regresses the positions of body joints. By using simple depth pixel comparison features and parallelizable decision forests, both approaches can run super-real time on consumer hardware. Our evaluation investigates many aspects of our methods, and compares the approaches to each other and to the state of the art. Results on silhouettes suggest broader applicability to other imaging modalities.


Subject(s)
Algorithms , Imaging, Three-Dimensional , Humans , Image Interpretation, Computer-Assisted , Regression Analysis
4.
IEEE Trans Cybern ; 43(5): 1314-7, 2013 Oct.
Article in English | MEDLINE | ID: mdl-23955797

ABSTRACT

Kinect sensor, high-resolution depth and visual (RGB) sensing has become available for widespread use as an off-the-shelf technology. This special issue is specifically dedicated to new algorithms and/or new applications based on the Kinect (or similar RGB-D) sensors. In total, we received over ninety submissions from more than twenty countries all around the world. The submissions cover a wide range of areas including object and scene classification, 3-D pose estimation, visual tracking, data fusion, human action/activity recognition, 3-D reconstruction, mobile robotics, and so on. After two rounds of review by at least two (mostly three) expert reviewers for each paper, the Guest Editors have selected twelve high-quality papers to be included in this highly popular special issue. The papers that comprise this issue are briefly summarized.


Subject(s)
Actigraphy/instrumentation , Actigraphy/methods , Computer Peripherals , Transducers , Video Games , Whole Body Imaging/instrumentation , Whole Body Imaging/methods , Humans , Imaging, Three-Dimensional/instrumentation , Imaging, Three-Dimensional/methods
5.
IEEE Trans Cybern ; 43(5): 1318-34, 2013 Oct.
Article in English | MEDLINE | ID: mdl-23807480

ABSTRACT

With the invention of the low-cost Microsoft Kinect sensor, high-resolution depth and visual (RGB) sensing has become available for widespread use. The complementary nature of the depth and visual information provided by the Kinect sensor opens up new opportunities to solve fundamental problems in computer vision. This paper presents a comprehensive review of recent Kinect-based computer vision algorithms and applications. The reviewed approaches are classified according to the type of vision problems that can be addressed or enhanced by means of the Kinect sensor. The covered topics include preprocessing, object tracking and recognition, human activity analysis, hand gesture analysis, and indoor 3-D mapping. For each category of methods, we outline their main algorithmic contributions and summarize their advantages/differences compared to their RGB counterparts. Finally, we give an overview of the challenges in this field and future research trends. This paper is expected to serve as a tutorial and source of references for Kinect-based computer vision researchers.


Subject(s)
Algorithms , Artificial Intelligence , Computer Peripherals , Imaging, Three-Dimensional/methods , Pattern Recognition, Automated/methods , Whole Body Imaging/methods , Actigraphy/instrumentation , Actigraphy/methods , Computer Simulation , Image Enhancement/instrumentation , Image Enhancement/methods , Transducers , Video Games , Whole Body Imaging/instrumentation
6.
Inf Process Med Imaging ; 22: 184-96, 2011.
Article in English | MEDLINE | ID: mdl-21761656

ABSTRACT

This work addresses the challenging problem of simultaneously segmenting multiple anatomical structures in highly varied CT scans. We propose the entangled decision forest (EDF) as a new discriminative classifier which augments the state of the art decision forest, resulting in higher prediction accuracy and shortened decision time. Our main contribution is two-fold. First, we propose entangling the binary tests applied at each tree node in the forest, such that the test result can depend on the result of tests applied earlier in the same tree and at image points offset from the voxel to be classified. This is demonstrated to improve accuracy and capture long-range semantic context. Second, during training, we propose injecting randomness in a guided way, in which node feature types and parameters are randomly drawn from a learned (nonuniform) distribution. This further improves classification accuracy. We assess our probabilistic anatomy segmentation technique using a labeled database of CT image volumes of 250 different patients from various scan protocols and scanner vendors. In each volume, 12 anatomical structures have been manually segmented. The database comprises highly varied body shapes and sizes, a wide array of pathologies, scan resolutions, and diverse contrast agents. Quantitative comparisons with state of the art algorithms demonstrate both superior test accuracy and computational efficiency.


Subject(s)
Algorithms , Artificial Intelligence , Pattern Recognition, Automated/methods , Radiographic Image Interpretation, Computer-Assisted/methods , Tomography, X-Ray Computed/methods , Cluster Analysis , Humans , Radiographic Image Enhancement/methods , Reproducibility of Results , Sensitivity and Specificity
7.
Neural Comput ; 23(3): 593-650, 2011 Mar.
Article in English | MEDLINE | ID: mdl-21162663

ABSTRACT

Computer vision has grown tremendously in the past two decades. Despite all efforts, existing attempts at matching parts of the human visual system's extraordinary ability to understand visual scenes lack either scope or power. By combining the advantages of general low-level generative models and powerful layer-based and hierarchical models, this work aims at being a first step toward richer, more flexible models of images. After comparing various types of restricted Boltzmann machines (RBMs) able to model continuous-valued data, we introduce our basic model, the masked RBM, which explicitly models occlusion boundaries in image patches by factoring the appearance of any patch region from its shape. We then propose a generative model of larger images using a field of such RBMs. Finally, we discuss how masked RBMs could be stacked to form a deep model able to generate more complicated structures and suitable for various tasks such as segmentation or object recognition.

8.
IEEE Trans Pattern Anal Mach Intell ; 33(4): 838-45, 2011 Apr.
Article in English | MEDLINE | ID: mdl-21079280

ABSTRACT

This paper proposes a novel method for recognizing faces degraded by blur using deblurring of facial images. The main issue is how to infer a Point Spread Function (PSF) representing the process of blur on faces. Inferring a PSF from a single facial image is an ill-posed problem. Our method uses learned prior information derived from a training set of blurred faces to make the problem more tractable. We construct a feature space such that blurred faces degraded by the same PSF are similar to one another. We learn statistical models that represent prior knowledge of predefined PSF sets in this feature space. A query image of unknown blur is compared with each model and the closest one is selected for PSF inference. The query image is deblurred using the PSF corresponding to that model and is thus ready for recognition. Experiments on a large face database (FERET) artificially degraded by focus or motion blur show that our method substantially improves the recognition performance compared to existing methods. We also demonstrate improved performance on real blurred images on the FRGC 1.0 face database. Furthermore, we show and explain how combining the proposed facial deblur inference with the local phase quantization (LPQ) method can further enhance the performance.


Subject(s)
Algorithms , Face/anatomy & histology , Image Processing, Computer-Assisted/methods , Humans , Image Enhancement/methods , Pattern Recognition, Automated/methods
9.
Med Image Comput Comput Assist Interv ; 12(Pt 2): 558-65, 2009.
Article in English | MEDLINE | ID: mdl-20426156

ABSTRACT

A new algorithm is presented for the automatic segmentation and classification of brain tissue from 3D MR scans. It uses discriminative Random Decision Forest classification and takes into account partial volume effects. This is combined with correction of intensities for the MR bias field, in conjunction with a learned model of spatial context, to achieve accurate voxel-wise classification. Our quantitative validation, carried out on existing labelled datasets, demonstrates improved results over the state of the art, especially for the cerebro-spinal fluid class which is the most difficult to label accurately.


Subject(s)
Brain/anatomy & histology , Image Interpretation, Computer-Assisted/methods , Imaging, Three-Dimensional/methods , Information Storage and Retrieval/methods , Magnetic Resonance Imaging/methods , Pattern Recognition, Automated/methods , Subtraction Technique , Algorithms , Artificial Intelligence , Data Interpretation, Statistical , Discriminant Analysis , Humans , Image Enhancement/methods , Reproducibility of Results , Sensitivity and Specificity
10.
IEEE Trans Pattern Anal Mach Intell ; 30(7): 1270-81, 2008 Jul.
Article in English | MEDLINE | ID: mdl-18550908

ABSTRACT

Psychophysical studies [9], [17] show that we can recognize objects using fragments of outline contour alone. This paper proposes a new automatic visual recognition system based only on local contour features, capable of localizing objects in space and scale. The system first builds a class-specific codebook of local fragments of contour using a novel formulation of chamfer matching. These local fragments allow recognition that is robust to within-class variation, pose changes, and articulation. Boosting combines these fragments into a cascaded sliding-window classifier, and mean shift is used to select strong responses as a final set of detections. We show how learning can be performed iteratively on both training and test sets to boot-strap an improved classifier. We compare with other methods based on contour and local descriptors in our detailed evaluation over 17 challenging categories, and obtain highly competitive results. The results confirm that contour is indeed a powerful cue for multi-scale and multi-class visual object recognition.


Subject(s)
Algorithms , Artificial Intelligence , Image Interpretation, Computer-Assisted/methods , Pattern Recognition, Automated/methods , Image Enhancement/methods , Information Storage and Retrieval/methods , Reproducibility of Results , Sensitivity and Specificity
SELECTION OF CITATIONS
SEARCH DETAIL
...