Search | VHL Regional Portal

CSMMI: class-specific maximization of mutual information for action and gesture recognition.

Wan, Jun; Athitsos, Vassilis; Jangyodsuk, Pat; Escalante, Hugo Jair; Ruan, Qiuqi; Guyon, Isabelle.

IEEE Trans Image Process ; 23(7): 3152-65, 2014 Jul.

Article in English | MEDLINE | ID: mdl-24983106

ABSTRACT

In this paper, we propose a novel approach called class-specific maximization of mutual information (CSMMI) using a submodular method, which aims at learning a compact and discriminative dictionary for each class. Unlike traditional dictionary-based algorithms, which typically learn a shared dictionary for all of the classes, we unify the intraclass and interclass mutual information (MI) into an single objective function to optimize class-specific dictionary. The objective function has two aims: 1) maximizing the MI between dictionary items within a specific class (intrinsic structure) and 2) minimizing the MI between the dictionary items in a given class and those of the other classes (extrinsic structure). We significantly reduce the computational complexity of CSMMI by introducing an novel submodular method, which is one of the important contributions of this paper. This paper also contributes a state-of-the-art end-to-end system for action and gesture recognition incorporating CSMMI, with feature extraction, learning initial dictionary per each class by sparse coding, CSMMI via submodularity, and classification based on reconstruction errors. We performed extensive experiments on synthetic data and eight benchmark data sets. Our experimental results show that CSMMI outperforms shared dictionary methods and that our end-to-end system is competitive with other state-of-the-art approaches.

Subject(s)

Algorithms , Artificial Intelligence , Gestures , Image Processing, Computer-Assisted/methods , Pattern Recognition, Automated/methods , Databases, Factual , Humans , Movement , Sports

A unified framework for gesture recognition and spatiotemporal gesture segmentation.

Alon, Jonathan; Athitsos, Vassilis; Yuan, Quan; Sclaroff, Stan.

IEEE Trans Pattern Anal Mach Intell ; 31(9): 1685-99, 2009 Sep.

Article in English | MEDLINE | ID: mdl-19574627

ABSTRACT

Within the context of hand gesture recognition, spatiotemporal gesture segmentation is the task of determining, in a video sequence, where the gesturing hand is located and when the gesture starts and ends. Existing gesture recognition methods typically assume either known spatial segmentation or known temporal segmentation, or both. This paper introduces a unified framework for simultaneously performing spatial segmentation, temporal segmentation, and recognition. In the proposed framework, information flows both bottom-up and top-down. A gesture can be recognized even when the hand location is highly ambiguous and when information about when the gesture begins and ends is unavailable. Thus, the method can be applied to continuous image streams where gestures are performed in front of moving, cluttered backgrounds. The proposed method consists of three novel contributions: a spatiotemporal matching algorithm that can accommodate multiple candidate hand detections in every frame, a classifier-based pruning framework that enables accurate and early rejection of poor matches to gesture models, and a subgesture reasoning algorithm that learns which gesture models can falsely match parts of other longer gestures. The performance of the approach is evaluated on two challenging applications: recognition of hand-signed digits gestured by users wearing short-sleeved shirts, in front of a cluttered background, and retrieval of occurrences of signs of interest in a video database containing continuous, unsegmented signing in American Sign Language (ASL).

Subject(s)

Algorithms , Gestures , Hand/anatomy & histology , Image Interpretation, Computer-Assisted/methods , Imaging, Three-Dimensional/methods , Pattern Recognition, Automated/methods , Sign Language , Artificial Intelligence , Humans , Image Enhancement/methods , Reproducibility of Results , Sensitivity and Specificity

Detecting objects of variable shape structure with hidden state shape models.

Wang, Jingbin; Athitsos, Vassilis; Sclaroff, Stan; Betke, Margrit.

IEEE Trans Pattern Anal Mach Intell ; 30(3): 477-92, 2008 Mar.

Article in English | MEDLINE | ID: mdl-18195441

ABSTRACT

This paper proposes a method for detecting object classes that exhibit variable shape structure in heavily cluttered images. The term "variable shape structure" is used to characterize object classes in which some shape parts can be repeated an arbitrary number of times, some parts can be optional, and some parts can have several alternative appearances. Hidden State Shape Models (HSSMs), a generalization of Hidden Markov Models (HMMs), are introduced to model object classes of variable shape structure using a probabilistic framework. A polynomial inference algorithm automatically determines object location, orientation, scale and structure by finding the globally optimal registration of model states with the image features, even in the presence of clutter. Experiments with real images demonstrate that the proposed method can localize objects of variable shape structure with high accuracy. For the task of hand shape localization and structure identification, the proposed method is significantly more accurate than previously proposed methods based on chamfer-distance matching. Furthermore, by integrating simple temporal constraints, the proposed method gains speed-ups of more than an order of magnitude, and produces highly accurate results in experiments on non-rigid hand motion tracking.

Subject(s)

Artificial Intelligence , Biometry/methods , Hand/anatomy & histology , Image Enhancement/methods , Image Interpretation, Computer-Assisted/methods , Models, Statistical , Pattern Recognition, Automated/methods , Algorithms , Computer Simulation , Data Interpretation, Statistical , Humans , Information Storage and Retrieval/methods , Markov Chains , Reproducibility of Results , Sensitivity and Specificity

BoostMap: an embedding method for efficient nearest neighbor retrieval.

Athitsos, Vassilis; Alon, Jonathan; Sclaroff, Stan; Kollios, George.

IEEE Trans Pattern Anal Mach Intell ; 30(1): 89-104, 2008 Jan.

Article in English | MEDLINE | ID: mdl-18000327

ABSTRACT

This paper describes BoostMap, a method for efficient nearest neighbor retrieval under computationally expensive distance measures. Database and query objects are embedded into a vector space, in which distances can be measured efficiently. Each embedding is treated as a classifier that predicts for any three objects X, A, B whether X is closer to A or to B. It is shown that a linear combination of such embeddingbased classifiers naturally corresponds to an embedding and a distance measure. Based on this property, the BoostMap method reduces the problem of embedding construction to the classical boosting problem of combining many weak classifiers into an optimized strong classifier. The classification accuracy of the resulting strong classifier is a direct measure of the amount of nearest neighbor structure preserved by the embedding. An important property of BoostMap is that the embedding optimization criterion is equally valid in both metric and non-metric spaces. Performance is evaluated in databases of hand images, handwritten digits, and time series. In all cases, BoostMap significantly improves retrieval efficiency with small losses in accuracy compared to brute-force search. Moreover, BoostMap significantly outperforms existing nearest neighbor retrieval methods, such as Lipschitz embeddings, FastMap, and VP-trees.

Subject(s)

Algorithms , Artificial Intelligence , Image Enhancement/methods , Image Interpretation, Computer-Assisted/methods , Imaging, Three-Dimensional/methods , Pattern Recognition, Automated/methods , Subtraction Technique , Reproducibility of Results , Sensitivity and Specificity

Skin color-based video segmentation under time-varying illumination.

Sigal, Leonid; Sclaroff, Stan; Athitsos, Vassilis.

IEEE Trans Pattern Anal Mach Intell ; 26(7): 862-77, 2004 Jul.

Article in English | MEDLINE | ID: mdl-18579945

ABSTRACT

A novel approach for real-time skin segmentation in video sequences is described. The approach enables reliable skin segmentation despite wide variation in illumination during tracking. An explicit second order Markov model is used to predict evolution of the skin-color (HSV) histogram over time. Histograms are dynamically updated based on feedback from the current segmentation and predictions of the Markov model. The evolution of the skin-color distribution at each frame is parameterized by translation, scaling, and rotation in color space. Consequent changes in geometric parameterization of the distribution are propagated by warping and resampling the histogram. The parameters of the discrete-time dynamic Markov model are estimated using Maximum Likelihood Estimation and also evolve over time. The accuracy of the new dynamic skin color segmentation algorithm is compared to that obtained via a static color model. Segmentation accuracy is evaluated using labeled ground-truth video sequences taken from staged experiments and popular movies. An overall increase in segmentation accuracy of up to 24 percent is observed in 17 out of 21 test sequences. In all but one case, the skin-color classification rates for our system were higher, with background classification rates comparable to those of the static segmentation.

Subject(s)

Color , Colorimetry/methods , Image Interpretation, Computer-Assisted/methods , Lighting/methods , Pattern Recognition, Automated/methods , Skin Physiological Phenomena , Video Recording/methods , Algorithms , Artificial Intelligence , Humans , Image Enhancement/methods

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL