Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 14 de 14
Filter
1.
Entropy (Basel) ; 24(4)2022 Mar 25.
Article in English | MEDLINE | ID: mdl-35455120

ABSTRACT

This work proposes a new computational framework for learning a structured generative model for real-world datasets. In particular, we propose to learn a Closed-loop Transcriptionbetween a multi-class, multi-dimensional data distribution and a Linear discriminative representation (CTRL) in the feature space that consists of multiple independent multi-dimensional linear subspaces. In particular, we argue that the optimal encoding and decoding mappings sought can be formulated as a two-player minimax game between the encoder and decoderfor the learned representation. A natural utility function for this game is the so-called rate reduction, a simple information-theoretic measure for distances between mixtures of subspace-like Gaussians in the feature space. Our formulation draws inspiration from closed-loop error feedback from control systems and avoids expensive evaluating and minimizing of approximated distances between arbitrary distributions in either the data space or the feature space. To a large extent, this new formulation unifies the concepts and benefits of Auto-Encoding and GAN and naturally extends them to the settings of learning a both discriminative and generative representation for multi-class and multi-dimensional real-world data. Our extensive experiments on many benchmark imagery datasets demonstrate tremendous potential of this new closed-loop formulation: under fair comparison, visual quality of the learned decoder and classification performance of the encoder is competitive and arguably better than existing methods based on GAN, VAE, or a combination of both. Unlike existing generative models, the so-learned features of the multiple classes are structured instead of hidden: different classes are explicitly mapped onto corresponding independent principal subspaces in the feature space, and diverse visual attributes within each class are modeled by the independent principal components within each subspace.

2.
IEEE Trans Pattern Anal Mach Intell ; 33(10): 1991-2001, 2011 Oct.
Article in English | MEDLINE | ID: mdl-21646678

ABSTRACT

State-of-the-art image retrieval systems achieve scalability by using a bag-of-words representation and textual retrieval methods, but their performance degrades quickly in the face image domain, mainly because they produce visual words with low discriminative power for face images and ignore the special properties of faces. The leading features for face recognition can achieve good retrieval performance, but these features are not suitable for inverted indexing as they are high-dimensional and global and thus not scalable in either computational or storage cost. In this paper, we aim to build a scalable face image retrieval system. For this purpose, we develop a new scalable face representation using both local and global features. In the indexing stage, we exploit special properties of faces to design new component-based local features, which are subsequently quantized into visual words using a novel identity-based quantization scheme. We also use a very small Hamming signature (40 bytes) to encode the discriminative global feature for each face. In the retrieval stage, candidate images are first retrieved from the inverted index of visual words. We then use a new multireference distance to rerank the candidate images using the Hamming signature. On a one millon face database, we show that our local features and global Hamming signatures are complementary--the inverted index based on local features provides candidate images with good recall, while the multireference reranking with global Hamming signature leads to good precision. As a result, our system is not only scalable but also outperforms the linear scan retrieval system using the state-of-the-art face recognition feature in term of the quality.


Subject(s)
Biometric Identification/methods , Face/anatomy & histology , Image Processing, Computer-Assisted/methods , Information Storage and Retrieval/methods , Humans
3.
IEEE Trans Pattern Anal Mach Intell ; 33(2): 353-67, 2011 Feb.
Article in English | MEDLINE | ID: mdl-21193811

ABSTRACT

In this paper, we study the salient object detection problem for images. We formulate this problem as a binary labeling task where we separate the salient object from the background. We propose a set of novel features, including multiscale contrast, center-surround histogram, and color spatial distribution, to describe a salient object locally, regionally, and globally. A conditional random field is learned to effectively combine these features for salient object detection. Further, we extend the proposed approach to detect a salient object from sequential images by introducing the dynamic salient features. We collected a large image database containing tens of thousands of carefully labeled images by multiple users and a video segment database, and conducted a set of experiments over them to demonstrate the effectiveness of the proposed approach.

4.
IEEE Trans Image Process ; 20(6): 1529-42, 2011 Jun.
Article in English | MEDLINE | ID: mdl-21118774

ABSTRACT

In this paper, we propose a novel generic image prior-gradient profile prior, which implies the prior knowledge of natural image gradients. In this prior, the image gradients are represented by gradient profiles, which are 1-D profiles of gradient magnitudes perpendicular to image structures. We model the gradient profiles by a parametric gradient profile model. Using this model, the prior knowledge of the gradient profiles are learned from a large collection of natural images, which are called gradient profile prior. Based on this prior, we propose a gradient field transformation to constrain the gradient fields of the high resolution image and the enhanced image when performing single image super-resolution and sharpness enhancement. With this simple but very effective approach, we are able to produce state-of-the-art results. The reconstructed high resolution images or the enhanced images are sharp while have rare ringing or jaggy artifacts.


Subject(s)
Algorithms , Artifacts , Image Enhancement/methods , Image Interpretation, Computer-Assisted/methods , Pattern Recognition, Automated/methods , Reproducibility of Results , Sensitivity and Specificity
5.
IEEE Trans Pattern Anal Mach Intell ; 28(8): 1341-6, 2006 Aug.
Article in English | MEDLINE | ID: mdl-16886868

ABSTRACT

In documents, tables are important structured objects that present statistical and relational information. In this paper, we present a robust system which is capable of detecting tables from free style online ink notes and extracting their structure so that they can be further edited in multiple ways. First, the primative structure of tables, i.e., candidates for ruling lines and table bounding boxes, are detected among drawing strokes. Second, the logical structure of tables is determined by normalizing the table skeletons, identifying the skeleton structure, and extracting the cell contents. The detection process is similar to a decision tree so that invalid candidates can be ruled out quickly. Experimental results suggest that our system is robust and accurate in dealing with tables having complex structure or drawn under complex situations.


Subject(s)
Artificial Intelligence , Computer Graphics , Documentation/methods , Electronic Data Processing/methods , Handwriting , Image Interpretation, Computer-Assisted/methods , Pattern Recognition, Automated/methods , Algorithms , Image Enhancement/methods , Information Storage and Retrieval/methods , Online Systems , Statistics as Topic , User-Computer Interface
6.
IEEE Trans Pattern Anal Mach Intell ; 28(7): 1150-63, 2006 Jul.
Article in English | MEDLINE | ID: mdl-16792103

ABSTRACT

Video stabilization is an important video enhancement technology which aims at removing annoying shaky motion from videos. We propose a practical and robust approach of video stabilization that produces full-frame stabilized videos with good visual quality. While most previous methods end up with producing smaller size stabilized videos, our completion method can produce full-frame videos by naturally filling in missing image parts by locally aligning image data of neighboring frames. To achieve this, motion inpainting is proposed to enforce spatial and temporal consistency of the completion in both static and dynamic image areas. In addition, image quality in the stabilized video is enhanced with a new practical deblurring algorithm. Instead of estimating point spread functions, our method transfers and interpolates sharper image pixels of neighboring frames to increase the sharpness of the frame. The proposed video completion and deblurring methods enabled us to develop a complete video stabilizer which can naturally keep the original image quality in the stabilized videos. The effectiveness of our method is confirmed by extensive experiments over a wide variety of videos.


Subject(s)
Algorithms , Image Enhancement/methods , Image Interpretation, Computer-Assisted/methods , Information Storage and Retrieval/methods , Paintings , Photography/methods , Video Recording/methods , Artifacts , Computer Graphics , Motion , Numerical Analysis, Computer-Assisted , Reproducibility of Results , Sensitivity and Specificity , Signal Processing, Computer-Assisted , Subtraction Technique
7.
IEEE Trans Vis Comput Graph ; 12(1): 48-60, 2006.
Article in English | MEDLINE | ID: mdl-16382607

ABSTRACT

Expression mapping (also called performance driven animation) has been a popular method for generating facial animations. A shortcoming of this method is that it does not generate expression details such as the wrinkles due to skin deformations. In this paper, we provide a solution to this problem. We have developed a geometry-driven facial expression synthesis system. Given feature point positions (the geometry) of a facial expression, our system automatically synthesizes a corresponding expression image that includes photorealistic and natural looking expression details. Due to the difficulty of point tracking, the number of feature points required by the synthesis system is, in general, more than what is directly available from a performance sequence. We have developed a technique to infer the missing feature point motions from the tracked subset by using an example-based approach. Another application of our system is expression editing where the user drags feature points while the system interactively generates facial expressions with skin deformation details.


Subject(s)
Computer Graphics , Face/anatomy & histology , Facial Expression , Image Enhancement/methods , Image Interpretation, Computer-Assisted/methods , Photography/methods , User-Computer Interface , Algorithms , Humans , Imaging, Three-Dimensional/methods , Pattern Recognition, Automated/methods , Video Recording/methods
8.
IEEE Trans Vis Comput Graph ; 11(5): 519-28, 2005.
Article in English | MEDLINE | ID: mdl-16144249

ABSTRACT

We present a system for decorating arbitrary surfaces with bidirectional texture functions (BTF). Our system generates BTFs in two steps. First, we automatically synthesize a BTF over the target surface from a given BTF sample. Then, we let the user interactively paint BTF patches onto the surface such that the painted patches seamlessly integrate with the background patterns. Our system is based on a patch-based texture synthesis approach known as quilting. We present a graphcut algorithm for BTF synthesis on surfaces and the algorithm works well for a wide variety of BTF samples, including those which present problems for existing algorithms. We also describe a graphcut texture painting algorithm for creating new surface imperfections (e.g., dirt, cracks, scratches) from existing imperfections found in input BTF samples. Using these algorithms, we can decorate surfaces with real-world textures that have spatially-variant reflectance, fine-scale geometry details, and surfaces imperfections. A particularly attractive feature of BTF painting is that it allows us to capture imperfections of real materials and paint them onto geometry models. We demonstrate the effectiveness of our system with examples.


Subject(s)
Algorithms , Computer Graphics , Image Enhancement/methods , Image Interpretation, Computer-Assisted/methods , Imaging, Three-Dimensional/methods , Paintings , Information Storage and Retrieval/methods , Surface Properties
9.
IEEE Trans Pattern Anal Mach Intell ; 27(2): 271-7, 2005 Feb.
Article in English | MEDLINE | ID: mdl-15688564

ABSTRACT

This paper presents a novel and simple method of analyzing the motion of a large image sequence captured by a calibrated outward-looking video camera moving on a circular trajectory for large-scale environment applications. Previous circular motion algorithms mainly focus on inward-looking turntable-like setups. They are not suitable for outward-looking motion where the conic trajectory of corresponding points degenerates to straight lines. The circular motion of a calibrated camera essentially has only one unknown rotation angle for each frame. The motion recovery for the entire sequence computes only one fundamental matrix of a pair of frames to extract the angular motion of the pair using Laguerre's formula and then propagates the computation of the unknown rotation angles to the other frames by tracking one point over at least three frames. Finally, a maximum-likelihood estimation is developed for the optimization of the whole sequence. Extensive experiments demonstrate the validity of the method and the feasibility of the application in image-based rendering.


Subject(s)
Algorithms , Artificial Intelligence , Image Enhancement/methods , Image Interpretation, Computer-Assisted/methods , Information Storage and Retrieval/methods , Pattern Recognition, Automated/methods , Photography/methods , Video Recording/methods , Cluster Analysis , Motion , Reproducibility of Results , Sensitivity and Specificity , Subtraction Technique
10.
IEEE Trans Vis Comput Graph ; 11(1): 25-34, 2005.
Article in English | MEDLINE | ID: mdl-15631126

ABSTRACT

We present a 2D feature-based technique for morphing 3D objects represented by light fields. Existing light field morphing methods require the user to specify corresponding 3D feature elements to guide morph computation. Since slight errors in 3D specification can lead to significant morphing artifacts, we propose a scheme based on 2D feature elements that is less sensitive to imprecise marking of features. First, 2D features are specified by the user in a number of key views in the source and target light fields. Then the two light fields are warped view by view as guided by the corresponding 2D features. Finally, the two warped light fields are blended together to yield the desired light field morph. Two key issues in light field morphing are feature specification and warping of light field rays. For feature specification, we introduce a user interface for delineating 2D features in key views of a light field, which are automatically interpolated to other views. For ray warping, we describe a 2D technique that accounts for visibility changes and present a comparison to the ideal morphing of light fields. Light field morphing based on 2D features makes it simple to incorporate previous image morphing techniques such as nonuniform blending, as well as to morph between an image and a light field.


Subject(s)
Algorithms , Computer Graphics , Image Enhancement/methods , Image Interpretation, Computer-Assisted/methods , Imaging, Three-Dimensional/methods , Pattern Recognition, Automated/methods , Signal Processing, Computer-Assisted , Computer Simulation , Metamorphosis, Biological , Online Systems
11.
IEEE Trans Pattern Anal Mach Intell ; 26(2): 275-80, 2004 Feb.
Article in English | MEDLINE | ID: mdl-15376903

ABSTRACT

Self-calibration using pure rotation is a well-known technique and has been shown to be a reliable means for recovering intrinsic camera parameters. However, in practice, it is virtually impossible to ensure that the camera motion for this type of self-calibration is a pure rotation. In this paper, we present an error analysis of recovered intrinsic camera parameters due to the presence of translation. We derived closed-form error expressions for a single pair of images with nondegenerate motion; for multiple rotations for which there are no closed-form solutions, analysis was done through repeated experiments. Among others, we show that translation-independent solutions do exist under certain practical conditions. Our analysis can be used to help choose the least error-prone approach (if multiple approaches exist) for a given set of conditions.


Subject(s)
Algorithms , Image Enhancement/methods , Image Interpretation, Computer-Assisted/methods , Imaging, Three-Dimensional/methods , Photography/methods , Subtraction Technique , Calibration/standards , Data Interpretation, Statistical , Image Enhancement/instrumentation , Image Enhancement/standards , Image Interpretation, Computer-Assisted/standards , Imaging, Three-Dimensional/instrumentation , Imaging, Three-Dimensional/standards , Information Storage and Retrieval/methods , Information Storage and Retrieval/standards , Pattern Recognition, Automated , Photography/standards , Reproducibility of Results , Rotation , Sensitivity and Specificity , Signal Processing, Computer-Assisted
12.
IEEE Trans Pattern Anal Mach Intell ; 26(1): 45-62, 2004 Jan.
Article in English | MEDLINE | ID: mdl-15382685

ABSTRACT

A new approach to computing a panoramic (360 degrees) depth map is presented in this paper. Our approach uses a large collection of images taken by a camera whose motion has been constrained to planar concentric circles. We resample regular perspective images to produce a set of multiperspective panoramas and then compute depth maps directly from these resampled panoramas. Our panoramas sample uniformly in three dimensions: rotation angle, inverse radial distance, and vertical elevation. The use of multiperspective panoramas eliminates the limited overlap present in the original input images and, thus, problems as in conventional multibaseline stereo can be avoided. Our approach differs from stereo matching of single-perspective panoramic images taken from different locations, where the epipolar constraints are sine curves. For our multiperspective panoramas, the epipolar geometry, to the first order approximation, consists of horizontal lines. Therefore, any traditional stereo algorithm can be applied to multiperspective panoramas with little modification. In this paper, we describe two reconstruction algorithms. The first is a cylinder sweep algorithm that uses a small number of resampled multiperspective panoramas to obtain dense 3D reconstruction. The second algorithm, in contrast, uses a large number of multiperspective panoramas and takes advantage of the approximate horizontal epipolar geometry inherent in multiperspective panoramas. It comprises a novel and efficient 1D multibaseline matching technique, followed by tensor voting to extract the depth surface. Experiments show that our algorithms are capable of producing comparable high quality depth maps which can be used for applications such as view interpolation.


Subject(s)
Algorithms , Image Interpretation, Computer-Assisted/methods , Imaging, Three-Dimensional/methods , Pattern Recognition, Automated , Photogrammetry/methods , Signal Processing, Computer-Assisted , Subtraction Technique , Artificial Intelligence , Computer Graphics , Computer Simulation , Depth Perception , Image Enhancement/methods , Information Storage and Retrieval/methods , Numerical Analysis, Computer-Assisted , Reproducibility of Results , Sensitivity and Specificity
13.
IEEE Trans Pattern Anal Mach Intell ; 26(1): 83-97, 2004 Jan.
Article in English | MEDLINE | ID: mdl-15382688

ABSTRACT

Superresolution is a technique that can produce images of a higher resolution than that of the originally captured ones. Nevertheless, improvement in resolution using such a technique is very limited in practice. This makes it significant to study the problem: "Do fundamental limits exist for superresolution?" In this paper, we focus on a major class of superresolution algorithms, called the reconstruction-based algorithms, which compute high-resolution images by simulating the image formation process. Assuming local translation among low-resolution images, this paper is the first attempt to determine the explicit limits of reconstruction-based algorithms, under both real and synthetic conditions. Based on the perturbation theory of linear systems, we obtain the superresolution limits from the conditioning analysis of the coefficient matrix. Moreover, we determine the number of low-resolution images that are sufficient to achieve the limit. Both real and synthetic experiments are carried out to verify our analysis.


Subject(s)
Algorithms , Image Enhancement/methods , Image Interpretation, Computer-Assisted/methods , Information Storage and Retrieval/methods , Pattern Recognition, Automated , Signal Processing, Computer-Assisted , Subtraction Technique , Artificial Intelligence , Computer Graphics , Computer Simulation , Imaging, Three-Dimensional/methods , Linear Models , Numerical Analysis, Computer-Assisted , Reproducibility of Results , Sensitivity and Specificity
14.
IEEE Trans Vis Comput Graph ; 10(3): 278-89, 2004.
Article in English | MEDLINE | ID: mdl-18579959

ABSTRACT

The bidirectional texture function (BTF) is a 6D function that describes the appearance of a real-world surface as a function of lighting and viewing directions. The BTF can model the fine-scale shadows, occlusions, and specularities caused by surface mesostructures. In this paper, we present algorithms for efficient synthesis of BTFs on arbitrary surfaces and for hardware-accelerated rendering. For both synthesis and rendering, a main challenge is handling the large amount of data in a BTF sample. To addresses this challenge, we approximate the BTF sample by a small number of 4D point appearance functions (PAFs) multiplied by 2D geometry maps. The geometry maps and PAFs lead to efficient synthesis and fast rendering of BTFs on arbitrary surfaces. For synthesis, a surface BTF can be generated by applying a texton-based sysnthesis algorithm to a small set of 2D geometry maps while leaving the companion 4D PAFs untouched. As for rendering, a surface BTF synthesized using geometry maps is well-suited for leveraging the programmable vertex and pixel shaders on the graphics hardware. We present a real-time BTF rendering algorithm that runs at the speed of about 30 frames/second on a mid-level PC with an ATI Radeon 8500 graphics card. We demonstrate the effectiveness of our synthesis and rendering algorithms using both real and synthetic BTF samples.


Subject(s)
Algorithms , Computer Graphics , Image Enhancement/methods , Image Interpretation, Computer-Assisted/methods , Imaging, Three-Dimensional/methods , User-Computer Interface , Reproducibility of Results , Sensitivity and Specificity
SELECTION OF CITATIONS
SEARCH DETAIL
...