Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 20
Filter
1.
Sci Adv ; 10(2): eadj3608, 2024 Jan 12.
Article in English | MEDLINE | ID: mdl-38198551

ABSTRACT

Embedded sensors in smart devices pose privacy risks, often unintentionally leaking user information. We investigate how combining an ambient light sensor with a device display can capture an image of touch interaction without a camera. By displaying a known video sequence, we use the light sensor to capture reflected light intensity variations partially blocked by the touching hand, formulating an inverse problem similar to single-pixel imaging. Because of the sensors' heavy quantization and low sensitivity, we propose an inversion algorithm involving an ℓp-norm dequantizer and a deep denoiser as natural image priors, to reconstruct images from the screen's perspective. We demonstrate touch interactions and eavesdropping hand gestures on an off-the-shelf Android tablet. Despite limitations in resolution and speed, we aim to raise awareness of potential security/privacy threats induced by the combination of passive and active components in smart devices and promote the development of ways to mitigate them.

2.
IEEE Trans Pattern Anal Mach Intell ; 43(12): 4229-4241, 2021 Dec.
Article in English | MEDLINE | ID: mdl-32078534

ABSTRACT

We present a method for predicting dense depth in scenarios where both a monocular camera and people in the scene are freely moving (right). Existing methods for recovering depth for dynamic, non-rigid objects from monocular video impose strong assumptions on the objects' motion and may only recover sparse depth. In this paper, we take a data-driven approach and learn human depth priors from a new source of data: thousands of Internet videos of people imitating mannequins, i.e., freezing in diverse, natural poses, while a hand-held camera tours the scene (left). Because people are stationary, geometric constraints hold, thus training data can be generated using multi-view stereo reconstruction. At inference time, our method uses motion parallax cues from the static areas of the scenes to guide the depth prediction. We evaluate our method on real-world sequences of complex human actions captured by a moving hand-held camera, show improvement over state-of-the-art monocular depth prediction methods, and demonstrate various 3D effects produced using our predicted depth.


Subject(s)
Algorithms , Cues , Freezing , Humans , Motion
3.
IEEE Trans Pattern Anal Mach Intell ; 41(9): 2236-2250, 2019 09.
Article in English | MEDLINE | ID: mdl-30004870

ABSTRACT

We study the problem of synthesizing a number of likely future frames from a single input image. In contrast to traditional methods that have tackled this problem in a deterministic or non-parametric way, we propose to model future frames in a probabilistic manner. Our probabilistic model makes it possible for us to sample and synthesize many possible future frames from a single input image. To synthesize realistic movement of objects, we propose a novel network structure, namely a Cross Convolutional Network; this network encodes image and motion information as feature maps and convolutional kernels, respectively. In experiments, our model performs well on synthetic data, such as 2D shapes and animated game sprites, and on real-world video frames. We present analyses of the learned network representations, showing it is implicitly learning a compact encoding of object appearance and motion. We also demonstrate a few of its applications, including visual analogy-making and video extrapolation.

4.
Article in English | MEDLINE | ID: mdl-30136936

ABSTRACT

We present an algorithm for creating high resolution anatomically plausible images consistent with acquired clinical brain MRI scans with large inter-slice spacing. Although large data sets of clinical images contain a wealth of information, time constraints during acquisition result in sparse scans that fail to capture much of the anatomy. These characteristics often render computational analysis impractical as many image analysis algorithms tend to fail when applied to such images. Highly specialized algorithms that explicitly handle sparse slice spacing do not generalize well across problem domains. In contrast, we aim to enable application of existing algorithms that were originally developed for high resolution research scans to significantly undersampled scans. We introduce a generative model that captures fine-scale anatomical structure across subjects in clinical image collections and derive an algorithm for filling in the missing data in scans with large inter-slice spacing. Our experimental results demonstrate that the resulting method outperforms state-of-the-art upsampling super-resolution techniques, and promises to facilitate subsequent analysis not previously possible with scans of this quality. Our implementation is freely available at https://github.com/adalca/papago.

5.
IEEE Trans Pattern Anal Mach Intell ; 40(8): 1799-1813, 2018 08.
Article in English | MEDLINE | ID: mdl-28796608

ABSTRACT

We propose a novel method for template matching in unconstrained environments. Its essence is the Best-Buddies Similarity (BBS), a useful, robust, and parameter-free similarity measure between two sets of points. BBS is based on counting the number of Best-Buddies Pairs (BBPs)-pairs of points in source and target sets that are mutual nearest neighbours, i.e., each point is the nearest neighbour of the other. BBS has several key features that make it robust against complex geometric deformations and high levels of outliers, such as those arising from background clutter and occlusions. We study these properties, provide a statistical analysis that justifies them, and demonstrate the consistent success of BBS on a challenging real-world dataset while using different types of features.

6.
Proc Natl Acad Sci U S A ; 114(44): 11639-11644, 2017 10 31.
Article in English | MEDLINE | ID: mdl-29078275

ABSTRACT

Although the human visual system is remarkable at perceiving and interpreting motions, it has limited sensitivity, and we cannot see motions that are smaller than some threshold. Although difficult to visualize, tiny motions below this threshold are important and can reveal physical mechanisms, or be precursors to large motions in the case of mechanical failure. Here, we present a "motion microscope," a computational tool that quantifies tiny motions in videos and then visualizes them by producing a new video in which the motions are made large enough to see. Three scientific visualizations are shown, spanning macroscopic to nanoscopic length scales. They are the resonant vibrations of a bridge demonstrating simultaneous spatial and temporal modal analysis, micrometer vibrations of a metamaterial demonstrating wave propagation through an elastic matrix with embedded resonating units, and nanometer motions of an extracellular tissue found in the inner ear demonstrating a mechanism of frequency separation in hearing. In these instances, the motion microscope uncovers hidden dynamics over a variety of length scales, leading to the discovery of previously unknown phenomena.


Subject(s)
Image Processing, Computer-Assisted/methods , Microscopy/methods , Video Recording , Lasers , Motion
7.
Inf Process Med Imaging ; 10265: 659-671, 2017 Jun.
Article in English | MEDLINE | ID: mdl-29379264

ABSTRACT

We present an algorithm for creating high resolution anatomically plausible images consistent with acquired clinical brain MRI scans with large inter-slice spacing. Although large databases of clinical images contain a wealth of information, medical acquisition constraints result in sparse scans that miss much of the anatomy. These characteristics often render computational analysis impractical as standard processing algorithms tend to fail when applied to such images. Highly specialized or application-specific algorithms that explicitly handle sparse slice spacing do not generalize well across problem domains. In contrast, our goal is to enable application of existing algorithms that were originally developed for high resolution research scans to significantly undersampled scans. We introduce a model that captures fine-scale anatomical similarity across subjects in clinical image collections and use it to fill in the missing data in scans with large slice spacing. Our experimental results demonstrate that the proposed method outperforms current upsampling methods and promises to facilitate subsequent analysis not previously possible with scans of this quality.


Subject(s)
Algorithms , Brain/diagnostic imaging , Image Processing, Computer-Assisted , Magnetic Resonance Imaging , Humans , Image Enhancement , Image Interpretation, Computer-Assisted , Pattern Recognition, Automated , Reproducibility of Results , Sensitivity and Specificity
8.
IEEE Trans Pattern Anal Mach Intell ; 39(4): 732-745, 2017 04.
Article in English | MEDLINE | ID: mdl-27875214

ABSTRACT

The estimation of material properties is important for scene understanding, with many applications in vision, robotics, and structural engineering. This paper connects fundamentals of vibration mechanics with computer vision techniques in order to infer material properties from small, often imperceptible motions in video. Objects tend to vibrate in a set of preferred modes. The frequencies of these modes depend on the structure and material properties of an object. We show that by extracting these frequencies from video of a vibrating object, we can often make inferences about that object's material properties. We demonstrate our approach by estimating material properties for a variety of objects by observing their motion in high-speed and regular frame rate video.

10.
Sci Am ; 312(1): 46-51, 2015 Jan.
Article in English | MEDLINE | ID: mdl-25597109
11.
IEEE Trans Pattern Anal Mach Intell ; 34(4): 683-94, 2012 Apr.
Article in English | MEDLINE | ID: mdl-21844632

ABSTRACT

The restoration of a blurry or noisy image is commonly performed with a MAP estimator, which maximizes a posterior probability to reconstruct a clean image from a degraded image. A MAP estimator, when used with a sparse gradient image prior, reconstructs piecewise smooth images and typically removes textures that are important for visual realism. We present an alternative deconvolution method called iterative distribution reweighting (IDR) which imposes a global constraint on gradients so that a reconstructed image should have a gradient distribution similar to a reference distribution. In natural images, a reference distribution not only varies from one image to another, but also within an image depending on texture. We estimate a reference distribution directly from an input image for each texture segment. Our algorithm is able to restore rich mid-frequency textures. A large-scale user study supports the conclusion that our algorithm improves the visual realism of reconstructed images compared to those of MAP estimators.


Subject(s)
Algorithms , Vision, Ocular/physiology , Humans , Image Enhancement/methods , Image Processing, Computer-Assisted/methods
12.
IEEE Trans Pattern Anal Mach Intell ; 33(12): 2354-67, 2011 Dec.
Article in English | MEDLINE | ID: mdl-21788664

ABSTRACT

Blind deconvolution is the recovery of a sharp version of a blurred image when the blur kernel is unknown. Recent algorithms have afforded dramatic progress, yet many aspects of the problem remain challenging and hard to understand. The goal of this paper is to analyze and evaluate recent blind deconvolution algorithms both theoretically and experimentally. We explain the previously reported failure of the naive MAP approach by demonstrating that it mostly favors no-blur explanations. We show that, using reasonable image priors, a naive simulations MAP estimation of both latent image and blur kernel is guaranteed to fail even with infinitely large images sampled from the prior. On the other hand, we show that since the kernel size is often smaller than the image size, a MAP estimation of the kernel alone is well constrained and is guaranteed to succeed to recover the true blur. The plethora of recent deconvolution techniques makes an experimental evaluation on ground-truth data important. As a first step toward this experimental evaluation, we have collected blur data with ground truth and compared recent algorithms under equal settings. Additionally, our data demonstrate that the shift-invariant blur assumption made by most algorithms is often violated.

13.
IEEE Trans Vis Comput Graph ; 17(9): 1273-85, 2011 Sep.
Article in English | MEDLINE | ID: mdl-21041875

ABSTRACT

Computer-generated (CG) images have achieved high levels of realism. This realism, however, comes at the cost of long and expensive manual modeling, and often humans can still distinguish between CG and real images. We introduce a new data-driven approach for rendering realistic imagery that uses a large collection of photographs gathered from online repositories. Given a CG image, we retrieve a small number of real images with similar global structure. We identify corresponding regions between the CG and real images using a mean-shift cosegmentation algorithm. The user can then automatically transfer color, tone, and texture from matching regions to the CG image. Our system only uses image processing operations and does not require a 3D model of the scene, making it fast and easy to integrate into digital content creation workflows. Results of a user study show that our hybrid images appear more realistic than the originals.

14.
IEEE Trans Pattern Anal Mach Intell ; 32(8): 1489-501, 2010 Aug.
Article in English | MEDLINE | ID: mdl-20558879

ABSTRACT

The patch transform represents an image as a bag of overlapping patches sampled on a regular grid. This representation allows users to manipulate images in the patch domain, which then seeds the inverse patch transform to synthesize modified images. Possible modifications include the spatial locations of patches, the size of the output image, or the pool of patches from which an image is reconstructed. When no modifications are made, the inverse patch transform reduces to solving a jigsaw puzzle. The inverse patch transform is posed as a patch assignment problem on a Markov random field (MRF), where each patch should be used only once and neighboring patches should fit to form a plausible image. We find an approximate solution to the MRF using loopy belief propagation, introducing an approximation that encourages the solution to use each patch only once. The image reconstruction algorithm scales well with the total number of patches through label pruning. In addition, structural misalignment artifacts are suppressed through a patch jittering scheme that spatially jitters the assigned patches. We demonstrate the patch transform and its effectiveness on natural images.


Subject(s)
Artificial Intelligence , Image Processing, Computer-Assisted/methods , Models, Theoretical , Pattern Recognition, Automated , Algorithms , Humans , Markov Chains
15.
IEEE Trans Pattern Anal Mach Intell ; 30(11): 1958-70, 2008 Nov.
Article in English | MEDLINE | ID: mdl-18787244

ABSTRACT

With the advent of the Internet, billions of images are now freely available online and constitute a dense sampling of the visual world. Using a variety of non-parametric methods, we explore this world with the aid of a large dataset of 79,302,017 images collected from the Internet. Motivated by psychophysical results showing the remarkable tolerance of the human visual system to degradations in image resolution, the images in the dataset are stored as 32 x 32 color images. Each image is loosely labeled with one of the 75,062 non-abstract nouns in English, as listed in the Wordnet lexical database. Hence the image database gives a comprehensive coverage of all object categories and scenes. The semantic information from Wordnet can be used in conjunction with nearest-neighbor methods to perform object classification over a range of semantic levels minimizing the effects of labeling noise. For certain classes that are particularly prevalent in the dataset, such as people, we are able to demonstrate a recognition performance comparable to class-specific Viola-Jones style detectors.


Subject(s)
Database Management Systems , Databases, Factual , Documentation/methods , Image Interpretation, Computer-Assisted/methods , Information Storage and Retrieval/methods , Internet , Pattern Recognition, Automated/methods , Artificial Intelligence , Image Enhancement/methods
16.
IEEE Trans Pattern Anal Mach Intell ; 30(2): 299-314, 2008 Feb.
Article in English | MEDLINE | ID: mdl-18084060

ABSTRACT

Image denoising algorithms often assume an additive white Gaussian noise (AWGN) process that is independent of the actual RGB values. Such approaches are not fully automatic and cannot effectively remove color noise produced by todays CCD digital camera. In this paper, we propose a unified framework for two tasks: automatic estimation and removal of color noise from a single image using piecewise smooth image models. We introduce the noise level function (NLF), which is a continuous function describing the noise level as a function of image brightness. We then estimate an upper bound of the real noise level function by fitting a lower envelope to the standard deviations of per-segment image variances. For denoising, the chrominance of color noise is significantly removed by projecting pixel values onto a line fit to the RGB values in each segment. Then, a Gaussian conditional random field (GCRF) is constructed to obtain the underlying clean image from the noisy input. Extensive experiments are conducted to test the proposed algorithm, which is shown to outperform state-of-the-art denoising algorithms.

17.
IEEE Trans Pattern Anal Mach Intell ; 29(5): 854-69, 2007 May.
Article in English | MEDLINE | ID: mdl-17356204

ABSTRACT

We consider the problem of detecting a large number of different classes of objects in cluttered scenes. Traditional approaches require applying a battery of different classifiers to the image, at multiple locations and scales. This can be slow and can require a lot of training data since each classifier requires the computation of many different image features. In particular, for independently trained detectors, the (runtime) computational complexity and the (training-time) sample complexity scale linearly with the number of classes to be detected. We present a multitask learning procedure, based on boosted decision stumps, that reduces the computational and sample complexity by finding common features that can be shared across the classes (and/or views). The detectors for each class are trained jointly, rather than independently. For a given performance level, the total number of features required and, therefore, the runtime cost of the classifier, is observed to scale approximately logarithmically with the number of classes. The features selected by joint training are generic edge-like features, whereas the features chosen by training each class separately tend to be more object-specific. The generic features generalize better and considerably reduce the computational cost of multiclass object detection.


Subject(s)
Algorithms , Artificial Intelligence , Cluster Analysis , Image Interpretation, Computer-Assisted/methods , Information Storage and Retrieval/methods , Pattern Recognition, Automated/methods , Subtraction Technique , Image Enhancement/methods , Reproducibility of Results , Sensitivity and Specificity
18.
IEEE Comput Graph Appl ; 27(2): 43-52, 2007.
Article in English | MEDLINE | ID: mdl-17388202

ABSTRACT

Defocus matting is a fully automatic and passive method for pulling mattes from video captured with coaxial cameras that have different depths of field and planes of focus. Nonparametric sampling can accelerate the video-matting process from minutes to seconds per frame. In addition a super-resolution technique efficiently bridges the gap between mattes from high-resolution video cameras and those from low-resolution cameras. Off-center matting pulls mattes for an external high-resolution camera that doesn't share the same center of projection as the low-resolution cameras used to capture the defocus matting data.


Subject(s)
Algorithms , Computer Graphics , Image Enhancement/methods , Image Interpretation, Computer-Assisted/methods , Photography/methods , Signal Processing, Computer-Assisted , User-Computer Interface , Imaging, Three-Dimensional/methods
19.
J Vis ; 6(11): 1267-81, 2006 Nov 06.
Article in English | MEDLINE | ID: mdl-17209734

ABSTRACT

Vision is difficult because images are ambiguous about the structure of the world. For object color, the ambiguity arises because the same object reflects a different spectrum to the eye under different illuminations. Human vision typically does a good job of resolving this ambiguity-an ability known as color constancy. The past 20 years have seen an explosion of work on color constancy, with advances in both experimental methods and computational algorithms. Here, we connect these two lines of research by developing a quantitative model of human color constancy. The model includes an explicit link between psychophysical data and illuminant estimates obtained via a Bayesian algorithm. The model is fit to the data through a parameterization of the prior distribution of illuminant spectral properties. The fit to the data is good, and the derived prior provides a succinct description of human performance.


Subject(s)
Bayes Theorem , Color Perception/physiology , Color , Models, Biological , Algorithms , Humans , Light , Psychophysics
20.
IEEE Trans Pattern Anal Mach Intell ; 27(9): 1459-72, 2005 Sep.
Article in English | MEDLINE | ID: mdl-16173188

ABSTRACT

Interpreting real-world images requires the ability distinguish the different characteristics of the scene that lead to its final appearance. Two of the most important of these characteristics are the shading and reflectance of each point in the scene. We present an algorithm that uses multiple cues to recover shading and reflectance intrinsic images from a single image. Using both color information and a classifier trained to recognize gray-scale patterns, given the lighting direction, each image derivative is classified as being caused by shading or a change in the surface's reflectance. The classifiers gather local evidence about the surface's form and color, which is then propagated using the Generalized Belief Propagation algorithm. The propagation step disambiguates areas of the image where the correct classification is not clear from local evidence. We use real-world images to demonstrate results and show how each component of the system affects the results.


Subject(s)
Algorithms , Artificial Intelligence , Colorimetry/methods , Image Enhancement/methods , Image Interpretation, Computer-Assisted/methods , Imaging, Three-Dimensional/methods , Pattern Recognition, Automated/methods , Computer Graphics , Information Storage and Retrieval/methods , Numerical Analysis, Computer-Assisted , Signal Processing, Computer-Assisted
SELECTION OF CITATIONS
SEARCH DETAIL
...