Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 19 de 19
Filter
2.
IEEE Trans Pattern Anal Mach Intell ; 45(12): 14575-14589, 2023 Dec.
Article in English | MEDLINE | ID: mdl-37725725

ABSTRACT

We propose a scheme for supervised image classification that uses privileged information, in the form of keypoint annotations for the training data, to learn strong models from small and/or biased training sets. Our main motivation is the recognition of animal species for ecological applications such as biodiversity modelling, which is challenging because of long-tailed species distributions due to rare species, and strong dataset biases such as repetitive scene background in camera traps. To counteract these challenges, we propose a visual attention mechanism that is supervised via keypoint annotations that highlight important object parts. This privileged information, implemented as a novel privileged pooling operation, is only required during training and helps the model to focus on regions that are discriminative. In experiments with three different animal species datasets, we show that deep networks with privileged pooling can use small training sets more efficiently and generalize better.

3.
Nat Ecol Evol ; 7(11): 1778-1789, 2023 Nov.
Article in English | MEDLINE | ID: mdl-37770546

ABSTRACT

The worldwide variation in vegetation height is fundamental to the global carbon cycle and central to the functioning of ecosystems and their biodiversity. Geospatially explicit and, ideally, highly resolved information is required to manage terrestrial ecosystems, mitigate climate change and prevent biodiversity loss. Here we present a comprehensive global canopy height map at 10 m ground sampling distance for the year 2020. We have developed a probabilistic deep learning model that fuses sparse height data from the Global Ecosystem Dynamics Investigation (GEDI) space-borne LiDAR mission with dense optical satellite images from Sentinel-2. This model retrieves canopy-top height from Sentinel-2 images anywhere on Earth and quantifies the uncertainty in these estimates. Our approach improves the retrieval of tall canopies with typically high carbon stocks. According to our map, only 5% of the global landmass is covered by trees taller than 30 m. Further, we find that only 34% of these tall canopies are located within protected areas. Thus, the approach can serve ongoing efforts in forest conservation and has the potential to foster advances in climate, carbon and biodiversity modelling.


Subject(s)
Ecosystem , Forests , Trees , Biodiversity , Carbon
4.
Nat Food ; 4(5): 384-393, 2023 05.
Article in English | MEDLINE | ID: mdl-37225908

ABSTRACT

Côte d'Ivoire and Ghana, the world's largest producers of cocoa, account for two thirds of the global cocoa production. In both countries, cocoa is the primary perennial crop, providing income to almost two million farmers. Yet precise maps of the area planted with cocoa are missing, hindering accurate quantification of expansion in protected areas, production and yields and limiting information available for improved sustainability governance. Here we combine cocoa plantation data with publicly available satellite imagery in a deep learning framework and create high-resolution maps of cocoa plantations for both countries, validated in situ. Our results suggest that cocoa cultivation is an underlying driver of over 37% of forest loss in protected areas in Côte d'Ivoire and over 13% in Ghana, and that official reports substantially underestimate the planted area (up to 40% in Ghana). These maps serve as a crucial building block to advance our understanding of conservation and economic development in cocoa-producing regions.


Subject(s)
Cacao , Chocolate , Cote d'Ivoire , Ghana , Conservation of Natural Resources
5.
Sci Rep ; 12(1): 20085, 2022 11 22.
Article in English | MEDLINE | ID: mdl-36418443

ABSTRACT

Fine-grained population maps are needed in several domains, like urban planning, environmental monitoring, public health, and humanitarian operations. Unfortunately, in many countries only aggregate census counts over large spatial units are collected, moreover, these are not always up-to-date. We present POMELO, a deep learning model that employs coarse census counts and open geodata to estimate fine-grained population maps with [Formula: see text]m ground sampling distance. Moreover, the model can also estimate population numbers when no census counts at all are available, by generalizing across countries. In a series of experiments for several countries in sub-Saharan Africa, the maps produced with POMELO are in good agreement with the most detailed available reference counts: disaggregation of coarse census counts reaches [Formula: see text] values of 85-89%; unconstrained prediction in the absence of any counts reaches 48-69%.


Subject(s)
Censuses , Environmental Monitoring
6.
IEEE Trans Pattern Anal Mach Intell ; 44(3): 1623-1637, 2022 03.
Article in English | MEDLINE | ID: mdl-32853149

ABSTRACT

The success of monocular depth estimation relies on large and diverse training sets. Due to the challenges associated with acquiring dense ground-truth depth across different environments at scale, a number of datasets with distinct characteristics and biases have emerged. We develop tools that enable mixing multiple datasets during training, even if their annotations are incompatible. In particular, we propose a robust training objective that is invariant to changes in depth range and scale, advocate the use of principled multi-objective learning to combine data from different sources, and highlight the importance of pretraining encoders on auxiliary tasks. Armed with these tools, we experiment with five diverse training datasets, including a new, massive data source: 3D films. To demonstrate the generalization power of our approach we use zero-shot cross-dataset transfer, i.e. we evaluate on datasets that were not seen during training. The experiments confirm that mixing data from complementary sources greatly improves monocular depth estimation. Our approach clearly outperforms competing methods across diverse datasets, setting a new state of the art for monocular depth estimation.


Subject(s)
Algorithms
7.
IEEE Trans Pattern Anal Mach Intell ; 44(8): 4081-4092, 2022 08.
Article in English | MEDLINE | ID: mdl-33687837

ABSTRACT

We propose a new STAckable Recurrent cell (STAR) for recurrent neural networks (RNNs), which has fewer parameters than widely used LSTM [16] and GRU [10] while being more robust against vanishing or exploding gradients. Stacking recurrent units into deep architectures suffers from two major limitations: (i) many recurrent cells (e.g., LSTMs) are costly in terms of parameters and computation resources; and (ii) deep RNNs are prone to vanishing or exploding gradients during training. We investigate the training of multi-layer RNNs and examine the magnitude of the gradients as they propagate through the network in the "vertical" direction. We show that, depending on the structure of the basic recurrent unit, the gradients are systematically attenuated or amplified. Based on our analysis we design a new type of gated cell that better preserves gradient magnitude. We validate our design on a large number of sequence modelling tasks and demonstrate that the proposed STAR cell allows to build and train deeper recurrent architectures, ultimately leading to improved performance while being computationally more efficient.


Subject(s)
Algorithms , Neural Networks, Computer
8.
IEEE Trans Pattern Anal Mach Intell ; 38(10): 2054-68, 2016 10.
Article in English | MEDLINE | ID: mdl-26660703

ABSTRACT

The task of tracking multiple targets is often addressed with the so-called tracking-by-detection paradigm, where the first step is to obtain a set of target hypotheses for each frame independently. Tracking can then be regarded as solving two separate, but tightly coupled problems. The first is to carry out data association, i.e., to determine the origin of each of the available observations. The second problem is to reconstruct the actual trajectories that describe the spatio-temporal motion pattern of each individual target. The former is inherently a discrete problem, while the latter should intuitively be modeled in continuous space. Having to deal with an unknown number of targets, complex dependencies, and physical constraints, both are challenging tasks on their own and thus most previous work focuses on one of these subproblems. Here, we present a multi-target tracking approach that explicitly models both tasks as minimization of a unified discrete-continuous energy function. Trajectory properties are captured through global label costs, a recent concept from multi-model fitting, which we introduce to tracking. Specifically, label costs describe physical properties of individual tracks, e.g., linear and angular dynamics, or entry and exit points. We further introduce pairwise label costs to describe mutual interactions between targets in order to avoid collisions. By choosing appropriate forms for the individual energy components, powerful discrete optimization techniques can be leveraged to address data association, while the shapes of individual trajectories are updated by gradient-based continuous energy minimization. The proposed method achieves state-of-the-art results on diverse benchmark sequences.

9.
IEEE Trans Image Process ; 23(10): 4601-10, 2014 Oct.
Article in English | MEDLINE | ID: mdl-25122570

ABSTRACT

Recent works on multimodel fitting are often formulated as an energy minimization task, where the energy function includes fitting error and regularization terms, such as low-level spatial smoothness and model complexity. In this paper, we introduce a novel energy with high-level geometric priors that consider interactions between geometric models, such that certain preferred model configurations may be induced.We argue that in many applications, such prior geometric properties are available and should be fruitfully exploited. For example, in surface fitting to point clouds, the building walls are usually either orthogonal or parallel to each other. Our proposed energy function is useful in dealing with unknown distributions of data errors and outliers, which are often the factors leading to biased estimation. Furthermore, the energy can be efficiently minimized using the expansion move method. We evaluate the performance on several vision applications using real data sets. Experimental results show that our method outperforms the state-of-the-art methods without significant increase in computation.

10.
IEEE Trans Pattern Anal Mach Intell ; 36(1): 58-72, 2014 Jan.
Article in English | MEDLINE | ID: mdl-24231866

ABSTRACT

Many recent advances in multiple target tracking aim at finding a (nearly) optimal set of trajectories within a temporal window. To handle the large space of possible trajectory hypotheses, it is typically reduced to a finite set by some form of data-driven or regular discretization. In this work, we propose an alternative formulation of multitarget tracking as minimization of a continuous energy. Contrary to recent approaches, we focus on designing an energy that corresponds to a more complete representation of the problem, rather than one that is amenable to global optimization. Besides the image evidence, the energy function takes into account physical constraints, such as target dynamics, mutual exclusion, and track persistence. In addition, partial image evidence is handled with explicit occlusion reasoning, and different targets are disambiguated with an appearance model. To nevertheless find strong local minima of the proposed nonconvex energy, we construct a suitable optimization scheme that alternates between continuous conjugate gradient descent and discrete transdimensional jump moves. These moves, which are executed such that they always reduce the energy, allow the search to escape weak minima and explore a much larger portion of the search space of varying dimensionality. We demonstrate the validity of our approach with an extensive quantitative evaluation on several public data sets.

11.
IEEE Trans Pattern Anal Mach Intell ; 35(11): 2608-23, 2013 Nov.
Article in English | MEDLINE | ID: mdl-24051723

ABSTRACT

Geometric 3D reasoning at the level of objects has received renewed attention recently in the context of visual scene understanding. The level of geometric detail, however, is typically limited to qualitative representations or coarse boxes. This is linked to the fact that today's object class detectors are tuned toward robust 2D matching rather than accurate 3D geometry, encouraged by bounding-box-based benchmarks such as Pascal VOC. In this paper, we revisit ideas from the early days of computer vision, namely, detailed, 3D geometric object class representations for recognition. These representations can recover geometrically far more accurate object hypotheses than just bounding boxes, including continuous estimates of object pose and 3D wireframes with relative 3D positions of object parts. In combination with robust techniques for shape description and inference, we outperform state-of-the-art results in monocular 3D pose estimation. In a series of experiments, we analyze our approach in detail and demonstrate novel applications enabled by such an object class representation, such as fine-grained categorization of cars and bicycles, according to their 3D geometry, and ultrawide baseline matching.


Subject(s)
Algorithms , Artificial Intelligence , Image Interpretation, Computer-Assisted/methods , Imaging, Three-Dimensional/methods , Models, Theoretical , Pattern Recognition, Automated/methods , Photography/methods , Computer Simulation
12.
IEEE Trans Pattern Anal Mach Intell ; 35(4): 882-97, 2013 Apr.
Article in English | MEDLINE | ID: mdl-22889818

ABSTRACT

Following recent advances in detection, context modeling, and tracking, scene understanding has been the focus of renewed interest in computer vision research. This paper presents a novel probabilistic 3D scene model that integrates state-of-the-art multiclass object detection, object tracking and scene labeling together with geometric 3D reasoning. Our model is able to represent complex object interactions such as inter-object occlusion, physical exclusion between objects, and geometric context. Inference in this model allows us to jointly recover the 3D scene context and perform 3D multi-object tracking from a mobile observer, for objects of multiple categories, using only monocular video as input. Contrary to many other approaches, our system performs explicit occlusion reasoning and is therefore capable of tracking objects that are partially occluded for extended periods of time, or objects that have never been observed to their full extent. In addition, we show that a joint scene tracklet model for the evidence collected over multiple frames substantially improves performance. The approach is evaluated for different types of challenging onboard sequences. We first show a substantial improvement to the state of the art in 3D multipeople tracking. Moreover, a similar performance gain is achieved for multiclass 3D tracking of cars and trucks on a challenging dataset.


Subject(s)
Algorithms , Human Activities , Image Processing, Computer-Assisted/methods , Pattern Recognition, Automated/methods , Automobiles , Cluster Analysis , Databases, Factual , Humans , Models, Theoretical , Video Recording , Walking
13.
Neural Comput ; 24(7): 1806-21, 2012 Jul.
Article in English | MEDLINE | ID: mdl-22509972

ABSTRACT

Given the presence of massive feedback loops in brain networks, it is difficult to disentangle the contribution of feedforward and feedback processing to the recognition of visual stimuli, in this case, of emotional body expressions. The aim of the work presented in this letter is to shed light on how well feedforward processing explains rapid categorization of this important class of stimuli. By means of parametric masking, it may be possible to control the contribution of feedback activity in human participants. A close comparison is presented between human recognition performance and the performance of a computational neural model that exclusively modeled feedforward processing and was engineered to fulfill the computational requirements of recognition. Results show that the longer the stimulus onset asynchrony (SOA), the closer the performance of the human participants was to the values predicted by the model, with an optimum at an SOA of 100 ms. At short SOA latencies, human performance deteriorated, but the categorization of the emotional expressions was still above baseline. The data suggest that, although theoretically, feedback arising from inferotemporal cortex is likely to be blocked when the SOA is 100 ms, human participants still seem to rely on more local visual feedback processing to equal the model's performance.


Subject(s)
Brain/physiology , Computer Simulation , Emotions/physiology , Models, Neurological , Visual Perception/physiology , Female , Humans , Male , Young Adult
14.
IEEE Trans Pattern Anal Mach Intell ; 32(6): 1134-41, 2010 Jun.
Article in English | MEDLINE | ID: mdl-20431137

ABSTRACT

Multibody structure from motion (SfM) is the extension of classical SfM to dynamic scenes with multiple rigidly moving objects. Recent research has unveiled some of the mathematical foundations of the problem, but a practical algorithm which can handle realistic sequences is still missing. In this paper, we discuss the requirements for such an algorithm, highlight theoretical issues and practical problems, and describe how a static structure-from-motion framework needs to be extended to handle real dynamic scenes. Theoretical issues include different situations in which the number of independently moving scene objects changes: Moving objects can enter or leave the field of view, merge into the static background (e.g., when a car is parked), or split off from the background and start moving independently. Practical issues arise due to small freely moving foreground objects with few and short feature tracks. We argue that all of these difficulties need to be handled online as structure-from-motion estimation progresses, and present an exemplary solution using the framework of probabilistic model-scoring.

15.
IEEE Trans Pattern Anal Mach Intell ; 31(10): 1831-46, 2009 Oct.
Article in English | MEDLINE | ID: mdl-19696453

ABSTRACT

In this paper, we address the problem of multiperson tracking in busy pedestrian zones using a stereo rig mounted on a mobile platform. The complexity of the problem calls for an integrated solution that extracts as much visual information as possible and combines it through cognitive feedback cycles. We propose such an approach, which jointly estimates camera position, stereo depth, object detection, and tracking. The interplay between those components is represented by a graphical model. Since the model has to incorporate object-object interactions and temporal links to past frames, direct inference is intractable. We, therefore, propose a two-stage procedure: for each frame, we first solve a simplified version of the model (disregarding interactions and temporal continuity) to estimate the scene geometry and an overcomplete set of object detections. Conditioned on these results, we then address object interactions, tracking, and prediction in a second step. The approach is experimentally evaluated on several long and difficult video sequences from busy inner-city locations. Our results show that the proposed integration makes it possible to deliver robust tracking performance in scenes of realistic complexity.


Subject(s)
Image Processing, Computer-Assisted/methods , Locomotion/physiology , Models, Theoretical , Pattern Recognition, Automated/methods , Algorithms , Bayes Theorem , Computer Graphics , Feedback , Humans , Motion , Motor Vehicles , Reproducibility of Results , Video Recording
16.
IEEE Trans Pattern Anal Mach Intell ; 30(10): 1683-98, 2008 Oct.
Article in English | MEDLINE | ID: mdl-18703824

ABSTRACT

We present a novel approach for multi-object tracking which considers object detection and spacetime trajectory estimation as a coupled optimization problem. Our approach is formulated in a Minimum Description Length hypothesis selection framework, which allows it to recover from mismatches and temporarily lost tracks. Building upon a state-of-the-art object detector, it performs multiview/multicategory object recognition to detect cars and pedestrians in the input images. The 2D object detections are checked for their consistency with (automatically estimated) scene geometry and are converted to 3D observations which are accumulated in a world coordinate frame. A subsequent trajectory estimation module analyzes the resulting 3D observations to find phyically plausible spacetime trajectories. Tracking is achieved by performing model selection after every frame. At each time instant, our approach searches for the globally optimal set of spacetime trajectories which provides the best explanation for the current image and for all evidence collected so far, while satisfying the constraints that no two objects may occupy the same physical space, nor explain the same image pixels at any point in time. Successful trajectory hypotheses are then fed back to guide object detection in future frames. The optimization procedure is kept efficient throught incremental computation and conservative hypothesis pruning. We evaluate our approach on several challenging video sequences and demonstrate its performance on both a surveillance-type scenario and a scenario where the input videos are taken from inside a moving vehicle passing through crowded city areas.


Subject(s)
Algorithms , Artificial Intelligence , Image Interpretation, Computer-Assisted/methods , Imaging, Three-Dimensional/methods , Pattern Recognition, Automated/methods , Photography/methods , Image Enhancement/methods , Motion , Motor Vehicles , Reproducibility of Results , Sensitivity and Specificity
17.
Neural Netw ; 21(9): 1238-46, 2008 Nov.
Article in English | MEDLINE | ID: mdl-18585892

ABSTRACT

Research into the visual perception of human emotion has traditionally focused on the facial expression of emotions. Recently researchers have turned to the more challenging field of emotional body language, i.e. emotion expression through body pose and motion. In this work, we approach recognition of basic emotional categories from a computational perspective. In keeping with recent computational models of the visual cortex, we construct a biologically plausible hierarchy of neural detectors, which can discriminate seven basic emotional states from static views of associated body poses. The model is evaluated against human test subjects on a recent set of stimuli manufactured for research on emotional body language.


Subject(s)
Expressed Emotion , Kinesics , Neural Networks, Computer , Algorithms , Attention/physiology , Humans , Photic Stimulation , Social Perception
18.
IEEE Trans Pattern Anal Mach Intell ; 29(9): 1661-7, 2007 Sep.
Article in English | MEDLINE | ID: mdl-17627053

ABSTRACT

We propose a similarity measure based on a Spatial-color Mixture of Gaussians (SMOG) appearance model for particle filters. This improves on the popular similarity measure based on color histograms because it considers not only the colors in a region but also the spatial layout of the colors. Hence, the SMOG-based similarity measure is more discriminative. To efficiently compute the parameters for SMOG, we propose a new technique, with which the computational time is greatly reduced. We also extend our method by integrating multiple cues to increase the reliability and robustness. Experiments show that our method can successfully track objects in many difficult situations.


Subject(s)
Algorithms , Artificial Intelligence , Color , Colorimetry/methods , Image Enhancement/methods , Image Interpretation, Computer-Assisted/methods , Pattern Recognition, Automated/methods , Computer Simulation , Models, Statistical , Motion , Normal Distribution , Reproducibility of Results , Sensitivity and Specificity
19.
IEEE Trans Pattern Anal Mach Intell ; 28(6): 983-95, 2006 Jun.
Article in English | MEDLINE | ID: mdl-16724591

ABSTRACT

Multibody structure-and-motion (MSaM) is the problem to establish the multiple-view geometry of several views of a 3D scene taken at different times, where the scene consists of multiple rigid objects moving relative to each other. We examine the case of two views. The setting is the following: Given are a set of corresponding image points in two images, which originate from an unknown number of moving scene objects, each giving rise to a motion model. Furthermore, the measurement noise is unknown, and there are a number of gross errors, which are outliers to all models. The task is to find an optimal set of motion models for the measurements. It is solved through Monte-Carlo sampling, careful statistical analysis of the sampled set of motion models, and simultaneous selection of multiple motion models to best explain the measurements. The framework is not restricted to any particular model selection mechanism because it is developed from a Bayesian viewpoint: Different model selection criteria are seen as different priors for the set of moving objects, which allow one to bias the selection procedure for different purposes.


Subject(s)
Algorithms , Artificial Intelligence , Image Interpretation, Computer-Assisted/methods , Imaging, Three-Dimensional/methods , Models, Biological , Movement/physiology , Pattern Recognition, Automated/methods , Computer Simulation , Humans , Information Storage and Retrieval/methods , Motion , Reproducibility of Results , Sensitivity and Specificity
SELECTION OF CITATIONS
SEARCH DETAIL
...