Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 21
Filtrar
1.
IEEE Trans Pattern Anal Mach Intell ; 45(3): 3090-3106, 2023 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-35536822

RESUMO

Detecting objects and estimating their viewpoints in images are key tasks of 3D scene understanding. Recent approaches have achieved excellent results on very large benchmarks for object detection and viewpoint estimation. However, performances are still lagging behind for novel object categories with few samples. In this paper, we tackle the problems of few-shot object detection and few-shot viewpoint estimation. We demonstrate on both tasks the benefits of guiding the network prediction with class-representative features extracted from data in different modalities: image patches for object detection, and aligned 3D models for viewpoint estimation. Despite its simplicity, our method outperforms state-of-the-art methods by a large margin on a range of datasets, including PASCAL and COCO for few-shot object detection, and Pascal3D+ and ObjectNet3D for few-shot viewpoint estimation. Furthermore, when the 3D model is not available, we introduce a simple category-agnostic viewpoint estimation method by exploiting geometrical similarities and consistent pose labeling across different classes. While it moderately reduces performance, this approach still obtains better results than previous methods in this setting. Last, for the first time, we tackle the combination of both few-shot tasks, on three challenging benchmarks for viewpoint estimation in the wild, ObjectNet3D, Pascal3D+ and Pix3D, showing very promising results.

2.
Artigo em Inglês | MEDLINE | ID: mdl-36155476

RESUMO

We propose a novel method applicable in many scene understanding problems that adapts the Monte Carlo Tree Search (MCTS) algorithm, originally designed to learn to play games of high-state complexity. From a generated pool of proposals, our method jointly selects and optimizes proposals that minimize the objective term. In our first application for floor plan reconstruction from point clouds, our method selects and refines the room proposals, modelled as 2D polygons, by optimizing on an objective function combining the fitness as predicted by a deep network and regularizing terms on the room shapes. We also introduce a novel differentiable method for rendering the polygonal shapes of these proposals. Our evaluations on the recent and challenging Structured3D and Floor-SP datasets show significant improvements over the state-of-the-art both in speed and quality of reconstructions, without imposing hard constraints nor assumptions on the floor plan configurations. In our second application, we extend our approach to reconstruct general 3D room layouts from a color image and obtain accurate room layouts. We also show that our differentiable renderer can easily be extended for rendering 3D planar polygons and polygon embeddings. Our method shows high performance on the Matterport3D-Layout dataset, without introducing hard constraints on room layout configurations.

3.
Neuroimage ; 219: 117026, 2020 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-32522665

RESUMO

Whole brain segmentation of fine-grained structures using deep learning (DL) is a very challenging task since the number of anatomical labels is very high compared to the number of available training images. To address this problem, previous DL methods proposed to use a single convolution neural network (CNN) or few independent CNNs. In this paper, we present a novel ensemble method based on a large number of CNNs processing different overlapping brain areas. Inspired by parliamentary decision-making systems, we propose a framework called AssemblyNet, made of two "assemblies" of U-Nets. Such a parliamentary system is capable of dealing with complex decisions, unseen problem and reaching a relevant consensus. AssemblyNet introduces sharing of knowledge among neighboring U-Nets, an "amendment" procedure made by the second assembly at higher-resolution to refine the decision taken by the first one, and a final decision obtained by majority voting. During our validation, AssemblyNet showed competitive performance compared to state-of-the-art methods such as U-Net, Joint label fusion and SLANT. Moreover, we investigated the scan-rescan consistency and the robustness to disease effects of our method. These experiences demonstrated the reliability of AssemblyNet. Finally, we showed the interest of using semi-supervised learning to improve the performance of our method.


Assuntos
Encéfalo/diagnóstico por imagem , Processamento de Imagem Assistida por Computador/métodos , Imageamento por Ressonância Magnética/métodos , Aprendizado Profundo , Humanos , Software
4.
IEEE Trans Pattern Anal Mach Intell ; 42(8): 1898-1912, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-30932832

RESUMO

We propose an approach to estimating the 3D pose of a hand, possibly handling an object, given a depth image. We show that we can correct the mistakes made by a Convolutional Neural Network trained to predict an estimate of the 3D pose by using a feedback loop. The components of this feedback loop are also Deep Networks, optimized using training data. This approach can be generalized to a hand interacting with an object. Therefore, we jointly estimate the 3D pose of the hand and the 3D pose of the object. Our approach performs en-par with state-of-the-art methods for 3D hand pose estimation, and outperforms state-of-the-art methods for joint hand-object pose estimation when using depth images only. Also, our approach is efficient as our implementation runs in real-time on a single GPU.

5.
IEEE Trans Pattern Anal Mach Intell ; 40(6): 1465-1479, 2018 06.
Artigo em Inglês | MEDLINE | ID: mdl-28574342

RESUMO

We present an algorithm for estimating the pose of a rigid object in real-time under challenging conditions. Our method effectively handles poorly textured objects in cluttered, changing environments, even when their appearance is corrupted by large occlusions, and it relies on grayscale images to handle metallic environments on which depth cameras would fail. As a result, our method is suitable for practical Augmented Reality applications including industrial environments. At the core of our approach is a novel representation for the 3D pose of object parts: We predict the 3D pose of each part in the form of the 2D projections of a few control points. The advantages of this representation is three-fold: We can predict the 3D pose of the object even when only one part is visible; when several parts are visible, we can easily combine them to compute a better pose of the object; the 3D pose we obtain is usually very accurate, even when only few parts are visible. We show how to use this representation in a robust 3D tracking framework. In addition to extensive comparisons with the state-of-the-art, we demonstrate our method on a practical Augmented Reality application for maintenance assistance in the ATLAS particle detector at CERN.

6.
IEEE Trans Pattern Anal Mach Intell ; 39(5): 879-892, 2017 05.
Artigo em Inglês | MEDLINE | ID: mdl-28113698

RESUMO

We propose an approach for detecting flying objects such as Unmanned Aerial Vehicles (UAVs) and aircrafts when they occupy a small portion of the field of view, possibly moving against complex backgrounds, and are filmed by a camera that itself moves. We argue that solving such a difficult problem requires combining both appearance and motion cues. To this end we propose a regression-based approach for object-centric motion stabilization of image patches that allows us to achieve effective classification on spatio-temporal image cubes and outperform state-of-the-art techniques. As this problem has not yet been extensively studied, no test datasets are publicly available. We therefore built our own, both for UAVs and aircrafts, and will make them publicly available so they can be used to benchmark future flying object detection and collision avoidance algorithms.

7.
IEEE Trans Pattern Anal Mach Intell ; 38(7): 1327-41, 2016 07.
Artigo em Inglês | MEDLINE | ID: mdl-27295457

RESUMO

Finding the centerline and estimating the radius of linear structures is a critical first step in many applications, ranging from road delineation in 2D aerial images to modeling blood vessels, lung bronchi, and dendritic arbors in 3D biomedical image stacks. Existing techniques rely either on filters designed to respond to ideal cylindrical structures or on classification techniques. The former tend to become unreliable when the linear structures are very irregular while the latter often has difficulties distinguishing centerline locations from neighboring ones, thus losing accuracy. We solve this problem by reformulating centerline detection in terms of a regression problem. We first train regressors to return the distances to the closest centerline in scale-space, and we apply them to the input images or volumes. The centerlines and the corresponding scale then correspond to the regressors local maxima, which can be easily identified. We show that our method outperforms state-of-the-art techniques for various 2D and 3D datasets. Moreover, our approach is very generic and also performs well on contour detection. We show an improvement above recent contour detection algorithms on the BSDS500 dataset.

8.
IEEE Trans Pattern Anal Mach Intell ; 37(1): 94-106, 2015 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-26353211

RESUMO

Learning filters to produce sparse image representations in terms of over-complete dictionaries has emerged as a powerful way to create image features for many different purposes. Unfortunately, these filters are usually both numerous and non-separable, making their use computationally expensive. In this paper, we show that such filters can be computed as linear combinations of a smaller number of separable ones, thus greatly reducing the computational complexity at no cost in terms of performance. This makes filter learning approaches practical even for large images or 3D volumes, and we show that we significantly outperform state-of-the-art methods on the curvilinear structure extraction task, in terms of both accuracy and speed. Moreover, our approach is general and can be used on generic convolutional filter banks to reduce the complexity of the feature extraction step.

9.
IEEE Trans Pattern Anal Mach Intell ; 37(3): 597-610, 2015 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-26353264

RESUMO

We propose a novel and general framework to learn compact but highly discriminative floating-point and binary local feature descriptors. By leveraging the boosting-trick we first show how to efficiently train a compact floating-point descriptor that is very robust to illumination and viewpoint changes. We then present the main contribution of this paper-a binary extension of the framework that demonstrates the real advantage of our approach and allows us to compress the descriptor even further. Each bit of the resulting binary descriptor, which we call BinBoost, is computed with a boosted binary hash function, and we show how to efficiently optimize the hash functions so that they are complementary, which is key to compactness and robustness. As we do not put any constraints on the weak learner configuration underlying each hash function, our general framework allows us to optimize the sampling patterns of recently proposed hand-crafted descriptors and significantly improve their performance. Moreover, our boosting scheme can easily adapt to new applications and generalize to other types of image data, such as faces, while providing state-of-the-art results at a fraction of the matching time and memory footprint.

10.
IEEE Trans Vis Comput Graph ; 21(11): 1309-18, 2015 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-26340773

RESUMO

We present a method for large-scale geo-localization and global tracking of mobile devices in urban outdoor environments. In contrast to existing methods, we instantaneously initialize and globally register a SLAM map by localizing the first keyframe with respect to widely available untextured 2.5D maps. Given a single image frame and a coarse sensor pose prior, our localization method estimates the absolute camera orientation from straight line segments and the translation by aligning the city map model with a semantic segmentation of the image. We use the resulting 6DOF pose, together with information inferred from the city map model, to reliably initialize and extend a 3D SLAM map in a global coordinate system, applying a model-supported SLAM mapping approach. We show the robustness and accuracy of our localization approach on a challenging dataset, and demonstrate unconstrained global SLAM mapping and tracking of arbitrary camera motion on several sequences.

11.
Med Image Comput Comput Assist Interv ; 16(Pt 1): 526-33, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24505707

RESUMO

We present a novel, fully-discriminative method for curvilinear structure segmentation that simultaneously learns a classifier and the features it relies on. Our approach requires almost no parameter tuning and, in the case of 2D images, removes the requirement for hand-designed features, thus freeing the practitioner from the time-consuming tasks of parameter and feature selection. Our approach relies on the Gradient Boosting framework to learn discriminative convolutional filters in closed form at each stage, and can operate on raw image pixels as well as additional data sources, such as the output of other methods like the Optimally Oriented Flux. We will show that it outperforms state-of-the-art curvilinear segmentation methods on both 2D images and 3D image stacks.


Assuntos
Algoritmos , Inteligência Artificial , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Reconhecimento Automatizado de Padrão/métodos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
12.
IEEE Trans Pattern Anal Mach Intell ; 34(5): 876-88, 2012 May.
Artigo em Inglês | MEDLINE | ID: mdl-22442120

RESUMO

We present a method for real-time 3D object instance detection that does not require a time-consuming training stage, and can handle untextured objects. At its core, our approach is a novel image representation for template matching designed to be robust to small image transformations. This robustness is based on spread image gradient orientations and allows us to test only a small subset of all possible pixel locations when parsing the image, and to represent a 3D object with a limited set of templates. In addition, we demonstrate that if a dense depth sensor is available we can extend our approach for an even better performance also taking 3D surface normal orientations into account. We show how to take advantage of the architecture of modern computers to build an efficient but very discriminant representation of the input images that can be used to consider thousands of templates in real time. We demonstrate in many experiments on real data that our method is much faster and more robust with respect to background clutter than current state-of-the-art methods.


Assuntos
Processamento de Imagem Assistida por Computador/métodos , Humanos , Imageamento Tridimensional , Reprodutibilidade dos Testes , Propriedades de Superfície
13.
Med Image Comput Comput Assist Interv ; 15(Pt 1): 189-97, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23285551

RESUMO

Extracting linear structures, such as blood vessels or dendrites, from images is crucial in many medical imagery applications, and many handcrafted features have been proposed to solve this problem. However, such features rely on assumptions that are never entirely true. Learned features, on the other hand, can capture image characteristics difficult to define analytically, but tend to be much slower to compute than handcrafted features. We propose to complement handcrafted methods with features found using very recent Machine Learning techniques, and we show that even few filters are sufficient to efficiently leverage handcrafted features. We demonstrate our approach on the STARE, DRIVE, and BF2D datasets, and on 2D projections of neural images from the DIADEM challenge. Our proposal outperforms handcrafted methods, and pairs up with learning-only approaches at a fraction of their computational cost.


Assuntos
Inteligência Artificial , Algoritmos , Vasos Sanguíneos/patologia , Simulação por Computador , Diagnóstico por Imagem/métodos , Humanos , Processamento de Imagem Assistida por Computador , Modelos Estatísticos , Neurônios/patologia , Reconhecimento Automatizado de Padrão/métodos , Análise de Regressão , Reprodutibilidade dos Testes , Software
14.
IEEE Trans Pattern Anal Mach Intell ; 34(7): 1281-98, 2012 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-22084141

RESUMO

Binary descriptors are becoming increasingly popular as a means to compare feature points very fast while requiring comparatively small amounts of memory. The typical approach to creating them is to first compute floating-point ones, using an algorithm such as SIFT, and then to binarize them. In this paper, we show that we can directly compute a binary descriptor, which we call BRIEF, on the basis of simple intensity difference tests. As a result, BRIEF is very fast both to build and to match. We compare it against SURF and SIFT on standard benchmarks and show that it yields comparable recognition accuracy, while running in an almost vanishing fraction of the time required by either.

15.
IEEE Trans Vis Comput Graph ; 18(9): 1449-59, 2012 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-21931174

RESUMO

The contribution of this paper is two-fold. First, we show how to extend the ESM algorithm to handle motion blur in 3D object tracking. ESM is a powerful algorithm for template matching-based tracking, but it can fail under motion blur. We introduce an image formation model that explicitly consider the possibility of blur, and shows its results in a generalization of the original ESM algorithm. This allows to converge faster, more accurately and more robustly even under large amount of blur. Our second contribution is an efficient method for rendering the virtual objects under the estimated motion blur. It renders two images of the object under 3D perspective, and warps them to create many intermediate images. By fusing these images we obtain a final image for the virtual objects blurred consistently with the captured image. Because warping is much faster than 3D rendering, we can create realistically blurred images at a very low computational cost.

16.
Med Image Comput Comput Assist Interv ; 13(Pt 2): 463-71, 2010.
Artigo em Inglês | MEDLINE | ID: mdl-20879348

RESUMO

While there has been substantial progress in segmenting natural images, state-of-the-art methods that perform well in such tasks unfortunately tend to underperform when confronted with the different challenges posed by electron microscope (EM) data. For example, in EM imagery of neural tissue, numerous cells and subcellular structures appear within a single image, they exhibit irregular shapes that cannot be easily modeled by standard techniques, and confusing textures clutter the background. We propose a fully automated approach that handles these challenges by using sophisticated cues that capture global shape and texture information, and by learning the specific appearance of object boundaries. We demonstrate that our approach significantly outperforms state-of-the-art techniques and closely matches the performance of human annotators.


Assuntos
Algoritmos , Inteligência Artificial , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Microscopia Eletrônica/métodos , Reconhecimento Automatizado de Padrão/métodos , Frações Subcelulares/ultraestrutura , Animais , Humanos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
17.
IEEE Trans Pattern Anal Mach Intell ; 32(7): 1165-81, 2010 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-20489222

RESUMO

We combine detection and tracking techniques to achieve robust 3D motion recovery of people seen from arbitrary viewpoints by a single and potentially moving camera. We rely on detecting key postures, which can be done reliably, using a motion model to infer 3D poses between consecutive detections, and finally refining them over the whole sequence using a generative model. We demonstrate our approach in the cases of golf motions filmed using a static camera and walking motions acquired using a potentially moving one. We will show that our approach, although monocular, is both metrically accurate because it integrates information over many frames and robust because it can recover from a few misdetections.


Assuntos
Processamento de Imagem Assistida por Computador/métodos , Movimento (Física) , Gravação em Vídeo/métodos , Algoritmos , Inteligência Artificial , Golfe , Humanos , Distribuição Normal , Postura , Caminhada
18.
IEEE Trans Pattern Anal Mach Intell ; 32(5): 815-30, 2010 May.
Artigo em Inglês | MEDLINE | ID: mdl-20299707

RESUMO

In this paper, we introduce a local image descriptor, DAISY, which is very efficient to compute densely. We also present an EM-based algorithm to compute dense depth and occlusion maps from wide-baseline image pairs using this descriptor. This yields much better results in wide-baseline situations than the pixel and correlation-based algorithms that are commonly used in narrow-baseline stereo. Also, using a descriptor makes our algorithm robust against many photometric and geometric transformations. Our descriptor is inspired from earlier ones such as SIFT and GLOH but can be computed much faster for our purposes. Unlike SURF, which can also be computed efficiently at every pixel, it does not introduce artifacts that degrade the matching performance when used densely. It is important to note that our approach is the first algorithm that attempts to estimate dense depth maps from wide-baseline image pairs, and we show that it is a good one at that with many experiments for depth estimation accuracy, occlusion detection, and comparing it against other descriptors on laser-scanned ground truth scenes. We also tested our approach on a variety of indoor and outdoor scenes with different photometric and geometric transformations and our experiments support our claim to being robust against these.


Assuntos
Algoritmos , Inteligência Artificial , Interpretação de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Reconhecimento Automatizado de Padrão/métodos , Fotogrametria/métodos , Técnica de Subtração
19.
IEEE Trans Pattern Anal Mach Intell ; 32(3): 448-61, 2010 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-20075471

RESUMO

While feature point recognition is a key component of modern approaches to object detection, existing approaches require computationally expensive patch preprocessing to handle perspective distortion. In this paper, we show that formulating the problem in a naive Bayesian classification framework makes such preprocessing unnecessary and produces an algorithm that is simple, efficient, and robust. Furthermore, it scales well as the number of classes grows. To recognize the patches surrounding keypoints, our classifier uses hundreds of simple binary features and models class posterior probabilities. We make the problem computationally tractable by assuming independence between arbitrary sets of features. Even though this is not strictly true, we demonstrate that our classifier nevertheless performs remarkably well on image data sets containing very significant perspective changes.

20.
IEEE Trans Pattern Anal Mach Intell ; 28(9): 1465-79, 2006 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-16929732

RESUMO

In many 3D object-detection and pose-estimation problems, runtime performance is of critical importance. However, there usually is time to train the system, which we will show to be very useful. Assuming that several registered images of the target object are available, we developed a keypoint-based approach that is effective in this context by formulating wide-baseline matching of keypoints extracted from the input images to those found in the model images as a classification problem. This shifts much of the computational burden to a training phase, without sacrificing recognition performance. As a result, the resulting algorithm is robust, accurate, and fast-enough for frame-rate performance. This reduction in runtime computational complexity is our first contribution. Our second contribution is to show that, in this context, a simple and fast keypoint detector suffices to support detection and tracking even under large perspective and scale variations. While earlier methods require a detector that can be expected to produce very repeatable results, in general, which usually is very time-consuming, we simply find the most repeatable object keypoints for the specific target object during the training phase. We have incorporated these ideas into a real-time system that detects planar, nonplanar, and deformable objects. It then estimates the pose of the rigid ones and the deformations of the others.


Assuntos
Algoritmos , Inteligência Artificial , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Reconhecimento Automatizado de Padrão/métodos , Simulação por Computador , Interpretação Estatística de Dados , Armazenamento e Recuperação da Informação/métodos , Modelos Estatísticos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...