Pesquisa | Portal Regional da BVS

1.

Cross-Modal Learning for Domain Adaptation in 3D Semantic Segmentation.

Jaritz, Maximilian; Vu, Tuan-Hung; de Charette, Raoul; Wirbel, Emilie; Perez, Patrick.

IEEE Trans Pattern Anal Mach Intell ; 45(2): 1533-1544, 2023 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-35298372

RESUMO

Domain adaptation is an important task to enable learning when labels are scarce. While most works focus only on the image modality, there are many important multi-modal datasets. In order to leverage multi-modality for domain adaptation, we propose cross-modal learning, where we enforce consistency between the predictions of two modalities via mutual mimicking. We constrain our network to make correct predictions on labeled data and consistent predictions across modalities on unlabeled target-domain data. Experiments in unsupervised and semi-supervised domain adaptation settings prove the effectiveness of this novel domain adaptation strategy. Specifically, we evaluate on the task of 3D semantic segmentation from either the 2D image, the 3D point cloud or from both. We leverage recent driving datasets to produce a wide variety of domain adaptation scenarios including changes in scene layout, lighting, sensor setup and weather, as well as the synthetic-to-real setup. Our method significantly improves over previous uni-modal adaptation baselines on all adaption scenarios. Code will be made available upon publication.

2.

Confidence Estimation via Auxiliary Models.

Corbiere, Charles; Thome, Nicolas; Saporta, Antoine; Vu, Tuan-Hung; Cord, Matthieu; Perez, Patrick.

IEEE Trans Pattern Anal Mach Intell ; 44(10): 6043-6055, 2022 10.

Artigo em Inglês | MEDLINE | ID: mdl-34086561

RESUMO

Reliably quantifying the confidence of deep neural classifiers is a challenging yet fundamental requirement for deploying such models in safety-critical applications. In this paper, we introduce a novel target criterion for model confidence, namely the true class probability (TCP). We show that TCP offers better properties for confidence estimation than standard maximum class probability (MCP). Since the true class is by essence unknown at test time, we propose to learn TCP criterion from data with an auxiliary model, introducing a specific learning scheme adapted to this context. We evaluate our approach on the task of failure prediction and of self-training with pseudo-labels for domain adaptation, which both necessitate effective confidence estimates. Extensive experiments are conducted for validating the relevance of the proposed approach in each task. We study various network architectures and experiment with small and large datasets for image classification and semantic segmentation. In every tested benchmark, our approach outperforms strong baselines.

Assuntos

Algoritmos , Aprendizado Profundo , Interpretação de Imagem Assistida por Computador , Processamento de Imagem Assistida por Computador , Redes Neurais de Computação , Web Semântica

3.

ROAM: A Rich Object Appearance Model with Application to Rotoscoping.

Perez-Rua, Juan-Manuel; Miksik, Ondrej; Crivelli, Tomas; Bouthemy, Patrick; Torr, Philip H S; Perez, Patrick.

IEEE Trans Pattern Anal Mach Intell ; 42(8): 1996-2010, 2020 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-30872223

RESUMO

Rotoscoping, the detailed delineation of scene elements through a video shot, is a painstaking task of tremendous importance in professional post-production pipelines. While pixel-wise segmentation techniques can help for this task, professional rotoscoping tools rely on parametric curves that offer the artists a much better interactive control on the definition, editing and manipulation of the segments of interest. Sticking to this prevalent rotoscoping paradigm, we propose a novel framework to capture and track the visual aspect of an arbitrary object in a scene, given an initial closed outline of this object. This model combines a collection of local foreground/background appearance models spread along the outline, a global appearance model of the enclosed object and a set of distinctive foreground landmarks. The structure of this rich appearance model allows simple initialization, efficient iterative optimization with exact minimization at each step, and on-line adaptation in videos. We further extend this model by so-called trimaps which serve as an input to alpha-matting algorithms to allow truly seamless compositing. To this end, we leverage local classifiers attached to the roto-curves to define a confidence measure that is well-suited to define trimaps with adaptive band-widths. The resulting trimaps are parametric, temporally consistent and remain fully editable by the artist. We demonstrate qualitatively and quantitatively the merit of this framework through comparisons with tools based on either dynamic segmentation with a closed curve or pixel-wise binary labelling.

4.

High-Fidelity Monocular Face Reconstruction Based on an Unsupervised Model-Based Face Autoencoder.

Tewari, Ayush; Zollhofer, Michael; Bernard, Florian; Garrido, Pablo; Kim, Hyeongwoo; Perez, Patrick; Theobalt, Christian.

IEEE Trans Pattern Anal Mach Intell ; 42(2): 357-370, 2020 02.

Artigo em Inglês | MEDLINE | ID: mdl-30334783

RESUMO

In this work, we propose a novel model-based deep convolutional autoencoder that addresses the highly challenging problem of reconstructing a 3D human face from a single in-the-wild color image. To this end, we combine a convolutional encoder network with an expert-designed generative model that serves as decoder. The core innovation is the differentiable parametric decoder that encapsulates image formation analytically based on a generative model. Our decoder takes as input a code vector with exactly defined semantic meaning that encodes detailed face pose, shape, expression, skin reflectance, and scene illumination. Due to this new way of combining CNN-based with model-based face reconstruction, the CNN-based encoder learns to extract semantically meaningful parameters from a single monocular input image. For the first time, a CNN encoder and an expert-designed generative model can be trained end-to-end in an unsupervised manner, which renders training on very large (unlabeled) real world datasets feasible. The obtained reconstructions compare favorably to current state-of-the-art approaches in terms of quality and richness of representation. This work is an extended version of [1] , where we additionally present a stochastic vertex sampling technique for faster training of our networks, and moreover, we propose and evaluate analysis-by-synthesis and shape-from-shading refinement approaches to achieve a high-fidelity reconstruction.

Assuntos

Face/anatomia & histologia , Face/diagnóstico por imagem , Imageamento Tridimensional/métodos , Aprendizado de Máquina não Supervisionado , Aprendizado Profundo , Feminino , Humanos , Masculino , Redes Neurais de Computação

5.

Sparse Multi-View Consistency for Object Segmentation.

Djelouah, Abdelaziz; Franco, Jean-Sébastien; Boyer, Edmond; Le Clerc, François; Pérez, Patrick.

IEEE Trans Pattern Anal Mach Intell ; 37(9): 1890-903, 2015 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-26353134

RESUMO

Multiple view segmentation consists in segmenting objects simultaneously in several views. A key issue in that respect and compared to monocular settings is to ensure propagation of segmentation information between views while minimizing complexity and computational cost. In this work, we first investigate the idea that examining measurements at the projections of a sparse set of 3D points is sufficient to achieve this goal. The proposed algorithm softly assigns each of these 3D samples to the scene background if it projects on the background region in at least one view, or to the foreground if it projects on foreground region in all views. Second, we show how other modalities such as depth may be seamlessly integrated in the model and benefit the segmentation. The paper exposes a detailed set of experiments used to validate the algorithm, showing results comparable with the state of art, with reduced computational complexity. We also discuss the use of different modalities for specific situations, such as dealing with a low number of viewpoints or a scene with color ambiguities between foreground and background.

6.

Robust optical flow integration.

Crivelli, Tomas; Fradet, Matthieu; Conze, Pierre-Henri; Robert, Philippe; Pérez, Patrick.

IEEE Trans Image Process ; 24(1): 484-98, 2015 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-25020094

RESUMO

We analyze the problem of how to correctly construct dense point trajectories from optical flow fields. First, we show that simple Euler integration is unavoidably inaccurate, no matter how good is the optical flow estimator. Then, an inverse integration scheme is analyzed which is more robust to bias and input noise and shows better stability properties. Our contribution is threefold: 1) a theoretical analysis that demonstrates why and in what sense inverse integration is more accurate; 2) a rich experimental validation both on synthetic and real (image) data; and 3) an algorithm for approximate online inverse integration. This new technique is precious whether one is trying to propagate information densely available on a reference frame to the other frames in the sequence or, conversely, to assign information densely over each frame by pulling it from the reference.

7.

Robust automatic line scratch detection in films.

Newson, Alasdair; Almansa, Andrés; Gousseau, Yann; Pérez, Patrick.

IEEE Trans Image Process ; 23(3): 1240-54, 2014 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-24723525

RESUMO

Line scratch detection in old films is a particularly challenging problem due to the variable spatiotemporal characteristics of this defect. Some of the main problems include sensitivity to noise and texture, and false detections due to thin vertical structures belonging to the scene. We propose a robust and automatic algorithm for frame-by-frame line scratch detection in old films, as well as a temporal algorithm for the filtering of false detections. In the frame-by-frame algorithm, we relax some of the hypotheses used in previous algorithms in order to detect a wider variety of scratches. This step's robustness and lack of external parameters is ensured by the combined use of an a contrario methodology and local statistical estimation. In this manner, over-detection in textured or cluttered areas is greatly reduced. The temporal filtering algorithm eliminates false detections due to thin vertical structures by exploiting the coherence of their motion with that of the underlying scene. Experiments demonstrate the ability of the resulting detection procedure to deal with difficult situations, in particular in the presence of noise, texture, and slanted or partial scratches. Comparisons show significant advantages over previous work.

Assuntos

Algoritmos , Artefatos , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Filmes Cinematográficos , Reconhecimento Automatizado de Padrão/métodos , Fotografação/métodos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade

8.

Correspondence map-aided neighbor embedding for image intra prediction.

Cherigui, Safa; Guillemot, Christine; Thoreau, Dominique; Guillotel, Philippe; Pérez, Patrick.

IEEE Trans Image Process ; 22(3): 1161-74, 2013 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-23193451

RESUMO

This paper describes new image prediction methods based on neighbor embedding (NE) techniques. Neighbor embedding methods are used here to approximate an input block (the block to be predicted) in the image as a linear combination of K nearest neighbors. However, in order for the decoder to proceed similarly, the K nearest neighbors are found by computing distances between the known pixels in a causal neighborhood (called template) of the input block and the co-located pixels in candidate patches taken from a causal window. Similarly, the weights used for the linear approximation are computed in order to best approximate the template pixels. Although efficient, these methods suffer from limitations when the template and the block to be predicted are not correlated, e.g., in non homogenous texture areas. To cope with these limitations, this paper introduces new image prediction methods based on NE techniques in which the K-NN search is done in two steps and aided, at the decoder, by a block correspondence map, hence the name map-aided neighbor embedding (MANE) method. Another optimized variant of this approach, called oMANE method, is also studied. In these methods, several alternatives have also been proposed for the K-NN search. The resulting prediction methods are shown to bring significant rate-distortion performance improvements when compared to H.264 Intra prediction modes (up to 44.75% rate saving at low bit rates).

Assuntos

Algoritmos , Compressão de Dados/métodos , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Técnica de Subtração , Inteligência Artificial , Reprodutibilidade dos Testes , Sensibilidade e Especificidade

9.

Adaptive multi-modal particle filtering for probabilistic white matter tractography.

Stamm, Aymeric; Commowick, Olivier; Barillot, Christian; Pérez, Patrick.

Inf Process Med Imaging ; 23: 594-606, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-24684002

RESUMO

Particle filtering has recently been introduced to perform probabilistic tractography in conjunction with DTI and Q-Ball models to estimate the diffusion information. Particle filters are particularly well adapted to the tractography problem as they offer a way to approximate a probability distribution over all paths originated from a specified voxel, given the diffusion information. In practice however, they often fail at consistently capturing the multi-modality of the target distribution. For brain white matter tractography, this means that multiple fiber pathways are unlikely to be tracked over extended volumes. We propose to remedy this issue by formulating the filtering distribution as an adaptive M-component non-parametric mixture model. Such a formulation preserves all the properties of a classical particle filter while improving multi-modality capture. We apply this multi-modal particle filter to both DTI and Q-Ball models and propose to estimate dynamically the number of modes of the filtering distribution. We show on synthetic and real data how this algorithm outperforms the previous versions proposed in the literature.

Assuntos

Algoritmos , Imagem de Tensor de Difusão/métodos , Interpretação de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Imagem Multimodal/métodos , Fibras Nervosas Mielinizadas/ultraestrutura , Tratos Piramidais/anatomia & histologia , Interpretação Estatística de Dados , Humanos , Aumento da Imagem/métodos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade

10.

Aggregating local image descriptors into compact codes.

Jégou, Hervé; Perronnin, Florent; Douze, Matthijs; Sánchez, Jorge; Pérez, Patrick; Schmid, Cordelia.

IEEE Trans Pattern Anal Mach Intell ; 34(9): 1704-16, 2012 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-22156101

RESUMO

This paper addresses the problem of large-scale image search. Three constraints have to be taken into account: search accuracy, efficiency, and memory usage. We first present and evaluate different ways of aggregating local image descriptors into a vector and show that the Fisher kernel achieves better performance than the reference bag-of-visual words approach for any given vector dimension. We then jointly optimize dimensionality reduction and indexing in order to obtain a precise vector comparison as well as a compact representation. The evaluation shows that the image representation can be reduced to a few dozen bytes while preserving high accuracy. Searching a 100 million image data set takes about 250 ms on one processor core.

11.

View-independent action recognition from temporal self-similarities.

Junejo, Imran N; Dexter, Emilie; Laptev, Ivan; Pérez, Patrick.

IEEE Trans Pattern Anal Mach Intell ; 33(1): 172-85, 2011 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-21088326

RESUMO

This paper addresses recognition of human actions under view changes. We explore self-similarities of action sequences over time and observe the striking stability of such measures across views. Building upon this key observation, we develop an action descriptor that captures the structure of temporal similarities and dissimilarities within an action sequence. Despite this temporal self-similarity descriptor not being strictly view-invariant, we provide intuition and experimental validation demonstrating its high stability under view changes. Self-similarity descriptors are also shown to be stable under performance variations within a class of actions when individual speed fluctuations are ignored. If required, such fluctuations between two different instances of the same action class can be explicitly recovered with dynamic time warping, as will be demonstrated, to achieve cross-view action synchronization. More central to the current work, temporal ordering of local self-similarity descriptors can simply be ignored within a bag-of-features type of approach. Sufficient action discrimination is still retained in this way to build a view-independent action recognition system. Interestingly, self-similarities computed from different image features possess similar properties and can be used in a complementary fashion. Our method is simple and requires neither structure recovery nor multiview correspondence estimation. Instead, it relies on weak geometric properties and combines them with machine learning for efficient cross-view action recognition. The method is validated on three public data sets. It has similar or superior performance compared to related methods and it performs well even in extreme conditions, such as when recognizing actions from top views while using side views only for training.

Assuntos

Algoritmos , Reconhecimento Automatizado de Padrão/métodos , Inteligência Artificial , Simulação por Computador , Humanos , Movimento

12.

Region filling and object removal by exemplar-based image inpainting.

Criminisi, Antonio; Pérez, Patrick; Toyama, Kentaro.

IEEE Trans Image Process ; 13(9): 1200-12, 2004 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-15449582

RESUMO

A new algorithm is proposed for removing large objects from digital images. The challenge is to fill in the hole that is left behind in a visually plausible way. In the past, this problem has been addressed by two classes of algorithms: 1) "texture synthesis" algorithms for generating large image regions from sample textures and 2) "inpainting" techniques for filling in small image gaps. The former has been demonstrated for "textures"--repeating two-dimensional patterns with some stochasticity; the latter focus on linear "structures" which can be thought of as one-dimensional patterns, such as lines and object contours. This paper presents a novel and efficient algorithm that combines the advantages of these two approaches. We first note that exemplar-based texture synthesis contains the essential process required to replicate both texture and structure; the success of structure propagation, however, is highly dependent on the order in which the filling proceeds. We propose a best-first algorithm in which the confidence in the synthesized pixel values is propagated in a manner similar to the propagation of information in inpainting. The actual color values are computed using exemplar-based synthesis. In this paper, the simultaneous propagation of texture and structure information is achieved by a single, efficient algorithm. Computational efficiency is achieved by a block-based sampling process. A number of examples on real and synthetic images demonstrate the effectiveness of our algorithm in removing large occluding objects, as well as thin scratches. Robustness with respect to the shape of the manually selected target region is also demonstrated. Our results compare favorably to those obtained by existing techniques.

Assuntos

Algoritmos , Gráficos por Computador , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão , Processamento de Sinais Assistido por Computador , Técnica de Subtração , Hipermídia , Armazenamento e Recuperação da Informação/métodos , Análise Numérica Assistida por Computador , Pinturas , Reprodutibilidade dos Testes , Sensibilidade e Especificidade

13.

Nonparametric motion characterization using causal probabilistic models for video indexing and retrieval.

Fablet, Ronan; Bouthemy, Patrick; Pérez, Patrick.

IEEE Trans Image Process ; 11(4): 393-407, 2002.

Artigo em Inglês | MEDLINE | ID: mdl-18244642

RESUMO

This paper describes an original approach for content-based video indexing and retrieval. We aim at providing a global interpretation of the dynamic content of video shots without any prior motion segmentation and without any use of dense optic flow fields. To this end, we exploit the spatio-temporal distribution, within a shot, of appropriate local motion-related measurements derived from the spatio-temporal derivatives of the intensity function. These distributions are then represented by causal Gibbs models. To be independent of camera movement, the motion-related measurements are computed in the image sequence generated by compensating the estimated dominant image motion in the original sequence. The statistical modeling framework considered makes the exact computation of the conditional likelihood of a video shot belonging to a given motion or more generally to an activity class feasible. This property allows us to develop a general statistical framework for video indexing and retrieval with query-by-example. We build a hierarchical structure of the processed video database according to motion content similarity. This results in a binary tree where each node is associated to an estimated causal Gibbs model. We consider a similarity measure inspired from Kullback-Leibler divergence. Then, retrieval with query-by-example is performed through this binary tree using the maximum a posteriori (MAP) criterion. We have obtained promising results on a set of various real image sequences.

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA