Pesquisa | Portal Regional da BVS

Robust Shape Fitting for 3D Scene Abstraction.

Kluger, Florian; Brachmann, Eric; Yang, Michael Ying; Rosenhahn, Bodo.

IEEE Trans Pattern Anal Mach Intell ; 46(9): 6306-6325, 2024 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-38502630

RESUMO

Humans perceive and construct the world as an arrangement of simple parametric models. In particular, we can often describe man-made environments using volumetric primitives such as cuboids or cylinders. Inferring these primitives is important for attaining high-level, abstract scene descriptions. Previous approaches for primitive-based abstraction estimate shape parameters directly and are only able to reproduce simple objects. In contrast, we propose a robust estimator for primitive fitting, which meaningfully abstracts complex real-world environments using cuboids. A RANSAC estimator guided by a neural network fits these primitives to a depth map. We condition the network on previously detected parts of the scene, parsing it one-by-one. To obtain cuboids from single RGB images, we additionally optimise a depth estimation CNN end-to-end. Naively minimising point-to-primitive distances leads to large or spurious cuboids occluding parts of the scene. We thus propose an improved occlusion-aware distance metric correctly handling opaque scenes. Furthermore, we present a neural network based cuboid solver which provides more parsimonious scene abstractions while also reducing inference time. The proposed algorithm does not require labour-intensive labels, such as cuboid annotations, for training. Results on the NYU Depth v2 dataset demonstrate that the proposed algorithm successfully abstracts cluttered real-world 3D scene layouts.

RelTR: Relation Transformer for Scene Graph Generation.

Cong, Yuren; Yang, Michael Ying; Rosenhahn, Bodo.

IEEE Trans Pattern Anal Mach Intell ; 45(9): 11169-11183, 2023 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-37074895

RESUMO

Different objects in the same scene are more or less related to each other, but only a limited number of these relationships are noteworthy. Inspired by Detection Transformer, which excels in object detection, we view scene graph generation as a set prediction problem. In this article, we propose an end-to-end scene graph generation model Relation Transformer (RelTR), which has an encoder-decoder architecture. The encoder reasons about the visual feature context while the decoder infers a fixed-size set of triplets subject-predicate-object using different types of attention mechanisms with coupled subject and object queries. We design a set prediction loss performing the matching between the ground truth and predicted triplets for the end-to-end training. In contrast to most existing scene graph generation methods, RelTR is a one-stage method that predicts sparse scene graphs directly only using visual appearance without combining entities and labeling all possible predicates. Extensive experiments on the Visual Genome, Open Images V6, and VRD datasets demonstrate the superior performance and fast inference of our model.

Object Detection in Aerial Images: A Large-Scale Benchmark and Challenges.

Ding, Jian; Xue, Nan; Xia, Gui-Song; Bai, Xiang; Yang, Wen; Yang, Michael Ying; Belongie, Serge; Luo, Jiebo; Datcu, Mihai; Pelillo, Marcello; Zhang, Liangpei.

IEEE Trans Pattern Anal Mach Intell ; 44(11): 7778-7796, 2022 11.

Artigo em Inglês | MEDLINE | ID: mdl-34613910

RESUMO

In he past decade, object detection has achieved significant progress in natural images but not in aerial images, due to the massive variations in the scale and orientation of objects caused by the bird's-eye view of aerial images. More importantly, the lack of large-scale benchmarks has become a major obstacle to the development of object detection in aerial images (ODAI). In this paper, we present a large-scale Dataset of Object deTection in Aerial images (DOTA) and comprehensive baselines for ODAI. The proposed DOTA dataset contains 1,793,658 object instances of 18 categories of oriented-bounding-box annotations collected from 11,268 aerial images. Based on this large-scale and well-annotated dataset, we build baselines covering 10 state-of-the-art algorithms with over 70 configurations, where the speed and accuracy performances of each model have been evaluated. Furthermore, we provide a code library for ODAI and build a website for evaluating different algorithms. Previous challenges run on DOTA have attracted more than 1300 teams worldwide. We believe that the expanded large-scale DOTA dataset, the extensive baselines, the code library and the challenges can facilitate the designs of robust algorithms and reproducible research on the problem of object detection in aerial images.

Assuntos

Algoritmos , Benchmarking

Multi-channel residual network model for accurate estimation of spatially-varying and depth-dependent defocus kernels.

Cao, Yanpeng; Ye, Zhangyu; He, Zewei; Yang, Jiangxin; Cao, Yanlong; Tisse, Christel-Loic; Yang, Michael Ying.

Opt Express ; 28(2): 2263-2275, 2020 Jan 20.

Artigo em Inglês | MEDLINE | ID: mdl-32121920

RESUMO

Digital projectors have been increasingly utilized in various commercial and scientific applications. However, they are prone to the out-of-focus blurring problem since their depth-of-fields are typically limited. In this paper, we explore the feasibility of utilizing a deep learning-based approach to analyze the spatially-varying and depth-dependent defocus properties of digital projectors. A multimodal displaying/imaging system is built for capturing images projected at various depths. Based on the constructed dataset containing well-aligned in-focus, out-of-focus, and depth images, we propose a novel multi-channel residual deep network model to learn the end-to-end mapping function between the in-focus and out-of-focus image patches captured at different spatial locations and depths. To the best of our knowledge, it is the first research work revealing that the complex spatially-varying and depth-dependent blurring effects can be accurately learned from a number of real-captured image pairs instead of being hand-crafted as before. Experimental results demonstrate that our proposed deep learning-based method significantly outperforms the state-of-the-art defocus kernel estimation techniques and thus leads to better out-of-focus compensation for extending the dynamic ranges of digital projectors.

Brain tumor classification and segmentation using sparse coding and dictionary learning.

Salman Al-Shaikhli, Saif Dawood; Yang, Michael Ying; Rosenhahn, Bodo.

Biomed Tech (Berl) ; 61(4): 413-29, 2016 Aug 01.

Artigo em Inglês | MEDLINE | ID: mdl-26351901

RESUMO

This paper presents a novel fully automatic framework for multi-class brain tumor classification and segmentation using a sparse coding and dictionary learning method. The proposed framework consists of two steps: classification and segmentation. The classification of the brain tumors is based on brain topology and texture. The segmentation is based on voxel values of the image data. Using K-SVD, two types of dictionaries are learned from the training data and their associated ground truth segmentation: feature dictionary and voxel-wise coupled dictionaries. The feature dictionary consists of global image features (topological and texture features). The coupled dictionaries consist of coupled information: gray scale voxel values of the training image data and their associated label voxel values of the ground truth segmentation of the training data. For quantitative evaluation, the proposed framework is evaluated using different metrics. The segmentation results of the brain tumor segmentation (MICCAI-BraTS-2013) database are evaluated using five different metric scores, which are computed using the online evaluation tool provided by the BraTS-2013 challenge organizers. Experimental results demonstrate that the proposed approach achieves an accurate brain tumor classification and segmentation and outperforms the state-of-the-art methods.

Assuntos

Neoplasias Encefálicas/diagnóstico por imagem , Interpretação de Imagem Assistida por Computador/métodos , Algoritmos , Bases de Dados Factuais , Humanos

Alzheimer's disease detection via automatic 3D caudate nucleus segmentation using coupled dictionary learning with level set formulation.

Al-Shaikhli, Saif Dawood Salman; Yang, Michael Ying; Rosenhahn, Bodo.

Comput Methods Programs Biomed ; 137: 329-339, 2016 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-28110736

RESUMO

BACKGROUND AND OBJECTIVE: This paper presents a novel method for Alzheimer's disease classification via an automatic 3D caudate nucleus segmentation. METHODS: The proposed method consists of segmentation and classification steps. In the segmentation step, we propose a novel level set cost function. The proposed cost function is constrained by a sparse representation of local image features using a dictionary learning method. We present coupled dictionaries: a feature dictionary of a grayscale brain image and a label dictionary of a caudate nucleus label image. Using online dictionary learning, the coupled dictionaries are learned from the training data. The learned coupled dictionaries are embedded into a level set function. In the classification step, a region-based feature dictionary is built. The region-based feature dictionary is learned from shape features of the caudate nucleus in the training data. The classification is based on the measure of the similarity between the sparse representation of region-based shape features of the segmented caudate in the test image and the region-based feature dictionary. RESULTS: The experimental results demonstrate the superiority of our method over the state-of-the-art methods by achieving a high segmentation (91.5%) and classification (92.5%) accuracy. CONCLUSIONS: In this paper, we find that the study of the caudate nucleus atrophy gives an advantage over the study of whole brain structure atrophy to detect Alzheimer's disease.

Assuntos

Doença de Alzheimer/diagnóstico , Automação , Núcleo Caudado/diagnóstico por imagem , Imageamento Tridimensional , Aprendizagem , Doença de Alzheimer/classificação , Humanos

3D automatic liver segmentation using feature-constrained Mahalanobis distance in CT images.

Salman Al-Shaikhli, Saif Dawood; Yang, Michael Ying; Rosenhahn, Bodo.

Biomed Tech (Berl) ; 61(4): 401-12, 2016 Aug 01.

Artigo em Inglês | MEDLINE | ID: mdl-26501155

RESUMO

Automatic 3D liver segmentation is a fundamental step in the liver disease diagnosis and surgery planning. This paper presents a novel fully automatic algorithm for 3D liver segmentation in clinical 3D computed tomography (CT) images. Based on image features, we propose a new Mahalanobis distance cost function using an active shape model (ASM). We call our method MD-ASM. Unlike the standard active shape model (ST-ASM), the proposed method introduces a new feature-constrained Mahalanobis distance cost function to measure the distance between the generated shape during the iterative step and the mean shape model. The proposed Mahalanobis distance function is learned from a public database of liver segmentation challenge (MICCAI-SLiver07). As a refinement step, we propose the use of a 3D graph-cut segmentation. Foreground and background labels are automatically selected using texture features of the learned Mahalanobis distance. Quantitatively, the proposed method is evaluated using two clinical 3D CT scan databases (MICCAI-SLiver07 and MIDAS). The evaluation of the MICCAI-SLiver07 database is obtained by the challenge organizers using five different metric scores. The experimental results demonstrate the availability of the proposed method by achieving an accurate liver segmentation compared to the state-of-the-art methods.

Assuntos

Imageamento Tridimensional/métodos , Fígado/diagnóstico por imagem , Tomografia Computadorizada por Raios X , Algoritmos , Bases de Dados Factuais , Humanos , Modelos Teóricos , Tomografia Computadorizada por Raios X/métodos

Multi-region labeling and segmentation using a graph topology prior and atlas information in brain images.

Al-Shaikhli, Saif Dawood Salman; Yang, Michael Ying; Rosenhahn, Bodo.

Comput Med Imaging Graph ; 38(8): 725-34, 2014 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-24998760

RESUMO

Medical image segmentation and anatomical structure labeling according to the types of the tissues are important for accurate diagnosis and therapy. In this paper, we propose a novel approach for multi-region labeling and segmentation, which is based on a topological graph prior and the topological information of an atlas, using a modified multi-level set energy minimization method in brain images. We consider a topological graph prior and atlas information to evolve the contour based on a topological relationship presented via a graph relation. This novel method is capable of segmenting adjacent objects with very close gray level in low resolution brain image that would be difficult to segment correctly using standard methods. The topological information of an atlas are transformed to the topological graph of a low resolution (noisy) brain image to obtain region labeling. We explain our algorithm and show the topological graph prior and label transformation techniques to explain how it gives precise multi-region segmentation and labeling. The proposed algorithm is capable of segmenting and labeling different regions in noisy or low resolution MRI brain images of different modalities. We compare our approaches with other state-of-the-art approaches for multi-region labeling and segmentation.

Assuntos

Encéfalo/anatomia & histologia , Documentação/métodos , Interpretação de Imagem Assistida por Computador/métodos , Imageamento por Ressonância Magnética/métodos , Modelos Anatômicos , Reconhecimento Automatizado de Padrão/métodos , Técnica de Subtração , Humanos , Aumento da Imagem/métodos , Sensibilidade e Especificidade , Terminologia como Assunto

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA