Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 22
Filter
1.
Med Image Anal ; 91: 103027, 2024 Jan.
Article in English | MEDLINE | ID: mdl-37992494

ABSTRACT

Established surgical navigation systems for pedicle screw placement have been proven to be accurate, but still reveal limitations in registration or surgical guidance. Registration of preoperative data to the intraoperative anatomy remains a time-consuming, error-prone task that includes exposure to harmful radiation. Surgical guidance through conventional displays has well-known drawbacks, as information cannot be presented in-situ and from the surgeon's perspective. Consequently, radiation-free and more automatic registration methods with subsequent surgeon-centric navigation feedback are desirable. In this work, we present a marker-less approach that automatically solves the registration problem for lumbar spinal fusion surgery in a radiation-free manner. A deep neural network was trained to segment the lumbar spine and simultaneously predict its orientation, yielding an initial pose for preoperative models, which then is refined for each vertebra individually and updated in real-time with GPU acceleration while handling surgeon occlusions. An intuitive surgical guidance is provided thanks to the integration into an augmented reality based navigation system. The registration method was verified on a public dataset with a median of 100% successful registrations, a median target registration error of 2.7 mm, a median screw trajectory error of 1.6°and a median screw entry point error of 2.3 mm. Additionally, the whole pipeline was validated in an ex-vivo surgery, yielding a 100% screw accuracy and a median target registration error of 1.0 mm. Our results meet clinical demands and emphasize the potential of RGB-D data for fully automatic registration approaches in combination with augmented reality guidance.


Subject(s)
Pedicle Screws , Spinal Fusion , Surgery, Computer-Assisted , Humans , Spine/diagnostic imaging , Spine/surgery , Surgery, Computer-Assisted/methods , Lumbar Vertebrae/diagnostic imaging , Lumbar Vertebrae/surgery , Spinal Fusion/methods
2.
Article in English | MEDLINE | ID: mdl-37021895

ABSTRACT

Finding local features that are repeatable across multiple views is a cornerstone of sparse 3D reconstruction. The classical image matching paradigm detects keypoints per-image once and for all, which can yield poorly-localized features and propagate large errors to the final geometry. In this paper, we refine two key steps of structure-from-motion by a direct alignment of low-level image information from multiple views: we first adjust the initial keypoint locations prior to any geometric estimation, and subsequently refine points and camera poses as a post-processing. This refinement is robust to large detection noise and appearance changes, as it optimizes a featuremetric error based on dense features predicted by a neural network. This significantly improves the accuracy of camera poses and scene geometry for a wide range of keypoint detectors, challenging viewing conditions, and off-the-shelf deep features. Our system easily scales to large image collections, enabling pixel-perfect crowd-sourced localization at scale. Our code is publicly available at https://github.com/cvg/pixel-perfect-sfm as an add-on to the popular Structure-from-Motion software COLMAP.

3.
IEEE Trans Pattern Anal Mach Intell ; 45(1): 932-945, 2023 Jan.
Article in English | MEDLINE | ID: mdl-35294342

ABSTRACT

3D hand pose estimation is a challenging problem in computer vision due to the high degrees-of-freedom of hand articulated motion space and large viewpoint variation. As a consequence, similar poses observed from multiple views can be dramatically different. In order to deal with this issue, view-independent features are required to achieve state-of-the-art performance. In this paper, we investigate the impact of view-independent features on 3D hand pose estimation from a single depth image, and propose a novel recurrent neural network for 3D hand pose estimation, in which a cascaded 3D pose-guided alignment strategy is designed for view-independent feature extraction and a recurrent hand pose module is designed for modeling the dependencies among sequential aligned features for 3D hand pose estimation. In particular, our cascaded pose-guided 3D alignments are performed in 3D space in a coarse-to-fine fashion. First, hand joints are predicted and globally transformed into a canonical reference frame; Second, the palm of the hand is detected and aligned; Third, local transformations are applied to the fingers to refine the final predictions. The proposed recurrent hand pose module for aligned 3D representation can extract recurrent pose-aware features and iteratively refines the estimated hand pose. Our recurrent module could be utilized for both single-view estimation and sequence-based estimation with 3D hand pose tracking. Experiments show that our method improves the state-of-the-art by a large margin on popular benchmarks with the simple yet efficient alignment and network architectures.

4.
IEEE Trans Pattern Anal Mach Intell ; 45(2): 2151-2165, 2023 Feb.
Article in English | MEDLINE | ID: mdl-35344487

ABSTRACT

Undesirable reflections contained in photos taken in front of glass windows or doors often degrade visual quality of the image. Separating two layers apart benefits both human and machine perception. The polarization status of the light changes after refraction or reflection, providing more observations of the scene, which can benefit the reflection separation. Different from previous works that take three or more polarization images as input, we propose to exploit physical constraints from a pair of unpolarized and polarized images to separate reflection and transmission layers in this paper. Due to the simplified capturing setup, the system is more under-determined compared to the existing polarization-based works. In order to solve this problem, we propose to estimate the semi-reflector orientation first to make the physical image formation well-posed, and then learn to reliably separate two layers using additional networks based on both physical and numerical analysis. In addition, a motion estimation network is introduced to handle the misalignment of paired input. Quantitative and qualitative experimental results show our approach performs favorably over existing polarization and single image based solutions.

5.
IEEE Trans Pattern Anal Mach Intell ; 45(4): 4945-4963, 2023 Apr.
Article in English | MEDLINE | ID: mdl-35984800

ABSTRACT

In this paper, we propose some efficient multi-view stereo methods for accurate and complete depth map estimation. We first present our basic methods with Adaptive Checkerboard sampling and Multi-Hypothesis joint view selection (ACMH & ACMH+). Based on our basic models, we develop two frameworks to deal with the depth estimation of ambiguous regions (especially low-textured areas) from two different perspectives: multi-scale information fusion and planar geometric clue assistance. For the former one, we propose a multi-scale geometric consistency guidance framework (ACMM) to obtain the reliable depth estimates for low-textured areas at coarser scales and guarantee that they can be propagated to finer scales. For the latter one, we propose a planar prior assisted framework (ACMP). We utilize a probabilistic graphical model to contribute a novel multi-view aggregated matching cost. At last, by taking advantage of the above frameworks, we further design a multi-scale geometric consistency guided and planar prior assisted multi-view stereo (ACMMP). This greatly enhances the discrimination of ambiguous regions and helps their depth sensing. Experiments on extensive datasets show our methods achieve state-of-the-art performance, recovering the depth estimation not only in low-textured areas but also in details. Related codes are available at https://github.com/GhiXu.

6.
IEEE Trans Pattern Anal Mach Intell ; 44(4): 2074-2088, 2022 04.
Article in English | MEDLINE | ID: mdl-33074802

ABSTRACT

Visual localization enables autonomous vehicles to navigate in their surroundings and augmented reality applications to link virtual to real worlds. Practical visual localization approaches need to be robust to a wide variety of viewing conditions, including day-night changes, as well as weather and seasonal variations, while providing highly accurate six degree-of-freedom (6DOF) camera pose estimates. In this paper, we extend three publicly available datasets containing images captured under a wide variety of viewing conditions, but lacking camera pose information, with ground truth pose information, making evaluation of the impact of various factors on 6DOF camera pose estimation accuracy possible. We also discuss the performance of state-of-the-art localization approaches on these datasets. Additionally, we release around half of the poses for all conditions, and keep the remaining half private as a test set, in the hopes that this will stimulate research on long-term visual localization, learned local image features, and related research areas. Our datasets are available at visuallocalization.net, where we are also hosting a benchmarking server for automatic evaluation of results on the test set. The presented state-of-the-art results are to a large degree based on submissions to our server.


Subject(s)
Algorithms
7.
Int J Comput Assist Radiol Surg ; 16(5): 799-808, 2021 May.
Article in English | MEDLINE | ID: mdl-33881732

ABSTRACT

PURPOSE:  : Tracking of tools and surgical activity is becoming more and more important in the context of computer assisted surgery. In this work, we present a data generation framework, dataset and baseline methods to facilitate further research in the direction of markerless hand and instrument pose estimation in realistic surgical scenarios. METHODS:  : We developed a rendering pipeline to create inexpensive and realistic synthetic data for model pretraining. Subsequently, we propose a pipeline to capture and label real data with hand and object pose ground truth in an experimental setup to gather high-quality real data. We furthermore present three state-of-the-art RGB-based pose estimation baselines. RESULTS:  : We evaluate three baseline models on the proposed datasets. The best performing baseline achieves an average tool 3D vertex error of 16.7 mm on synthetic data as well as 13.8 mm on real data which is comparable to the state-of-the art in RGB-based hand/object pose estimation. CONCLUSION:  : To the best of our knowledge, we propose the first synthetic and real data generation pipelines to generate hand and object pose labels for open surgery. We present three baseline models for RGB based object and object/hand pose estimation based on RGB frames. Our realistic synthetic data generation pipeline may contribute to overcome the data bottleneck in the surgical domain and can easily be transferred to other medical applications.


Subject(s)
Deep Learning , Hand/diagnostic imaging , Imaging, Three-Dimensional/methods , Surgery, Computer-Assisted/methods , Algorithms , Calibration , Humans , Operating Rooms , Orthopedics/methods , Reproducibility of Results
8.
IEEE Trans Pattern Anal Mach Intell ; 43(4): 1293-1307, 2021 Apr.
Article in English | MEDLINE | ID: mdl-31722474

ABSTRACT

We seek to predict the 6 degree-of-freedom (6DoF) pose of a query photograph with respect to a large indoor 3D map. The contributions of this work are three-fold. First, we develop a new large-scale visual localization method targeted for indoor spaces. The method proceeds along three steps: (i) efficient retrieval of candidate poses that scales to large-scale environments, (ii) pose estimation using dense matching rather than sparse local features to deal with weakly textured indoor scenes, and (iii) pose verification by virtual view synthesis that is robust to significant changes in viewpoint, scene layout, and occlusion. Second, we release a new dataset with reference 6DoF poses for large-scale indoor localization. Query photographs are captured by mobile phones at a different time than the reference 3D map, thus presenting a realistic indoor localization scenario. Third, we demonstrate that our method significantly outperforms current state-of-the-art indoor localization approaches on this new challenging data. Code and data are publicly available.

9.
IEEE Trans Pattern Anal Mach Intell ; 43(3): 814-829, 2021 03.
Article in English | MEDLINE | ID: mdl-31535984

ABSTRACT

Accurate visual localization is a key technology for autonomous navigation. 3D structure-based methods employ 3D models of the scene to estimate the full 6 degree-of-freedom (DOF) pose of a camera very accurately. However, constructing (and extending) large-scale 3D models is still a significant challenge. In contrast, 2D image retrieval-based methods only require a database of geo-tagged images, which is trivial to construct and to maintain. They are often considered inaccurate since they only approximate the positions of the cameras. Yet, the exact camera pose can theoretically be recovered when enough relevant database images are retrieved. In this paper, we demonstrate experimentally that large-scale 3D models are not strictly necessary for accurate visual localization. We create reference poses for a large and challenging urban dataset. Using these poses, we show that combining image-based methods with local reconstructions results in a higher pose accuracy compared to state-of-the-art structure-based methods, albeight at higher run-time costs. We show that some of these run-time costs can be alleviated by exploiting known database image poses. Our results suggest that we might want to reconsider the need for large-scale 3D models in favor of more local models, but also that further research is necessary to accelerate the local reconstruction process.

10.
Article in English | MEDLINE | ID: mdl-31613751

ABSTRACT

We address the problem of mesh reconstruction from live RGB-D video, assuming a calibrated camera and poses provided externally (e.g., by a SLAM system). In contrast to most existing approaches, we do not fuse depth measurements in a volume but in a dense surfel cloud. We asynchronously (re)triangulate the smoothed surfels to reconstruct a surface mesh. This novel approach enables to maintain a dense surface representation of the scene during SLAM which can quickly adapt to loop closures. This is possible by deforming the surfel cloud and asynchronously remeshing the surface where necessary. The surfel-based representation also naturally supports strongly varying scan resolution. In particular, it reconstructs colors at the input camera's resolution. Moreover, in contrast to many volumetric approaches, ours can reconstruct thin objects since objects do not need to enclose a volume. We demonstrate our approach in a number of experiments, showing that it produces reconstructions that are competitive with the state-of-the-art, and we discuss its advantages and limitations. The algorithm (excluding loop closure functionality) is available as open source at https://github.com/puzzlepaint/surfelmeshing.

11.
IEEE Trans Vis Comput Graph ; 23(11): 2455-2462, 2017 11.
Article in English | MEDLINE | ID: mdl-28809696

ABSTRACT

We present a real-time method for rendering novel virtual camera views from given RGB-D (color and depth) data of a different viewpoint. Missing color and depth information due to incomplete input or disocclusions is efficiently inpainted in a temporally consistent way. The inpainting takes the location of strong image gradients into account as likely depth discontinuities. We present our method in the context of a view correction system for mobile devices, and discuss how to obtain a screen-camera calibration and options for acquiring depth input. Our method has use cases in both augmented and virtual reality applications. We demonstrate the speed of our system and the visual quality of its results in multiple experiments in the paper as well as in the supplementary video.

12.
IEEE Trans Pattern Anal Mach Intell ; 39(9): 1730-1743, 2017 09.
Article in English | MEDLINE | ID: mdl-28113966

ABSTRACT

Both image segmentation and dense 3D modeling from images represent an intrinsically ill-posed problem. Strong regularizers are therefore required to constrain the solutions from being 'too noisy'. These priors generally yield overly smooth reconstructions and/or segmentations in certain regions while they fail to constrain the solution sufficiently in other areas. In this paper, we argue that image segmentation and dense 3D reconstruction contribute valuable information to each other's task. As a consequence, we propose a mathematical framework to formulate and solve a joint segmentation and dense reconstruction problem. On the one hand knowing about the semantic class of the geometry provides information about the likelihood of the surface direction. On the other hand the surface direction provides information about the likelihood of the semantic class. Experimental results on several data sets highlight the advantages of our joint formulation. We show how weakly observed surfaces are reconstructed more faithfully compared to a geometry only reconstruction. Thanks to the volumetric nature of our formulation we also infer surfaces which cannot be directly observed for example the surface between the ground and a building. Finally, our method returns a semantic segmentation which is consistent across the whole dataset.

13.
IEEE Trans Pattern Anal Mach Intell ; 39(2): 327-341, 2017 02.
Article in English | MEDLINE | ID: mdl-27019476

ABSTRACT

In this paper, we explore the different minimal solutions for egomotion estimation of a camera based on homography knowing the gravity vector between calibrated images. These solutions depend on the prior knowledge about the reference plane used by the homography. We then demonstrate that the number of matched points can vary from two to three and that a direct closed-form solution or a Gröbner basis based solution can be derived according to this plane. Many experimental results on synthetic and real sequences in indoor and outdoor environments show the efficiency and the robustness of our approach compared to standard methods.

14.
IEEE Trans Pattern Anal Mach Intell ; 37(11): 2193-206, 2015 Nov.
Article in English | MEDLINE | ID: mdl-26440261

ABSTRACT

We propose a method to detect changes in the geometry of a city using panoramic images captured by a car driving around the city. The proposed method can be used to significantly optimize the process of updating the 3D model of an urban environment that is changing over time, by restricting this process to only those areas where changes are detected. With this application in mind, we designed our algorithm to specifically detect only structural changes in the environment, ignoring any changes in its appearance, and ignoring also all the changes which are not relevant for update purposes such as cars, people etc. The approach also accounts for the challenges involved in a large scale application of change detection, such as inaccuracies in the input geometry, errors in the geo-location data of the images as well as the limited amount of information due to sparse imagery. We evaluated our approach on a small scale setup using high resolution, densely captured images and a large scale setup covering an entire city using instead the more realistic scenario of low resolution, sparsely captured images. A quantitative evaluation was also conducted for the large scale setup consisting of 14,000 images.

15.
IEEE Trans Pattern Anal Mach Intell ; 36(1): 157-70, 2014 Jan.
Article in English | MEDLINE | ID: mdl-24231873

ABSTRACT

In this work, we present a unified view on Markov random fields (MRFs) and recently proposed continuous tight convex relaxations for multilabel assignment in the image plane. These relaxations are far less biased toward the grid geometry than Markov random fields on grids. It turns out that the continuous methods are nonlinear extensions of the well-established local polytope MRF relaxation. In view of this result, a better understanding of these tight convex relaxations in the discrete setting is obtained. Further, a wider range of optimization methods is now applicable to find a minimizer of the tight formulation. We propose two methods to improve the efficiency of minimization. One uses a weaker, but more efficient continuously inspired approach as initialization and gradually refines the energy where it is necessary. The other one reformulates the dual energy enabling smooth approximations to be used for efficient optimization. We demonstrate the utility of our proposed minimization schemes in numerical experiments. Finally, we generalize the underlying energy formulation from isotropic metric smoothness costs to arbitrary nonmetric and orientation dependent smoothness terms.

16.
IEEE Trans Vis Comput Graph ; 20(2): 262-75, 2014 Feb.
Article in English | MEDLINE | ID: mdl-24356368

ABSTRACT

Given the growth of Internet photo collections, we now have a visual index of all major cities and tourist sites in the world. However, it is still a difficult task to capture that perfect shot with your own camera when visiting these places, especially when your camera itself has limitations, such as a limited field of view. In this paper, we propose a framework to overcome the imperfections of personal photographs of tourist sites using the rich information provided by large-scale Internet photo collections. Our method deploys state-of-the-art techniques for constructing initial 3D models from photo collections. The same techniques are then used to register personal photographs to these models, allowing us to augment personal 2D images with 3D information. This strong available scene prior allows us to address a number of traditionally challenging image enhancement techniques and achieve high-quality results using simple and robust algorithms. Specifically, we demonstrate automatic foreground segmentation, mono-to-stereo conversion, field-of-view expansion, photometric enhancement, and additionally automatic annotation with geolocation and tags. Our method clearly demonstrates some possible benefits of employing the rich information contained in online photo databases to efficiently enhance and augment one's own personal photographs.

17.
IEEE Trans Pattern Anal Mach Intell ; 35(8): 2022-38, 2013 Aug.
Article in English | MEDLINE | ID: mdl-23787350

ABSTRACT

A computational problem that arises frequently in computer vision is that of estimating the parameters of a model from data that have been contaminated by noise and outliers. More generally, any practical system that seeks to estimate quantities from noisy data measurements must have at its core some means of dealing with data contamination. The random sample consensus (RANSAC) algorithm is one of the most popular tools for robust estimation. Recent years have seen an explosion of activity in this area, leading to the development of a number of techniques that improve upon the efficiency and robustness of the basic RANSAC algorithm. In this paper, we present a comprehensive overview of recent research in RANSAC-based robust estimation by analyzing and comparing various approaches that have been explored over the years. We provide a common context for this analysis by introducing a new framework for robust estimation, which we call Universal RANSAC (USAC). USAC extends the simple hypothesize-and-verify structure of standard RANSAC to incorporate a number of important practical and computational considerations. In addition, we provide a general-purpose C++ software library that implements the USAC framework by leveraging state-of-the-art algorithms for the various modules. This implementation thus addresses many of the limitations of standard RANSAC within a single unified package. We benchmark the performance of the algorithm on a large collection of estimation problems. The implementation we provide can be used by researchers either as a stand-alone tool for robust estimation or as a benchmark for evaluating new techniques.

18.
IEEE Trans Pattern Anal Mach Intell ; 35(5): 1107-20, 2013 May.
Article in English | MEDLINE | ID: mdl-22868652

ABSTRACT

We present a supervised learning-based method to estimate a per-pixel confidence for optical flow vectors. Regions of low texture and pixels close to occlusion boundaries are known to be difficult for optical flow algorithms. Using a spatiotemporal feature vector, we estimate if a flow algorithm is likely to fail in a given region. Our method is not restricted to any specific class of flow algorithm and does not make any scene specific assumptions. By automatically learning this confidence, we can combine the output of several computed flow fields from different algorithms to select the best performing algorithm per pixel. Our optical flow confidence measure allows one to achieve better overall results by discarding the most troublesome pixels. We illustrate the effectiveness of our method on four different optical flow algorithms over a variety of real and synthetic sequences. For algorithm selection, we achieve the top overall results on a large test set, and at times even surpass the results of the best algorithm among the candidates.

19.
Med Image Anal ; 16(1): 160-76, 2012 Jan.
Article in English | MEDLINE | ID: mdl-21920798

ABSTRACT

Specialists often need to browse through libraries containing many diagnostic hysteroscopy videos searching for similar cases, or even to review the video of one particular case. Video searching and browsing can be used in many situations, like in case-based diagnosis when videos of previously diagnosed cases are compared, in case referrals, in reviewing the patient records, as well as for supporting medical research (e.g. in human reproduction). However, in terms of visual content, diagnostic hysteroscopy videos contain lots of information, but only a reduced number of frames are actually useful for diagnosis/prognosis purposes. In order to facilitate the browsing task, we propose in this paper a technique for estimating the clinical relevance of video segments in diagnostic hysteroscopies. Basically, the proposed technique associates clinical relevance with the attention attracted by a diagnostic hysteroscopy video segment during the video acquisition (i.e. during the diagnostic hysteroscopy conducted by a specialist). We show that the resulting video summary allows specialists to browse the video contents nonlinearly, while avoiding spending time on spurious visual information. In this work, we review state-of-art methods for summarizing general videos and how they apply to diagnostic hysteroscopy videos (considering their specific characteristics), and conclude that our proposed method contributes to the field with a summarization and representation method specific for video hysteroscopies. The experimental results indicate that our method tends to produce compact video summaries without discarding clinically relevant information.


Subject(s)
Attention , Data Mining/methods , Hysteroscopy/methods , Image Interpretation, Computer-Assisted/methods , Radiology Information Systems , User-Computer Interface , Video Recording/methods , Database Management Systems , Female , Humans
SELECTION OF CITATIONS
SEARCH DETAIL
...