Search | VHL Regional Portal

1.

Deep Sensing for Compressive Video Acquisition.

Yoshida, Michitaka; Torii, Akihiko; Okutomi, Masatoshi; Taniguchi, Rin-Ichiro; Nagahara, Hajime; Yagi, Yasushi.

Sensors (Basel) ; 23(17)2023 Aug 30.

Article in English | MEDLINE | ID: mdl-37687990

ABSTRACT

A camera captures multidimensional information of the real world by convolving it into two dimensions using a sensing matrix. The original multidimensional information is then reconstructed from captured images. Traditionally, multidimensional information has been captured by uniform sampling, but by optimizing the sensing matrix, we can capture images more efficiently and reconstruct multidimensional information with high quality. Although compressive video sensing requires random sampling as a theoretical optimum, when designing the sensing matrix in practice, there are many hardware limitations (such as exposure and color filter patterns). Existing studies have found random sampling is not always the best solution for compressive sensing because the optimal sampling pattern is related to the scene context, and it is hard to manually design a sampling pattern and reconstruction algorithm. In this paper, we propose an end-to-end learning approach that jointly optimizes the sampling pattern as well as the reconstruction decoder. We applied this deep sensing approach to the video compressive sensing problem. We modeled the spatio-temporal sampling and color filter pattern using a convolutional neural network constrained by hardware limitations during network training. We demonstrated that the proposed method performs better than the manually designed method in gray-scale video and color video acquisitions.

2.

Long-Term Visual Localization Revisited.

Toft, Carl; Maddern, Will; Torii, Akihiko; Hammarstrand, Lars; Stenborg, Erik; Safari, Daniel; Okutomi, Masatoshi; Pollefeys, Marc; Sivic, Josef; Pajdla, Tomas; Kahl, Fredrik; Sattler, Torsten.

IEEE Trans Pattern Anal Mach Intell ; 44(4): 2074-2088, 2022 04.

Article in English | MEDLINE | ID: mdl-33074802

ABSTRACT

Visual localization enables autonomous vehicles to navigate in their surroundings and augmented reality applications to link virtual to real worlds. Practical visual localization approaches need to be robust to a wide variety of viewing conditions, including day-night changes, as well as weather and seasonal variations, while providing highly accurate six degree-of-freedom (6DOF) camera pose estimates. In this paper, we extend three publicly available datasets containing images captured under a wide variety of viewing conditions, but lacking camera pose information, with ground truth pose information, making evaluation of the impact of various factors on 6DOF camera pose estimation accuracy possible. We also discuss the performance of state-of-the-art localization approaches on these datasets. Additionally, we release around half of the poses for all conditions, and keep the remaining half private as a test set, in the hopes that this will stimulate research on long-term visual localization, learned local image features, and related research areas. Our datasets are available at visuallocalization.net, where we are also hosting a benchmarking server for automatic evaluation of results on the test set. The presented state-of-the-art results are to a large degree based on submissions to our server.

Subject(s)

Algorithms

3.

NCNet: Neighbourhood Consensus Networks for Estimating Image Correspondences.

Rocco, Ignacio; Cimpoi, Mircea; Arandjelovic, Relja; Torii, Akihiko; Pajdla, Tomas; Sivic, Josef.

IEEE Trans Pattern Anal Mach Intell ; 44(2): 1020-1034, 2022 Feb.

Article in English | MEDLINE | ID: mdl-32795965

ABSTRACT

We address the problem of finding reliable dense correspondences between a pair of images. This is a challenging task due to strong appearance differences between the corresponding scene elements and ambiguities generated by repetitive patterns. The contributions of this work are threefold. First, inspired by the classic idea of disambiguating feature matches using semi-local constraints, we develop an end-to-end trainable convolutional neural network architecture that identifies sets of spatially consistent matches by analyzing neighbourhood consensus patterns in the 4D space of all possible correspondences between a pair of images without the need for a global geometric model. Second, we demonstrate that the model can be trained effectively from weak supervision in the form of matching and non-matching image pairs without the need for costly manual annotation of point to point correspondences. Third, we show the proposed neighbourhood consensus network can be applied to a range of matching tasks including both category- and instance-level matching, obtaining the state-of-the-art results on the PF, TSS, InLoc, and HPatches benchmarks.

4.

InLoc: Indoor Visual Localization with Dense Matching and View Synthesis.

Taira, Hajime; Okutomi, Masatoshi; Sattler, Torsten; Cimpoi, Mircea; Pollefeys, Marc; Sivic, Josef; Pajdla, Tomas; Torii, Akihiko.

IEEE Trans Pattern Anal Mach Intell ; 43(4): 1293-1307, 2021 Apr.

Article in English | MEDLINE | ID: mdl-31722474

ABSTRACT

We seek to predict the 6 degree-of-freedom (6DoF) pose of a query photograph with respect to a large indoor 3D map. The contributions of this work are three-fold. First, we develop a new large-scale visual localization method targeted for indoor spaces. The method proceeds along three steps: (i) efficient retrieval of candidate poses that scales to large-scale environments, (ii) pose estimation using dense matching rather than sparse local features to deal with weakly textured indoor scenes, and (iii) pose verification by virtual view synthesis that is robust to significant changes in viewpoint, scene layout, and occlusion. Second, we release a new dataset with reference 6DoF poses for large-scale indoor localization. Query photographs are captured by mobile phones at a different time than the reference 3D map, thus presenting a realistic indoor localization scenario. Third, we demonstrate that our method significantly outperforms current state-of-the-art indoor localization approaches on this new challenging data. Code and data are publicly available.

5.

Are Large-Scale 3D Models Really Necessary for Accurate Visual Localization?

Torii, Akihiko; Taira, Hajime; Sivic, Josef; Pollefeys, Marc; Okutomi, Masatoshi; Pajdla, Tomas; Sattler, Torsten.

IEEE Trans Pattern Anal Mach Intell ; 43(3): 814-829, 2021 03.

Article in English | MEDLINE | ID: mdl-31535984

ABSTRACT

Accurate visual localization is a key technology for autonomous navigation. 3D structure-based methods employ 3D models of the scene to estimate the full 6 degree-of-freedom (DOF) pose of a camera very accurately. However, constructing (and extending) large-scale 3D models is still a significant challenge. In contrast, 2D image retrieval-based methods only require a database of geo-tagged images, which is trivial to construct and to maintain. They are often considered inaccurate since they only approximate the positions of the cameras. Yet, the exact camera pose can theoretically be recovered when enough relevant database images are retrieved. In this paper, we demonstrate experimentally that large-scale 3D models are not strictly necessary for accurate visual localization. We create reference poses for a large and challenging urban dataset. Using these poses, we show that combining image-based methods with local reconstructions results in a higher pose accuracy compared to state-of-the-art structure-based methods, albeight at higher run-time costs. We show that some of these run-time costs can be alleviated by exploiting known database image poses. Our results suggest that we might want to reconsider the need for large-scale 3D models in favor of more local models, but also that further research is necessary to accelerate the local reconstruction process.

6.

24/7 Place Recognition by View Synthesis.

Torii, Akihiko; Arandjelovic, Relja; Sivic, Josef; Okutomi, Masatoshi; Pajdla, Tomas.

IEEE Trans Pattern Anal Mach Intell ; 40(2): 257-271, 2018 02.

Article in English | MEDLINE | ID: mdl-28207385

ABSTRACT

We address the problem of large-scale visual place recognition for situations where the scene undergoes a major change in appearance, for example, due to illumination (day/night), change of seasons, aging, or structural modifications over time such as buildings being built or destroyed. Such situations represent a major challenge for current large-scale place recognition methods. This work has the following three principal contributions. First, we demonstrate that matching across large changes in the scene appearance becomes much easier when both the query image and the database image depict the scene from approximately the same viewpoint. Second, based on this observation, we develop a new place recognition approach that combines (i) an efficient synthesis of novel views with (ii) a compact indexable image representation. Third, we introduce a new challenging dataset of 1,125 camera-phone query images of Tokyo that contain major changes in illumination (day, sunset, night) as well as structural changes in the scene. We demonstrate that the proposed approach significantly outperforms other large-scale place recognition techniques on this challenging data.

7.

NetVLAD: CNN Architecture for Weakly Supervised Place Recognition.

Arandjelovic, Relja; Gronat, Petr; Torii, Akihiko; Pajdla, Tomas; Sivic, Josef.

IEEE Trans Pattern Anal Mach Intell ; 40(6): 1437-1451, 2018 06.

Article in English | MEDLINE | ID: mdl-28622667

ABSTRACT

We tackle the problem of large scale visual place recognition, where the task is to quickly and accurately recognize the location of a given query photograph. We present the following four principal contributions. First, we develop a convolutional neural network (CNN) architecture that is trainable in an end-to-end manner directly for the place recognition task. The main component of this architecture, NetVLAD, is a new generalized VLAD layer, inspired by the "Vector of Locally Aggregated Descriptors" image representation commonly used in image retrieval. The layer is readily pluggable into any CNN architecture and amenable to training via backpropagation. Second, we create a new weakly supervised ranking loss, which enables end-to-end learning of the architecture's parameters from images depicting the same places over time downloaded from Google Street View Time Machine. Third, we develop an efficient training procedure which can be applied on very large-scale weakly labelled tasks. Finally, we show that the proposed architecture and training procedure significantly outperform non-learnt image representations and off-the-shelf CNN descriptors on challenging place recognition and image retrieval benchmarks.

8.

Visual place recognition with repetitive structures.

Torii, Akihiko; Sivic, Josef; Okutomi, Masatoshi; Pajdla, Tomas.

IEEE Trans Pattern Anal Mach Intell ; 37(11): 2346-59, 2015 Nov.

Article in English | MEDLINE | ID: mdl-26440272

ABSTRACT

Repeated structures such as building facades, fences or road markings often represent a significant challenge for place recognition. Repeated structures are notoriously hard for establishing correspondences using multi-view geometry. They violate the feature independence assumed in the bag-of-visual-words representation which often leads to over-counting evidence and significant degradation of retrieval performance. In this work we show that repeated structures are not a nuisance but, when appropriately represented, they form an important distinguishing feature for many places. We describe a representation of repeated structures suitable for scalable retrieval and geometric verification. The retrieval is based on robust detection of repeated image structures and a suitable modification of weights in the bag-of-visual-word model. We also demonstrate that the explicit detection of repeated patterns is beneficial for robust visual word matching for geometric verification. Place recognition results are shown on datasets of street-level imagery from Pittsburgh and San Francisco demonstrating significant gains in recognition performance compared to the standard bag-of-visual-words baseline as well as the more recently proposed burstiness weighting and Fisher vector encoding.

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL