Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 32
Filter
1.
Sensors (Basel) ; 23(23)2023 Dec 02.
Article in English | MEDLINE | ID: mdl-38067953

ABSTRACT

Robust and precise visual localization over extended periods of time poses a formidable challenge in the current domain of spatial vision. The primary difficulty lies in effectively addressing significant variations in appearance caused by seasonal changes (summer, winter, spring, autumn) and diverse lighting conditions (dawn, day, sunset, night). With the rapid development of related technologies, more and more relevant datasets have emerged, which has also promoted the progress of 6-DOF visual localization in both directions of autonomous vehicles and handheld devices.This manuscript endeavors to rectify the existing limitations of the current public benchmark for long-term visual localization, especially in the part on the autonomous vehicle challenge. Taking into account that autonomous vehicle datasets are primarily captured by multi-camera rigs with fixed extrinsic camera calibration and consist of serialized image sequences, we present several proposed modifications designed to enhance the rationality and comprehensiveness of the evaluation algorithm. We advocate for standardized preprocessing procedures to minimize the possibility of human intervention influencing evaluation results. These procedures involve aligning the positions of multiple cameras on the vehicle with a predetermined canonical reference system, replacing the individual camera positions with uniform vehicle poses, and incorporating sequence information to compensate for any failed localized poses. These steps are crucial in ensuring a just and accurate evaluation of algorithmic performance. Lastly, we introduce a novel indicator to resolve potential ties in the Schulze ranking among submitted methods. The inadequacies highlighted in this study are substantiated through simulations and actual experiments, which unequivocally demonstrate the necessity and effectiveness of our proposed amendments.

2.
IEEE Trans Cybern ; 52(2): 862-872, 2022 Feb.
Article in English | MEDLINE | ID: mdl-32413945

ABSTRACT

Camera translation averaging, aiming to recover the global camera locations from a given set of camera translation directions, is a challenging problem for Structure from Motion (SfM) in the field of computer vision, largely due to the fact that the given relative translation directions from a set of noisy essential matrices are generally of low accuracy. To tackle this problem, we first reveal a novel but a simple property of the camera translation matrix consisting of all the pairwise camera translations among an arbitrary set of cameras that the rank of this translation matrix is always smaller or equal to 4. Then, by explicitly enforcing this rank property, a novel translation estimation method for computing global camera locations is proposed, called TERE. Moreover, to further improve the performances of the explored TERE in the two aspects of accuracy and speed, an iterative batch-based translation estimation method is proposed, called B-TERE, where a small-scale batch of cameras is selected without replacement from the given set of cameras according to a simple camera selection strategy at each iterative step, and the locations of the selected cameras are estimated by the proposed TERE accordingly. Extensive experimental results on various datasets demonstrate that our proposed methods could achieve better performances in comparison to several state-of-the-art methods.


Subject(s)
Photography
3.
IEEE Trans Image Process ; 30: 6943-6956, 2021.
Article in English | MEDLINE | ID: mdl-34343091

ABSTRACT

In zero-shot learning (ZSL) community, it is generally recognized that transductive learning performs better than inductive one as the unseen-class samples are also used in its training stage. How to generate pseudo labels for unseen-class samples and how to use such usually noisy pseudo labels are two critical issues in transductive learning. In this work, we introduce an iterative co-training framework which contains two different base ZSL models and an exchanging module. At each iteration, the two different ZSL models are co-trained to separately predict pseudo labels for the unseen-class samples, and the exchanging module exchanges the predicted pseudo labels, then the exchanged pseudo-labeled samples are added into the training sets for the next iteration. By such, our framework can gradually boost the ZSL performance by fully exploiting the potential complementarity of the two models' classification capabilities. In addition, our co-training framework is also applied to the generalized ZSL (GZSL), in which a semantic-guided OOD detector is proposed to pick out the most likely unseen-class samples before class-level classification to alleviate the bias problem in GZSL. Extensive experiments on three benchmarks show that our proposed methods could significantly outperform about 31 state-of-the-art ones.

4.
IEEE Trans Image Process ; 30: 7458-7471, 2021.
Article in English | MEDLINE | ID: mdl-34449362

ABSTRACT

Urban scene modeling is a challenging task for the photogrammetry and computer vision community due to its large scale, structural complexity, and topological delicacy. This paper presents an efficient multistep modeling framework for large-scale urban scenes from aerial images. It takes aerial images and a textured 3D mesh model generated by an image-based modeling system as the input and outputs compact polygon models with semantics at different levels of detail (LODs). Based on the key observation that urban buildings usually have piecewise planar rooftops and vertical walls, we propose a segment-based modeling method, which consists of three major stages: scene segmentation, roof contour extraction, and building modeling. By combining the deep neural network predictions with geometric constraints of the 3D mesh, the scene is first segmented into three classes. Then, for each building mesh, the 2D line segments are detected and used to slice the ground into polygon cells, followed by assigning each cell a roof plane via a MRF optimization. Finally, the LOD model is obtained by extruding cells to their corresponding planes. Compared with direct modeling in 3D space, we transform the mesh into a uniform 2D image grid representation and most of the modeling work is performed in 2D space, which has the advantages of low computational complexity and high robustness. In addition, our method doesn't require any global prior, such as the Manhattan or Atlanta world assumption, making it flexible to model scenes with different characteristics and complexity. Experiments on both single buildings and large-scale urban scenes demonstrate that by combining 2D photometric with 3D geometric information, the proposed algorithm is robust and efficient in urban scene LOD vectorized modeling compared with the state-of-the-art approaches.

5.
Front Comput Neurosci ; 14: 35, 2020.
Article in English | MEDLINE | ID: mdl-32477087

ABSTRACT

Recently DCNN (Deep Convolutional Neural Network) has been advocated as a general and promising modeling approach for neural object representation in primate inferotemporal cortex. In this work, we show that some inherent non-uniqueness problem exists in the DCNN-based modeling of image object representations. This non-uniqueness phenomenon reveals to some extent the theoretical limitation of this general modeling approach, and invites due attention to be taken in practice.

6.
Sensors (Basel) ; 19(6)2019 Mar 13.
Article in English | MEDLINE | ID: mdl-30871277

ABSTRACT

In this paper, we put forward a new method for surface reconstruction from image-based point clouds. In particular, we introduce a new visibility model for each line of sight to preserve scene details without decreasing the noise filtering ability. To make the proposed method suitable for point clouds with heavy noise, we introduce a new likelihood energy term to the total energy of the binary labeling problem of Delaunay tetrahedra, and we give its s-t graph implementation. Besides, we further improve the performance of the proposed method with the dense visibility technique, which helps to keep the object edge sharp. The experimental result shows that the proposed method rivalled the state-of-the-art methods in terms of accuracy and completeness, and performed better with reference to detail preservation.

7.
Sensors (Basel) ; 18(7)2018 Jul 02.
Article in English | MEDLINE | ID: mdl-30004420

ABSTRACT

A multi-camera dense RGB-D SLAM (simultaneous localization and mapping) system has the potential both to speed up scene reconstruction and to improve localization accuracy, thanks to multiple mounted sensors and an enlarged effective field of view. To effectively tap the potential of the system, two issues must be understood: first, how to calibrate the system where sensors usually shares small or no common field of view to maximally increase the effective field of view; second, how to fuse the location information from different sensors. In this work, a three-Kinect system is reported. For system calibration, two kinds of calibration methods are proposed, one is suitable for system with inertial measurement unit (IMU) using an improved hand⁻eye calibration method, the other for pure visual SLAM without any other auxiliary sensors. In the RGB-D SLAM stage, we extend and improve a state-of-art single RGB-D SLAM method to multi-camera system. We track the multiple cameras' poses independently and select the one with the pose minimal-error as the reference pose at each moment to correct other cameras' poses. To optimize the initial estimated pose, we improve the deformation graph by adding an attribute of device number to distinguish surfels built by different cameras and do deformations according to the device number. We verify the accuracy of our extrinsic calibration methods in the experiment section and show the satisfactory reconstructed models by our multi-camera dense RGB-D SLAM. The RMSE (root-mean-square error) of the lengths measured in our reconstructed mode is 1.55 cm (similar to the state-of-art single camera RGB-D SLAM systems).

8.
Article in English | MEDLINE | ID: mdl-29994526

ABSTRACT

Learning depth from a single image, as an important issue in scene understanding, has attracted a lot of attention in the past decade. The accuracy of the depth estimation has been improved from conditional Markov random fields, non-parametric methods, to deep convolutional neural networks most recently. However, there exist inherent ambiguities in recovering 3D from a single 2D image. In this paper, we first prove the ambiguity between the focal length and monocular depth learning, and verify the result using experiments, showing that the focal length has a great influence on accurate depth recovery. In order to learn monocular depth by embedding the focal length, we propose a method to generate synthetic varying-focal-length dataset from fixed-focal-length datasets, and a simple and effective method is implemented to fill the holes in the newly generated images. For the sake of accurate depth recovery, we propose a novel deep neural network to infer depth through effectively fusing the middle-level information on the fixed-focal-length dataset, which outperforms the state-of-the-art methods built on pretrained VGG. Furthermore, the newly generated varying-focallength dataset is taken as input to the proposed network in both learning and inference phases. Extensive experiments on the fixed- and varying-focal-length datasets demonstrate that the learned monocular depth with embedded focal length is significantly improved compared to that without embedding the focal length information.

10.
Neural Comput ; 30(2): 447-476, 2018 02.
Article in English | MEDLINE | ID: mdl-29162010

ABSTRACT

Under the goal-driven paradigm, Yamins et al. ( 2014 ; Yamins & DiCarlo, 2016 ) have shown that by optimizing only the final eight-way categorization performance of a four-layer hierarchical network, not only can its top output layer quantitatively predict IT neuron responses but its penultimate layer can also automatically predict V4 neuron responses. Currently, deep neural networks (DNNs) in the field of computer vision have reached image object categorization performance comparable to that of human beings on ImageNet, a data set that contains 1.3 million training images of 1000 categories. We explore whether the DNN neurons (units in DNNs) possess image object representational statistics similar to monkey IT neurons, particularly when the network becomes deeper and the number of image categories becomes larger, using VGG19, a typical and widely used deep network of 19 layers in the computer vision field. Following Lehky, Kiani, Esteky, and Tanaka ( 2011 , 2014 ), where the response statistics of 674 IT neurons to 806 image stimuli are analyzed using three measures (kurtosis, Pareto tail index, and intrinsic dimensionality), we investigate the three issues in this letter using the same three measures: (1) the similarities and differences of the neural response statistics between VGG19 and primate IT cortex, (2) the variation trends of the response statistics of VGG19 neurons at different layers from low to high, and (3) the variation trends of the response statistics of VGG19 neurons when the numbers of stimuli and neurons increase. We find that the response statistics on both single-neuron selectivity and population sparseness of VGG19 neurons are fundamentally different from those of IT neurons in most cases; by increasing the number of neurons in different layers and the number of stimuli, the response statistics of neurons at different layers from low to high do not substantially change; and the estimated intrinsic dimensionality values at the low convolutional layers of VGG19 are considerably larger than the value of approximately 100 reported for IT neurons in Lehky et al. ( 2014 ), whereas those at the high fully connected layers are close to or lower than 100. To the best of our knowledge, this work is the first attempt to analyze the response statistics of DNN neurons with respect to primate IT neurons in image object representation.


Subject(s)
Cerebral Cortex/physiology , Image Processing, Computer-Assisted/methods , Neural Networks, Computer , Neurons/physiology , Visual Perception/physiology , Action Potentials , Animals , Macaca mulatta , Models, Neurological , Pattern Recognition, Automated/methods , Statistics as Topic
11.
Front Comput Neurosci ; 11: 60, 2017.
Article in English | MEDLINE | ID: mdl-28747882

ABSTRACT

Lehky et al. (2011) provided a statistical analysis on the responses of the recorded 674 neurons to 806 image stimuli in anterior inferotemporalm (AIT) cortex of two monkeys. In terms of kurtosis and Pareto tail index, they observed that the population sparseness of both unnormalized and normalized responses is always larger than their single-neuron selectivity, hence concluded that the critical features for individual neurons in primate AIT cortex are not very complex, but there is an indefinitely large number of them. In this work, we explore an "inverse problem" by simulation, that is, by simulating each neuron indeed only responds to a very limited number of stimuli among a very large number of neurons and stimuli, to assess whether the population sparseness is always larger than the single-neuron selectivity. Our simulation results show that the population sparseness exceeds the single-neuron selectivity in most cases even if the number of neurons and stimuli are much larger than several hundreds, which confirms the observations in Lehky et al. (2011). In addition, we found that the variances of the computed kurtosis and Pareto tail index are quite large in some cases, which reveals some limitations of these two criteria when used for neuron response evaluation.

12.
IEEE Trans Image Process ; 26(8): 3775-3788, 2017 Aug.
Article in English | MEDLINE | ID: mdl-28534771

ABSTRACT

This paper aims at bridging the two important trends in efficient graph cuts in the literature, the one is to decompose a graph into several smaller subgraphs to take the advantage of parallel computation, the other is to reuse the solution of the max-flow problem on a residual graph to boost the efficiency on another similar graph. Our proposed parallel dynamic graph cuts algorithm takes the advantages of both, and is extremely efficient for certain dynamically changing MRF models in computer vision. The performance of our proposed algorithm is validated on two typical dynamic graph cuts problems: the foreground-background segmentation in video, where similar graph cuts problems need to be solved in sequential and GrabCut, where graph cuts are used iteratively.

13.
IEEE Trans Image Process ; 25(12): 5511-5525, 2016 12.
Article in English | MEDLINE | ID: mdl-27654484

ABSTRACT

Graph cuts are widely used in computer vision. To speed up the optimization process and improve the scalability for large graphs, Strandmark and Kahl introduced a splitting method to split a graph into multiple subgraphs for parallel computation in both shared and distributed memory models. However, this parallel algorithm (the parallel BK-algorithm) does not have a polynomial bound on the number of iterations and is found to be non-convergent in some cases due to the possible multiple optimal solutions of its sub-problems. To remedy this non-convergence problem, in this paper, we first introduce a merging method capable of merging any number of those adjacent sub-graphs that can hardly reach agreement on their overlapping regions in the parallel BK-algorithm. Based on the pseudo-boolean representations of graph cuts, our merging method is shown to be effectively reused all the computed flows in these sub-graphs. Through both splitting and merging, we further propose a dynamic parallel and distributed graph cuts algorithm with guaranteed convergence to the globally optimal solutions within a predefined number of iterations. In essence, this paper provides a general framework to allow more sophisticated splitting and merging strategies to be employed to further boost performance. Our dynamic parallel algorithm is validated with extensive experimental results.

14.
PLoS One ; 10(7): e0132354, 2015.
Article in English | MEDLINE | ID: mdl-26218615

ABSTRACT

Line triangulation, a classical geometric problem in computer vision, is to determine the 3D coordinates of a line based on its 2D image projections from more than two views of cameras with known projection matrices. Compared to point features, line segments are more robust to matching errors, occlusions, and image uncertainties. In addition to line triangulation, a better metric is needed to evaluate 3D errors of line triangulation. In this paper, the line triangulation problem is investigated by using the Lagrange multipliers theory. The main contributions include: (i) Based on the Lagrange multipliers theory, a formula to compute the Plücker correction is provided, and from the formula, a new linear algorithm, LINa, is proposed for line triangulation; (ii) two optimal algorithms, OPTa-I and OPTa-II, are proposed by minimizing the algebraic error; and (iii) two metrics on 3D line space, the orthogonal metric and the quasi-Riemannian metric, are introduced for the evaluation of line triangulations. Extensive experiments on synthetic data and real images are carried out to validate and demonstrate the effectiveness of the proposed algorithms.


Subject(s)
Models, Theoretical , Mathematics
15.
IEEE Trans Image Process ; 24(11): 3561-73, 2015 Nov.
Article in English | MEDLINE | ID: mdl-26111397

ABSTRACT

One of the potentially effective means for large-scale 3D scene reconstruction is to reconstruct the scene in a global manner, rather than incrementally, by fully exploiting available auxiliary information on the imaging condition, such as camera location by Global Positioning System (GPS), orientation by inertial measurement unit (or compass), focal length from EXIF, and so on. However, such auxiliary information, though informative and valuable, is usually too noisy to be directly usable. In this paper, we present an approach by taking advantage of such noisy auxiliary information to improve structure from motion solving. More specifically, we introduce two effective iterative global optimization algorithms initiated with such noisy auxiliary information. One is a robust rotation averaging algorithm to deal with contaminated epipolar graph, the other is a robust scene reconstruction algorithm to deal with noisy GPS data for camera centers initialization. We found that by exclusively focusing on the estimated inliers at the current iteration, the optimization process initialized by such noisy auxiliary information could converge well and efficiently. Our proposed method is evaluated on real images captured by unmanned aerial vehicle, StreetView car, and conventional digital cameras. Extensive experimental results show that our method performs similarly or better than many of the state-of-art reconstruction approaches, in terms of reconstruction accuracy and completeness, but is more efficient and scalable for large-scale image data sets.

16.
IEEE Trans Image Process ; 23(1): 308-18, 2014 Jan.
Article in English | MEDLINE | ID: mdl-24240002

ABSTRACT

Depth-map merging based 3D modeling is an effective approach for reconstructing large-scale scenes from multiple images. In addition to generate high quality depth maps at each image, how to select suitable neighboring images for each image is also an important step in the reconstruction pipeline, unfortunately to which little attention has been paid in the literature until now. This paper is intended to tackle this issue for large scale scene reconstruction where many unordered images are captured and used with substantial varying scale and view-angle changes. We formulate the neighboring image selection as a combinatorial optimization problem and use the quantum-inspired evolutionary algorithm to seek its optimal solution. Experimental results on the ground truth data set show that our approach can significantly improve the quality of the depth-maps as well as final 3D reconstruction results with high computational efficiency.


Subject(s)
Algorithms , Image Interpretation, Computer-Assisted/methods , Imaging, Three-Dimensional/methods , Models, Theoretical , Pattern Recognition, Automated/methods , Subtraction Technique , Computer Simulation , Image Enhancement/methods , Reproducibility of Results , Sensitivity and Specificity
17.
IEEE Trans Pattern Anal Mach Intell ; 34(10): 2031-45, 2012 Oct.
Article in English | MEDLINE | ID: mdl-22201063

ABSTRACT

This paper proposes a novel method for interest region description which pools local features based on their intensity orders in multiple support regions. Pooling by intensity orders is not only invariant to rotation and monotonic intensity changes, but also encodes ordinal information into a descriptor. Two kinds of local features are used in this paper, one based on gradients and the other on intensities; hence, two descriptors are obtained: the Multisupport Region Order-Based Gradient Histogram (MROGH) and the Multisupport Region Rotation and Intensity Monotonic Invariant Descriptor (MRRID). Thanks to the intensity order pooling scheme, the two descriptors are rotation invariant without estimating a reference orientation, which appears to be a major error source for most of the existing methods, such as Scale Invariant Feature Transform (SIFT), SURF, and DAISY. Promising experimental results on image matching and object recognition demonstrate the effectiveness of the proposed descriptors compared to state-of-the-art descriptors.

18.
Neural Comput ; 22(8): 2161-91, 2010 Aug.
Article in English | MEDLINE | ID: mdl-20438337

ABSTRACT

Markov random field (MRF) and belief propagation have given birth to stereo vision algorithms with top performance. This article explores their biological plausibility. First, an MRF model guided by physiological and psychophysical facts was designed. Typically an MRF-based stereo vision algorithm employs a likelihood function that reflects the local similarity of two regions and a potential function that models the continuity constraint. In our model, the likelihood function is constructed on the basis of the disparity energy model because complex cells are considered as front-end disparity encoders in the visual pathway. Our likelihood function is also relevant to several psychological findings. The potential function in our model is constrained by the psychological finding that the strength of the cooperative interaction minimizing relative disparity decreases as the separation between stimuli increases. Our model is tested on three kinds of stereo images. In simulations on images with repetitive patterns, we demonstrate that our model could account for the human depth percepts that were previously explained by the second-order mechanism. In simulations on random dot stereograms and natural scene images, we demonstrate that false matches introduced by the disparity energy model can be reliably removed using our model. A comparison with the coarse-to-fine model shows that our model is able to compute the absolute disparity of small objects with larger relative disparity. We also relate our model to several physiological findings. The hypothesized neurons of the model are selective for absolute disparity and have facilitative extra receptive field. There are plenty of such neurons in the visual cortex. In conclusion, we think that stereopsis can be implemented by neural networks resembling MRF.


Subject(s)
Computer Simulation , Depth Perception/physiology , Models, Neurological , Neurons/physiology , Algorithms , Humans
19.
Guang Pu Xue Yu Guang Pu Fen Xi ; 29(6): 1702-6, 2009 Jun.
Article in Chinese | MEDLINE | ID: mdl-19810565

ABSTRACT

With recent technological advances in wide field survey astronomy and implementation of several large-scale astronomical survey proposals (e. g. SDSS, 2dF and LAMOST), celestial spectra are becoming very abundant and rich. Therefore, research on automated classification methods based on celestial spectra has been attracting more and more attention in recent years. Feature extraction is a fundamental problem in automated spectral classification, which not only influences the difficulty and complexity of the problem, but also determines the performance of the designed classifying system. The available methods of feature extraction for spectra classification are usually unsupervised, e. g. principal components analysis (PCA), wavelet transform (WT), artificial neural networks (ANN) and Rough Set theory. These methods extract features not by their capability to classify spectra, but by some kind of power to approximate the original celestial spectra. Therefore, the extracted features by these methods usually are not the best ones for classification. In the present work, the authors pointed out the necessary to investigate supervised feature extraction by analyzing the characteristics of the spectra classification research in available literature and the limitations of unsupervised feature extracting methods. And the authors also studied supervised feature extracting based on relevance vector machine (RVM) and its application in Seyfert spectra classification. RVM is a recently introduced method based on Bayesian methodology, automatic relevance determination (ARD), regularization technique and hierarchical priors structure. By this method, the authors can easily fuse the information in training data, the authors' prior knowledge and belief in the problem, etc. And RVM could effectively extract the features and reduce the data based on classifying capability. Extensive experiments show its superior performance in dimensional reduction and feature extraction for Seyfert classification.

20.
Guang Pu Xue Yu Guang Pu Fen Xi ; 27(9): 1898-901, 2007 Sep.
Article in Chinese | MEDLINE | ID: mdl-18051557

ABSTRACT

With the recent technological advances in wide field survey astronomy and the implementation of several large scale astronomical survey proposals, celestial spectra are becoming very rich and the study of automated processing methods is attracting more and more attention. In the present work, the authors pointed out that it is necessary to investigate supervised feature extraction by analyzing the characteristics of the spectra classification research in literature and the limitations of unsupervised feature extraction methods. And the authors studied supervised feature extraction based on Fisher discriminant analysis (FDA) and its application in galaxy spectra classification. FDA could effectively reduce dimension and extract the features based on the classifying capability by fusing information in training data. Experiments show its superior performance in dimensional reduction for galaxy spectra classification.

SELECTION OF CITATIONS
SEARCH DETAIL
...