Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 28
Filter
1.
IEEE Trans Pattern Anal Mach Intell ; 45(5): 5697-5711, 2023 May.
Article in English | MEDLINE | ID: mdl-36279351

ABSTRACT

In this paper, we come up with a simple yet effective approach for instance segmentation on 3D point cloud with strong robustness. Previous top-performing methods for this task adopt a bottom-up strategy, which often involves various inefficient operations or complex pipelines, such as grouping over-segmented components, introducing heuristic post-processing steps, and designing complex loss functions. As a result, the inevitable variations of the instances sizes make it vulnerable and sensitive to the values of pre-defined hyper-parameters. To this end, we instead propose a novel pipeline that applies dynamic convolution to generate instance-aware parameters in response to the characteristics of the instances. The representation capability of the parameters is greatly improved by gathering homogeneous points that have identical semantic categories and close votes for the geometric centroids. Instances are then decoded via several simple convolution layers, where the parameters are generated depending on the input. In addition, to introduce a large context and maintain limited computational overheads, a light-weight transformer is built upon the bottleneck layer to capture the long-range dependencies. With the only post-processing step, non-maximum suppression (NMS), we demonstrate a simpler and more robust approach that achieves promising performance on various datasets: ScanNetV2, S3DIS, and PartNet. The consistent improvements on both voxel- and point-based architectures imply the effectiveness of the proposed method. Code is available at: https://git.io/DyCo3D.

2.
IEEE Trans Image Process ; 30: 2947-2962, 2021.
Article in English | MEDLINE | ID: mdl-33471753

ABSTRACT

Most learning-based super-resolution (SR) methods aim to recover high-resolution (HR) image from a given low-resolution (LR) image via learning on LR-HR image pairs. The SR methods learned on synthetic data do not perform well in real-world, due to the domain gap between the artificially synthesized and real LR images. Some efforts are thus taken to capture real-world image pairs. However, the captured LR-HR image pairs usually suffer from unavoidable misalignment, which hampers the performance of end- to-end learning. Here, focusing on the real-world SR, we ask a different question: since misalignment is unavoidable, can we propose a method that does not need LR-HR image pairing and alignment at all and utilizes real images as they are? Hence we propose a framework to learn SR from an arbitrary set of unpaired LR and HR images and see how far a step can go in such a realistic and "unsupervised" setting. To do so, we firstly train a degradation generation network to generate realistic LR images and, more importantly, to capture their distribution (i.e., learning to zoom out). Instead of assuming the domain gap has been eliminated, we minimize the discrepancy between the generated data and real data while learning a degradation adaptive SR network (i.e., learning to zoom in). The proposed unpaired method achieves state-of- the-art SR results on real-world images, even in the datasets that favour the paired-learning methods more.

3.
Neural Netw ; 126: 250-261, 2020 Jun.
Article in English | MEDLINE | ID: mdl-32272429

ABSTRACT

Depth is one of the key factors behind the success of convolutional neural networks (CNNs). Since ResNet (He et al., 2016), we are able to train very deep CNNs as the gradient vanishing issue has been largely addressed by the introduction of skip connections. However, we observe that, when the depth is very large, the intermediate layers (especially shallow layers) may fail to receive sufficient supervision from the loss due to severe transformation through long backpropagation path. As a result, the representation power of intermediate layers can be very weak and the model becomes very redundant with limited performance. In this paper, we first investigate the supervision vanishing issue in existing backpropagation (BP) methods. And then, we propose to address it via an effective method, called Multi-way BP (MW-BP), which relies on multiple auxiliary losses added to the intermediate layers of the network. The proposed MW-BP method can be applied to most deep architectures with slight modifications, such as ResNet and MobileNet. Our method often gives rise to much more compact models (denoted by "Mw+Architecture") than existing methods. For example, MwResNet-44 with 44 layers performs better than ResNet-110 with 110 layers on CIFAR-10 and CIFAR-100. More critically, the resultant models even outperform the light models obtained by state-of-the-art model compression methods. Last, our method inherently produces multiple compact models with different depths at the same time, which is helpful for model selection. Extensive experiments on both image classification and face recognition demonstrate the superiority of the proposed method.


Subject(s)
Data Compression/methods , Databases, Factual , Neural Networks, Computer , Pattern Recognition, Automated/methods , Humans
4.
IEEE Trans Neural Netw Learn Syst ; 31(12): 5468-5482, 2020 Dec.
Article in English | MEDLINE | ID: mdl-32078566

ABSTRACT

As an integral component of blind image deblurring, non-blind deconvolution removes image blur with a given blur kernel, which is essential but difficult due to the ill-posed nature of the inverse problem. The predominant approach is based on optimization subject to regularization functions that are either manually designed or learned from examples. Existing learning-based methods have shown superior restoration quality but are not practical enough due to their restricted and static model design. They solely focus on learning a prior and require to know the noise level for deconvolution. We address the gap between the optimization- and learning-based approaches by learning a universal gradient descent optimizer. We propose a recurrent gradient descent network (RGDN) by systematically incorporating deep neural networks into a fully parameterized gradient descent scheme. A hyperparameter-free update unit shared across steps is used to generate the updates from the current estimates based on a convolutional neural network. By training on diverse examples, the RGDN learns an implicit image prior and a universal update rule through recursive supervision. The learned optimizer can be repeatedly used to improve the quality of diverse degenerated observations. The proposed method possesses strong interpretability and high generalization. Extensive experiments on synthetic benchmarks and challenging real-world images demonstrate that the proposed deep optimization method is effective and robust to produce favorable results as well as practical for real-world image deblurring applications.

5.
IEEE Trans Neural Netw Learn Syst ; 31(10): 4170-4184, 2020 Oct.
Article in English | MEDLINE | ID: mdl-31899434

ABSTRACT

Low-rank representation-based approaches that assume low-rank tensors and exploit their low-rank structure with appropriate prior models have underpinned much of the recent progress in tensor completion. However, real tensor data only approximately comply with the low-rank requirement in most cases, viz., the tensor consists of low-rank (e.g., principle part) as well as non-low-rank (e.g., details) structures, which limit the completion accuracy of these approaches. To address this problem, we propose an adaptive low-rank representation model for tensor completion that represents low-rank and non-low-rank structures of a latent tensor separately in a Bayesian framework. Specifically, we reformulate the CANDECOMP/PARAFAC (CP) tensor rank and develop a sparsity-induced prior for the low-rank structure that can be used to determine tensor rank automatically. Then, the non-low-rank structure is modeled using a mixture of Gaussians prior that is shown to be sufficiently flexible and powerful to inform the completion process for a variety of real tensor data. With these two priors, we develop a Bayesian minimum mean-squared error estimate framework for inference. The developed framework can capture the important distinctions between low-rank and non-low-rank structures, thereby enabling more accurate model, and ultimately, completion. For various applications, compared with the state-of-the-art methods, the proposed model yields more accurate completion results.

6.
Med Image Anal ; 59: 101570, 2020 01.
Article in English | MEDLINE | ID: mdl-31630011

ABSTRACT

Glaucoma is one of the leading causes of irreversible but preventable blindness in working age populations. Color fundus photography (CFP) is the most cost-effective imaging modality to screen for retinal disorders. However, its application to glaucoma has been limited to the computation of a few related biomarkers such as the vertical cup-to-disc ratio. Deep learning approaches, although widely applied for medical image analysis, have not been extensively used for glaucoma assessment due to the limited size of the available data sets. Furthermore, the lack of a standardize benchmark strategy makes difficult to compare existing methods in a uniform way. In order to overcome these issues we set up the Retinal Fundus Glaucoma Challenge, REFUGE (https://refuge.grand-challenge.org), held in conjunction with MICCAI 2018. The challenge consisted of two primary tasks, namely optic disc/cup segmentation and glaucoma classification. As part of REFUGE, we have publicly released a data set of 1200 fundus images with ground truth segmentations and clinical glaucoma labels, currently the largest existing one. We have also built an evaluation framework to ease and ensure fairness in the comparison of different models, encouraging the development of novel techniques in the field. 12 teams qualified and participated in the online challenge. This paper summarizes their methods and analyzes their corresponding results. In particular, we observed that two of the top-ranked teams outperformed two human experts in the glaucoma classification task. Furthermore, the segmentation results were in general consistent with the ground truth annotations, with complementary outcomes that can be further exploited by ensembling the results.


Subject(s)
Deep Learning , Diagnostic Techniques, Ophthalmological , Fundus Oculi , Glaucoma/diagnostic imaging , Photography , Datasets as Topic , Humans
7.
Plant J ; 98(3): 555-570, 2019 05.
Article in English | MEDLINE | ID: mdl-30604470

ABSTRACT

To optimize shoot growth and structure of cereals, we need to understand the genetic components controlling initiation and elongation. While measuring total shoot growth at high throughput using 2D imaging has progressed, recovering the 3D shoot structure of small grain cereals at a large scale is still challenging. Here, we present a method for measuring defined individual leaves of cereals, such as wheat and barley, using few images. Plant shoot modelling over time was used to measure the initiation and elongation of leaves in a bi-parental barley mapping population under low and high soil salinity. We detected quantitative trait loci (QTL) related to shoot growth per se, using both simple 2D total shoot measurements and our approach of measuring individual leaves. In addition, we detected QTL specific to leaf elongation and not to total shoot size. Of particular importance was the detection of a QTL on chromosome 3H specific to the early responses of leaf elongation to salt stress, a locus that could not be detected without the computer vision tools developed in this study.


Subject(s)
Hordeum/anatomy & histology , Hordeum/genetics , Plant Leaves/anatomy & histology , Plant Leaves/genetics , Triticum/genetics , Hordeum/growth & development , Plant Leaves/growth & development , Quantitative Trait Loci/genetics
8.
IEEE Trans Image Process ; 28(4): 1851-1865, 2019 Apr.
Article in English | MEDLINE | ID: mdl-30307866

ABSTRACT

Total variation (TV) regularization has proven effective for a range of computer vision tasks through its preferential weighting of sharp image edges. Existing TV-based methods, however, often suffer from the over-smoothing issue and solution bias caused by the homogeneous penalization. In this paper, we consider addressing these issues by applying inhomogeneous regularization on different image components. We formulate the inhomogeneous TV minimization problem as a convex quadratic constrained linear programming problem. Relying on this new model, we propose a matching pursuit-based total variation minimization method (MPTV), specifically for image deconvolution. The proposed MPTV method is essentially a cutting-plane method that iteratively activates a subset of nonzero image gradients and then solves a subproblem focusing on those activated gradients only. Compared with existing methods, the MPTV is less sensitive to the choice of the trade-off parameter between data fitting and regularization. Moreover, the inhomogeneity of MPTV alleviates the over-smoothing and ringing artifacts and improves the robustness to errors in blur kernel. Extensive experiments on different tasks demonstrate the superiority of the proposed method over the current state of the art.

9.
IEEE Trans Image Process ; 27(7): 3403-3417, 2018 Jul.
Article in English | MEDLINE | ID: mdl-29671743

ABSTRACT

We show that it is possible to achieve high-quality domain adaptation without explicit adaptation. The nature of the classification problem means that when samples from the same class in different domains are sufficiently close, and samples from differing classes are separated by large enough margins, there is a high probability that each will be classified correctly. Inspired by this, we propose an embarrassingly simple yet effective approach to domain adaptation-only the class mean is used to learn class-specific linear projections. Learning these projections is naturally cast into a linear-discriminant-analysis-like framework, which gives an efficient, closed form solution. Furthermore, to enable to application of this approach to unsupervised learning, an iterative validation strategy is developed to infer target labels. Extensive experiments on cross-domain visual recognition demonstrate that, even with the simplest formulation, our approach outperforms existing non-deep adaptation methods and exhibits classification performance comparable with that of modern deep adaptation methods. An analysis of potential issues effecting the practical application of the method is also described, including robustness, convergence, and the impact of small sample sizes.

10.
IEEE Trans Pattern Anal Mach Intell ; 40(6): 1367-1381, 2018 06.
Article in English | MEDLINE | ID: mdl-28574341

ABSTRACT

Much of the recent progress in Vision-to-Language problems has been achieved through a combination of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). This approach does not explicitly represent high-level semantic concepts, but rather seeks to progress directly from image features to text. In this paper we first propose a method of incorporating high-level concepts into the successful CNN-RNN approach, and show that it achieves a significant improvement on the state-of-the-art in both image captioning and visual question answering. We further show that the same mechanism can be used to incorporate external knowledge, which is critically important for answering high level visual questions. Specifically, we design a visual question answering model that combines an internal representation of the content of an image with information extracted from a general knowledge base to answer a broad range of image-based questions. It particularly allows questions to be asked where the image alone does not contain the information required to select the appropriate answer. Our final model achieves the best reported results for both image captioning and visual question answering on several of the major benchmark datasets.

11.
IEEE Trans Pattern Anal Mach Intell ; 40(6): 1352-1366, 2018 06.
Article in English | MEDLINE | ID: mdl-28574343

ABSTRACT

We propose an approach for exploiting contextual information in semantic image segmentation, and particularly investigate the use of patch-patch context and patch-background context in deep CNNs. We formulate deep structured models by combining CNNs and Conditional Random Fields (CRFs) for learning the patch-patch context between image regions. Specifically, we formulate CNN-based pairwise potential functions to capture semantic correlations between neighboring patches. Efficient piecewise training of the proposed deep structured model is then applied in order to avoid repeated expensive CRF inference during the course of back propagation. For capturing the patch-background context, we show that a network design with traditional multi-scale image inputs and sliding pyramid pooling is very effective for improving performance. We perform comprehensive evaluation of the proposed method. We achieve new state-of-the-art performance on a number of challenging semantic segmentation datasets.

12.
Sensors (Basel) ; 16(4)2016 Apr 15.
Article in English | MEDLINE | ID: mdl-27092506

ABSTRACT

Aging populations are increasing worldwide and strategies to minimize the impact of falls on older people need to be examined. Falls in hospitals are common and current hospital technological implementations use localized sensors on beds and chairs to alert caregivers of unsupervised patient ambulations; however, such systems have high false alarm rates. We investigate the recognition of bed and chair exits in real-time using a wireless wearable sensor worn by healthy older volunteers. Fourteen healthy older participants joined in supervised trials. They wore a batteryless, lightweight and wireless sensor over their attire and performed a set of broadly scripted activities. We developed a movement monitoring approach for the recognition of bed and chair exits based on a machine learning activity predictor. We investigated the effectiveness of our approach in generating bed and chair exit alerts in two possible clinical deployments (Room 1 and Room 2). The system obtained recall results above 93% (Room 2) and 94% (Room 1) for bed and chair exits, respectively. Precision was >78% and 67%, respectively, while F-score was >84% and 77% for bed and chair exits, respectively. This system has potential for real-time monitoring but further research in the final target population of older people is necessary.


Subject(s)
Accidental Falls/prevention & control , Biosensing Techniques/methods , Monitoring, Physiologic/methods , Wireless Technology/instrumentation , Aged , Aged, 80 and over , Biosensing Techniques/instrumentation , Electric Power Supplies , Female , Hospitals , Humans , Male , Monitoring, Physiologic/instrumentation , Movement/physiology
13.
IEEE Trans Pattern Anal Mach Intell ; 37(1): 2-12, 2015 Jan.
Article in English | MEDLINE | ID: mdl-26353204

ABSTRACT

We propose a novel hybrid loss for multiclass and structured prediction problems that is a convex combination of a log loss for Conditional Random Fields (CRFs) and a multiclass hinge loss for Support Vector Machines (SVMs). We provide a sufficient condition for when the hybrid loss is Fisher consistent for classification. This condition depends on a measure of dominance between labels-specifically, the gap between the probabilities of the best label and the second best label. We also prove Fisher consistency is necessary for parametric consistency when learning models such as CRFs. We demonstrate empirically that the hybrid loss typically performs least as well as-and often better than-both of its constituent losses on a variety of tasks, such as human action recognition. In doing so we also provide an empirical comparison of the efficacy of probabilistic and margin based approaches to multiclass and structured prediction.


Subject(s)
Artificial Intelligence , Models, Statistical , Pattern Recognition, Automated/methods , Support Vector Machine , Human Activities/classification , Humans , Image Processing, Computer-Assisted/methods , Video Recording
14.
IEEE Trans Image Process ; 24(6): 1839-51, 2015 Jun.
Article in English | MEDLINE | ID: mdl-25826800

ABSTRACT

Learning-based hashing methods have attracted considerable attention due to their ability to greatly increase the scale at which existing algorithms may operate. Most of these methods are designed to generate binary codes preserving the Euclidean similarity in the original space. Manifold learning techniques, in contrast, are better able to model the intrinsic structure embedded in the original high-dimensional data. The complexities of these models, and the problems with out-of-sample data, have previously rendered them unsuitable for application to large-scale embedding, however. In this paper, how to learn compact binary embeddings on their intrinsic manifolds is considered. In order to address the above-mentioned difficulties, an efficient, inductive solution to the out-of-sample data problem, and a process by which nonparametric manifold learning may be used as the basis of a hashing method are proposed. The proposed approach thus allows the development of a range of new hashing techniques exploiting the flexibility of the wide variety of manifold learning approaches available. It is particularly shown that hashing on the basis of t-distributed stochastic neighbor embedding outperforms state-of-the-art hashing methods on large-scale benchmark data sets, and is very effective for image classification with very short code lengths. It is shown that the proposed framework can be further improved, for example, by minimizing the quantization error with learned orthogonal rotations without much computation overhead. In addition, a supervised inductive manifold hashing framework is developed by incorporating the label information, which is shown to greatly advance the semantic retrieval performance.

15.
IEEE Trans Image Process ; 24(8): 2382-92, 2015 Aug.
Article in English | MEDLINE | ID: mdl-25675458

ABSTRACT

In this paper, we propose an efficient semidefinite programming (SDP) approach to worst case linear discriminant analysis (WLDA). Compared with the traditional LDA, WLDA considers the dimensionality reduction problem from the worst case viewpoint, which is in general more robust for classification. However, the original problem of WLDA is non-convex and difficult to optimize. In this paper, we reformulate the optimization problem of WLDA into a sequence of semidefinite feasibility problems. To efficiently solve the semidefinite feasibility problems, we design a new scalable optimization method with a quasi-Newton method and eigen-decomposition being the core components. The proposed method is orders of magnitude faster than standard interior-point SDP solvers. Experiments on a variety of classification problems demonstrate that our approach achieves better performance than standard LDA. Our method is also much faster and more scalable than standard interior-point SDP solvers-based WLDA. The computational complexity for an SDP with m constraints and matrices of size d by d is roughly reduced from O(m(3)+md(3)+m(2)d(2)) to O(d(3)) (m>d in our case).

16.
IEEE Trans Image Process ; 23(9): 4041-4054, 2014 09.
Article in English | MEDLINE | ID: mdl-25051551

ABSTRACT

The use of high-dimensional features has become a normal practice in many computer vision applications. The large dimension of these features is a limiting factor upon the number of data points which may be effectively stored and processed, however. We address this problem by developing a novel approach to learning a compact binary encoding, which exploits both pair-wise proximity and class-label information on training data set. Exploiting this extra information allows the development of encodings which, although compact, outperform the original high-dimensional features in terms of final classification or retrieval performance. The method is general, in that it is applicable to both non-parametric and parametric learning methods. This generality means that the embedded features are suitable for a wide variety of computer vision tasks, such as image classification and content-based image retrieval. Experimental results demonstrate that the new compact descriptor achieves an accuracy comparable to, and in some cases better than, the visual descriptor in the original space despite being significantly more compact. Moreover, any convex loss function and convex regularization penalty (e.g., `p norm with p 1) can be incorporated into the framework, which provides future flexibility.

17.
IEEE Trans Neural Netw Learn Syst ; 25(4): 764-79, 2014 Apr.
Article in English | MEDLINE | ID: mdl-24807953

ABSTRACT

We propose a novel boosting approach to multiclass classification problems, in which multiple classes are distinguished by a set of random projection matrices in essence. The approach uses random projections to alleviate the proliferation of binary classifiers typically required to perform multiclass classification. The result is a multiclass classifier with a single vector-valued parameter, irrespective of the number of classes involved. Two variants of this approach are proposed. The first method randomly projects the original data into new spaces, while the second method randomly projects the outputs of learned weak classifiers. These methods are not only conceptually simple but also effective and easy to implement. A series of experiments on synthetic, machine learning, and visual recognition data sets demonstrate that our proposed methods could be compared favorably with existing multiclass boosting algorithms in terms of both the convergence rate and classification accuracy.

18.
IEEE Trans Neural Netw Learn Syst ; 25(5): 1002-13, 2014 May.
Article in English | MEDLINE | ID: mdl-24808045

ABSTRACT

We present a scalable and effective classification model to train multiclass boosting for multiclass classification problems. A direct formulation of multiclass boosting had been introduced in the past in the sense that it directly maximized the multiclass margin. The major problem of that approach is its high computational complexity during training, which hampers its application to real-world problems. In this paper, we propose a scalable and simple stagewise multiclass boosting method which also directly maximizes the multiclass margin. Our approach offers the following advantages: 1) it is simple and computationally efficient to train. The approach can speed up the training time by more than two orders of magnitude without sacrificing the classification accuracy and 2) like traditional AdaBoost, it is less sensitive to the choice of parameters and empirically demonstrates excellent generalization performance. Experimental results on challenging multiclass machine learning and vision tasks demonstrate that the proposed approach substantially improves the convergence rate and accuracy of the final visual detector at no additional computational cost compared to existing multiclass boosting.

19.
IEEE Trans Image Process ; 23(4): 1666-77, 2014 Apr.
Article in English | MEDLINE | ID: mdl-24808338

ABSTRACT

Text in an image provides vital information for interpreting its contents, and text in a scene can aid a variety of tasks from navigation to obstacle avoidance and odometry. Despite its value, however, detecting general text in images remains a challenging research problem. Motivated by the need to consider the widely varying forms of natural text, we propose a bottom-up approach to the problem, which reflects the characterness of an image region. In this sense, our approach mirrors the move from saliency detection methods to measures of objectness. In order to measure the characterness, we develop three novel cues that are tailored for character detection and a Bayesian method for their integration. Because text is made up of sets of characters, we then design a Markov random field model so as to exploit the inherent dependencies between characters. We experimentally demonstrate the effectiveness of our characterness cues as well as the advantage of Bayesian multicue integration. The proposed text detector outperforms state-of-the-art methods on a few benchmark scene text detection data sets. We also show that our measurement of characterness is superior than state-of-the-art saliency detection models when applied to the same task.

20.
IEEE Trans Neural Netw Learn Syst ; 25(2): 394-406, 2014 Feb.
Article in English | MEDLINE | ID: mdl-24807037

ABSTRACT

Distance metric learning is of fundamental interest in machine learning because the employed distance metric can significantly affect the performance of many learning methods. Quadratic Mahalanobis metric learning is a popular approach to the problem, but typically requires solving a semidefinite programming (SDP) problem, which is computationally expensive. The worst case complexity of solving an SDP problem involving a matrix variable of size D×D with O(D) linear constraints is about O(D(6.5)) using interior-point methods, where D is the dimension of the input data. Thus, the interior-point methods only practically solve problems exhibiting less than a few thousand variables. Because the number of variables is D(D+1)/2, this implies a limit upon the size of problem that can practically be solved around a few hundred dimensions. The complexity of the popular quadratic Mahalanobis metric learning approach thus limits the size of problem to which metric learning can be applied. Here, we propose a significantly more efficient and scalable approach to the metric learning problem based on the Lagrange dual formulation of the problem. The proposed formulation is much simpler to implement, and therefore allows much larger Mahalanobis metric learning problems to be solved. The time complexity of the proposed method is roughly O(D(3)), which is significantly lower than that of the SDP approach. Experiments on a variety of data sets demonstrate that the proposed method achieves an accuracy comparable with the state of the art, but is applicable to significantly larger problems. We also show that the proposed method can be applied to solve more general Frobenius norm regularized SDP problems approximately.

SELECTION OF CITATIONS
SEARCH DETAIL
...