Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters










Database
Language
Publication year range
1.
IEEE Trans Neural Netw Learn Syst ; 29(5): 1525-1538, 2018 05.
Article in English | MEDLINE | ID: mdl-28320678

ABSTRACT

High-dimensional data present in the real world is often corrupted by noise and gross outliers. Principal component analysis (PCA) fails to learn the true low-dimensional subspace in such cases. This is the reason why robust versions of PCA, which put a penalty on arbitrarily large outlying entries, are preferred to perform dimension reduction. In this paper, we argue that it is necessary to study the presence of outliers not only in the observed data matrix but also in the orthogonal complement subspace of the authentic principal subspace. In fact, the latter can seriously skew the estimation of the principal components. A reinforced robustification of principal component pursuit is designed in order to cater to the problem of finding out both types of outliers and eliminate their influence on the final subspace estimation. Simulation results under different design situations clearly show the superiority of our proposed method as compared with other popular implementations of robust PCA. This paper also showcases possible applications of our method in critically tough scenarios of face recognition and video background subtraction. Along with approximating a usable low-dimensional subspace from real-world data sets, the technique can capture semantically meaningful outliers.

2.
IEEE Trans Pattern Anal Mach Intell ; 39(2): 272-286, 2017 02.
Article in English | MEDLINE | ID: mdl-27019473

ABSTRACT

Many computer vision and medical imaging problems are faced with learning from large-scale datasets, with millions of observations and features. In this paper we propose a novel efficient learning scheme that tightens a sparsity constraint by gradually removing variables based on a criterion and a schedule. The attractive fact that the problem size keeps dropping throughout the iterations makes it particularly suitable for big data learning. Our approach applies generically to the optimization of any differentiable loss function, and finds applications in regression, classification and ranking. The resultant algorithms build variable screening into estimation and are extremely simple to implement. We provide theoretical guarantees of convergence and selection consistency. In addition, one dimensional piecewise linear response functions are used to account for nonlinearity and a second order prior is imposed on these functions to avoid overfitting. Experiments on real and synthetic data show that the proposed method compares very well with other state of the art methods in regression, classification and ranking while being computationally very efficient and scalable.

3.
IEEE Trans Neural Netw Learn Syst ; 27(10): 1997-2008, 2016 10.
Article in English | MEDLINE | ID: mdl-26672049

ABSTRACT

Deep hierarchical representations of the data have been found out to provide better informative features for several machine learning applications. In addition, multilayer neural networks surprisingly tend to achieve better performance when they are subject to an unsupervised pretraining. The booming of deep learning motivates researchers to identify the factors that contribute to its success. One possible reason identified is the flattening of manifold-shaped data in higher layers of neural networks. However, it is not clear how to measure the flattening of such manifold-shaped data and what amount of flattening a deep neural network can achieve. For the first time, this paper provides quantitative evidence to validate the flattening hypothesis. To achieve this, we propose a few quantities for measuring manifold entanglement under certain assumptions and conduct experiments with both synthetic and real-world data. Our experimental results validate the proposition and lead to new insights on deep learning.

4.
Neuroimage ; 55(4): 1519-27, 2011 Apr 15.
Article in English | MEDLINE | ID: mdl-21167288

ABSTRACT

The goals of this paper are to review the most popular methods of predictor selection in regression models, to explain why some fail when the number P of explanatory variables exceeds the number N of participants, and to discuss alternative statistical methods that can be employed in this case. We focus on penalized least squares methods in regression models, and discuss in detail two such methods that are well established in the statistical literature, the LASSO and Elastic Net. We introduce bootstrap enhancements of these methods, the BE-LASSO and BE-Enet, that allow the user to attach a measure of uncertainty to each variable selected. Our work is motivated by a multimodal neuroimaging dataset that consists of morphometric measures (volumes at several anatomical regions of interest), white matter integrity measures from diffusion weighted data (fractional anisotropy, mean diffusivity, axial diffusivity and radial diffusivity) and clinical and demographic variables (age, education, alcohol and drug history). In this dataset, the number P of explanatory variables exceeds the number N of participants. We use the BE-LASSO and BE-Enet to provide the first statistical analysis that allows the assessment of neurocognitive performance from high dimensional neuroimaging and clinical predictors, including their interactions. The major novelty of this analysis is that biomarker selection and dimension reduction are accomplished with a view towards obtaining good predictions for the outcome of interest (i.e., the neurocognitive indices), unlike principal component analysis that are performed only on the predictors' space independently of the outcome of interest.


Subject(s)
Brain/pathology , Cognition Disorders/etiology , Cognition Disorders/pathology , Diffusion Magnetic Resonance Imaging/methods , HIV Infections/complications , HIV Infections/pathology , Image Interpretation, Computer-Assisted/methods , Adult , Aged , Algorithms , Female , Humans , Image Enhancement/methods , Least-Squares Analysis , Male , Middle Aged , Regression Analysis , Reproducibility of Results , Sensitivity and Specificity
5.
BMC Bioinformatics ; 10: 237, 2009 Aug 04.
Article in English | MEDLINE | ID: mdl-19653895

ABSTRACT

BACKGROUND: For many gene structures it is impossible to resolve intensity data uniquely to establish abundances of splice variants. This was empirically noted by Wang et al. in which it was called a "degeneracy problem". The ambiguity results from an ill-posed problem where additional information is needed in order to obtain an unique answer in splice variant deconvolution. RESULTS: In this paper, we analyze the situations under which the problem occurs and perform a rigorous mathematical study which gives necessary and sufficient conditions on how many and what type of constraints are needed to resolve all ambiguity. This analysis is generally applicable to matrix models of splice variants. We explore the proposal that probe sequence information may provide sufficient additional constraints to resolve real-world instances. However, probe behavior cannot be predicted with sufficient accuracy by any existing probe sequence model, and so we present a Bayesian framework for estimating variant abundances by incorporating the prediction uncertainty from the micro-model of probe responsiveness into the macro-model of probe intensities. CONCLUSION: The matrix analysis of constraints provides a tool for detecting real-world instances in which additional constraints may be necessary to resolve splice variants. While purely mathematical constraints can be stated without error, real-world constraints may themselves be poorly resolved. Our Bayesian framework provides a generic solution to the problem of uniquely estimating transcript abundances given additional constraints that themselves may be uncertain, such as regression fit to probe sequence models. We demonstrate the efficacy of it by extensive simulations as well as various biological data.


Subject(s)
Alternative Splicing/genetics , Computational Biology/methods , Algorithms , Gene Expression Profiling , Humans , Oligonucleotide Array Sequence Analysis
SELECTION OF CITATIONS
SEARCH DETAIL
...