Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 6 de 6
Filter
Add more filters










Database
Language
Publication year range
1.
Neural Netw ; 53: 69-80, 2014 May.
Article in English | MEDLINE | ID: mdl-24561452

ABSTRACT

Kernel learning methods, whether Bayesian or frequentist, typically involve multiple levels of inference, with the coefficients of the kernel expansion being determined at the first level and the kernel and regularisation parameters carefully tuned at the second level, a process known as model selection. Model selection for kernel machines is commonly performed via optimisation of a suitable model selection criterion, often based on cross-validation or theoretical performance bounds. However, if there are a large number of kernel parameters, as for instance in the case of automatic relevance determination (ARD), there is a substantial risk of over-fitting the model selection criterion, resulting in poor generalisation performance. In this paper we investigate the possibility of learning the kernel, for the Least-Squares Support Vector Machine (LS-SVM) classifier, at the first level of inference, i.e. parameter optimisation. The kernel parameters and the coefficients of the kernel expansion are jointly optimised at the first level of inference, minimising a training criterion with an additional regularisation term acting on the kernel parameters. The key advantage of this approach is that the values of only two regularisation parameters need be determined in model selection, substantially alleviating the problem of over-fitting the model selection criterion. The benefits of this approach are demonstrated using a suite of synthetic and real-world binary classification benchmark problems, where kernel learning at the first level of inference is shown to be statistically superior to the conventional approach, improves on our previous work (Cawley and Talbot, 2007) and is competitive with Multiple Kernel Learning approaches, but with reduced computational expense.


Subject(s)
Support Vector Machine , Bayes Theorem , Least-Squares Analysis , Models, Theoretical
2.
Neural Netw ; 20(7): 832-41, 2007 Sep.
Article in English | MEDLINE | ID: mdl-17600674

ABSTRACT

Mika, Rätsch, Weston, Schölkopf and Müller [Mika, S., Rätsch, G., Weston, J., Schölkopf, B., & Müller, K.-R. (1999). Fisher discriminant analysis with kernels. In Neural networks for signal processing: Vol. IX (pp. 41-48). New York: IEEE Press] introduce a non-linear formulation of Fisher's linear discriminant, based on the now familiar "kernel trick", demonstrating state-of-the-art performance on a wide range of real-world benchmark datasets. In this paper, we extend an existing analytical expression for the leave-one-out cross-validation error [Cawley, G. C., & Talbot, N. L. C. (2003b). Efficient leave-one-out cross-validation of kernel Fisher discriminant classifiers. Pattern Recognition, 36(11), 2585-2592] such that the leave-one-out error can be re-estimated following a change in the value of the regularisation parameter with a computational complexity of only O(l(2)) operations, which is substantially less than the O(l(3)) operations required for the basic training algorithm. This allows the regularisation parameter to be tuned at an essentially negligible computational cost. This is achieved by performing the discriminant analysis in canonical form. The proposed method is therefore a useful component of a model selection strategy for this class of kernel machines that alternates between updates of the kernel and regularisation parameters. Results obtained on real-world and synthetic benchmark datasets indicate that the proposed method is competitive with model selection based on k-fold cross-validation in terms of generalisation, whilst being considerably faster.


Subject(s)
Algorithms , Artificial Intelligence , Linear Models , Neural Networks, Computer , Nonlinear Dynamics , Computer Simulation , Computing Methodologies , Discriminant Analysis
3.
Bioinformatics ; 22(19): 2348-55, 2006 Oct 01.
Article in English | MEDLINE | ID: mdl-16844704

ABSTRACT

MOTIVATION: Gene selection algorithms for cancer classification, based on the expression of a small number of biomarker genes, have been the subject of considerable research in recent years. Shevade and Keerthi propose a gene selection algorithm based on sparse logistic regression (SLogReg) incorporating a Laplace prior to promote sparsity in the model parameters, and provide a simple but efficient training procedure. The degree of sparsity obtained is determined by the value of a regularization parameter, which must be carefully tuned in order to optimize performance. This normally involves a model selection stage, based on a computationally intensive search for the minimizer of the cross-validation error. In this paper, we demonstrate that a simple Bayesian approach can be taken to eliminate this regularization parameter entirely, by integrating it out analytically using an uninformative Jeffrey's prior. The improved algorithm (BLogReg) is then typically two or three orders of magnitude faster than the original algorithm, as there is no longer a need for a model selection step. The BLogReg algorithm is also free from selection bias in performance estimation, a common pitfall in the application of machine learning algorithms in cancer classification. RESULTS: The SLogReg, BLogReg and Relevance Vector Machine (RVM) gene selection algorithms are evaluated over the well-studied colon cancer and leukaemia benchmark datasets. The leave-one-out estimates of the probability of test error and cross-entropy of the BLogReg and SLogReg algorithms are very similar, however the BlogReg algorithm is found to be considerably faster than the original SLogReg algorithm. Using nested cross-validation to avoid selection bias, performance estimation for SLogReg on the leukaemia dataset takes almost 48 h, whereas the corresponding result for BLogReg is obtained in only 1 min 24 s, making BLogReg by far the more practical algorithm. BLogReg also demonstrates better estimates of conditional probability than the RVM, which are of great importance in medical applications, with similar computational expense. AVAILABILITY: A MATLAB implementation of the sparse logistic regression algorithm with Bayesian regularization (BLogReg) is available from http://theoval.cmp.uea.ac.uk/~gcc/cbl/blogreg/


Subject(s)
Biomarkers, Tumor/analysis , Diagnosis, Computer-Assisted/methods , Gene Expression Profiling/methods , Neoplasm Proteins/analysis , Neoplasms/diagnosis , Neoplasms/metabolism , Oligonucleotide Array Sequence Analysis/methods , Algorithms , Bayes Theorem , Humans , Logistic Models , Models, Biological , Neoplasms/classification , Regression Analysis , Reproducibility of Results , Sensitivity and Specificity
4.
IEEE Trans Neural Netw ; 17(2): 471-81, 2006 Mar.
Article in English | MEDLINE | ID: mdl-16566473

ABSTRACT

Survival analysis is a branch of statistics concerned with the time elapsing before "failure," with diverse applications in medical statistics and the analysis of the reliability of electrical or mechanical components. We introduce a parametric accelerated life survival analysis model based on kernel learning methods that, at least in principal, is able to learn arbitrary dependencies between a vector of explanatory variables and the scale of the distribution of survival times. The proposed kernel survival analysis method is then used to model the growth domain of Clostridium botulinum, the food processing and storage conditions permitting the growth of this foodborne microbial pathogen, leading to the production of the neurotoxin responsible for botulism. A Bayesian training procedure, based on the evidence framework, is used for model selection and to provide a credible interval on model predictions. The kernel survival analysis models are found to be more accurate than models based on more traditional survival analysis techniques but also suggest a risk assessment of the foodborne botulism hazard would benefit from the collection of additional data.


Subject(s)
Artificial Intelligence , Clostridium botulinum/cytology , Clostridium botulinum/growth & development , Food Microbiology , Models, Biological , Survival Analysis , Bayes Theorem , Cell Proliferation , Cell Survival/physiology , Computer Simulation , Data Interpretation, Statistical , Models, Statistical , Population Growth , Survival Rate
5.
Neural Netw ; 18(5-6): 674-83, 2005.
Article in English | MEDLINE | ID: mdl-16085387

ABSTRACT

We present here a simple technique that simplifies the construction of Bayesian treatments of a variety of sparse kernel learning algorithms. An incomplete Cholesky factorisation is employed to modify the dual parameter space, such that the Gaussian prior over the dual model parameters is whitened. The regularisation term then corresponds to the usual weight-decay regulariser, allowing the Bayesian analysis to proceed via the evidence framework of MacKay. There is in addition a useful by-product associated with the incomplete Cholesky factorisation algorithm, it also identifies a subset of the training data forming an approximate basis for the entire dataset in the kernel-induced feature space, resulting in a sparse model. Bayesian treatments of the kernel ridge regression (KRR) algorithm, with both constant and heteroscedastic (input dependent) variance structures, and kernel logistic regression (KLR) are provided as illustrative examples of the proposed method, which we hope will be more widely applicable.


Subject(s)
Artificial Intelligence , Bayes Theorem , Algorithms , Data Interpretation, Statistical , Linear Models , Logistic Models , Models, Statistical , Normal Distribution
6.
Neural Netw ; 17(10): 1467-75, 2004 Dec.
Article in English | MEDLINE | ID: mdl-15541948

ABSTRACT

Leave-one-out cross-validation has been shown to give an almost unbiased estimator of the generalisation properties of statistical models, and therefore provides a sensible criterion for model selection and comparison. In this paper we show that exact leave-one-out cross-validation of sparse Least-Squares Support Vector Machines (LS-SVMs) can be implemented with a computational complexity of only O(ln2) floating point operations, rather than the O(l2n2) operations of a naïve implementation, where l is the number of training patterns and n is the number of basis vectors. As a result, leave-one-out cross-validation becomes a practical proposition for model selection in large scale applications. For clarity the exposition concentrates on sparse least-squares support vector machines in the context of non-linear regression, but is equally applicable in a pattern recognition setting.


Subject(s)
Algorithms , Least-Squares Analysis , Models, Statistical , Neural Networks, Computer , Mathematics , Nonlinear Dynamics , Pattern Recognition, Automated/methods
SELECTION OF CITATIONS
SEARCH DETAIL
...