Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 7 de 7
Filter
Add more filters










Database
Publication year range
1.
IEEE Trans Neural Netw ; 10(5): 988-99, 1999.
Article in English | MEDLINE | ID: mdl-18252602

ABSTRACT

Statistical learning theory was introduced in the late 1960's. Until the 1990's it was a purely theoretical analysis of the problem of function estimation from a given collection of data. In the middle of the 1990's new types of learning algorithms (called support vector machines) based on the developed theory were proposed. This made statistical learning theory not only a tool for the theoretical analysis but also a tool for creating practical algorithms for estimating multidimensional functions. This article presents a very general overview of statistical learning theory including both theoretical and algorithmic aspects of the theory. The goal of this overview is to demonstrate how the abstract learning theory established conditions for generalization which are more general than those discussed in classical statistical paradigms and how the understanding of these conditions inspired new algorithmic approaches to function estimation problems. A more detailed overview of the theory (without proofs) can be found in Vapnik (1995). In Vapnik (1998) one can find detailed description of the theory (including proofs).

2.
IEEE Trans Neural Netw ; 10(5): 1055-64, 1999.
Article in English | MEDLINE | ID: mdl-18252608

ABSTRACT

Traditional classification approaches generalize poorly on image classification tasks, because of the high dimensionality of the feature space. This paper shows that support vector machines (SVM's) can generalize well on difficult image classification problems where the only features are high dimensional histograms. Heavy-tailed RBF kernels of the form K(x, y) = e(-rho)Sigma(i)/xia-yia/b with a < or = 1 and b < or = 2 are evaluated on the classification of images extracted from the Corel stock photo collection and shown to far outperform traditional polynomial or Gaussian radial basis function (RBF) kernels. Moreover, we observed that a simple remapping of the input x(i)-->x(i)(a) improves the performance of linear SVM's to such an extend that it makes them, for this problem, a valid alternative to RBF kernels.

3.
IEEE Trans Neural Netw ; 10(5): 1048-54, 1999.
Article in English | MEDLINE | ID: mdl-18252607

ABSTRACT

We study the use of support vector machines (SVM's) in classifying e-mail as spam or nonspam by comparing it to three other classification algorithms: Ripper, Rocchio, and boosting decision trees. These four algorithms were tested on two different data sets: one data set where the number of features were constrained to the 1000 best features and another data set where the dimensionality was over 7000. SVM's performed best when using binary features. For both data sets, boosting trees and SVM's had acceptable test performance in terms of accuracy and speed. However, SVM's had significantly less training time.

4.
IEEE Trans Neural Netw ; 10(5): 1075-89, 1999.
Article in English | MEDLINE | ID: mdl-18252610

ABSTRACT

It is well known that for a given sample size there exists a model of optimal complexity corresponding to the smallest prediction (generalization) error. Hence, any method for learning from finite samples needs to have some provisions for complexity control. Existing implementations of complexity control include penalization (or regularization), weight decay (in neural networks), and various greedy procedures (aka constructive, growing, or pruning methods). There are numerous proposals for determining optimal model complexity (aka model selection) based on various (asymptotic) analytic estimates of the prediction risk and on resampling approaches. Nonasymptotic bounds on the prediction risk based on Vapnik-Chervonenkis (VC)-theory have been proposed by Vapnik. This paper describes application of VC-bounds to regression problems with the usual squared loss. An empirical study is performed for settings where the VC-bounds can be rigorously applied, i.e., linear models and penalized linear models where the VC-dimension can be accurately estimated, and the empirical risk can be reliably minimized. Empirical comparisons between model selection using VC-bounds and classical methods are performed for various noise levels, sample size, target functions and types of approximating functions. Our results demonstrate the advantages of VC-based complexity control with finite samples.

5.
Vopr Onkol ; 29(2): 8-13, 1983.
Article in Russian | MEDLINE | ID: mdl-6687646

ABSTRACT

The predictive value of risk factors of stomach cancer was studied on the basis of the results of a complex genetico-epidemiologic survey conducted with a view to identifying the group at risk. An optimal combination of these factors and the multifactor method of mathematical statistics were used in working out a decision instruction. Application of the latter offers an 80% credibility in selecting persons at high risk for stomach cancer development. The paper deals with the findings on a significant relationship between stomach cancer development and genetic and familial factors as well as on indications at certain changes in gastrointestinal function observed before clinical manifestation of the disease. The results point to the efficacy of complex clinical, genetic and epidemiologic studies conducted for prediction of neoplasms caused by a set of factors.


Subject(s)
Stomach Neoplasms/epidemiology , Aged , Factor Analysis, Statistical , Female , Gastritis/epidemiology , Gastritis/genetics , Humans , Male , Middle Aged , Pedigree , Prognosis , Risk , Software , Stomach Neoplasms/genetics , Stomach Ulcer/epidemiology , Stomach Ulcer/genetics , USSR
SELECTION OF CITATIONS
SEARCH DETAIL
...