Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
Más filtros











Base de datos
Intervalo de año de publicación
1.
Neural Netw ; 178: 106476, 2024 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-38959596

RESUMEN

This paper introduces a novel bounded loss framework for SVM and SVR. Specifically, using the Pinball loss as an illustration, we devise a novel bounded exponential quantile loss (Leq-loss) for both support vector machine classification and regression tasks. For Leq-loss, it not only enhances the robustness of SVM and SVR against outliers but also improves the robustness of SVM to resampling from a different perspective. Furthermore, EQSVM and EQSVR were constructed based on Leq-loss, and the influence functions and breakdown point lower bounds of their estimators are derived. It is proved that the influence functions are bounded, and the breakdown point lower bounds can reach the highest asymptotic breakdown point of 1/2. Additionally, we demonstrated the robustness of EQSVM to resampling and derived its generalization error bound based on Rademacher complexity. Due to the Leq-loss being non-convex, we can use the concave-convex procedure (CCCP) technique to transform the problem into a series of convex optimization problems and use the ClipDCD algorithm to solve these convex optimization problems. Numerous experiments have been conducted to confirm the effectiveness of the proposed EQSVM and EQSVR.


Asunto(s)
Algoritmos , Máquina de Vectores de Soporte , Redes Neurales de la Computación
2.
J Appl Stat ; 51(8): 1590-1608, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38863800

RESUMEN

This paper consists of two parts. The first part of the paper is to propose an explicit robust estimation method for the regression coefficients in simple linear regression based on the power-weighted repeated medians technique that has a tuning constant for dealing with the trade-offs between efficiency and robustness. We then investigate the lower and upper bounds of the finite-sample breakdown point of the proposed method. The second part of the paper is to show that based on the linearization of the cumulative distribution function, the proposed method can be applied to obtain robust parameter estimators for the Weibull and Birnbaum-Saunders distributions that are commonly used in both reliability and survival analysis. Numerical studies demonstrate that the proposed method performs well in a manner that is approximately comparable with the ordinary least squares method, whereas it is far superior in the presence of data contamination that occurs frequently in practice.

3.
Biometrics ; 78(4): 1592-1603, 2022 12.
Artículo en Inglés | MEDLINE | ID: mdl-34437713

RESUMEN

Biomedical research is increasingly data rich, with studies comprising ever growing numbers of features. The larger a study, the higher the likelihood that a substantial portion of the features may be redundant and/or contain contamination (outlying values). This poses serious challenges, which are exacerbated in cases where the sample sizes are relatively small. Effective and efficient approaches to perform sparse estimation in the presence of outliers are critical for these studies, and have received considerable attention in the last decade. We contribute to this area considering high-dimensional regressions contaminated by multiple mean-shift outliers affecting both the response and the design matrix. We develop a general framework and use mixed-integer programming to simultaneously perform feature selection and outlier detection with provably optimal guarantees. We prove theoretical properties for our approach, that is, a necessary and sufficient condition for the robustly strong oracle property, where the number of features can increase exponentially with the sample size; the optimal estimation of parameters; and the breakdown point of the resulting estimates. Moreover, we provide computationally efficient procedures to tune integer constraints and warm-start the algorithm. We show the superior performance of our proposal compared to existing heuristic methods through simulations and use it to study the relationships between childhood obesity and the human microbiome.


Asunto(s)
Obesidad Infantil , Niño , Humanos , Algoritmos , Tamaño de la Muestra , Probabilidad
4.
Entropy (Basel) ; 24(10)2022 Oct 21.
Artículo en Inglés | MEDLINE | ID: mdl-37420523

RESUMEN

A generalized notion of species richness is introduced. The generalization embeds the popular index of species richness on the boundary of a family of diversity indices each of which is the number of species in the community after a small proportion of individuals belonging to the least minorities is trimmed. It is established that the generalized species richness indices satisfy a weak version of the usual axioms for diversity indices, are qualitatively robust against small perturbations in the underlying distribution, and are collectively complete with respect to all information of diversity. In addition to a natural plug-in estimator of the generalized species richness, a bias-adjusted estimator is proposed, and its statistical reliability is gauged via bootstrapping. Finally an ecological example and supportive simulation results are given.

5.
Metron ; 79(2): 121-125, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34219810

RESUMEN

Starting with 2020 volume, the journal Metron has decided to celebrate the centenary since its foundation with three special issues. This volume is dedicated to robust statistics. A striking feature of most applied statistical analyses is the use of methods that are well known to be sensitive to outliers or to other departures from the postulated model. Robust statistical methods provide useful tools for reducing this sensitivity, through the detection of the outliers by first fitting the majority of the data and then by flagging deviant data points. The six papers in this issue cover a wide orientation in all fields of robustness. This editorial first provides some facts about the history and current state of robust statistics and then summarizes the contents of each paper.

6.
J Comput Chem ; 41(7): 629-634, 2020 Mar 15.
Artículo en Inglés | MEDLINE | ID: mdl-31792984

RESUMEN

There are works of the Maeda-Morokuma group, which propose the artificial force induced reaction (AFIR) method (Maeda et al., J. Comput. Chem. 2014, 35, 166 and 2018, 39, 233). We study this important method from a theoretical point of view. The understanding of the proposers does not use the barrier breakdown point of the AFIR parameter, which usually is half of the reaction path between the minimum and the transition state which is searched for. Based on a comparison with the theory of Newton trajectories, we could better understand the method. It allows us to follow along some reaction pathways from minimum to saddle point, or vice versa. We discuss some well-known two-dimensional test surfaces where we calculate full AFIR pathways. If one has special AFIR curves at hand, one can also study the behavior of the ansatz. © 2019 The Authors. Journal of Computational Chemistry published by Wiley Periodicals, Inc.

7.
Artículo en Inglés | MEDLINE | ID: mdl-32190012

RESUMEN

Evaluating the joint significance of covariates is of fundamental importance in a wide range of applications. To this end, p-values are frequently employed and produced by algorithms that are powered by classical large-sample asymptotic theory. It is well known that the conventional p-values in Gaussian linear model are valid even when the dimensionality is a non-vanishing fraction of the sample size, but can break down when the design matrix becomes singular in higher dimensions or when the error distribution deviates from Gaussianity. A natural question is when the conventional p-values in generalized linear models become invalid in diverging dimensions. We establish that such a breakdown can occur early in nonlinear models. Our theoretical characterizations are confirmed by simulation studies.

8.
Ann Stat ; 46(6B): 3362-3389, 2018 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-30294050

RESUMEN

Robustness is a desirable property for many statistical techniques. As an important measure of robustness, breakdown point has been widely used for regression problems and many other settings. Despite the existing development, we observe that the standard breakdown point criterion is not directly applicable for many classification problems. In this paper, we propose a new breakdown point criterion, namely angular breakdown point, to better quantify the robustness of different classification methods. Using this new breakdown point criterion, we study the robustness of binary large margin classification techniques, although the idea is applicable to general classification methods. Both bounded and unbounded loss functions with linear and kernel learning are considered. These studies provide useful insights on the robustness of different classification methods. Numerical results further confirm our theoretical findings.

9.
Neural Netw ; 94: 173-191, 2017 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-28797759

RESUMEN

We propose a unified formulation of robust learning methods for classification and regression problems. In the learning methods, the hinge loss is used with outlier indicators in order to detect outliers in the observed data. To analyze the robustness property, we evaluate the breakdown point of the learning methods in the situation that the outlier ratio is not necessarily small. Although minimization of the hinge loss with outlier indicators is a non-convex optimization problem, we prove that any local optimal solution of our learning algorithms has the robustness property. The theoretical findings are confirmed in numerical experiments.


Asunto(s)
Redes Neurales de la Computación , Máquina de Vectores de Soporte
10.
J Am Stat Assoc ; 108(502): 632-643, 2013 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-23913996

RESUMEN

Robust variable selection procedures through penalized regression have been gaining increased attention in the literature. They can be used to perform variable selection and are expected to yield robust estimates. However, to the best of our knowledge, the robustness of those penalized regression procedures has not been well characterized. In this paper, we propose a class of penalized robust regression estimators based on exponential squared loss. The motivation for this new procedure is that it enables us to characterize its robustness that has not been done for the existing procedures, while its performance is near optimal and superior to some recently developed methods. Specifically, under defined regularity conditions, our estimators are [Formula: see text] and possess the oracle property. Importantly, we show that our estimators can achieve the highest asymptotic breakdown point of 1/2 and that their influence functions are bounded with respect to the outliers in either the response or the covariate domain. We performed simulation studies to compare our proposed method with some recent methods, using the oracle method as the benchmark. We consider common sources of influential points. Our simulation studies reveal that our proposed method performs similarly to the oracle method in terms of the model error and the positive selection rate even in the presence of influential points. In contrast, other existing procedures have a much lower non-causal selection rate. Furthermore, we re-analyze the Boston Housing Price Dataset and the Plasma Beta-Carotene Level Dataset that are commonly used examples for regression diagnostics of influential points. Our analysis unravels the discrepancies of using our robust method versus the other penalized regression method, underscoring the importance of developing and applying robust penalized regression methods.

11.
J Am Stat Assoc ; 108(502): 644-655, 2013 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-23976805

RESUMEN

Large- and finite-sample efficiency and resistance to outliers are the key goals of robust statistics. Although often not simultaneously attainable, we develop and study a linear regression estimator that comes close. Efficiency obtains from the estimator's close connection to generalized empirical likelihood, and its favorable robustness properties are obtained by constraining the associated sum of (weighted) squared residuals. We prove maximum attainable finite-sample replacement breakdown point, and full asymptotic efficiency for normal errors. Simulation evidence shows that compared to existing robust regression estimators, the new estimator has relatively high efficiency for small sample sizes, and comparable outlier resistance. The estimator is further illustrated and compared to existing methods via application to a real data set with purported outliers.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA