Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
1.
Technometrics ; 61(2): 154-164, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31534281

RESUMO

Penalized regression methods that perform simultaneous model selection and estimation are ubiquitous in statistical modeling. The use of such methods is often unavoidable as manual inspection of all possible models quickly becomes intractable when there are more than a handful of predictors. However, automated methods usually fail to incorporate domain-knowledge, exploratory analyses, or other factors that might guide a more interactive model-building approach. A hybrid approach is to use penalized regression to identify a set of candidate models and then to use interactive model-building to examine this candidate set more closely. To identify a set of candidate models, we derive point and interval estimators of the probability that each model along a solution path will minimize a given model selection criterion, for example, Akaike information criterion, Bayesian information criterion (AIC, BIC), etc., conditional on the observed solution path. Then models with a high probability of selection are considered for further examination. Thus, the proposed methodology attempts to strike a balance between algorithmic modeling approaches that are computationally efficient but fail to incorporate expert knowledge, and interactive modeling approaches that are labor intensive but informed by experience, intuition, and domain knowledge. Supplementary materials for this article are available online.

2.
J Am Stat Assoc ; 112(518): 638-649, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28890584

RESUMO

A dynamic treatment regime is a sequence of decision rules, each of which recommends treatment based on features of patient medical history such as past treatments and outcomes. Existing methods for estimating optimal dynamic treatment regimes from data optimize the mean of a response variable. However, the mean may not always be the most appropriate summary of performance. We derive estimators of decision rules for optimizing probabilities and quantiles computed with respect to the response distribution for two-stage, binary treatment settings. This enables estimation of dynamic treatment regimes that optimize the cumulative distribution function of the response at a prespecified point or a prespecified quantile of the response distribution such as the median. The proposed methods perform favorably in simulation experiments. We illustrate our approach with data from a sequentially randomized trial where the primary outcome is remission of depression symptoms.

3.
J Am Stat Assoc ; 112(520): 1587-1597, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29628539

RESUMO

This paper develops a nonparametric shrinkage and selection estimator via the measurement error selection likelihood approach recently proposed by Stefanski, Wu, and White. The Measurement Error Kernel Regression Operator (MEKRO) has the same form as the Nadaraya-Watson kernel estimator, but optimizes a measurement error model selection likelihood to estimate the kernel bandwidths. Much like LASSO or COSSO solution paths, MEKRO results in solution paths depending on a tuning parameter that controls shrinkage and selection via a bound on the harmonic mean of the pseudo-measurement error standard deviations. We use small-sample-corrected AIC to select the tuning parameter. Large-sample properties of MEKRO are studied and small-sample properties are explored via Monte Carlo experiments and applications to data.

4.
Biometrika ; 102(2): 381-395, 2015 Jun 02.
Artigo em Inglês | MEDLINE | ID: mdl-26146407

RESUMO

We propose an automatic structure recovery method for additive models, based on a backfitting algorithm coupled with local polynomial smoothing, in conjunction with a new kernel-based variable selection strategy. Our method produces estimates of the set of noise predictors, the sets of predictors that contribute polynomially at different degrees up to a specified degree M, and the set of predictors that contribute beyond polynomially of degree M. We prove consistency of the proposed method, and describe an extension to partially linear models. Finite-sample performance of the method is illustrated via Monte Carlo studies and a real-data example.

5.
J Stat Softw ; 64(1)2015 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-26900385

RESUMO

Chronic illness treatment strategies must adapt to the evolving health status of the patient receiving treatment. Data-driven dynamic treatment regimes can offer guidance for clinicians and intervention scientists on how to treat patients over time in order to bring about the most favorable clinical outcome on average. Methods for estimating optimal dynamic treatment regimes, such as Q-learning, typically require modeling nonsmooth, nonmonotone transformations of data. Thus, building well-fitting models can be challenging and in some cases may result in a poor estimate of the optimal treatment regime. Interactive Q-learning (IQ-learning) is an alternative to Q-learning that only requires modeling smooth, monotone transformations of the data. The R package iqLearn provides functions for implementing both the IQ-learning and Q-learning algorithms. We demonstrate how to estimate a two-stage optimal treatment policy with iqLearn using a generated data set bmiData which mimics a two-stage randomized body mass index reduction trial with binary treatments at each stage.

6.
Biometrika ; 101(4): 831-847, 2014 Oct 20.
Artigo em Inglês | MEDLINE | ID: mdl-25541562

RESUMO

Evidence-based rules for optimal treatment allocation are key components in the quest for efficient, effective health care delivery. Q-learning, an approximate dynamic programming algorithm, is a popular method for estimating optimal sequential decision rules from data. Q-learning requires the modeling of nonsmooth, nonmonotone transformations of the data, complicating the search for adequately expressive, yet parsimonious, statistical models. The default Q-learning working model is multiple linear regression, which is not only provably misspecified under most data-generating models, but also results in nonregular regression estimators, complicating inference. We propose an alternative strategy for estimating optimal sequential decision rules for which the requisite statistical modeling does not depend on nonsmooth, nonmonotone transformed data, does not result in nonregular regression estimators, is consistent under a broader array of data-generation models than Q-learning, results in estimated sequential decision rules that have better sampling properties, and is amenable to established statistical approaches for exploratory data analysis, model building, and validation. We derive the new method, IQ-learning, via an interchange in the order of certain steps in Q-learning. In simulated experiments IQ-learning improves on Q-learning in terms of integrated mean squared error and power. The method is illustrated using data from a study of major depressive disorder.

7.
Comput Stat Data Anal ; 67: 15-24, 2013 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-24072947

RESUMO

In clinical studies, covariates are often measured with error due to biological fluctuations, device error and other sources. Summary statistics and regression models that are based on mismeasured data will differ from the corresponding analysis based on the "true" covariate. Statistical analysis can be adjusted for measurement error, however various methods exhibit a tradeo between convenience and performance. Moment Adjusted Imputation (MAI) is method for measurement error in a scalar latent variable that is easy to implement and performs well in a variety of settings. In practice, multiple covariates may be similarly influenced by biological fluctuastions, inducing correlated multivariate measurement error. The extension of MAI to the setting of multivariate latent variables involves unique challenges. Alternative strategies are described, including a computationally feasible option that is shown to perform well.

8.
J Am Stat Assoc ; 108(502): 644-655, 2013 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-23976805

RESUMO

Large- and finite-sample efficiency and resistance to outliers are the key goals of robust statistics. Although often not simultaneously attainable, we develop and study a linear regression estimator that comes close. Efficiency obtains from the estimator's close connection to generalized empirical likelihood, and its favorable robustness properties are obtained by constraining the associated sum of (weighted) squared residuals. We prove maximum attainable finite-sample replacement breakdown point, and full asymptotic efficiency for normal errors. Simulation evidence shows that compared to existing robust regression estimators, the new estimator has relatively high efficiency for small sample sizes, and comparable outlier resistance. The estimator is further illustrated and compared to existing methods via application to a real data set with purported outliers.

9.
Biometrika ; 99(2): 405-421, 2012 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-23843665

RESUMO

We study estimation in quantile regression when covariates are measured with errors. Existing methods require stringent assumptions, such as spherically symmetric joint distribution of the regression and measurement error variables, or linearity of all quantile functions, which restrict model flexibility and complicate computation. In this paper, we develop a new estimation approach based on corrected scores to account for a class of covariate measurement errors in quantile regression. The proposed method is simple to implement. Its validity requires only linearity of the particular quantile function of interest, and it requires no parametric assumptions on the regression error distributions. Finite-sample results demonstrate that the proposed estimators are more efficient than the existing methods in various models considered.

10.
Comput Stat Data Anal ; 55(6): 2026-2037, 2011 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-21479118

RESUMO

Most variable selection techniques focus on first-order linear regression models. Often, interaction and quadratic terms are also of interest, but the number of candidate predictors grows very fast with the number of original predictors, making variable selection more difficult. Forward selection algorithms are thus developed that enforce natural hierarchies in second-order models to control the entry rate of uninformative effects and to equalize the false selection rates from first-order and second-order terms. Method performance is compared through Monte Carlo simulation and illustrated with data from a Cox regression and from a response surface experiment.

11.
Stat Med ; 30(14): 1722-34, 2011 Jun 30.
Artigo em Inglês | MEDLINE | ID: mdl-21284016

RESUMO

We present a semi-parametric deconvolution estimator for the density function of a random variable biX that is measured with error, a common challenge in many epidemiological studies. Traditional deconvolution estimators rely only on assumptions about the distribution of X and the error in its measurement, and ignore information available in auxiliary variables. Our method assumes the availability of a covariate vector statistically related to X by a mean-variance function regression model, where regression errors are normally distributed and independent of the measurement errors. Simulations suggest that the estimator achieves a much lower integrated squared error than the observed-data kernel density estimator when models are correctly specified and the assumption of normal regression errors is met. We illustrate the method using anthropometric measurements of newborns to estimate the density function of newborn length.


Assuntos
Antropometria/métodos , Biometria/métodos , Bioestatística/métodos , Modelos Estatísticos , Algoritmos , Análise de Variância , Viés , Peso ao Nascer , Estatura , Simulação por Computador , Feminino , Humanos , Recém-Nascido , Modelos Lineares , Masculino , Pennsylvania , Tamanho da Amostra
12.
Ann Inst Stat Math ; 63(1): 81-99, 2011 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-21311734

RESUMO

We present a deconvolution estimator for the density function of a random variable from a set of independent replicate measurements. We assume that measurements are made with normally distributed errors having unknown and possibly heterogeneous variances. The estimator generalizes the deconvoluting kernel density estimator of Stefanski and Carroll (1990), with error variances estimated from the replicate observations. We derive expressions for the integrated mean squared error and examine its rate of convergence as n → ∞ and the number of replicates is fixed. We investigate the finite-sample performance of the estimator through a simulation study and an application to real data.

13.
Am Stat ; 65(4): 213-221, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-22690019

RESUMO

P-values are useful statistical measures of evidence against a null hypothesis. In contrast to other statistical estimates, however, their sample-to-sample variability is usually not considered or estimated, and therefore not fully appreciated. Via a systematic study of log-scale p-value standard errors, bootstrap prediction bounds, and reproducibility probabilities for future replicate p-values, we show that p-values exhibit surprisingly large variability in typical data situations. In addition to providing context to discussions about the failure of statistical results to replicate, our findings shed light on the relative value of exact p-values vis-a-vis approximate p-values, and indicate that the use of *, **, and *** to denote levels .05, .01, and .001 of statistical significance in subject-matter journals is about the right level of precision for reporting p-values when judged by widely accepted rules for rounding statistical estimates.

14.
Stat Appl Genet Mol Biol ; 8: Article 2, 2009.
Artigo em Inglês | MEDLINE | ID: mdl-19222385

RESUMO

There is great interest in finding human genes expressed through pharmaceutical intervention, thus opening a genomic window into benefit and side-effect profiles of a drug. Human insight gained from FDA-required animal experiments has historically been limited, but in the case of gene expression measurements, proposed biological orthologies between mouse and human genes provide a foothold for animal-to-human extrapolation. We have investigated a five-component, multilevel, bivariate normal mixture model that incorporates mouse, as well as human, gene expression data. The goal is two-fold: to increase human differential gene-finding power; and to find a subclass of gene pairs for which there is a direct exploitable relationship between animal and human genes. In simulation studies, the dual-species model boasted impressive gains in differential gene-finding power over a related marginal model using only human data. Bias in parameter estimation was problematic, however, and occasionally led to failures in control of the false discovery rate. Though it was considerably more difficult to find species-extrapolative gene-pairs (than differentially expressed human genes), simulation experiments deemed it to be possible, especially when traditional FDR controls are relaxed and under hypothetical parameter configurations.


Assuntos
Perfilação da Expressão Gênica , Modelos Genéticos , Homologia de Sequência do Ácido Nucleico , Animais , Simulação por Computador , Intervalos de Confiança , Humanos , Camundongos , Tamanho da Amostra
15.
Biometrics ; 65(3): 719-27, 2009 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-19173697

RESUMO

Joint modeling of a primary response and a longitudinal process via shared random effects is widely used in many areas of application. Likelihood-based inference on joint models requires model specification of the random effects. Inappropriate model specification of random effects can compromise inference. We present methods to diagnose random effect model misspecification of the type that leads to biased inference on joint models. The methods are illustrated via application to simulated data, and by application to data from a study of bone mineral density in perimenopausal women and data from an HIV clinical trial.


Assuntos
Biometria/métodos , Ensaios Clínicos como Assunto , Interpretação Estatística de Dados , Modificador do Efeito Epidemiológico , Determinação de Ponto Final/métodos , Estudos Longitudinais , Modelos Estatísticos , Simulação por Computador , Análise de Regressão
16.
Biometrics ; 65(3): 692-700, 2009 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-18945266

RESUMO

A new version of the false selection rate variable selection method of Wu, Boos, and Stefanski (2007, Journal of the American Statistical Association 102, 235-243) is developed that requires no simulation. This version allows the tuning parameter in forward selection to be estimated simply by hand calculation from a summary table of output even for situations where the number of explanatory variables is larger than the sample size. Because of the computational simplicity, the method can be used in permutation tests and inside bagging loops for improved prediction. Illustration is provided in clinical trials for linear regression, logistic regression, and Cox proportional hazards regression.


Assuntos
Biometria/métodos , Ensaios Clínicos como Assunto , Interpretação Estatística de Dados , Modificador do Efeito Epidemiológico , Modelos Estatísticos , Modelos de Riscos Proporcionais , Análise de Regressão , Simulação por Computador
17.
Biometrics ; 62(3): 877-85, 2006 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-16984331

RESUMO

We develop a new statistic for testing the equality of two multivariate mean vectors. A scaled chi-squared distribution is proposed as an approximating null distribution. Because the test statistic is based on componentwise statistics, it has the advantage over Hotelling's T2 test of being applicable to the case where the dimension of an observation exceeds the number of observations. An appealing feature of the new test is its ability to handle missing data by relying on only componentwise sample moments. Monte Carlo studies indicate good power compared to Hotelling's T2 and a recently proposed test by Srivastava (2004, Technical Report, University of Toronto). The test is applied to drug discovery data.


Assuntos
Biometria/métodos , Análise Multivariada , Algoritmos , Interpretação Estatística de Dados , Desenho de Fármacos , Modelos Estatísticos , Método de Monte Carlo , Relação Quantitativa Estrutura-Atividade , Tamanho da Amostra
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...