Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 13 de 13
Filter
Add more filters










Publication year range
1.
J Appl Stat ; 50(3): 744-760, 2023.
Article in English | MEDLINE | ID: mdl-36819084

ABSTRACT

Causal inference under the potential outcome framework relies on the strongly ignorable treatment assumption. This assumption is usually questionable in observational studies, and the unmeasured confounding is one of the fundamental challenges in causal inference. To this end, we propose a new sensitivity analysis method to evaluate the impact of the unmeasured confounder by leveraging ideas of doubly robust estimators, the exponential tilt method, and the super learner algorithm. Compared to other existing methods of sensitivity analysis that parameterize the unmeasured confounder as a latent variable in the working models, the exponential tilting method does not impose any restrictions on the structure or models of the unmeasured confounders. In addition, in order to reduce the modeling bias of traditional parametric methods, we propose incorporating the super learner machine learning algorithm to perform nonparametric model estimation and the corresponding sensitivity analysis. Furthermore, most existing sensitivity analysis methods require multivariate sensitivity parameters, which make its choice difficult and subjective in practice. In comparison, the new method has a univariate sensitivity parameter with a nice and simple interpretation of log-odds ratios for binary outcomes, which makes its choice and the application of the new sensitivity analysis method very easy for practitioners.

2.
J R Stat Soc Ser A Stat Soc ; 185(3): 1424-1453, 2022 Jul.
Article in English | MEDLINE | ID: mdl-36105847

ABSTRACT

In this paper, under the stationary α-mixing dependent samples, we develop a novel nonlinear modal regression for time series sequences and establish the consistency and asymptotic property of the proposed nonlinear modal estimator with a shrinking bandwidth h under certain regularity conditions. The asymptotic distribution is shown to be identical to the one derived from the independent observations, whereas the convergence rate ( n h 3 in which n is the sample size) is slower than that in the nonlinear mean regression. We numerically estimate the proposed nonlinear modal regression model by the use of a modified modal expectation-maximization (MEM) algorithm in conjunction with Taylor expansion. Monte Carlo simulations are presented to demonstrate the good finite sample (prediction) performance of the newly proposed model. We also construct a specified nonlinear modal regression to match the available daily new cases and new deaths data of the COVID-19 outbreak at the state/region level in the United States, and provide forward predictions up to 130 days ahead (from 24 August 2020 to 31 December 2020). In comparison to the traditional nonlinear regressions, the suggested model can fit the COVID-19 data better and produce more precise predictions. The prediction results indicate that there are systematic differences in spreading distributions among states/regions. For most western and eastern states, they have many serious COVID-19 burdens compared to Midwest. We hope that the built nonlinear modal regression can help policymakers to implement fast actions to curb the spread of the infection, avoid overburdening the health system and understand the development of COVID-19 from some points.

3.
Biometrics ; 78(2): 716-729, 2022 06.
Article in English | MEDLINE | ID: mdl-33527347

ABSTRACT

Researchers often have to deal with heterogeneous population with mixed regression relationships, increasingly so in the era of data explosion. In such problems, when there are many candidate predictors, it is not only of interest to identify the predictors that are associated with the outcome, but also to distinguish the true sources of heterogeneity, that is, to identify the predictors that have different effects among the clusters and thus are the true contributors to the formation of the clusters. We clarify the concepts of the source of heterogeneity that account for potential scale differences of the clusters and propose a regularized finite mixture effects regression to achieve heterogeneity pursuit and feature selection simultaneously. We develop an efficient algorithm and show that our approach can achieve both estimation and selection consistency. Simulation studies further demonstrate the effectiveness of our method under various practical scenarios. Three applications are presented, namely, an imaging genetics study for linking genetic factors and brain neuroimaging traits in Alzheimer's disease, a public health study for exploring the association between suicide risk among adolescents and their school district characteristics, and a sport analytics study for understanding how the salary levels of baseball players are associated with their performance and contractual status.


Subject(s)
Alzheimer Disease , Neuroimaging , Adolescent , Algorithms , Alzheimer Disease/genetics , Brain , Computer Simulation , Humans , Neuroimaging/methods
4.
Can J Stat ; 50(1): 267-286, 2022 Mar.
Article in English | MEDLINE | ID: mdl-38239624

ABSTRACT

In this article, we propose a novel estimator of extreme conditional quantiles in partial functional linear regression models with heavy-tailed distributions. The conventional quantile regression estimators are often unstable at the extreme tails due to data sparsity, especially for heavy-tailed distributions. We first estimate the slope function and the partially linear coefficient using a functional quantile regression based on functional principal component analysis, which is a robust alternative to the ordinary least squares regression. The extreme conditional quantiles are then estimated by using a new extrapolation technique from extreme value theory. We establish the asymptotic normality of the proposed estimator and illustrate its finite sample performance by simulation studies and an empirical analysis of diffusion tensor imaging data from a cognitive disorder study.


Dans cet article, un nouvel estimateur de quantiles conditionnels extrêmes est élaboré dans le cadre de modèles de régression linéaire fonctionnelle partielle avec des distributions à queues lourdes. Il est bien connu que la rareté des observations dans les ailes extrêmes de distributions à queues lourdes rend souvent les estimateurs de régression quantile usuels instables. Pour parer à la non robustesse des moindres carrés classiques, les auteurs ont commencé par estimer la fonction de pente et le coefficient partiellement linéaire d'une régression quantile en ayant recours à une approche basée sur l'analyse en composantes principales fonctionnelles. Ensuite, ils ont estimé les quantiles conditionnels extrêmes à l'aide d'une nouvelle technique d'extrapolation issue de la théorie des valeurs extrêmes. En plus d'établir la normalité asymptotique de l'estimateur proposé, les auteurs illustrent ses bonnes performances à distance finie par le biais d'une étude de simulation et une mise en oeuvre pratique sur les données d'imagerie de diffusion par tenseurs provenant d'une étude portant sur des troubles cognitifs.

5.
Sci Rep ; 10(1): 9747, 2020 06 16.
Article in English | MEDLINE | ID: mdl-32546735

ABSTRACT

Feature selection is demanded in many modern scientific research problems that use high-dimensional data. A typical example is to identify gene signatures that are related to a certain disease from high-dimensional gene expression data. The expression of genes may have grouping structures, for example, a group of co-regulated genes that have similar biological functions tend to have similar expressions. Thus it is preferable to take the grouping structure into consideration to select features. In this paper, we propose a Bayesian Robit regression method with Hyper-LASSO priors (shortened by BayesHL) for feature selection in high dimensional genomic data with grouping structure. The main features of BayesHL include that it discards more aggressively unrelated features than LASSO, and it makes feature selection within groups automatically without a pre-specified grouping structure. We apply BayesHL in gene expression analysis to identify subsets of genes that contribute to the 5-year survival outcome of endometrial cancer (EC) patients. Results show that BayesHL outperforms alternative methods (including LASSO, group LASSO, supervised group LASSO, penalized logistic regression, random forest, neural network, XGBoost and knockoff) in terms of predictive power, sparsity and the ability to uncover grouping structure, and provides insight into the mechanisms of multiple genetic pathways leading to differentiated EC survival outcome.


Subject(s)
Endometrial Neoplasms/classification , Endometrial Neoplasms/genetics , Sequence Analysis, DNA/methods , Bayes Theorem , Computational Biology/methods , Female , Genomics , Humans , Logistic Models , RNA-Seq , Regression Analysis , Exome Sequencing
6.
Hydrol Process ; 32(22): 3365-3390, 2018 Oct 30.
Article in English | MEDLINE | ID: mdl-31073260

ABSTRACT

Accurate and reliable reservoir inflow forecast is instrumental to the efficient operation of the hydroelectric power systems. It has been discovered that natural and anthropogenic aerosols have a great influence on meteorological variables such as temperature, snow water equivalent, and precipitation, which in turn impact the reservoir inflow. Therefore, it is imperative for us to quantify the impact of aerosols on reservoir inflow and to incorporate the aerosol models into future reservoir inflow forecasting models. In this paper, a comprehensive framework was developed to quantify the impact of aerosols on reservoir inflow by integrating the Weather Research and Forecasting model with Chemistry (WRF-Chem) and a dynamic regression model. The statistical dynamic regression model produces forecasts for reservoir inflow based on the meteorological output variables from the WRF-Chem model. The case study was performed on the Florence Lake and Lake Thomas Alva Edison of the Big Creek Hydroelectric Project in the San Joaquin Region. The simulation results show that the presence of aerosols results in a significant reduction of annual reservoir inflow by 4-14%. In the summer, aerosols reduce precipitation, snow water equivalent, and snowmelt that leads to a reduction in inflow by 11-26%. In the spring, aerosols increase temperature and snowmelt which leads to an increase in inflow by 0.6-2%. Aerosols significantly reduce the amount of inflow in the summer when the marginal value of water is extremely high and slightly increase the inflow in the spring when the run-off risk is high. In summary, the presence of aerosols is detrimental to the optimal utilization of hydroelectric power systems.

7.
Comput Stat Data Anal ; 111: 14-26, 2017 Jul.
Article in English | MEDLINE | ID: mdl-28947841

ABSTRACT

Finite mixture of regression (FMR) models can be reformulated as incomplete data problems and they can be estimated via the expectation-maximization (EM) algorithm. The main drawback is the strong parametric assumption such as FMR models with normal distributed residuals. The estimation might be biased if the model is misspecified. To relax the parametric assumption about the component error densities, a new method is proposed to estimate the mixture regression parameters by only assuming that the components have log-concave error densities but the specific parametric family is unknown. Two EM-type algorithms for the mixtures of regression models with log-concave error densities are proposed. Numerical studies are made to compare the performance of our algorithms with the normal mixture EM algorithms. When the component error densities are not normal, the new methods have much smaller MSEs when compared with the standard normal mixture EM algorithms. When the underlying component error densities are normal, the new methods have comparable performance to the normal EM algorithm.

8.
Can J Stat ; 45(1): 77-94, 2017 Mar.
Article in English | MEDLINE | ID: mdl-28579672

ABSTRACT

Finite mixture regression models have been widely used for modelling mixed regression relationships arising from a clustered and thus heterogenous population. The classical normal mixture model, despite its simplicity and wide applicability, may fail in the presence of severe outliers. Using a sparse, case-specific, and scale-dependent mean-shift mixture model parameterization, we propose a robust mixture regression approach for simultaneously conducting outlier detection and robust parameter estimation. A penalized likelihood approach is adopted to induce sparsity among the mean-shift parameters so that the outliers are distinguished from the remainder of the data, and a generalized Expectation-Maximization (EM) algorithm is developed to perform stable and efficient computation. The proposed approach is shown to have strong connections with other robust methods including the trimmed likelihood method and M-estimation approaches. In contrast to several existing methods, the proposed methods show outstanding performance in our simulation studies.

9.
Stat Sin ; 26(3): 979-1000, 2016 Jul.
Article in English | MEDLINE | ID: mdl-27667908

ABSTRACT

Motivated by an empirical analysis of ecological momentary assessment data (EMA) collected in a smoking cessation study, we propose a joint modeling technique for estimating the time-varying association between two intensively measured longitudinal responses: a continuous one and a binary one. A major challenge in joint modeling these responses is the lack of a multivariate distribution. We suggest introducing a normal latent variable underlying the binary response and factorizing the model into two components: a marginal model for the continuous response, and a conditional model for the binary response given the continuous response. We develop a two-stage estimation procedure and establish the asymptotic normality of the resulting estimators. We also derived the standard error formulas for estimated coefficients. We conduct a Monte Carlo simulation study to assess the finite sample performance of our procedure. The proposed method is illustrated by an empirical analysis of smoking cessation data, in which the question of interest is to investigate the association between urge to smoke, continuous response, and the status of alcohol use, the binary response, and how this association varies over time.

10.
Comput Stat Data Anal ; 101: 137-147, 2016 Sep.
Article in English | MEDLINE | ID: mdl-27065505

ABSTRACT

Finite mixture models are useful tools and can be estimated via the EM algorithm. A main drawback is the strong parametric assumption about the component densities. In this paper, a much more flexible mixture model is considered, which assumes each component density to be log-concave. Under fairly general conditions, the log-concave maximum likelihood estimator (LCMLE) exists and is consistent. Numeric examples are also made to demonstrate that the LCMLE improves the clustering results while comparing with the traditional MLE for parametric mixture models.

11.
J Bus Econ Stat ; 32(2): 259-270, 2014.
Article in English | MEDLINE | ID: mdl-24976675

ABSTRACT

When the functional data are not homogeneous, e.g., there exist multiple classes of functional curves in the dataset, traditional estimation methods may fail. In this paper, we propose a new estimation procedure for the Mixture of Gaussian Processes, to incorporate both functional and inhomogeneous properties of the data. Our method can be viewed as a natural extension of high-dimensional normal mixtures. However, the key difference is that smoothed structures are imposed for both the mean and covariance functions. The model is shown to be identifiable, and can be estimated efficiently by a combination of the ideas from EM algorithm, kernel regression, and functional principal component analysis. Our methodology is empirically justified by Monte Carlo simulations and illustrated by an analysis of a supermarket dataset.

12.
J R Stat Soc Series B Stat Methodol ; 75(1): 123-138, 2013 Jan 01.
Article in English | MEDLINE | ID: mdl-23539417

ABSTRACT

This paper develops a new estimation of nonparametric regression functions for clustered or longitudinal data. We propose to use Cholesky decomposition and profile least squares techniques to estimate the correlation structure and regression function simultaneously. We further prove that the proposed estimator is as asymptotically efficient as if the covariance matrix were known. A Monte Carlo simulation study is conducted to examine the finite sample performance of the proposed procedure, and to compare the proposed procedure with the existing ones. Based on our empirical studies, the newly proposed procedure works better than the naive local linear regression with working independence error structure and the efficiency gain can be achieved in moderate-sized samples. Our numerical comparison also shows that the newly proposed procedure outperforms some existing ones. A real data set application is also provided to illustrate the proposed estimation procedure.

13.
J Nonparametr Stat ; 24(3): 647-663, 2012 Jan 01.
Article in English | MEDLINE | ID: mdl-23049230

ABSTRACT

A local modal estimation procedure is proposed for the regression function in a non-parametric regression model. A distinguishing characteristic of the proposed procedure is that it introduces an additional tuning parameter that is automatically selected using the observed data in order to achieve both robustness and efficiency of the resulting estimate. We demonstrate both theoretically and empirically that the resulting estimator is more efficient than the ordinary local polynomial regression estimator in the presence of outliers or heavy tail error distribution (such as t-distribution). Furthermore, we show that the proposed procedure is as asymptotically efficient as the local polynomial regression estimator when there are no outliers and the error distribution is a Gaussian distribution. We propose an EM type algorithm for the proposed estimation procedure. A Monte Carlo simulation study is conducted to examine the finite sample performance of the proposed method. The simulation results confirm the theoretical findings. The proposed methodology is further illustrated via an analysis of a real data example.

SELECTION OF CITATIONS
SEARCH DETAIL
...