Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 74
Filtrar
1.
Stat Med ; 43(19): 3578-3594, 2024 Aug 30.
Artigo em Inglês | MEDLINE | ID: mdl-38881189

RESUMO

In health and clinical research, medical indices (eg, BMI) are commonly used for monitoring and/or predicting health outcomes of interest. While single-index modeling can be used to construct such indices, methods to use single-index models for analyzing longitudinal data with multiple correlated binary responses are underdeveloped, although there are abundant applications with such data (eg, prediction of multiple medical conditions based on longitudinally observed disease risk factors). This article aims to fill the gap by proposing a generalized single-index model that can incorporate multiple single indices and mixed effects for describing observed longitudinal data of multiple binary responses. Compared to the existing methods focusing on constructing marginal models for each response, the proposed method can make use of the correlation information in the observed data about different responses when estimating different single indices for predicting response variables. Estimation of the proposed model is achieved by using a local linear kernel smoothing procedure, together with methods designed specifically for estimating single-index models and traditional methods for estimating generalized linear mixed models. Numerical studies show that the proposed method is effective in various cases considered. It is also demonstrated using a dataset from the English Longitudinal Study of Aging project.


Assuntos
Modelos Estatísticos , Estudos Longitudinais , Humanos , Modelos Lineares , Simulação por Computador , Interpretação Estatística de Dados
2.
Biostatistics ; 2024 Apr 19.
Artigo em Inglês | MEDLINE | ID: mdl-38637995

RESUMO

Computed tomography (CT) has been a powerful diagnostic tool since its emergence in the 1970s. Using CT data, 3D structures of human internal organs and tissues, such as blood vessels, can be reconstructed using professional software. This 3D reconstruction is crucial for surgical operations and can serve as a vivid medical teaching example. However, traditional 3D reconstruction heavily relies on manual operations, which are time-consuming, subjective, and require substantial experience. To address this problem, we develop a novel semiparametric Gaussian mixture model tailored for the 3D reconstruction of blood vessels. This model extends the classical Gaussian mixture model by enabling nonparametric variations in the component-wise parameters of interest according to voxel positions. We develop a kernel-based expectation-maximization algorithm for estimating the model parameters, accompanied by a supporting asymptotic theory. Furthermore, we propose a novel regression method for optimal bandwidth selection. Compared to the conventional cross-validation-based (CV) method, the regression method outperforms the CV method in terms of computational and statistical efficiency. In application, this methodology facilitates the fully automated reconstruction of 3D blood vessel structures with remarkable accuracy.

3.
Stat Med ; 42(25): 4556-4569, 2023 11 10.
Artigo em Inglês | MEDLINE | ID: mdl-37599209

RESUMO

The spatial relative risk function describes differences in the geographical distribution of two types of points, such as locations of cases and controls in an epidemiological study. It is defined as the ratio of the two underlying densities. Estimation of spatial relative risk is typically done using kernel estimates of these densities, but this procedure is often challenging in practice because of the high degree of spatial inhomogeneity in the distributions. This makes it difficult to obtain estimates of the relative risk that are stable in areas of sparse data while retaining necessary detail elsewhere, and consequently difficult to distinguish true risk hotspots from stochastic bumps in the risk function. We study shrinkage estimators of the spatial relative risk function to address these problems. In particular, we propose a new lasso-type estimator that shrinks a standard kernel estimator of the log-relative risk function towards zero. The shrinkage tuning parameter can be adjusted to help quantify the degree of evidence for the existence of risk hotspots, or selected to optimize a cross-validation criterion. The performance of the lasso estimator is encouraging both on a simulation study and on real-world examples.


Assuntos
Projetos de Pesquisa , Humanos , Risco , Simulação por Computador
4.
Stat Med ; 42(17): 2982-2998, 2023 07 30.
Artigo em Inglês | MEDLINE | ID: mdl-37173778

RESUMO

In medical studies, composite indices and/or scores are routinely used for predicting medical conditions of patients. These indices are usually developed from observed data of certain disease risk factors, and it has been demonstrated in the literature that single index models can provide a powerful tool for this purpose. In practice, the observed data of disease risk factors are often longitudinal in the sense that they are collected at multiple time points for individual patients, and there are often multiple aspects of a patient's medical condition that are of our concern. However, most existing single-index models are developed for cases with independent data and a single response variable, which are inappropriate for the problem just described in which within-subject observations are usually correlated and there are multiple mutually correlated response variables involved. This paper aims to fill this methodological gap by developing a single index model for analyzing longitudinal data with multiple responses. Both theoretical and numerical justifications show that the proposed new method provides an effective solution to the related research problem. It is also demonstrated using a dataset from the English Longitudinal Study of Aging.


Assuntos
Estudos Longitudinais , Humanos , Estatística como Assunto
5.
Int J Biostat ; 2023 May 31.
Artigo em Inglês | MEDLINE | ID: mdl-37257507

RESUMO

This paper considers a partially linear regression model relating a right-censored response variable to predictors and an extra covariate with measured error. The main problem here is that censorship and measurement error problems need to be solved to estimate the model correctly. In this sense, we propose three modified semiparametric estimators obtained from local polynomial regression, kernel smoothing, and B-spline smoothing methods based on kernel deconvolution approach and synthetic data transformation. Here, kernel deconvolution technique is used to solve the measurement error problem in the model and synthetic data transformation is considered to add the effect of censorship to the estimation procedure, which is a very common method in the literature. The performances of the introduced estimators are compared in the detailed Monte-Carlo simulation study. In addition, Carotid endarterectomy data is used as real-world data example and results are presented. According to the results, it is seen that the deconvoluted local polynomial method gives more qualified estimates than other two methods.

6.
Biometrics ; 79(4): 3374-3387, 2023 12.
Artigo em Inglês | MEDLINE | ID: mdl-37042741

RESUMO

In many longitudinal settings, time-varying covariates may not be measured at the same time as responses and are often prone to measurement error. Naive last-observation-carried-forward methods incur estimation biases, and existing kernel-based methods suffer from slow convergence rates and large variations. To address these challenges, we propose a new functional calibration approach to efficiently learn longitudinal covariate processes based on sparse functional data with measurement error. Our approach, stemming from functional principal component analysis, calibrates the unobserved synchronized covariate values from the observed asynchronous and error-prone covariate values, and is broadly applicable to asynchronous longitudinal regression with time-invariant or time-varying coefficients. For regression with time-invariant coefficients, our estimator is asymptotically unbiased, root-n consistent, and asymptotically normal; for time-varying coefficient models, our estimator has the optimal varying coefficient model convergence rate with inflated asymptotic variance from the calibration. In both cases, our estimators present asymptotic properties superior to the existing methods. The feasibility and usability of the proposed methods are verified by simulations and an application to the Study of Women's Health Across the Nation, a large-scale multisite longitudinal study on women's health during midlife.


Assuntos
Modelos Estatísticos , Feminino , Humanos , Estudos Longitudinais , Análise de Regressão , Calibragem , Viés
7.
Biometrics ; 79(2): 695-710, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-34877661

RESUMO

Statistical analysis of longitudinal data often involves modeling treatment effects on clinically relevant longitudinal biomarkers since an initial event (the time origin). In some studies including preventive HIV vaccine efficacy trials, some participants have biomarkers measured starting at the time origin, whereas others have biomarkers measured starting later with the time origin unknown. The semiparametric additive time-varying coefficient model is investigated where the effects of some covariates vary nonparametrically with time while the effects of others remain constant. Weighted profile least squares estimators coupled with kernel smoothing are developed. The method uses the expectation maximization approach to deal with the censored time origin. The Kaplan-Meier estimator and other failure time regression models such as the Cox model can be utilized to estimate the distribution and the conditional distribution of left censored event time related to the censored time origin. Asymptotic properties of the parametric and nonparametric estimators and consistent asymptotic variance estimators are derived. A two-stage estimation procedure for choosing weight is proposed to improve estimation efficiency. Numerical simulations are conducted to examine finite sample properties of the proposed estimators. The simulation results show that the theory and methods work well. The efficiency gain of the two-stage estimation procedure depends on the distribution of the longitudinal error processes. The method is applied to analyze data from the Merck 023/HVTN 502 Step HIV vaccine study.


Assuntos
Modelos Estatísticos , Projetos de Pesquisa , Humanos , Simulação por Computador , Modelos de Riscos Proporcionais , Análise de Sobrevida
8.
Scand Stat Theory Appl ; 50(1): 266-295, 2023 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-39076352

RESUMO

We model the Alzheimer's Disease-related phenotype response variables observed on irregular time points in longitudinal Genome-Wide Association Studies as sparse functional data and propose nonparametric test procedures to detect functional genotype effects while controlling the confounding effects of environmental covariates. Our new functional analysis of covariance tests are based on a seemingly unrelated kernel smoother, which takes into account the within-subject temporal correlations, and thus enjoy improved power over existing functional tests. We show that the proposed test combined with a uniformly consistent nonparametric covariance function estimator enjoys the Wilks phenomenon and is minimax most powerful. Data used in the preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database, where an application of the proposed test lead to the discovery of new genes that may be related to Alzheimer's Disease.

9.
Stat Sin ; 32(Suppl): 547-567, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36415324

RESUMO

Personalized treatment aims at tailoring treatments to individual characteristics. An important step is to understand how a treatment effect varies across individual characteristics, known as the conditional average treatment effect (CATE). In this study, we make robust inferences of the CATE from observational data, which becomes challenging with a multivariate confounder. To reduce the curse of dimensionality, while keeping the nonparametric advantages, we propose double dimension reductions that achieve different goal. First, we identify the central mean subspace of the CATE directly using dimension reduction in order to detect the most accurate and parsimonious structure of the CATE. Second, we use a nonparametric regression with a prior dimension reduction to impute counterfactual outcomes, which helps to improve the stability of the imputation. We establish the asymptotic properties of the proposed estimator, taking into account the two-step double dimension reduction, and propose an effective bootstrapping procedure without bootstrapping the estimated central mean subspace to make valid inferences. A simulation and applications show that the proposed estimator outperforms existing competitors.

10.
Stat Med ; 41(25): 5084-5101, 2022 11 10.
Artigo em Inglês | MEDLINE | ID: mdl-36263919

RESUMO

Distributed estimation based on different sources of observations has drawn attention in the modern statistical learning. In practice, due to the expensive cost or time-consuming process to collect data in some cases, the sample size on each local site can be small, but the dimension of covariates is large and may be far larger than the sample size on each site. In this article, we focus on the distributed estimation and inference for a preconceived low-dimensional parameter vector in the high-dimensional quantile regression model with small local sample size. Specifically, we consider that the data are inherently distributed and propose two communication-efficient estimators by generalizing the decorrelated score approach to conquer the slow convergence rate of nuisance parameter estimation and adopting the smoothing technique based on multiround algorithms. The risk bounds and limiting distributions of the proposed estimators are given. The finite sample performance of the proposed estimators is studied through simulations and an application to a gene expression dataset is also presented.


Assuntos
Algoritmos , Comunicação , Humanos
11.
J Econom ; 230(2): 221-239, 2022 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-36017081

RESUMO

When predicting crop yield using both functional and multivariate predictors, the prediction performances benefit from the inclusion of the interactions between the two sets of predictors. We assume the interaction depends on a nonparametric, single-index structure of the multivariate predictor and reduce each functional predictor's dimension using functional principal component analysis (FPCA). Allowing the number of FPCA scores to diverge to infinity, we consider a sequence of semiparametric working models with a diverging number of predictors, which are FPCA scores with estimation errors. We show that the parametric component of the model is root-n consistent and asymptotically normal, the overall prediction error is dominated by the estimation of the nonparametric interaction function, and justify a CV-based procedure to select the tuning parameters.

12.
J Comput Graph Stat ; 31(2): 390-402, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35685204

RESUMO

We propose interval censored recursive forests (ICRF), an iterative tree ensemble method for interval censored survival data. This nonparametric regression estimator addresses the splitting bias problem of existing tree-based methods and iteratively updates survival estimates in a self-consistent manner. Consistent splitting rules are developed for interval censored data, convergence is monitored using out-of-bag samples, and kernel-smoothing is applied. The ICRF is uniformly consistent and displays high prediction accuracy in both simulations and applications to avalanche and national mortality data. An R package icrf is available on CRAN and Supplementary Materials for this article are available online.

13.
Biom J ; 64(6): 1056-1074, 2022 08.
Artigo em Inglês | MEDLINE | ID: mdl-35523738

RESUMO

The receiver-operating characteristic (ROC) curve is the most popular graphical method for evaluating the classification accuracy of a diagnostic marker. In time-to-event studies, the subject's event status is time-dependent, and hence, time-dependent extensions of ROC curve have been proposed. However, in practice, the calculation of this curve is not straightforward due to the presence of censoring that may be of different types. Existing methods focus on the more standard and simple case of right-censoring and neglect the general case of mixed interval-censored data that may involve left-, right-, and interval-censored observations. In this context, we propose and study a new time-dependent ROC curve estimator. We also consider some summary measures (area under the ROC curve and Youden index) traditionally associated with ROC as well as the Youden-based cutoff estimation method. The proposed method uses available data very efficiently. To this end, the unknown status (positive or negative) of censored subjects are estimated from the data via the estimation of the conditional survival function given the marker. For that, we investigate both model-based and nonparametric approaches. We also provide variance estimates and confidence intervals using Bootstrap. A simulation study is conducted to investigate the finite sample behavior of the proposed methods and to compare their performance with a competitor. Globally, we observed better finite sample performances for the proposed estimators. Finally, we illustrate the methods using two data sets one from a hypobaric decompression sickness study and the other from an oral health study. The proposed methods are implemented in the R package cenROC.


Assuntos
Curva ROC , Área Sob a Curva , Biomarcadores , Simulação por Computador , Humanos
14.
Comput Methods Programs Biomed ; 217: 106694, 2022 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-35278813

RESUMO

BACKGROUND AND OBJECTIVE: Nowadays the "low sample size, large dimension" scenario is often encountered in genetics and in the omic sciences, where the microarray data is typically formed by a large number of possibly dependent small samples. Standard methods to solve the k-sample problem in such a setting are of limited applicability due to lack of theoretical validation for large k, lengthy computational times, missing software solutions, or inability to deal with statistical dependence among the samples. This paper presents the R package Equalden.HD to overcome the referred limitations. METHODS: The package implements several tests for the null hypothesis that a large number of samples follow a common density. These methods are particularly well suited to the "low sample size, large dimension" setting. The implemented procedures allow for dependent samples. For each method Equalden.HD reports, among other things, the standardized value of the test statistic and the corresponding p-value. The package also includes two high-dimensional genetic data sets, Hedenfalk and Rat, which are used in this paper for illustration purposes. RESULTS: The usage of Equalden.HD has been illustrated through the analysis of Hedenfalk and Rat genetic data. Statistical dependence among the samples was found for both genetic data sets. The application of an appropriate k-sample test within Equalden.HD rejected the null hypothesis of inter-samples homogeneity. The methods were used to test for the within groups homogeneity in cluster analysis too, which is usually performed when the k samples are found to be significantly different. Equalden.HD helped to identify the individuals which are responsible for the lack of homogeneity of the samples. The limitations of the standard Kruskal-Wallis test for the identification of homogeneous clusters have been highlighted. CONCLUSIONS: The methods implemented by Equalden.HD are the unique omnibus nonparametric k-sample tests that have been validated as k grows. Furthermore, the package provides suitable corrections for possibly dependent samples, which is another distinctive feature. Thus, the package opens new doors for the statistical analysis of omic data. Limitations of standard methods (e.g. Anderson-Darling and Kruskal-Wallis) and existing software solutions in the setting with a large k have been emphasized.


Assuntos
Software , Animais , Análise por Conglomerados , Ratos , Tamanho da Amostra
15.
Stat Med ; 41(11): 2025-2051, 2022 05 20.
Artigo em Inglês | MEDLINE | ID: mdl-35124839

RESUMO

Censoring often occurs in data collection. This article, considers nonparametric regression when the covariate is censored under general settings. In contrast to censoring in the response variable in survival analysis, regression with censored covariates is more challenging but less studied in the literature, especially for dependent censoring. We propose to estimate the regression function using conditional hazard rates. The asymptotic normality of our proposed estimator is established. Both theoretical results and simulation studies demonstrate that the proposed method is more efficient than the estimation based on complete observations and other methods, especially when the censoring rate is high. We illustrate the usefulness of the proposed method using a data set from the Framingham heart study and a data set from a randomized placebo-controlled clinical trial of the drug D-penicillamine.


Assuntos
Penicilamina , Simulação por Computador , Humanos , Penicilamina/uso terapêutico , Análise de Sobrevida
16.
J Surv Stat Methodol ; 10(1): 1-24, 2022 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-35083356

RESUMO

Data integration combining a probability sample with another nonprobability sample is an emerging area of research in survey sampling. We consider the case when the study variable of interest is measured only in the nonprobability sample, but comparable auxiliary information is available for both data sources. We consider mass imputation for the probability sample using the nonprobability data as the training set for imputation. The parametric mass imputation is sensitive to parametric model assumptions. To develop improved and robust methods, we consider nonparametric mass imputation for data integration. In particular, we consider kernel smoothing for a low-dimensional covariate and generalized additive models for a relatively high-dimensional covariate for imputation. Asymptotic theories and variance estimation are developed. Simulation studies and real applications show the benefits of our proposed methods over parametric counterparts.

17.
Biometrika ; 109(1): 195-208, 2022 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-37790796

RESUMO

Single-index models have gained increased popularity in time-to-event analysis owing to their model flexibility and advantage in dimension reduction. We propose a semiparametric framework for the rate function of a recurrent event counting process by modelling its size and shape components with single-index models. With additional monotone constraints on the two link functions for the size and shape components, the proposed model possesses the desired directional interpretability of covariate effects and encompasses many commonly used models as special cases. To tackle the analytical challenges arising from leaving the two link functions unspecified, we develop a two-step rank-based estimation procedure to estimate the regression parameters with or without informative censoring. The proposed estimators are asymptotically normal, with a root-n convergence rate. To guide model selection, we develop hypothesis testing procedures for checking shape and size independence. Simulation studies and a data example on a hematopoietic stem cell transplantation study are presented to illustrate the proposed methodology.

18.
J Adolesc Health ; 70(2): 322-328, 2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-34756642

RESUMO

PURPOSE: The aim of this study is to compare age-variant 18 health risk factors by constructing longitudinal predictive curves between African-American (AA) and Caucasian American (CA) adolescent girls. METHODS: A total of 2,379 girls (51% AA) from ages 9 to 10 were recruited in the National Heart, Lung, and Blood Institute Growth and Health Study. The various health indicators and dietary habits of these girls were assessed annually for 10 years. We model 2nd, 5th, 95th, and 98th percentile values of the health risk factors to compare trajectories between AA and CA adolescents by employing novel kernel smoothing regression and global tests of equality for regression curves. Health risk factors such as dietary fiber, intake of sodium, sugar, and total calories, systolic blood pressure, weight, body fat percentage, and high-density lipoprotein levels were compared. RESULTS: Trajectories of sugar, sodium, and total calories intake and systolic blood pressure, weight, body fat percentage, and high-density lipoprotein among AA girls were significantly higher than those of CA girls throughout their adolescence. CONCLUSIONS: AA girls exhibit several health risk factors that are significantly higher than those of CA adolescent girls at the 95th and 98th percentile. Interventions may be warranted for the purposes of ensuring access to health risk information as well as a greater ease of access to healthier food choices within the educational food system.


Assuntos
Ingestão de Energia , População Branca , Adolescente , Negro ou Afro-Americano , Índice de Massa Corporal , Criança , Ingestão de Energia/fisiologia , Feminino , Humanos , Fatores de Risco
19.
Biometrics ; 78(2): 586-597, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-33559887

RESUMO

The local kernel pseudo-partial likelihood is employed for estimation in a panel count model with nonparametric covariate functions. An estimator of the derivative of the nonparametric covariate function is derived first, and the nonparametric function estimator is then obtained by integrating the derivative estimator. Uniform consistency rates and pointwise asymptotic normality are obtained for the local derivative estimator under some regularity conditions. Moreover, the baseline function estimator is shown to be uniformly consistent. Demonstration of the asymptotic results strongly relies on the modern empirical theory, which generally does not require the Poisson assumption. Simulation studies also illustrate that the local derivative estimator performs well in a finite-sample regardless of whether the Poisson assumption holds. We also implement the proposed methodology to analyze a clinical study on childhood wheezing.


Assuntos
Modelos Estatísticos , Simulação por Computador
20.
Sci Total Environ ; 811: 152334, 2022 Mar 10.
Artigo em Inglês | MEDLINE | ID: mdl-34921882

RESUMO

The quantification of the SARS-CoV-2 RNA load in wastewater has emerged as a useful tool to monitor COVID-19 outbreaks in the community. This approach was implemented in the metropolitan area of A Coruña (NW Spain), where wastewater from a treatment plant was analyzed to track the epidemic dynamics in a population of 369,098 inhabitants. Viral load detected in the wastewater and the epidemiological data from A Coruña health system served as main sources for statistical models developing. Regression models described here allowed us to estimate the number of infected people (R2 = 0.9), including symptomatic and asymptomatic individuals. These models have helped to understand the real magnitude of the epidemic in a population at any given time and have been used as an effective early warning tool for predicting outbreaks in A Coruña municipality. The methodology of the present work could be used to develop a similar wastewater-based epidemiological model to track the evolution of the COVID-19 epidemic anywhere in the world where centralized water-based sanitation systems exist.


Assuntos
COVID-19 , SARS-CoV-2 , Modelos Epidemiológicos , Humanos , RNA Viral , Espanha/epidemiologia , Carga Viral , Águas Residuárias
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA