Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 84
Filtrar
1.
Heliyon ; 10(9): e30470, 2024 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-38726202

RESUMO

Coastal terrestrial-aquatic interfaces (TAIs) are crucial contributors to global biogeochemical cycles and carbon exchange. The soil carbon dioxide (CO2) efflux in these transition zones is however poorly understood due to the high spatiotemporal dynamics of TAIs, as various sub-ecosystems in this region are compressed and expanded by complex influences of tides, changes in river levels, climate, and land use. We focus on the Chesapeake Bay region to (i) investigate the spatial heterogeneity of the coastal ecosystem and identify spatial zones with similar environmental characteristics based on the spatial data layers, including vegetation phenology, climate, landcover, diversity, topography, soil property, and relative tidal elevation; (ii) understand the primary driving factors affecting soil respiration within sub-ecosystems of the coastal ecosystem. Specifically, we employed hierarchical clustering analysis to identify spatial regions with distinct environmental characteristics, followed by the determination of main driving factors using Random Forest regression and SHapley Additive exPlanations. Maximum and minimum temperature are the main drivers common to all sub-ecosystems, while each region also has additional unique major drivers that differentiate them from one another. Precipitation exerts an influence on vegetated lands, while soil pH value holds importance specifically in forested lands. In croplands characterized by high clay content and low sand content, the significant role is attributed to bulk density. Wetlands demonstrate the importance of both elevation and sand content, with clay content being more relevant in non-inundated wetlands than in inundated wetlands. The topographic wetness index significantly contributes to the mixed vegetation areas, including shrub, grass, pasture, and forest. Additionally, our research reveals that dense vegetation land covers and urban/developed areas exhibit distinct soil property drivers. Overall, our research demonstrates an efficient method of employing various open-source remote sensing and GIS datasets to comprehend the spatial variability and soil respiration mechanisms in coastal TAI. There is no one-size-fits-all approach to modeling carbon fluxes released by soil respiration in coastal TAIs, and our study highlights the importance of further research and monitoring practices to improve our understanding of carbon dynamics and promote the sustainable management of coastal TAIs.

2.
Entropy (Basel) ; 26(1)2024 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-38248176

RESUMO

Change points indicate significant shifts in the statistical properties in data streams at some time points. Detecting change points efficiently and effectively are essential for us to understand the underlying data-generating mechanism in modern data streams with versatile parameter-varying patterns. However, it becomes a highly challenging problem to locate multiple change points in the noisy data. Although the Bayesian information criterion has been proven to be an effective way of selecting multiple change points in an asymptotical sense, its finite sample performance could be deficient. In this article, we have reviewed a list of information criterion-based methods for multiple change point detection, including Akaike information criterion, Bayesian information criterion, minimum description length, and their variants, with the emphasis on their practical applications. Simulation studies are conducted to investigate the actual performance of different information criteria in detecting multiple change points with possible model mis-specification for the practitioners. A case study on the SCADA signals of wind turbines is conducted to demonstrate the actual change point detection power of different information criteria. Finally, some key challenges in the development and application of multiple change point detection are presented for future research work.

3.
BMC Bioinformatics ; 24(1): 322, 2023 Aug 26.
Artigo em Inglês | MEDLINE | ID: mdl-37633901

RESUMO

BACKGROUND: The identification of genomic regions affected by selection is one of the most important goals in population genetics. If temporal data are available, allele frequency changes at SNP positions are often used for this purpose. Here we provide a new testing approach that uses haplotype frequencies instead of allele frequencies. RESULTS: Using simulated data, we show that compared to SNP based test, our approach has higher power, especially when the number of candidate haplotypes is small or moderate. To improve power when the number of haplotypes is large, we investigate methods to combine them with a moderate number of haplotype subsets. Haplotype frequencies can often be recovered with less noise than SNP frequencies, especially under pool sequencing, giving our test an additional advantage. Furthermore, spurious outlier SNPs may lead to false positives, a problem usually not encountered when working with haplotypes. Post hoc tests for the number of selected haplotypes and for differences between their selection coefficients are also provided for a better understanding of the underlying selection dynamics. An application on a real data set further illustrates the performance benefits. CONCLUSIONS: Due to less multiple testing correction and noise reduction, haplotype based testing is able to outperform SNP based tests in terms of power in most scenarios.


Assuntos
Genômica , Polimorfismo de Nucleotídeo Único , Haplótipos , Frequência do Gene
4.
Foodborne Pathog Dis ; 20(9): 414-418, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-37578455

RESUMO

CDC and health departments investigate foodborne disease outbreaks to identify a source. To generate and test hypotheses about vehicles, investigators typically compare exposure prevalence among case-patients with the general population using a one-sample binomial test. We propose a Bayesian alternative that also accounts for uncertainty in the estimate of exposure prevalence in the reference population. We compared exposure prevalence in a 2020 outbreak of Escherichia coli O157:H7 illnesses linked to leafy greens with 2018-2019 FoodNet Population Survey estimates. We ran prospective simulations using our Bayesian approach at three time points during the investigation. The posterior probability that leafy green consumption prevalence was higher than the general population prevalence increased as additional case-patients were interviewed. Probabilities were >0.70 for multiple leafy green items 2 weeks before the exact binomial p-value was statistically significant. A Bayesian approach to assessing exposure prevalence among cases could be superior to the one-sample binomial test typically used during foodborne outbreak investigations.


Assuntos
Escherichia coli O157 , Doenças Transmitidas por Alimentos , Humanos , Teorema de Bayes , Prevalência , Doenças Transmitidas por Alimentos/epidemiologia , Surtos de Doenças
5.
Stat Methods Med Res ; 32(8): 1559-1575, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37325816

RESUMO

Nonlinear mixed effects models have been widely applied to analyses of data that arise from biological, agricultural, and environmental sciences. Estimation of and inference on parameters in nonlinear mixed effects models are often based on the specification of a likelihood function. Maximizing this likelihood function can be complicated by the specification of the random effects distribution, especially in the presence of multiple random effects. The implementation of nonlinear mixed effects models can be further complicated by left-censored responses, representing measurements from bioassays where the exact quantification below a certain threshold is not possible. Motivated by the need to characterize the nonlinear human immunodeficiency virus RNA viral load trajectories after the interruption of antiretroviral therapy, we propose a smoothed simulated pseudo-maximum likelihood estimation approach to fit nonlinear mixed effects models in the presence of left-censored observations. We establish the consistency and asymptotic normality of the resulting estimators. We develop testing procedures for the correlation among random effects and for testing the distributional assumptions on random effects against a specific alternative. In contrast to the existing variants of expectation-maximization approaches, the proposed methods offer flexibility in the specification of the random effects distribution and convenience in making inference about higher-order correlation parameters. We evaluate the finite-sample performance of the proposed methods through extensive simulation studies and illustrate them on a combined dataset from six AIDS Clinical Trials Group treatment interruption studies.


Assuntos
Infecções por HIV , Humanos , Funções Verossimilhança , Simulação por Computador , Infecções por HIV/tratamento farmacológico , Dinâmica não Linear , Modelos Estatísticos
6.
Front Psychiatry ; 14: 1102811, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36970281

RESUMO

Background: A greatly growing body of literature has revealed the mediating role of DNA methylation in the influence path from childhood maltreatment to psychiatric disorders such as post-traumatic stress disorder (PTSD) in adult. However, the statistical method is challenging and powerful mediation analyses regarding this issue are lacking. Methods: To study how the maltreatment in childhood alters long-lasting DNA methylation changes which further affect PTSD in adult, we here carried out a gene-based mediation analysis from a perspective of composite null hypothesis in the Grady Trauma Project (352 participants and 16,565 genes) with childhood maltreatment as exposure, multiple DNA methylation sites as mediators, and PTSD or its relevant scores as outcome. We effectively addressed the challenging issue of gene-based mediation analysis by taking its composite null hypothesis testing nature into consideration and fitting a weighted test statistic. Results: We discovered that childhood maltreatment could substantially affected PTSD or PTSD-related scores, and that childhood maltreatment was associated with DNA methylation which further had significant roles in PTSD and these scores. Furthermore, using the proposed mediation method, we identified multiple genes within which DNA methylation sites exhibited mediating roles in the influence path from childhood maltreatment to PTSD-relevant scores in adult, with 13 for Beck Depression Inventory and 6 for modified PTSD Symptom Scale, respectively. Conclusion: Our results have the potential to confer meaningful insights into the biological mechanism for the impact of early adverse experience on adult diseases; and our proposed mediation methods can be applied to other similar analysis settings.

7.
Entropy (Basel) ; 25(2)2023 Jan 28.
Artigo em Inglês | MEDLINE | ID: mdl-36832605

RESUMO

In this paper, we focus on the homogeneity test that evaluates whether two multivariate samples come from the same distribution. This problem arises naturally in various applications, and there are many methods available in the literature. Based on data depth, several tests have been proposed for this problem but they may not be very powerful. In light of the recent development of data depth as an important measure in quality assurance, we propose two new test statistics for the multivariate two-sample homogeneity test. The proposed test statistics have the same χ2(1) asymptotic null distribution. The generalization of the proposed tests into the multivariate multisample situation is discussed as well. Simulations studies demonstrate the superior performance of the proposed tests. The test procedure is illustrated through two real data examples.

8.
J Appl Stat ; 50(3): 495-511, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36819081

RESUMO

Network (graph) data analysis is a popular research topic in statistics and machine learning. In application, one is frequently confronted with graph two-sample hypothesis testing where the goal is to test the difference between two graph populations. Several statistical tests have been devised for this purpose in the context of binary graphs. However, many of the practical networks are weighted and existing procedures cannot be directly applied to weighted graphs. In this paper, we study the weighted graph two-sample hypothesis testing problem and propose a practical test statistic. We prove that the proposed test statistic converges in distribution to the standard normal distribution under the null hypothesis and analyze its power theoretically. The simulation study shows that the proposed test has satisfactory performance and it substantially outperforms the existing counterpart in the binary graph case. A real data application is provided to illustrate the method.

9.
Int J Biostat ; 19(1): 1-19, 2023 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-35749155

RESUMO

It has been reported that about half of biological discoveries are irreproducible. These irreproducible discoveries were partially attributed to poor statistical power. The poor powers are majorly owned to small sample sizes. However, in molecular biology and medicine, due to the limit of biological resources and budget, most molecular biological experiments have been conducted with small samples. Two-sample t-test controls bias by using a degree of freedom. However, this also implicates that t-test has low power in small samples. A discovery found with low statistical power suggests that it has a poor reproducibility. So, promotion of statistical power is not a feasible way to enhance reproducibility in small-sample experiments. An alternative way is to reduce type I error rate. For doing so, a so-called t α -test was developed. Both theoretical analysis and simulation study demonstrate that t α -test much outperforms t-test. However, t α -test is reduced to t-test when sample sizes are over 15. Large-scale simulation studies and real experiment data show that t α -test significantly reduced type I error rate compared to t-test and Wilcoxon test in small-sample experiments. t α -test had almost the same empirical power with t-test. Null p-value density distribution explains why t α -test had so lower type I error rate than t-test. One real experimental dataset provides a typical example to show that t α -test outperforms t-test and a microarray dataset showed that t α -test had the best performance among five statistical methods. In addition, the density distribution and probability cumulative function of t α -statistic were given in mathematics and the theoretical and observed distributions are well matched.


Assuntos
Modelos Estatísticos , Reprodutibilidade dos Testes , Simulação por Computador , Funções Verossimilhança , Tamanho da Amostra
10.
Stat Med ; 42(1): 68-88, 2023 01 15.
Artigo em Inglês | MEDLINE | ID: mdl-36372072

RESUMO

The primary benefit of identifying a valid surrogate marker is the ability to use it in a future trial to test for a treatment effect with shorter follow-up time or less cost. However, previous work has demonstrated potential heterogeneity in the utility of a surrogate marker. When such heterogeneity exists, existing methods that use the surrogate to test for a treatment effect while ignoring this heterogeneity may lead to inaccurate conclusions about the treatment effect, particularly when the patient population in the new study has a different mix of characteristics than the study used to evaluate the utility of the surrogate marker. In this article, we develop a novel test for a treatment effect using surrogate marker information that accounts for heterogeneity in the utility of the surrogate. We compare our testing procedure to a test that uses primary outcome information (gold standard) and a test that uses surrogate marker information, but ignores heterogeneity. We demonstrate the validity of our approach and derive the asymptotic properties of our estimator and variance estimates. Simulation studies examine the finite sample properties of our testing procedure and demonstrate when our proposed approach can outperform the testing approach that ignores heterogeneity. We illustrate our methods using data from an AIDS clinical trial to test for a treatment effect using CD4 count as a surrogate marker for RNA.


Assuntos
Simulação por Computador , Humanos , Biomarcadores , Contagem de Linfócito CD4
11.
Stat Probab Lett ; 1932023 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38584807

RESUMO

This work defines a new correction for the likelihood ratio test for a two-sample problem within the multivariate normal context. This correction applies to decomposable graphical models, where testing equality of distributions can be decomposed into lower dimensional problems.

12.
Front Genet ; 13: 1009428, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36468009

RESUMO

Combining SNP p-values from GWAS summary data is a promising strategy for detecting novel genetic factors. Existing statistical methods for the p-value-based SNP-set testing confront two challenges. First, the statistical power of different methods depends on unknown patterns of genetic effects that could drastically vary over different SNP sets. Second, they do not identify which SNPs primarily contribute to the global association of the whole set. We propose a new signal-adaptive analysis pipeline to address these challenges using the omnibus thresholding Fisher's method (oTFisher). The oTFisher remains robustly powerful over various patterns of genetic effects. Its adaptive thresholding can be applied to estimate important SNPs contributing to the overall significance of the given SNP set. We develop efficient calculation algorithms to control the type I error rate, which accounts for the linkage disequilibrium among SNPs. Extensive simulations show that the oTFisher has robustly high power and provides a higher balanced accuracy in screening SNPs than the traditional Bonferroni and FDR procedures. We applied the oTFisher to study the genetic association of genes and haplotype blocks of the bone density-related traits using the summary data of the Genetic Factors for Osteoporosis Consortium. The oTFisher identified more novel and literature-reported genetic factors than existing p-value combination methods. Relevant computation has been implemented into the R package TFisher to support similar data analysis.

13.
Front Psychol ; 13: 980261, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36533060

RESUMO

The identification of an empirically adequate theoretical construct requires determining whether a theoretically predicted effect is sufficiently similar to an observed effect. To this end, we propose a simple similarity measure, describe its application in different research designs, and use computer simulations to estimate the necessary sample size for a given observed effect. As our main example, we apply this measure to recent meta-analytical research on precognition. Results suggest that the evidential basis is too weak for a predicted precognition effect of d = 0.20 to be considered empirically adequate. As additional examples, we apply this measure to object-level experimental data from dissonance theory and a recent crowdsourcing hypothesis test, as well as to meta-analytical data on the correlation of personality traits and life outcomes.

14.
Prev Med ; 164: 107127, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-35787846

RESUMO

It is well known that the statistical analyses in health-science and medical journals are frequently misleading or even wrong. Despite many decades of reform efforts by hundreds of scientists and statisticians, attempts to fix the problem by avoiding obvious error and encouraging good practice have not altered this basic situation. Statistical teaching and reporting remain mired in damaging yet editorially enforced jargon of "significance", "confidence", and imbalanced focus on null (no-effect or "nil") hypotheses, leading to flawed attempts to simplify descriptions of results in ordinary terms. A positive development amidst all this has been the introduction of interval estimates alongside or in place of significance tests and P-values, but intervals have been beset by similar misinterpretations. Attempts to remedy this situation by calling for replacement of traditional statistics with competitors (such as pure-likelihood or Bayesian methods) have had little impact. Thus, rather than ban or replace P-values or confidence intervals, we propose to replace traditional jargon with more accurate and modest ordinary-language labels that describe these statistics as measures of compatibility between data and hypotheses or models, which have long been in use in the statistical modeling literature. Such descriptions emphasize the full range of possibilities compatible with observations. Additionally, a simple transform of the P-value called the surprisal or S-value provides a sense of how much or how little information the data supply against those possibilities. We illustrate these reforms using some examples from a highly charged topic: trials of ivermectin treatment for Covid-19.


Assuntos
COVID-19 , Humanos , Interpretação Estatística de Dados , Teorema de Bayes , COVID-19/prevenção & controle , Probabilidade , Modelos Estatísticos , Intervalos de Confiança
15.
Stat Med ; 41(17): 3349-3364, 2022 07 30.
Artigo em Inglês | MEDLINE | ID: mdl-35491388

RESUMO

We propose an inferential framework for fixed effects in longitudinal functional models and introduce tests for the correlation structures induced by the longitudinal sampling procedure. The framework provides a natural extension of standard longitudinal correlation models for scalar observations to functional observations. Using simulation studies, we compare fixed effects estimation under correctly and incorrectly specified correlation structures and also test the longitudinal correlation structure. Finally, we apply the proposed methods to a longitudinal functional dataset on physical activity. The computer code for the proposed method is available at https://github.com/rli20ST758/FILF.


Assuntos
Exercício Físico , Projetos de Pesquisa , Simulação por Computador , Humanos , Estudos Longitudinais
16.
Stat Med ; 41(13): 2417-2426, 2022 06 15.
Artigo em Inglês | MEDLINE | ID: mdl-35253259

RESUMO

Testing a global null hypothesis that there are no significant predictors for a binary outcome of interest among a large set of biomarker measurements is an important task in biomedical studies. We seek to improve the power of such testing methods by leveraging ensemble machine learning methods. Ensemble machine learning methods such as random forest, bagging, and adaptive boosting model the relationship between the outcome and the predictor nonparametrically, while stacking combines the strength of multiple learners. We demonstrate the power of the proposed testing methods through Monte Carlo studies and show the use of the methods by applying them to the immunologic biomarkers dataset from the RV144 HIV vaccine efficacy trial.


Assuntos
Aprendizado de Máquina , Humanos
17.
Sensors (Basel) ; 22(2)2022 Jan 07.
Artigo em Inglês | MEDLINE | ID: mdl-35062397

RESUMO

Distinguishing between wireless and wired traffic in a network middlebox is an essential ingredient for numerous applications including security monitoring and quality-of-service (QoS) provisioning. The majority of existing approaches have exploited the greater delay statistics, such as round-trip-time and inter-packet arrival time, observed in wireless traffic to infer whether the traffic is originated from Ethernet (i.e., wired) or Wi-Fi (i.e., wireless) based on the assumption that the capacity of the wireless link is much slower than that of the wired link. However, this underlying assumption is no longer valid due to increases in wireless data rates over Gbps enabled by recent Wi-Fi technologies such as 802.11ac/ax. In this paper, we revisit the problem of identifying Wi-Fi traffic in network middleboxes as the wireless link capacity approaches the capacity of the wired. We present Weigh-in-Motion, a lightweight online detection scheme, that analyzes the traffic patterns observed at the middleboxes and infers whether the traffic is originated from high-speed Wi-Fi devices. To this end, we introduce the concept of ACKBunch that captures the unique characteristics of high-speed Wi-Fi, which is further utilized to distinguish whether the observed traffic is originated from a wired or wireless device. The effectiveness of the proposed scheme is evaluated via extensive real experiments, demonstrating its capability of accurately identifying wireless traffic from/to Gigabit 802.11 devices.

18.
Multivariate Behav Res ; 57(5): 767-783, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-33827347

RESUMO

The multivariate normal linear model is one of the most widely employed models for statistical inference in applied research. Special cases include (multivariate) t testing, (M)AN(C)OVA, (multivariate) multiple regression, and repeated measures analysis. Statistical criteria for a model selection problem where models may have equality as well as order constraints on the model parameters based on scientific expectations are limited however. This paper presents a default Bayes factor for this inference problem using fractional Bayes methodology. Group specific fractions are used to properly control prior information. Furthermore the fractional prior is centered on the boundary of the constrained space to properly evaluate order-constrained models. The criterion enjoys various important properties under a broad set of testing problems. The methodology is readily usable via the R package 'BFpack'. Applications from the social and medical sciences are provided to illustrate the methodology.


Assuntos
Modelos Estatísticos , Motivação , Teorema de Bayes , Modelos Lineares , Análise Multivariada
19.
Pharm Stat ; 21(1): 133-149, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-34350678

RESUMO

In multiregional randomized clinical trials (MRCTs), determining the regional treatment effect of a new treatment over an existing one is important to both the sponsor and related regulatory agencies. Also of particular interest is to test the null hypothesis that the treatment benefit is the same among all the regions. Existing methods are mainly for continuous endpoint and use parametric models, which are not robust. MRCTs are known for facing increased variation and heterogeneity and a robust model for its design and analysis would be desirable. We consider clinical trials with a binary primary endpoint and propose a robust semiparametric logistic model which has a known parametric and an unknown nonparametric component. The parametric component represents our prior knowledge about the model, and the nonparametric part reflects uncertainty. Compared to the classic logistic model for this problem, the proposed model has the following advantages: robust to model assumption, more flexible and accurate to model the relationship between the response and covariates, and possibly more accurate parameter estimates. The model parameters are estimated by profile maximum likelihood approach, and the null hypothesis of regional treatment difference being the same is tested by the profile likelihood ratio statistic. Asymptotic properties of the estimates are derived. Simulation studies are conducted to evaluate the performance of the proposed model, which demonstrated clear advantages over the classic logistic model. The method is then applied to analyzing a real MRCT.


Assuntos
Modelos Estatísticos , Simulação por Computador , Humanos , Funções Verossimilhança , Modelos Logísticos , Ensaios Clínicos Controlados Aleatórios como Assunto
20.
Sensors (Basel) ; 23(1)2022 Dec 22.
Artigo em Inglês | MEDLINE | ID: mdl-36616684

RESUMO

This paper addresses the problem of disentangling nonoverlapping multicomponent signals from their observation being possibly contaminated by external additive noise. We aim to extract and to retrieve the elementary components (also called modes) present in an observed nonstationary mixture signal. To this end, we propose a new pseudo-Bayesian algorithm to perform the estimation of the instantaneous frequency of the signal modes from their time-frequency representation. In a second time, a detection algorithm is developed to restrict the time region where each signal component behaves, to enhance quality of the reconstructed signal. We finally deal with the presence of noise in the vicinity of the estimated instantaneous frequency by introducing a new reconstruction approach relying on nonbinary band-pass synthesis filters. We validate our methods by comparing their reconstruction performance to state-of-the-art approaches through several experiments involving both synthetic and real-world data under different experimental conditions.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...