Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 66
Filtrar
1.
Entropy (Basel) ; 26(6)2024 Jun 07.
Artigo em Inglês | MEDLINE | ID: mdl-38920507

RESUMO

Many semiparametric spatial autoregressive (SSAR) models have been used to analyze spatial data in a variety of applications; however, it is a common phenomenon that heteroscedasticity often occurs in spatial data analysis. Therefore, when considering SSAR models in this paper, it is allowed that the variance parameters of the models can depend on the explanatory variable, and these are called heterogeneous semiparametric spatial autoregressive models. In order to estimate the model parameters, a Bayesian estimation method is proposed for heterogeneous SSAR models based on B-spline approximations of the nonparametric function. Then, we develop an efficient Markov chain Monte Carlo sampling algorithm on the basis of the Gibbs sampler and Metropolis-Hastings algorithm that can be used to generate posterior samples from posterior distributions and perform posterior inference. Finally, some simulation studies and real data analysis of Boston housing data have demonstrated the excellent performance of the proposed Bayesian method.

2.
Biomolecules ; 14(6)2024 May 29.
Artigo em Inglês | MEDLINE | ID: mdl-38927043

RESUMO

DNA methylation plays an essential role in regulating gene activity, modulating disease risk, and determining treatment response. We can obtain insight into methylation patterns at a single-nucleotide level via next-generation sequencing technologies. However, complex features inherent in the data obtained via these technologies pose challenges beyond the typical big data problems. Identifying differentially methylated cytosines (dmc) or regions is one such challenge. We have developed DMCFB, an efficient dmc identification method based on Bayesian functional regression, to tackle these challenges. Using simulations, we establish that DMCFB outperforms current methods and results in better smoothing and efficient imputation. We analyzed a dataset of patients with acute promyelocytic leukemia and control samples. With DMCFB, we discovered many new dmcs and, more importantly, exhibited enhanced consistency of differential methylation within islands and their adjacent shores. Additionally, we detected differential methylation at more of the binding sites of the fused gene involved in this cancer.


Assuntos
Teorema de Bayes , Metilação de DNA , Epigênese Genética , Metilação de DNA/genética , Humanos , Leucemia Promielocítica Aguda/genética
3.
Spat Spatiotemporal Epidemiol ; 49: 100645, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38876555

RESUMO

Bayesian inference in modelling infectious diseases using Bayesian inference using Gibbs Sampling (BUGS) is notable in the last two decades in parallel with the advancements in computing and model development. The ability of BUGS to easily implement the Markov chain Monte Carlo (MCMC) method brought Bayesian analysis to the mainstream of infectious disease modelling. However, with the existing software that runs MCMC to make Bayesian inferences, it is challenging, especially in terms of computational complexity, when infectious disease models become more complex with spatial and temporal components, in addition to the increasing number of parameters and large datasets. This study investigates two alternative subscripting strategies for creating models in Just Another Gibbs Sampler (JAGS) environment and their performance in terms of run times. Our results are useful for practitioners to ensure the efficiency and timely implementation of Bayesian spatiotemporal infectious disease modelling.


Assuntos
Teorema de Bayes , Cadeias de Markov , Análise Espaço-Temporal , Humanos , Modelos Epidemiológicos , Método de Monte Carlo , Software , Doenças Transmissíveis/epidemiologia
4.
Artif Intell Med ; 149: 102784, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38462284

RESUMO

Bayesian networks (BNs) are suitable models for studying complex interdependencies between multiple health outcomes, simultaneously. However, these models fail the assumption of independent observation in the case of hierarchical data. Therefore, this study proposes a two and three-level random intercept multilevel Bayesian network (MBN) models to study the conditional dependencies between multiple outcomes. The structure of MBN was learned using the connected three parent set block Gibbs sampler, where each local network was included based on Bayesian information criteria (BIC) score of multilevel regression. These models were examined using simulated data assuming features of both multilevel models and BNs. The estimated area under the receiver operating characteristics for both models were above 0.8, indicating good fit. The MBN was then applied to real child morbidity data from the 2016 Ethiopian Demographic Health Survey (EDHS). The result shows a complex causal dependencies between malnutrition indicators and child morbidities such as anemia, acute respiratory infection (ARI) and diarrhea. According to this result, families and health professionals should give special attention to children who suffer from malnutrition and also have one of these illnesses, as the co-occurrence of both can worsen the health of a child.


Assuntos
Anemia , Desnutrição , Criança , Humanos , Teorema de Bayes , Morbidade , Curva ROC
5.
J Anim Sci Biotechnol ; 14(1): 119, 2023 Sep 09.
Artigo em Inglês | MEDLINE | ID: mdl-37684681

RESUMO

BACKGROUND: Many phenotypes in animal breeding are derived from incomplete measures, especially if they are challenging or expensive to measure precisely. Examples include time-dependent traits such as reproductive status, or lifespan. Incomplete measures for these traits result in phenotypes that are subject to left-, interval- and right-censoring, where phenotypes are only known to fall below an upper bound, between a lower and upper bound, or above a lower bound respectively. Here we compare three methods for deriving phenotypes from incomplete data using age at first elevation (> 1 ng/mL) in blood plasma progesterone (AGEP4), which generally coincides with onset of puberty, as an example trait. METHODS: We produced AGEP4 phenotypes from three blood samples collected at about 30-day intervals from approximately 5,000 Holstein-Friesian or Holstein-Friesian × Jersey cross-bred dairy heifers managed in 54 seasonal-calving, pasture-based herds in New Zealand. We used these actual data to simulate 7 different visit scenarios, increasing the extent of censoring by disregarding data from one or two of the three visits. Three methods for deriving phenotypes from these data were explored: 1) ordinal categorical variables which were analysed using categorical threshold analysis; 2) continuous variables, with a penalty of 31 d assigned to right-censored phenotypes; and 3) continuous variables, sampled from within a lower and upper bound using a data augmentation approach. RESULTS: Credibility intervals for heritability estimations overlapped across all methods and visit scenarios, but estimated heritabilities tended to be higher when left censoring was reduced. For sires with at least 5 daughters, the correlations between estimated breeding values (EBVs) from our three-visit scenario and each reduced data scenario varied by method, ranging from 0.65 to 0.95. The estimated breed effects also varied by method, but breed differences were smaller as phenotype censoring increased. CONCLUSION: Our results indicate that using some methods, phenotypes derived from one observation per offspring for a time-dependent trait such as AGEP4 may provide comparable sire rankings to three observations per offspring. This has implications for the design of large-scale phenotyping initiatives where animal breeders aim to estimate variance parameters and estimated breeding values (EBVs) for phenotypes that are challenging to measure or prohibitively expensive.

6.
Comput Stat ; 38(2): 647-674, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37223721

RESUMO

Topic models are a useful and popular method to find latent topics of documents. However, the short and sparse texts in social media micro-blogs such as Twitter are challenging for the most commonly used Latent Dirichlet Allocation (LDA) topic model. We compare the performance of the standard LDA topic model with the Gibbs Sampler Dirichlet Multinomial Model (GSDMM) and the Gamma Poisson Mixture Model (GPM), which are specifically designed for sparse data. To compare the performance of the three models, we propose the simulation of pseudo-documents as a novel evaluation method. In a case study with short and sparse text, the models are evaluated on tweets filtered by keywords relating to the Covid-19 pandemic. We find that standard coherence scores that are often used for the evaluation of topic models perform poorly as an evaluation metric. The results of our simulation-based approach suggest that the GSDMM and GPM topic models may generate better topics than the standard LDA model.

7.
Biom J ; 65(5): e2200231, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-36908004

RESUMO

Several penalization approaches have been developed to identify homogeneous subgroups based on a regression model with subject-specific intercepts in subgroup analysis. These methods often apply concave penalty functions to pairwise comparisons of the intercepts, such that the subjects with similar intercept values are assigned to the same group, which is very similar to the procedure of the penalization approaches for variable selection. Since the Bayesian methods are commonly used in variable selection, it is worth considering the corresponding approaches to subgroup analysis in the Bayesian framework. In this paper, a Bayesian hierarchical model with appropriate prior structures is developed for the pairwise differences of intercepts based on a regression model with subject-specific intercepts, which can automatically detect and identify homogeneous subgroups. A Gibbs sampling algorithm is also provided to select the hyperparameter and estimate the intercepts and coefficients of the covariates simultaneously, which is computationally efficient for pairwise comparisons compared to the time-consuming procedures for parameter estimation of the penalization methods (e.g., alternating direction method of multiplier) in the case of large sample sizes. The effectiveness and usefulness of the proposed Bayesian method are evaluated through simulation studies and analysis of a Cleveland Heart Disease Dataset.


Assuntos
Algoritmos , Humanos , Teorema de Bayes , Simulação por Computador , Tamanho da Amostra
8.
Environ Pollut ; 321: 121061, 2023 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-36702429

RESUMO

We present a methodology to identify multiple pollutant sources in the atmosphere that combines a data-driven dispersion model with Bayesian inference and uncertainty quantification. The dispersion model accounts for a realistic wind field based on the output of a multivariate dynamic linear model (DLM), estimated from measured wind components time series. The forward problem solution, described by an adjoint transient advection-diffusion partial differential equation, is then obtained using an appropriately stabilized finite element formulation. The Bayesian inference tool accounts for uncertainty in the concentration data and automatically states the balance between the prior and the likelihood. The source parameters are estimated by a Metropolis in Gibbs Monte Carlo Markov chain (MCMC) algorithm with adaptive steps. The MCMC algorithm is initialized with a maximum a posteriori estimator obtained with particle swarm optimization to accelerate convergence. Finally, the proposed methodology seems to outperform inversion techniques from previous works.


Assuntos
Modelos Estatísticos , Vento , Teorema de Bayes , Algoritmos , Probabilidade , Método de Monte Carlo
9.
Methods Mol Biol ; 2426: 119-129, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36308687

RESUMO

Missing values caused by the limit of detection or quantification (LOD/LOQ) were widely observed in mass spectrometry (MS)-based omics studies and could be recognized as missing not at random (MNAR). MNAR leads to biased statistical estimations and jeopardizes downstream analyses. Although a wide range of missing value imputation methods was developed for omics studies, a limited number of methods were designed appropriately for the situation of MNAR. To facilitate MS-based omics studies, we introduce GSimp, a Gibbs sampler-based missing value imputation approach, to deal with left-censor missing values in MS-proteomics datasets. In this book, we explain the MNAR and elucidate the usage of GSimp for MNAR in detail.


Assuntos
Algoritmos , Proteômica , Espectrometria de Massas/métodos , Limite de Detecção , Coleta de Dados
10.
J Appl Stat ; 49(8): 2157-2166, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35813081

RESUMO

This paper proposes a differing methodology from the Brazilian Electricity Regulatory Agency on the efficiency estimation for the Brazilian electricity distribution sector. Our proposal combines robust state-space models and stochastic frontier analysis to measure the operational cost efficiency in a panel data set from 60 Brazilian electricity distribution utilities. The modeling joins the main literature in energy economics with advanced econometric and statistic techniques in order to estimate the efficiencies. Moreover, the suggested model is able to deal with changes in the inefficiencies across time whilst the Bayesian paradigm - through Markov chain Monte Carlo techniques - facilitates the inference on all unknowns. The method enables a significant degree of flexibility in the resultant efficiencies and a complete photography about the distribution sector.

11.
J Appl Stat ; 49(3): 656-675, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35706775

RESUMO

This paper describes a comprehensive survival analysis for the inverse Gaussian distribution employing Bayesian and Fiducial approaches. It focuses on making inferences on the inverse Gaussian (IG) parameters µ and λ and the average remaining time of censored units. A flexible Gibbs sampling approach applicable in the presence of censoring is discussed and illustrations with Type II, progressive Type II, and random rightly censored observations are included. The analyses are performed using both simulated IG data and empirical data examples. Further, the bootstrap comparisons are made between the Bayesian and Fiducial estimates. It is concluded that the shape parameter ( ϕ = λ / µ ) of the inverse Gaussian distribution has the most impact on the two analyses, Bayesian vs. Fiducial, and so does the size of censoring in data to a lesser extent. Overall, both these approaches are effective in estimating IG parameters and the average remaining lifetime. The suggested Gibbs sampler allowed a great deal of flexibility in implementation for all types of censoring considered.

12.
One Health ; 14: 100359, 2022 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-34977321

RESUMO

Echinococcus granulosus sensu lato is a globally prevalent zoonotic parasitic cestode leading to cystic echinococcosis (CE) in both humans and sheep with both medical and financial impacts, whose reduction requires the application of a One Health approach to its control. Regarding the animal health component of this approach, lack of accurate and practical diagnostics in livestock impedes the assessment of disease burden and the implementation and evaluation of control strategies. We use of a Bayesian Latent Class Analysis (LCA) model to estimate ovine CE prevalence in sheep samples from the Río Negro province of Argentina accounting for uncertainty in the diagnostics. We use model outputs to evaluate the performance of a novel recombinant B8/2 antigen B subunit (rEgAgB8/2) indirect enzyme-linked immunosorbent assay (ELISA) for detecting E. granulosus in sheep. Necropsy (as a partial gold standard), western blot (WB) and ELISA diagnostic data were collected from 79 sheep within two Río Negro slaughterhouses, and used to estimate individual infection status (assigned as a latent variable within the model). Using the model outputs, the performance of the novel ELISA at both individual and flock levels was evaluated, respectively, using a receiver operating characteristic (ROC) curve, and simulating a range of sample sizes and prevalence levels within hypothetical flocks. The estimated (mean) prevalence of ovine CE was 27.5% (95%Bayesian credible interval (95%BCI): 13.8%-58.9%) within the sample population. At the individual level, the ELISA had a mean sensitivity and specificity of 55% (95%BCI: 46%-68%) and 68% (95%BCI: 63%-92%), respectively, at an optimal optical density (OD) threshold of 0.378. At the flock level, the ELISA had an 80% probability of correctly classifying infection at an optimal cut-off threshold of 0.496. These results suggest that the novel ELISA could play a useful role as a flock-level diagnostic for CE surveillance in the region, supplementing surveillance activities in the human population and thus strengthening a One Health approach. Importantly, selection of ELISA cut-off threshold values must be tailored according to the epidemiological situation.

13.
Entropy (Basel) ; 24(10)2022 Oct 14.
Artigo em Inglês | MEDLINE | ID: mdl-37420486

RESUMO

In the development of simplex mixed-effects models, random effects in these mixed-effects models are generally distributed in normal distribution. The normality assumption may be violated in an analysis of skewed and multimodal longitudinal data. In this paper, we adopt the centered Dirichlet process mixture model (CDPMM) to specify the random effects in the simplex mixed-effects models. Combining the block Gibbs sampler and the Metropolis-Hastings algorithm, we extend a Bayesian Lasso (BLasso) to simultaneously estimate unknown parameters of interest and select important covariates with nonzero effects in semiparametric simplex mixed-effects models. Several simulation studies and a real example are employed to illustrate the proposed methodologies.

14.
MethodsX ; 9: 101599, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-34917491

RESUMO

The seabird meta-population viability model (mPVA) uses a generalized approach to project abundance and quasi-extinction risk for 102 seabird species under various conservation scenarios. The mPVA is a stage-structured projection matrix that tracks abundance of multiple populations linked by dispersal, accounting for breeding island characteristics and spatial distribution. Data are derived from published studies, grey literature, and expert review (with over 500 contributions). Invasive species impacts were generalized to stage-specific vital rates by fitting a Bayesian state-space model to trend data from Islands where invasive removals had occurred, while accounting for characteristics of seabird biology, breeding islands and invasive species. Survival rates were estimated using a competing hazards formulation to account for impacts of multiple threats, while also allowing for environmental and demographic stochasticity, density dependence and parameter uncertainty.•The mPVA provides resource managers with a tool to quantitatively assess potential benefits of alternative management actions, for multiple species•The mPVA compares projected abundance and quasi-extinction risk under current conditions (no intervention) and various conservation scenarios, including removal of invasive species from specified breeding islands, translocation or reintroduction of individuals to an island of specified location and size, and at-sea mortality amelioration via reduction in annual at-sea deaths.

15.
Int J Biostat ; 18(2): 473-485, 2022 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-34592069

RESUMO

The accelerated failure time mixture cure (AFTMC) model is widely used for survival data when a portion of patients can be cured. In this paper, a Bayesian semiparametric method is proposed to obtain the estimation of parameters and density distribution for both the cure probability and the survival distribution of the uncured patients in the AFTMC model. Specifically, the baseline error distribution of the uncured patients is nonparametrically modeled by a mixture of Dirichlet process. Based on the stick-breaking formulation of the Dirichlet process, the techniques of retrospective and slice sampling, an efficient and easy-to-implement Gibbs sampler is developed for the posterior calculation. The proposed approach can be easily implemented in commonly used statistical softwares, and its performance is comparable to fully parametric method via comprehensive simulation studies. Besides, the proposed approach is adopted to the analysis of a colorectal cancer clinical trial data.


Assuntos
Modelos Estatísticos , Humanos , Teorema de Bayes , Estudos Retrospectivos , Simulação por Computador , Probabilidade
16.
Environ Pollut ; 290: 118039, 2021 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-34467885

RESUMO

We address the source characterization of atmospheric releases using adaptive strategies in Bayesian inference in combination with the numerical solution of the dispersion problem by a stabilized finite element method and uncertainty quantification in the measurements. The adaptive techniques accelerate the convergence of Monte Carlo Markov Chain (MCMC) algorithms, leading to accurate reconstructions of the source parameters. Such accuracy is illustrated by the comparison with results from previous works. Moreover, the technique used to simulate the corresponding dispersion problem allowed us to introduce relevant meteorological information. The uncertainty quantification also improves the quality of reconstructions. Numerical examples using data from the Copenhagen experimental campaign illustrate the effectiveness of the proposed methodology. We found errors in reconstructions ranging from 0.11% to 8.67% of the size of the search region, which is similar to results found in previous works using deterministic techniques, with comparable computational time.


Assuntos
Algoritmos , Teorema de Bayes , Cadeias de Markov , Método de Monte Carlo , Incerteza
17.
Malar J ; 20(1): 311, 2021 Jul 10.
Artigo em Inglês | MEDLINE | ID: mdl-34246273

RESUMO

BACKGROUND: Malaria patients can have two or more haplotypes in their blood sample making it challenging to identify which haplotypes they carry. In addition, there are challenges in measuring the type and frequency of resistant haplotypes in populations. This study presents a novel statistical method Gibbs sampler algorithm to investigate this issue. RESULTS: The performance of the algorithm is evaluated on simulated datasets consisting of patient blood samples characterized by their multiplicity of infection (MOI) and malaria genotype. The simulation used different resistance allele frequencies (RAF) at each Single Nucleotide Polymorphisms (SNPs) and different limit of detection (LoD) of the SNPs and the MOI. The Gibbs sampler algorithm presents higher accuracy among high LoD of the SNPs or the MOI, validated, and deals with missing MOI compared to previous related statistical approaches. CONCLUSIONS: The Gibbs sampler algorithm provided robust results when faced with genotyping errors caused by LoDs and functioned well even in the absence of MOI data on individual patients.


Assuntos
Algoritmos , Malária/sangue , Plasmodium/genética , Haplótipos , Humanos , Cadeias de Markov , Método de Monte Carlo
18.
Magn Reson Med ; 86(5): 2766-2779, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-34170032

RESUMO

PURPOSE: The proposed method aims to create label maps that can be used for the segmentation of animal brain MR images without the need of a brain template. This is achieved by performing a joint deconvolution and segmentation of the brain MR images. METHODS: It is based on modeling locally the image statistics using a generalized Gaussian distribution (GGD) and couples the deconvolved image and its corresponding labels map using the GGD-Potts model. Because of the complexity of the resulting Bayesian estimators of the unknown model parameters, a Gibbs sampler is used to generate samples following the desired posterior probability. RESULTS: The performance of the proposed algorithm is assessed on simulated and real MR images by the segmentation of enhanced marmoset brain images into its main compartments using the corresponding label maps created. Quantitative assessment showed that this method presents results that are comparable to those obtained with the classical method-registering the volumes to a brain template. CONCLUSION: The proposed method of using labels as prior information for brain segmentation provides a similar or a slightly better performance compared with the classical reference method based on a dedicated template.


Assuntos
Callithrix , Imageamento por Ressonância Magnética , Algoritmos , Animais , Teorema de Bayes , Encéfalo/diagnóstico por imagem , Processamento de Imagem Assistida por Computador
19.
J Comput Graph Stat ; 30(4): 889-905, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-37138786

RESUMO

The goal of this paper is to provide a way for Bayesian statisticians to incorporate subsampling directly into the Bayesian hierarchical model of their choosing without imposing additional restrictive model assumptions. We are motivated by the fact that the rise of "big data" has created difficulties for statisticians to directly apply their methods to big datasets. We introduce a "data subset model" to the popular "data model, process model, and parameter model" framework used to summarize Bayesian hierarchical models. The hyperparameters of the data subset model are specified constructively in that they are chosen such that the implied size of the subset satisfies pre-defined computational constraints. Thus, these hyperparameters effectively calibrate the statistical model to the computer itself to obtain predictions/estimations in a pre-specified amount of time. Several properties of the data subset model are provided including: propriety, partial sufficiency, and semi-parametric properties. Simulated datasets will be used to assess the consequences of subsampling, and results will be presented across different computers to show the effect of the computer on the statistical analysis. Additionally, we provide a joint analysis of a high-dimensional dataset (roughly 10 gigabytes) consisting of 2018 5-year period estimates from the US Census Bureau's Public Use Micro-Sample (PUMS).

20.
J Anim Breed Genet ; 138(1): 14-22, 2021 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-32729965

RESUMO

This work focuses on the effects of variable amount of genomic information in the Bayesian estimation of unknown variance components associated with single-step genomic prediction. We propose a quantitative criterion for the amount of genomic information included in the model and use it to study the relative effect of genomic data on efficiency of sampling from the posterior distribution of parameters of the single-step model when conducting a Bayesian analysis with estimating unknown variances. The rate of change of estimated variances was dependent on the amount of genomic information involved in the analysis, but did not depend on the Gibbs updating schemes applied for sampling realizations of the posterior distribution. Simulation revealed a gradual deterioration of convergence rates for the locations parameters when new genomic data were gradually added into the analysis. In contrast, the convergence of variance components showed continuous improvement under the same conditions. The sampling efficiency increased proportionally to the amount of genomic information. In addition, an optimal amount of genomic information in variance-covariance matrix that guaranty the most (computationally) efficient analysis was found to correspond a proportion of animals genotyped ***0.8. The proposed criterion yield a characterization of expected performance of the Gibbs sampler if the analysis is subject to adjustment of the amount of genomic data and can be used to guide researchers on how large a proportion of animals should be genotyped in order to attain an efficient analysis.


Assuntos
Genoma , Genômica , Animais , Teorema de Bayes , Modelos Lineares , Método de Monte Carlo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...