Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 43
Filter
1.
Nat Commun ; 15(1): 3840, 2024 May 07.
Article in English | MEDLINE | ID: mdl-38714698

ABSTRACT

As the circadian clock regulates fundamental biological processes, disrupted clocks are often observed in patients and diseased tissues. Determining the circadian time of the patient or the tissue of focus is essential in circadian medicine and research. Here we present tauFisher, a computational pipeline that accurately predicts circadian time from a single transcriptomic sample by finding correlations between rhythmic genes within the sample. We demonstrate tauFisher's performance in adding timestamps to both bulk and single-cell transcriptomic samples collected from multiple tissue types and experimental settings. Application of tauFisher at a cell-type level in a single-cell RNAseq dataset collected from mouse dermal skin implies that greater circadian phase heterogeneity may explain the dampened rhythm of collective core clock gene expression in dermal immune cells compared to dermal fibroblasts. Given its robustness and generalizability across assay platforms, experimental setups, and tissue types, as well as its potential application in single-cell RNAseq data analysis, tauFisher is a promising tool that facilitates circadian medicine and research.


Subject(s)
Circadian Clocks , Circadian Rhythm , Single-Cell Analysis , Transcriptome , Single-Cell Analysis/methods , Animals , Mice , Circadian Rhythm/genetics , Circadian Clocks/genetics , Humans , Gene Expression Profiling/methods , Computational Biology/methods , Skin/metabolism , Software , Fibroblasts/metabolism , Sequence Analysis, RNA/methods
2.
Alzheimers Dement (Amst) ; 15(4): e12494, 2023.
Article in English | MEDLINE | ID: mdl-37908438

ABSTRACT

INTRODUCTION: To reduce demands on expert time and improve clinical efficiency, we developed a framework to evaluate whether inexpensive, accessible data could accurately classify Alzheimer's disease (AD) clinical diagnosis and predict the likelihood of progression. METHODS: We stratified relevant data into three tiers: obtainable at primary care (low-cost), mostly available at specialty visits (medium-cost), and research-only (high-cost). We trained several machine learning models, including a hierarchical model, an ensemble model, and a clustering model, to distinguish between diagnoses of cognitively unimpaired, mild cognitive impairment, and dementia due to AD. RESULTS: All models showed viable classification, but the hierarchical and ensemble models outperformed the conventional model. Classifier "error" was predictive of progression rates, and cluster membership identified subgroups with high and low risk of progression within 1.5 to 3 years. DISCUSSION: Accessible, inexpensive clinical data can be used to guide AD diagnosis and are predictive of current and future disease states. HIGHLIGHTS: Classification performance using cost-effective features was accurate and robustHierarchical classification outperformed conventional multinomial classificationClassification labels indicated significant changes in conversion risk at follow-upA clustering-classification method identified subgroups at high risk of decline.

3.
bioRxiv ; 2023 Nov 06.
Article in English | MEDLINE | ID: mdl-37066246

ABSTRACT

As the circadian clock regulates fundamental biological processes, disrupted clocks are often observed in patients and diseased tissues. Determining the circadian time of the patient or the tissue of focus is essential in circadian medicine and research. Here we present tau-Fisher, a computational pipeline that accurately predicts circadian time from a single transcriptomic sample by finding correlations between rhythmic genes within the sample. We demonstrate tauFisher's out-standing performance in both bulk and single-cell transcriptomic data collected from multiple tissue types and experimental settings. Application of tauFisher at a cell-type level in a single-cell RNA-seq dataset collected from mouse dermal skin implies that greater circadian phase heterogeneity may explain the dampened rhythm of collective core clock gene expression in dermal immune cells compared to dermal fibroblasts. Given its robustness and generalizability across assay platforms, experimental setups, and tissue types, as well as its potential application in single-cell RNA-seq data analysis, tauFisher is a promising tool that facilitates circadian medicine and research.

4.
Proc Mach Learn Res ; 202: 34409-34430, 2023 Jul.
Article in English | MEDLINE | ID: mdl-38644959

ABSTRACT

We present a fully Bayesian autoencoder model that treats both local latent variables and global decoder parameters in a Bayesian fashion. This approach allows for flexible priors and posterior approximations while keeping the inference costs low. To achieve this, we introduce an amortized MCMC approach by utilizing an implicit stochastic network to learn sampling from the posterior over local latent variables. Furthermore, we extend the model by incorporating a Sparse Gaussian Process prior over the latent space, allowing for a fully Bayesian treatment of inducing points and kernel hyperparameters and leading to improved scalability. Additionally, we enable Deep Gaussian Process priors on the latent space and the handling of missing data. We evaluate our model on a range of experiments focusing on dynamic representation learning and generative modeling, demonstrating the strong performance of our approach in comparison to existing methods that combine Gaussian Processes and autoencoders.

5.
JAMA Netw Open ; 5(5): e2211967, 2022 05 02.
Article in English | MEDLINE | ID: mdl-35579899

ABSTRACT

Importance: Identifying the associations between severe COVID-19 and individual cardiovascular conditions in pediatric patients may inform treatment. Objective: To assess the association between previous or preexisting cardiovascular conditions and severity of COVID-19 in pediatric patients. Design, Setting, and Participants: This retrospective cohort study used data from a large, multicenter, electronic health records database in the US. The cohort included patients aged 2 months to 17 years with a laboratory-confirmed diagnosis of COVID-19 or a diagnosis code indicating infection or exposure to SARS-CoV-2 at 85 health systems between March 1, 2020, and January 31, 2021. Exposures: Diagnoses for 26 cardiovascular conditions between January 1, 2015, and December 31, 2019 (before infection with SARS-CoV-2). Main Outcomes and Measures: The main outcome was severe COVID-19, defined as need for supplemental oxygen or in-hospital death. Mixed-effects, random intercept logistic regression modeling assessed the significance and magnitude of associations between 26 cardiovascular conditions and COVID-19 severity. Multiple comparison adjustment was performed using the Benjamini-Hochberg false discovery rate procedure. Results: The study comprised 171 416 pediatric patients; the median age was 8 years (IQR, 2-14 years), and 50.28% were male. Of these patients, 17 065 (9.96%) had severe COVID-19. The random intercept model showed that the following cardiovascular conditions were associated with severe COVID-19: cardiac arrest (odds ratio [OR], 9.92; 95% CI, 6.93-14.20), cardiogenic shock (OR, 3.07; 95% CI, 1.90-4.96), heart surgery (OR, 3.04; 95% CI, 2.26-4.08), cardiopulmonary disease (OR, 1.91; 95% CI, 1.56-2.34), heart failure (OR, 1.82; 95% CI, 1.46-2.26), hypotension (OR, 1.57; 95% CI, 1.38-1.79), nontraumatic cerebral hemorrhage (OR, 1.54; 95% CI, 1.24-1.91), pericarditis (OR, 1.50; 95% CI, 1.17-1.94), simple biventricular defects (OR, 1.45; 95% CI, 1.29-1.62), venous embolism and thrombosis (OR, 1.39; 95% CI, 1.11-1.73), other hypertensive disorders (OR, 1.34; 95% CI, 1.09-1.63), complex biventricular defects (OR, 1.33; 95% CI, 1.14-1.54), and essential primary hypertension (OR, 1.22; 95% CI, 1.08-1.38). Furthermore, 194 of 258 patients (75.19%) with a history of cardiac arrest were younger than 12 years. Conclusions and Relevance: The findings suggest that some previous or preexisting cardiovascular conditions are associated with increased severity of COVID-19 among pediatric patients in the US and that morbidity may be increased among individuals children younger than 12 years with previous cardiac arrest.


Subject(s)
COVID-19 , Heart Arrest , Adolescent , COVID-19/epidemiology , Child , Child, Preschool , Female , Heart Arrest/epidemiology , Hospital Mortality , Humans , Male , Retrospective Studies , SARS-CoV-2
6.
Nat Commun ; 13(1): 787, 2022 02 08.
Article in English | MEDLINE | ID: mdl-35136052

ABSTRACT

The hippocampus is critical to the temporal organization of our experiences. Although this fundamental capacity is conserved across modalities and species, its underlying neuronal mechanisms remain unclear. Here we recorded hippocampal activity as rats remembered an extended sequence of nonspatial events unfolding over several seconds, as in daily life episodes in humans. We then developed statistical machine learning methods to analyze the ensemble activity and discovered forms of sequential organization and coding important for order memory judgments. Specifically, we found that hippocampal ensembles provide significant temporal coding throughout nonspatial event sequences, differentiate distinct types of task-critical information sequentially within events, and exhibit theta-associated reactivation of the sequential relationships among events. We also demonstrate that nonspatial event representations are sequentially organized within individual theta cycles and precess across successive cycles. These findings suggest a fundamental function of the hippocampal network is to encode, preserve, and predict the sequential order of experiences.


Subject(s)
Hippocampus/physiopathology , Memory , Acoustic Stimulation/methods , Animals , Auditory Perception , Electrodes, Implanted , Machine Learning , Male , Models, Animal , Nerve Net/physiology , Odorants , Olfactory Perception , Rats , Stereotaxic Techniques , Time Factors
7.
Neurorehabil Neural Repair ; 36(2): 131-139, 2022 02.
Article in English | MEDLINE | ID: mdl-34933635

ABSTRACT

OBJECTIVE: Patients show substantial differences in response to rehabilitation therapy after stroke. We hypothesized that specific genetic profiles might explain some of this variance and, secondarily, that genetic factors are related to cerebral atrophy post-stroke. METHODS: The phase 3 ICARE study examined response to motor rehabilitation therapies. In 216 ICARE enrollees, DNA was analyzed for presence of the BDNF val66met and the ApoE ε4 polymorphism. The relationship of polymorphism status to 12-month change in motor status (Wolf Motor Function Test, WMFT) was examined. Neuroimaging data were also evaluated (n=127). RESULTS: Subjects were 61±13 years old (mean±SD) and enrolled 43±22 days post-stroke; 19.7% were BDNF val66met carriers and 29.8% ApoE ε4 carriers. Carrier status for each polymorphism was not associated with WMFT, either at baseline or over 12 months of follow-up. Neuroimaging, acquired 5±11 days post-stroke, showed that BDNF val66met polymorphism carriers had a 1.34-greater degree of cerebral atrophy compared to non-carriers (P=.01). Post hoc analysis found that age of stroke onset was 4.6 years younger in subjects with the ApoE ε4 polymorphism (P=.02). CONCLUSION: Neither the val66met BDNF nor ApoE ε4 polymorphism explained inter-subject differences in response to rehabilitation therapy. The BDNF val66met polymorphism was associated with cerebral atrophy at baseline, echoing findings in healthy subjects, and suggesting an endophenotype. The ApoE ε4 polymorphism was associated with younger age at stroke onset, echoing findings in Alzheimer's disease and suggesting a common biology. Genetic associations provide insights useful to understanding the biology of outcomes after stroke.


Subject(s)
Endophenotypes , Outcome Assessment, Health Care , Stroke Rehabilitation , Stroke , Aged , Apolipoprotein E4/genetics , Atrophy/diagnostic imaging , Atrophy/pathology , Biomarkers , Brain-Derived Neurotrophic Factor/genetics , Female , Humans , Male , Middle Aged , Neuroimaging , Stroke/genetics , Stroke/pathology , Stroke/therapy
8.
J R Soc Interface ; 18(174): 20200729, 2021 01.
Article in English | MEDLINE | ID: mdl-33499768

ABSTRACT

The haematopoietic system has a highly regulated and complex structure in which cells are organized to successfully create and maintain new blood cells. It is known that feedback regulation is crucial to tightly control this system, but the specific mechanisms by which control is exerted are not completely understood. In this work, we aim to uncover the underlying mechanisms in haematopoiesis by conducting perturbation experiments, where animal subjects are exposed to an external agent in order to observe the system response and evolution. We have developed a novel Bayesian hierarchical framework for optimal design of perturbation experiments and proper analysis of the data collected. We use a deterministic model that accounts for feedback and feedforward regulation on cell division rates and self-renewal probabilities. A significant obstacle is that the experimental data are not longitudinal, rather each data point corresponds to a different animal. We overcome this difficulty by modelling the unobserved cellular levels as latent variables. We then use principles of Bayesian experimental design to optimally distribute time points at which the haematopoietic cells are quantified. We evaluate our approach using synthetic and real experimental data and show that an optimal design can lead to better estimates of model parameters.


Subject(s)
Hematopoiesis , Research Design , Animals , Bayes Theorem , Cell Division , Models, Biological
9.
Am Stat ; 74(3): 249-257, 2020.
Article in English | MEDLINE | ID: mdl-33041343

ABSTRACT

Although no universally accepted definition of causality exists, in practice one is often faced with the question of statistically assessing causal relationships in different settings. We present a uniform general approach to causality problems derived from the axiomatic foundations of the Bayesian statistical framework. In this approach, causality statements are viewed as hypotheses, or models, about the world and the fundamental object to be computed is the posterior distribution of the causal hypotheses, given the data and the background knowledge. Computation of the posterior, illustrated here in simple examples, may involve complex probabilistic modeling but this is no different than in any other Bayesian modeling situation. The main advantage of the approach is its connection to the axiomatic foundations of the Bayesian framework, and the general uniformity with which it can be applied to a variety of causality settings, ranging from specific to general cases, or from causes of effects to effects of causes.

10.
Stroke ; 51(11): 3361-3365, 2020 11.
Article in English | MEDLINE | ID: mdl-32942967

ABSTRACT

BACKGROUND AND PURPOSE: Clinical methods have incomplete diagnostic value for early diagnosis of acute stroke and large vessel occlusion (LVO). Electroencephalography is rapidly sensitive to brain ischemia. This study examined the diagnostic utility of electroencephalography for acute stroke/transient ischemic attack (TIA) and for LVO. METHODS: Patients (n=100) with suspected acute stroke in an emergency department underwent clinical exam then electroencephalography using a dry-electrode system. Four models classified patients, first as acute stroke/TIA or not, then as acute stroke with LVO or not: (1) clinical data, (2) electroencephalography data, (3) clinical+electroencephalography data using logistic regression, and (4) clinical+electroencephalography data using a deep learning neural network. Each model used a training set of 60 randomly selected patients, then was validated in an independent cohort of 40 new patients. RESULTS: Of 100 patients, 63 had a stroke (43 ischemic/7 hemorrhagic) or TIA (13). For classifying patients as stroke/TIA or not, the clinical data model had area under the curve=62.3, whereas clinical+electroencephalography using deep learning neural network model had area under the curve=87.8. Results were comparable for classifying patients as stroke with LVO or not. CONCLUSIONS: Adding electroencephalography data to clinical measures improves diagnosis of acute stroke/TIA and of acute stroke with LVO. Rapid acquisition of dry-lead electroencephalography is feasible in the emergency department and merits prehospital evaluation.


Subject(s)
Deep Learning , Electroencephalography/methods , Ischemic Stroke/diagnosis , Aged , Aged, 80 and over , Female , Hemorrhagic Stroke/diagnosis , Hemorrhagic Stroke/physiopathology , Humans , Ischemic Attack, Transient/diagnosis , Ischemic Attack, Transient/physiopathology , Ischemic Stroke/physiopathology , Logistic Models , Male , Middle Aged , Neural Networks, Computer , Sensitivity and Specificity , Stroke/diagnosis , Stroke/physiopathology
11.
Stat Sin ; 30(3): 1561-1582, 2020 Jul.
Article in English | MEDLINE | ID: mdl-32774073

ABSTRACT

We propose an evolutionary state space model (E-SSM) for analyzing high dimensional brain signals whose statistical properties evolve over the course of a non-spatial memory experiment. Under E-SSM, brain signals are modeled as mixtures of components (e.g., AR(2) process) with oscillatory activity at pre-defined frequency bands. To account for the potential non-stationarity of these components (since the brain responses could vary throughout the entire experiment), the parameters are allowed to vary over epochs. Compared with classical approaches such as independent component analysis and filtering, the proposed method accounts for the entire temporal correlation of the components and accommodates non-stationarity. For inference purpose, we propose a novel computational algorithm based upon using Kalman smoother, maximum likelihood and blocked resampling. The E-SSM model is applied to simulation studies and an application to a multi-epoch local field potentials (LFP) signal data collected from a non-spatial (olfactory) sequence memory task study. The results confirm that our method captures the evolution of the power for different components across different phases in the experiment and identifies clusters of electrodes that behave similarly with respect to the decomposition of different sources. These findings suggest that the activity of different electrodes does change over the course of an experiment in practice; treating these epoch recordings as realizations of an identical process could lead to misleading results. In summary, the proposed method underscores the importance of capturing the evolution in brain responses over the study period.

12.
Blood Adv ; 4(14): 3391-3404, 2020 07 28.
Article in English | MEDLINE | ID: mdl-32722783

ABSTRACT

Diffuse large B-cell lymphoma (DLBCL) is a heterogeneous entity of B-cell lymphoma. Cell-of-origin (COO) classification of DLBCL is required in routine practice by the World Health Organization classification for biological and therapeutic insights. Genetic subtypes uncovered recently are based on distinct genetic alterations in DLBCL, which are different from the COO subtypes defined by gene expression signatures of normal B cells retained in DLBCL. We hypothesize that classifiers incorporating both genome-wide gene-expression and pathogenetic variables can improve the therapeutic significance of DLBCL classification. To develop such refined classifiers, we performed targeted RNA sequencing (RNA-Seq) with a commercially available next-generation sequencing (NGS) platform in a large cohort of 418 DLBCLs. Genetic and transcriptional data obtained by RNA-Seq in a single run were explored by state-of-the-art artificial intelligence (AI) to develop a NGS-COO classifier for COO assignment and NGS survival models for clinical outcome prediction. The NGS-COO model built through applying AI in the training set was robust, showing high concordance with COO classification by either Affymetrix GeneChip microarray or the NanoString Lymph2Cx assay in 2 validation sets. Although the NGS-COO model was not trained for clinical outcome, the activated B-cell-like compared with the germinal-center B-cell-like subtype had significantly poorer survival. The NGS survival models stratified 30% high-risk patients in the validation set with poor survival as in the training set. These results demonstrate that targeted RNA-Seq coupled with AI deep learning techniques provides reproducible, efficient, and affordable assays for clinical application. The clinical grade assays and NGS models integrating both genetic and transcriptional factors developed in this study may eventually support precision medicine in DLBCL.


Subject(s)
Artificial Intelligence , Lymphoma, Large B-Cell, Diffuse , B-Lymphocytes , Germinal Center , High-Throughput Nucleotide Sequencing , Humans , Lymphoma, Large B-Cell, Diffuse/diagnosis , Lymphoma, Large B-Cell, Diffuse/genetics
13.
Bayesian Anal ; 15(4): 1199-1228, 2020 Dec.
Article in English | MEDLINE | ID: mdl-33868547

ABSTRACT

Modeling correlation (and covariance) matrices can be challenging due to the positive-definiteness constraint and potential high-dimensionality. Our approach is to decompose the covariance matrix into the correlation and variance matrices and propose a novel Bayesian framework based on modeling the correlations as products of unit vectors. By specifying a wide range of distributions on a sphere (e.g. the squared-Dirichlet distribution), the proposed approach induces flexible prior distributions for covariance matrices (that go beyond the commonly used inverse-Wishart prior). For modeling real-life spatio-temporal processes with complex dependence structures, we extend our method to dynamic cases and introduce unit-vector Gaussian process priors in order to capture the evolution of correlation among components of a multivariate time series. To handle the intractability of the resulting posterior, we introduce the adaptive Δ-Spherical Hamiltonian Monte Carlo. We demonstrate the validity and flexibility of our proposed framework in a simulation study of periodic processes and an analysis of rat's local field potential activity in a complex sequence memory task.

14.
Neurophotonics ; 6(4): 045012, 2019 Oct.
Article in English | MEDLINE | ID: mdl-31824979

ABSTRACT

There is a growing recognition regarding the importance of pial collateral flow in the protection from impending ischemic stroke both in preclinical and clinical studies. Collateral flow is also a major player in sensory stimulation-based protection from impending ischemic stroke. Doppler optical coherence tomography has been employed to image spatiotemporal patterns of collateral flow within the dorsal branches of the middle cerebral artery (MCA) as it provides a powerful tool for quantitative in vivo flow parameters imaging (velocity, flux, direction of flow, and radius of imaged branches). It was employed prior to and following dorsal permanent MCA occlusion (pMCAo) in rat models of treatment by protective sensory stimulation, untreated controls, or sham surgery controls. Unexpectedly, following pMCAo in the majority of subjects, some MCA branches continued to show anterograde blood flow patterns over time despite severing of the MCA. Further, in the presence of protective sensory stimulation, the anterograde velocity and flux were stronger and lasted longer than in retrograde flow branches, even within different branches of single subjects, but stimulated retrograde branches showed stronger flow parameters at 24 h. Our study suggests that the spatiotemporal patterns of collateral-based dorsal MCA flow are dynamic and provide a detailed description on the differential effects of protective sensory stimulation.

15.
Comput Stat ; 34(1): 281-299, 2019 Mar.
Article in English | MEDLINE | ID: mdl-31695242

ABSTRACT

Hamiltonian Monte Carlo is a widely used algorithm for sampling from posterior distributions of complex Bayesian models. It can efficiently explore high-dimensional parameter spaces guided by simulated Hamiltonian flows. However, the algorithm requires repeated gradient calculations, and these computations become increasingly burdensome as data sets scale. We present a method to substantially reduce the computation burden by using a neural network to approximate the gradient. First, we prove that the proposed method still maintains convergence to the true distribution though the approximated gradient no longer comes from a Hamiltonian system. Second, we conduct experiments on synthetic examples and real data to validate the proposed method.

16.
Adv Neural Inf Process Syst ; 32: 8263-8273, 2019 Dec.
Article in English | MEDLINE | ID: mdl-33041607

ABSTRACT

Dynamic functional connectivity, as measured by the time-varying covariance of neurological signals, is believed to play an important role in many aspects of cognition. While many methods have been proposed, reliably establishing the presence and characteristics of brain connectivity is challenging due to the high dimensionality and noisiness of neuroimaging data. We present a latent factor Gaussian process model which addresses these challenges by learning a parsimonious representation of connectivity dynamics. The proposed model naturally allows for inference and visualization of connectivity dynamics. As an illustration of the scientific utility of the model, application to a data set of rat local field potential activity recorded during a complex non-spatial memory task provides evidence of stimuli differentiation.

17.
Prostate ; 78(4): 294-299, 2018 03.
Article in English | MEDLINE | ID: mdl-29315679

ABSTRACT

BACKGROUND: Distinguishing between low- and high-grade prostate cancers (PCa) is important, but biopsy may underestimate the actual grade of cancer. We have previously shown that urine/plasma-based prostate-specific biomarkers can predict high grade PCa. Our objective was to determine the accuracy of a test using cell-free RNA levels of biomarkers in predicting prostatectomy results. METHODS: This multicenter community-based prospective study was conducted using urine/blood samples collected from 306 patients. All recruited patients were treatment-naïve, without metastases, and had been biopsied, designated a Gleason Score (GS) based on biopsy, and assigned to prostatectomy prior to participation in the study. The primary outcome measure was the urine/plasma test accuracy in predicting high grade PCa on prostatectomy compared with biopsy findings. Sensitivity and specificity were calculated using standard formulas, while comparisons between groups were performed using the Wilcoxon Rank Sum, Kruskal-Wallis, Chi-Square, and Fisher's exact test. RESULTS: GS as assigned by standard 10-12 core biopsies was 3 + 3 in 90 (29.4%), 3 + 4 in 122 (39.8%), 4 + 3 in 50 (16.3%), and > 4 + 3 in 44 (14.4%) patients. The urine/plasma assay confirmed a previous validation and was highly accurate in predicting the presence of high-grade PCa (Gleason ≥3 + 4) with sensitivity between 88% and 95% as verified by prostatectomy findings. GS was upgraded after prostatectomy in 27% of patients and downgraded in 12% of patients. CONCLUSIONS: This plasma/urine biomarker test accurately predicts high grade cancer as determined by prostatectomy with a sensitivity at 92-97%, while the sensitivity of core biopsies was 78%.


Subject(s)
Biomarkers, Tumor/metabolism , Cell-Free Nucleic Acids/metabolism , Prostatic Neoplasms/pathology , Adult , Aged , Humans , Male , Middle Aged , Neoplasm Grading , Prospective Studies , Prostate/pathology , Prostate/surgery , Prostatectomy/methods , Prostatic Neoplasms/metabolism , Prostatic Neoplasms/surgery , Real-Time Polymerase Chain Reaction , Sensitivity and Specificity
18.
Bayesian Anal ; 13(2): 485-506, 2018 Jun.
Article in English | MEDLINE | ID: mdl-37151569

ABSTRACT

Traditionally, the field of computational Bayesian statistics has been divided into two main subfields: variational methods and Markov chain Monte Carlo (MCMC). In recent years, however, several methods have been proposed based on combining variational Bayesian inference and MCMC simulation in order to improve their overall accuracy and computational efficiency. This marriage of fast evaluation and flexible approximation provides a promising means of designing scalable Bayesian inference methods. In this paper, we explore the possibility of incorporating variational approximation into a state-of-the-art MCMC method, Hamiltonian Monte Carlo (HMC), to reduce the required expensive computation involved in the sampling procedure, which is the bottleneck for many applications of HMC in big data problems. To this end, we exploit the regularity in parameter space to construct a free-form approximation of the target distribution by a fast and flexible surrogate function using an optimized additive model of proper random basis, which can also be viewed as a single-hidden layer feedforward neural network. The surrogate function provides sufficiently accurate approximation while allowing for fast computation in the sampling procedure, resulting in an efficient approximate Bayesian inference algorithm. We demonstrate the advantages of our proposed method using both synthetic and real data problems.

19.
J Stat Comput Simul ; 88(5): 982-1002, 2018.
Article in English | MEDLINE | ID: mdl-31105358

ABSTRACT

We present geodesic Lagrangian Monte Carlo, an extension of Hamiltonian Monte Carlo for sampling from posterior distributions defined on general Riemannian manifolds. We apply this new algorithm to Bayesian inference on symmetric or Hermitian positive definite matrices. To do so, we exploit the Riemannian structure induced by Cartan's canonical metric. The geodesics that correspond to this metric are available in closed-form and-within the context of Lagrangian Monte Carlo-provide a principled way to travel around the space of positive definite matrices. Our method improves Bayesian inference on such matrices by allowing for a broad range of priors, so we are not limited to conjugate priors only. In the context of spectral density estimation, we use the (non-conjugate) complex reference prior as an example modeling option made available by the algorithm. Results based on simulated and real-world multivariate time series are presented in this context, and future directions are outlined.

20.
Stat Comput ; 27(6): 1473-1490, 2017 Nov.
Article in English | MEDLINE | ID: mdl-28983154

ABSTRACT

For big data analysis, high computational cost for Bayesian methods often limits their applications in practice. In recent years, there have been many attempts to improve computational efficiency of Bayesian inference. Here we propose an efficient and scalable computational technique for a state-of-the-art Markov chain Monte Carlo methods, namely, Hamiltonian Monte Carlo. The key idea is to explore and exploit the structure and regularity in parameter space for the underlying probabilistic model to construct an effective approximation of its geometric properties. To this end, we build a surrogate function to approximate the target distribution using properly chosen random bases and an efficient optimization process. The resulting method provides a flexible, scalable, and efficient sampling algorithm, which converges to the correct target distribution. We show that by choosing the basis functions and optimization process differently, our method can be related to other approaches for the construction of surrogate functions such as generalized additive models or Gaussian process models. Experiments based on simulated and real data show that our approach leads to substantially more efficient sampling algorithms compared to existing state-of-the-art methods.

SELECTION OF CITATIONS
SEARCH DETAIL
...