Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 29
Filter
1.
Biometrika ; 109(3): 817-835, 2022 Sep.
Article in English | MEDLINE | ID: mdl-36105175

ABSTRACT

Factorization models express a statistical object of interest in terms of a collection of simpler objects. For example, a matrix or tensor can be expressed as a sum of rank-one components. However, in practice, it can be challenging to infer the relative impact of the different components as well as the number of components. A popular idea is to include infinitely many components having impact decreasing with the component index. This article is motivated by two limitations of existing methods: (1) lack of careful consideration of the within component sparsity structure; and (2) no accommodation for grouped variables and other non-exchangeable structures. We propose a general class of infinite factorization models that address these limitations. Theoretical support is provided, practical gains are shown in simulation studies, and an ecology application focusing on modelling bird species occurrence is discussed.

2.
Biometrika ; 108(2): 269-282, 2021 Jun.
Article in English | MEDLINE | ID: mdl-35747172

ABSTRACT

Posterior computation for high-dimensional data with many parameters can be challenging. This article focuses on a new method for approximating posterior distributions of a low- to moderate-dimensional parameter in the presence of a high-dimensional or otherwise computationally challenging nuisance parameter. The focus is on regression models and the key idea is to separate the likelihood into two components through a rotation. One component involves only the nuisance parameters, which can then be integrated out using a novel type of Gaussian approximation. We provide theory on approximation accuracy that holds for a broad class of forms of the nuisance component and priors. Applying our method to simulated and real data sets shows that it can outperform state-of-the-art posterior approximation approaches.

3.
Biometrika ; 105(2): 431-446, 2018 Jun.
Article in English | MEDLINE | ID: mdl-29880978

ABSTRACT

There has been substantial recent interest in record linkage, where one attempts to group the records pertaining to the same entities from one or more large databases that lack unique identifiers. This can be viewed as a type of microclustering, with few observations per cluster and a very large number of clusters. We show that the problem is fundamentally hard from a theoretical perspective and, even in idealized cases, accurate entity resolution is effectively impossible unless the number of entities is small relative to the number of records and/or the separation between records from different entities is extremely large. These results suggest conservatism in interpretation of the results of record linkage, support collection of additional data to more accurately disambiguate the entities, and motivate a focus on coarser inference. For example, results from a simulation study suggest that sometimes one may obtain accurate results for population size estimation even when fine-scale entity resolution is inaccurate.

4.
Biometrika ; 104(4): 939-952, 2017 Dec.
Article in English | MEDLINE | ID: mdl-29422695

ABSTRACT

We consider shape restricted nonparametric regression on a closed set [Formula: see text], where it is reasonable to assume the function has no more than H local extrema interior to [Formula: see text]. Following a Bayesian approach we develop a nonparametric prior over a novel class of local extremum splines. This approach is shown to be consistent when modeling any continuously differentiable function within the class considered, and is used to develop methods for testing hypotheses on the shape of the curve. Sampling algorithms are developed, and the method is applied in simulation studies and data examples where the shape of the curve is of interest.

5.
Stat Probab Lett ; 113: 41-48, 2016 Jun.
Article in English | MEDLINE | ID: mdl-31427835

ABSTRACT

In population studies, it is standard to sample data via designs in which the population is divided into strata, with the different strata assigned different probabilities of inclusion. Although there have been some proposals for including sample survey weights into Bayesian analyses, existing methods require complex models or ignore the stratified design underlying the survey weights. We propose a simple approach based on modeling the distribution of the selected sample as a mixture, with the mixture weights appropriately adjusted, while accounting for uncertainty in the adjustment. We focus for simplicity on Dirichlet process mixtures but the proposed approach can be applied more broadly. We sketch a simple Markov chain Monte Carlo algorithm for computation, and assess the approach via simulations and an application.

6.
Bioinformatics ; 31(24): 3890-6, 2015 Dec 15.
Article in English | MEDLINE | ID: mdl-26323717

ABSTRACT

MOTIVATION: Both single marker and simultaneous analysis face challenges in GWAS due to the large number of markers genotyped for a small number of subjects. This large p small n problem is particularly challenging when the trait under investigation has low heritability. METHOD: In this article, we propose a two-stage approach that is a hybrid method of single and simultaneous analysis designed to improve genomic prediction of complex traits. In the first stage, we use a Bayesian independent screening method to select the most promising SNPs. In the second stage, we rely on a hierarchical model to analyze the joint impact of the selected markers. The model is designed to take into account familial dependence in the different subjects, while using local-global shrinkage priors on the marker effects. RESULTS: We evaluate the performance in simulation studies, and consider an application to animal breeding data. The illustrative data analysis reveals an encouraging result in terms of prediction performance and computational cost.


Subject(s)
Genome-Wide Association Study/methods , Polymorphism, Single Nucleotide , Animals , Bayes Theorem , Breeding , Cattle , Genomics/methods , Genotype , Models, Genetic
7.
Biometrika ; 98(2): 291-306, 2011 Jun.
Article in English | MEDLINE | ID: mdl-23049129

ABSTRACT

We focus on sparse modelling of high-dimensional covariance matrices using Bayesian latent factor models. We propose a multiplicative gamma process shrinkage prior on the factor loadings which allows introduction of infinitely many factors, with the loadings increasingly shrunk towards zero as the column index increases. We use our prior on a parameter-expanded loading matrix to avoid the order dependence typical in factor analysis models and develop an efficient Gibbs sampler that scales well as data dimensionality increases. The gain in efficiency is achieved by the joint conjugacy property of the proposed prior, which allows block updating of the loadings matrix. We propose an adaptive Gibbs sampler for automatically truncating the infinite loading matrix through selection of the number of important factors. Theoretical results are provided on the support of the prior and truncation approximation bounds. A fast algorithm is proposed to produce approximate Bayes estimates. Latent factor regression methods are developed for prediction and variable selection in applications with high-dimensional correlated predictors. Operating characteristics are assessed through simulation studies, and the approach is applied to predict survival times from gene expression data.

8.
Biometrika ; 98(1): 35-48, 2011 Mar.
Article in English | MEDLINE | ID: mdl-23956461

ABSTRACT

We consider geostatistical models that allow the locations at which data are collected to be informative about the outcomes. A Bayesian approach is proposed, which models the locations using a log Gaussian Cox process, while modelling the outcomes conditionally on the locations as Gaussian with a Gaussian process spatial random effect and adjustment for the location intensity process. We prove posterior propriety under an improper prior on the parameter controlling the degree of informative sampling, demonstrating that the data are informative. In addition, we show that the density of the locations and mean function of the outcome process can be estimated consistently under mild assumptions. The methods show significant evidence of informative sampling when applied to ozone data over Eastern U.S.A.

9.
Biometrics ; 60(3): 676-83, 2004 Sep.
Article in English | MEDLINE | ID: mdl-15339290

ABSTRACT

In studying rates of occurrence and progression of lesions (or tumors), it is typically not possible to obtain exact onset times for each lesion. Instead, data consist of the number of lesions that reach a detectable size between screening examinations, along with measures of the size/severity of individual lesions at each exam time. This interval-censored data structure makes it difficult to properly adjust for the onset time distribution in assessing covariate effects on rates of lesion progression. This article proposes a joint model for the multiple lesion onset and progression process, motivated by cross-sectional data from a study of uterine leiomyoma tumors. By using a joint model, one can potentially obtain more precise inferences on rates of onset, while also performing onset time-adjusted inferences on lesion severity. Following a Bayesian approach, we propose a data augmentation Markov chain Monte Carlo algorithm for posterior computation.


Subject(s)
Bayes Theorem , Neoplasms/etiology , Neoplasms/pathology , Adult , Algorithms , Biometry , Cross-Sectional Studies , Female , Humans , Leiomyomatosis/etiology , Leiomyomatosis/pathology , Markov Chains , Middle Aged , Models, Statistical , Monte Carlo Method , Stochastic Processes , Time Factors , Uterine Neoplasms/etiology , Uterine Neoplasms/pathology
10.
Hum Reprod ; 16(11): 2278-82, 2001 Nov.
Article in English | MEDLINE | ID: mdl-11679504

ABSTRACT

BACKGROUND: The TwoDay Algorithm is a simple method for identifying the fertile window. It classifies a day as fertile if cervical secretions are present on that day or were present on the day before. This approach may be an effective alternative to the ovulation and symptothermal methods for populations and programmes that find current natural family planning methods difficult to implement. METHODS: We used data on secretions from a large multinational European fecundability study to assess the relationship between the days predicted to be potentially fertile by the TwoDay Algorithm and the day-specific probabilities of pregnancy based on intercourse patterns in 434 conception cycles from the study. RESULTS: The days around ovulation that had the highest fecundability were the days most likely to be classified as fertile by the TwoDay Algorithm. In addition, intercourse on a particular day in the fertile interval was twice as likely to result in a pregnancy if cervical secretions were present on that day or the day before. CONCLUSIONS: The TwoDay Algorithm is effective, both in identifying the fertile days of the cycle and in predicting days within the fertile interval that have a high pregnancy rate. Our data provide the first direct evidence that cervical secretions are associated with higher fecundability within the fertile window.


Subject(s)
Algorithms , Cervix Uteri/metabolism , Fertility , Body Temperature , Cervix Mucus/metabolism , Cohort Studies , Coitus , Female , Humans , Ovulation Detection , Pregnancy , Probability , Prospective Studies
11.
J Infect Dis ; 184(2): 127-35, 2001 Jul 15.
Article in English | MEDLINE | ID: mdl-11424008

ABSTRACT

Many human immunodeficiency virus (HIV)-infected persons receive prolonged treatment with DNA-reactive antiretroviral drugs. A prospective study was conducted of 26 HIV-infected men who provided samples before treatment and at multiple times after beginning treatment, to investigate effects of antiretrovirals on lymphocyte and sperm chromosomes and semen quality. Several antiretroviral regimens, all including a nucleoside component, were used. Lymphocyte metaphase analysis and sperm fluorescence in situ hybridization were used for cytogenetic studies. Semen analyses included conventional parameters (volume, concentration, viability, motility, and morphology). No significant effects on cytogenetic parameters, semen volume, or sperm concentration were detected. However, there were significant improvements in sperm motility for men with study entry CD4 cell counts >200 cells/mm(3), sperm morphology for men with entry CD4 cell counts < or =200 cells/mm(3), and the percentage of viable sperm in both groups. These findings suggest that nucleoside-containing antiretrovirals administered via recommended protocols do not induce chromosomal changes in lymphocytes or sperm but may produce improvements in semen quality.


Subject(s)
Anti-HIV Agents/adverse effects , Chromosome Breakage , Chromosomes/drug effects , HIV Infections/drug therapy , HIV Infections/immunology , Lymphocytes/drug effects , Metaphase/drug effects , Reverse Transcriptase Inhibitors/adverse effects , Spermatozoa/drug effects , Adult , Aneuploidy , Anti-HIV Agents/therapeutic use , CD4 Lymphocyte Count , Diploidy , Drug Therapy, Combination , Humans , In Situ Hybridization, Fluorescence , Longitudinal Studies , Lymphocytes/metabolism , Lymphocytes/pathology , Male , Middle Aged , Reverse Transcriptase Inhibitors/therapeutic use
12.
Biometrics ; 57(2): 396-403, 2001 Jun.
Article in English | MEDLINE | ID: mdl-11414562

ABSTRACT

In some cross-sectional studies of chronic disease, data consist of the age at examination, whether the disease was present at the exam, and recall of the age at first diagnosis. This article describes a flexible parametric approach for combining current status and age at first diagnosis data. We assume that the log odds of onset by a given age and of detection by a given age conditional on onset by that age are nondecreasing functions of time plus linear combinations of covariates. Piecewise linear models are used to characterize changes across time in the baseline odds. Methods are described for accommodating informatively missing current status data and inferences based on the age-specific incidence of disease prior to a landmark event (e.g., puberty, menopause). Our formulation enables straightforward maximum likelihood estimation without requiring restrictive parametric or Markov assumptions. The methods are applied to data from a study of uterine fibroids.


Subject(s)
Models, Statistical , Adult , Age Factors , Chronic Disease , Cross-Sectional Studies , Disease Progression , Female , Humans , Leiomyoma/diagnosis , Leiomyoma/physiopathology , Middle Aged , Odds Ratio , Premenopause , Probability , Time Factors , Uterine Neoplasms/diagnosis , Uterine Neoplasms/physiopathology
13.
Am J Epidemiol ; 153(12): 1222-6, 2001 Jun 15.
Article in English | MEDLINE | ID: mdl-11415958

ABSTRACT

In the past decade, there have been enormous advances in the use of Bayesian methodology for analysis of epidemiologic data, and there are now many practical advantages to the Bayesian approach. Bayesian models can easily accommodate unobserved variables such as an individual's true disease status in the presence of diagnostic error. The use of prior probability distributions represents a powerful mechanism for incorporating information from previous studies and for controlling confounding. Posterior probabilities can be used as easily interpretable alternatives to p values. Recent developments in Markov chain Monte Carlo methodology facilitate the implementation of Bayesian analyses of complex data sets containing missing observations and multidimensional outcomes. Tools are now available that allow epidemiologists to take advantage of this powerful approach to assessment of exposure-disease relations.


Subject(s)
Bayes Theorem , Humans
14.
Toxicol Lett ; 122(1): 33-44, 2001 May 31.
Article in English | MEDLINE | ID: mdl-11397555

ABSTRACT

The Tg.AC mouse carrying the v-Ha-ras structural gene is a useful model for the study of chemical carcinogens, especially those acting via non-genotoxic mechanisms. This study evaluated the efficacy of the non-toxic, water-soluble antioxidant from spinach, natural antioxidant (NAO), in reducing skin papilloma induction in female hemizygous Tg.AC mice treated dermally five times over 2.5 weeks with 2.5 microg 12-O-tetradecanoylphorbol-13-acetate (TPA). The TPA-only group was considered as a control; the other two groups received, additionally, NAO topically (2 mg) or orally (100 mg/kg), 5 days/week for 5 weeks. Papilloma counts made macroscopically during the clinical observations showed a significant decrease in multiplicity (P<0.01) in the NAO topically treated group. According to histological criteria, papilloma multiplicity were lower in both topical-NAO and oral-NAO groups, but significantly so only in the oral-NAO mice (P<0.01). The beneficial effect of NAO in the Tg.AC mouse is reported.


Subject(s)
Antioxidants/pharmacology , Papilloma/prevention & control , Skin Neoplasms/prevention & control , Administration, Cutaneous , Administration, Oral , Animals , Body Weight/drug effects , Carcinogens/adverse effects , Disease Models, Animal , Female , Genes, ras/genetics , Genotype , Mice , Mice, Transgenic , Papilloma/chemically induced , Papilloma/pathology , Plant Extracts/pharmacology , Skin Neoplasms/chemically induced , Skin Neoplasms/pathology , Spinacia oleracea/chemistry , Survival Analysis , Tetradecanoylphorbol Acetate/adverse effects
15.
Contraception ; 63(4): 211-5, 2001 Apr.
Article in English | MEDLINE | ID: mdl-11376648

ABSTRACT

Emergency post-coital contraceptives effectively reduce the risk of pregnancy, but their degree of efficacy remains uncertain. Measurement of efficacy depends on the pregnancy rate without treatment, which cannot be measured directly. We provide indirect estimates of such pregnancy rates, using data from a prospective study of 221 women who were attempting to conceive. We previously estimated the probability of pregnancy with an act of intercourse relative to ovulation. In this article, we extend these data to estimate the probability of pregnancy relative to intercourse on a given cycle day (counting from onset of previous menses). In assessing the efficacy of post-coital contraceptives, other approaches have not incorporated accurate information on the variability of ovulation. We find that the possibility of late ovulation produces a persistent risk of pregnancy even into the sixth week of the cycle. Post-coital contraceptives may be indicated even when intercourse has occurred late in the cycle.


Subject(s)
Coitus , Contraceptives, Postcoital , Female , Humans , Menstrual Cycle , Ovulation , Pregnancy , Probability , Prospective Studies , Time Factors
16.
Stat Med ; 20(6): 965-78, 2001 Mar 30.
Article in English | MEDLINE | ID: mdl-11252016

ABSTRACT

In modelling human fertility one ideally accounts for timing of intercourse relative to ovulation. Measurement error in identifying the day of ovulation can bias estimates of fecundability parameters and attenuate estimates of covariate effects. In the absence of a single perfect marker of ovulation, several error prone markers are sometimes obtained. In this paper we propose a semi-parametric mixture model that uses multiple independent markers of ovulation to account for measurement error. The model assigns each method of assessing ovulation a distinct non-parametric error distribution, and corrects bias in estimates of day-specific fecundability. We use a Monte Carlo EM algorithm for joint estimation of (i) the error distribution for the markers, (ii) the error-corrected fertility parameters, and (iii) the couple-specific random effects. We apply the methods to data from a North Carolina fertility study to assess the magnitude of error in measures of ovulation based on urinary luteinizing hormone and metabolites of ovarian hormones, and estimate the corrected day-specific probabilities of clinical pregnancy. Published in 2001 by John Wiley & Sons, Ltd.


Subject(s)
Fertility/physiology , Models, Biological , Ovulation Detection/methods , Ovulation/physiology , Algorithms , Biomarkers , Corpus Luteum/physiology , Estrogens/urine , Female , Humans , Likelihood Functions , Luteinizing Hormone/urine , Male , North Carolina , Pregnancy , Progesterone/urine
17.
Biometrics ; 57(1): 302-8, 2001 Mar.
Article in English | MEDLINE | ID: mdl-11252614

ABSTRACT

This article describes a general class of factor analytic models for the analysis of clustered multivariate data in the presence of informative missingness. We assume that there are distinct sets of cluster-level latent variables related to the primary outcomes and to the censoring process, and we account for dependency between these latent variables through a hierarchical model. A linear model is used to relate covariates and latent variables to the primary outcomes for each subunit. A generalized linear model accounts for covariate and latent variable effects on the probability of censoring for subunits within each cluster. The model accounts for correlation within clusters and within subunits through a flexible factor analytic framework that allows multiple latent variables and covariate effects on the latent variables. The structure of the model facilitates implementation of Markov chain Monte Carlo methods for posterior estimation. Data from a spermatotoxicity study are analyzed to illustrate the proposed approach.


Subject(s)
Biometry , Models, Statistical , Animals , Cluster Analysis , Data Interpretation, Statistical , In Vitro Techniques , Male , Models, Biological , Multivariate Analysis , Rats , Sperm Motility/drug effects
18.
Biostatistics ; 2(2): 131-45, 2001 Jun.
Article in English | MEDLINE | ID: mdl-12933545

ABSTRACT

Models of human fertility that incorporate information on timing of intercourse have assumed that a single ovum is released each menstrual cycle. These models are misspecified if two or more viable ova are sometimes released in a single cycle, which is known to occur in dizygotic twin pregnancies. In this paper, we propose a model for multiple ovulation in humans. We assume that the unobservable number of viable ova in each cycle follows a multinomial distribution. Successful fertilization of each ovum depends on the ability of the cycle to support a pregnancy and on the aggregate of a set of unobservable Bernoulli trials representing the fertilizing effects of intercourse on various days. Our model accommodates general covariate effects, allows for heterogeneity among couples, and accounts for a sterile subpopulation of couples. Information on early detection of pregnancy can be incorporated to estimate the probability of embryo loss. We outline a Markov chain Monte Carlo algorithm for estimation of the posterior distributions of the parameters. The methods are applied to data from a North Carolina pregnancy study, and applications to studies of assisted reproduction are described.

19.
Biometrics ; 57(4): 1067-73, 2001 Dec.
Article in English | MEDLINE | ID: mdl-11764245

ABSTRACT

Time to pregnancy studies that identify ovulation days and collect daily intercourse data can be used to estimate the day-specific probabilities of conception given intercourse on a single day relative to ovulation. In this article, a Bayesian semiparametric model is described for flexibly characterizing covariate effects and heterogeneity among couples in daily fecundability. The proposed model is characterized by the timing of the most fertile day of the cycle relative to ovulation, by the probability of conception due to intercourse on the most fertile day, and by the ratios of the daily conception probabilities for other days of the cycle relative to this peak probability. The ratios are assumed to be increasing in time to the peak and decreasing thereafter. Generalized linear mixed models are used to incorporate covariate and couple-specific effects on the peak probability and on the day-specific ratios. A Markov chain Monte Carlo algorithm is described for posterior estimation, and the methods are illustrated through application to caffeine data from a North Carolina pregnancy study.


Subject(s)
Bayes Theorem , Fertility , Menstrual Cycle/physiology , Algorithms , Biometry , Caffeine/pharmacology , Female , Fertility/drug effects , Humans , Markov Chains , Models, Biological , Monte Carlo Method , Pregnancy , Time Factors
20.
Biometrics ; 56(4): 1068-75, 2000 Dec.
Article in English | MEDLINE | ID: mdl-11129462

ABSTRACT

In some types of cancer chemoprevention experiments and short-term carcinogenicity bioassays, the data consist of the number of observed tumors per animal and the times at which these tumors were first detected. In such studies, there is interest in distinguishing between treatment effects on the number of tumors induced by a known carcinogen and treatment effects on the tumor growth rate. Since animals may die before all induced tumors reach a detectable size, separation of these effects can be difficult. This paper describes a flexible parametric model for data of this type. Under our model, the tumor detection times are realizations of a delayed Poisson process that is characterized by the age-specific tumor induction rate and a random latency interval between tumor induction and detection. The model accommodates distinct treatment and animal-specific effects on the number of induced tumors (multiplicity) and the time to tumor detection (growth rate). A Gibbs sampler is developed for estimation of the posterior distributions of the parameters. The methods are illustrated through application to data from a breast cancer chemoprevention experiment.


Subject(s)
Anticarcinogenic Agents/therapeutic use , Drug Screening Assays, Antitumor/methods , Neoplasms, Experimental/pathology , Neoplasms, Experimental/prevention & control , Vitamin A/analogs & derivatives , 9,10-Dimethyl-1,2-benzanthracene , Animals , Biometry/methods , Canthaxanthin/therapeutic use , Diterpenes , Female , Mammary Neoplasms, Experimental/chemically induced , Mammary Neoplasms, Experimental/pathology , Mammary Neoplasms, Experimental/prevention & control , Models, Statistical , Rats , Rats, Sprague-Dawley , Retinyl Esters , Vitamin A/therapeutic use
SELECTION OF CITATIONS
SEARCH DETAIL
...