Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 27
Filter
1.
J R Stat Soc Series B Stat Methodol ; 84(4): 1082-1104, 2022 Sep.
Article in English | MEDLINE | ID: mdl-36419504

ABSTRACT

While many methods are available to detect structural changes in a time series, few procedures are available to quantify the uncertainty of these estimates post-detection. In this work, we fill this gap by proposing a new framework to test the null hypothesis that there is no change in mean around an estimated changepoint. We further show that it is possible to efficiently carry out this framework in the case of changepoints estimated by binary segmentation and its variants, ℓ 0 segmentation, or the fused lasso. Our setup allows us to condition on much less information than existing approaches, which yields higher powered tests. We apply our proposals in a simulation study and on a dataset of chromosomal guanine-cytosine content. These approaches are freely available in the R package ChangepointInference at https://jewellsean.github.io/changepoint-inference/.

2.
Biostatistics ; 21(4): 709-726, 2020 10 01.
Article in English | MEDLINE | ID: mdl-30753436

ABSTRACT

Calcium imaging data promises to transform the field of neuroscience by making it possible to record from large populations of neurons simultaneously. However, determining the exact moment in time at which a neuron spikes, from a calcium imaging data set, amounts to a non-trivial deconvolution problem which is of critical importance for downstream analyses. While a number of formulations have been proposed for this task in the recent literature, in this article, we focus on a formulation recently proposed in Jewell and Witten (2018. Exact spike train inference via $\ell_{0} $ optimization. The Annals of Applied Statistics12(4), 2457-2482) that can accurately estimate not just the spike rate, but also the specific times at which the neuron spikes. We develop a much faster algorithm that can be used to deconvolve a fluorescence trace of 100 000 timesteps in less than a second. Furthermore, we present a modification to this algorithm that precludes the possibility of a "negative spike". We demonstrate the performance of this algorithm for spike deconvolution on calcium imaging datasets that were recently released as part of the $\texttt{spikefinder}$ challenge (http://spikefinder.codeneuro.org/). The algorithm presented in this article was used in the Allen Institute for Brain Science's "platform paper" to decode neural activity from the Allen Brain Observatory; this is the main scientific paper in which their data resource is presented. Our $\texttt{C++}$ implementation, along with $\texttt{R}$ and $\texttt{python}$ wrappers, is publicly available. $\texttt{R}$ code is available on $\texttt{CRAN}$ and $\texttt{Github}$, and $\texttt{python}$ wrappers are available on $\texttt{Github}$; see https://github.com/jewellsean/FastLZeroSpikeInference.


Subject(s)
Calcium , Neurons , Algorithms , Brain/diagnostic imaging , Diagnostic Imaging , Humans
3.
Stat Comput ; 27(5): 1293-1305, 2017.
Article in English | MEDLINE | ID: mdl-32063685

ABSTRACT

In this paper we build on an approach proposed by Zou et al. (2014) for nonparametric changepoint detection. This approach defines the best segmentation for a data set as the one which minimises a penalised cost function, with the cost function defined in term of minus a non-parametric log-likelihood for data within each segment. Minimising this cost function is possible using dynamic programming, but their algorithm had a computational cost that is cubic in the length of the data set. To speed up computation, Zou et al. (2014) resorted to a screening procedure which means that the estimated segmentation is no longer guaranteed to be the global minimum of the cost function. We show that the screening procedure adversely affects the accuracy of the changepoint detection method, and show how a faster dynamic programming algorithm, pruned exact linear time (PELT) (Killick et al. 2012), can be used to find the optimal segmentation with a computational cost that can be close to linear in the amount of data. PELT requires a penalty to avoid under/over-fitting the model which can have a detrimental effect on the quality of the detected changepoints. To overcome this issue we use a relatively new method, changepoints over a range of penalties (Haynes et al. 2016), which finds all of the optimal segmentations for multiple penalty values over a continuous range. We apply our method to detect changes in heart-rate during physical activity.

4.
Stat Comput ; 27(2): 519-533, 2017.
Article in English | MEDLINE | ID: mdl-32355427

ABSTRACT

Many common approaches to detecting changepoints, for example based on statistical criteria such as penalised likelihood or minimum description length, can be formulated in terms of minimising a cost over segmentations. We focus on a class of dynamic programming algorithms that can solve the resulting minimisation problem exactly, and thus find the optimal segmentation under the given statistical criteria. The standard implementation of these dynamic programming methods have a computational cost that scales at least quadratically in the length of the time-series. Recently pruning ideas have been suggested that can speed up the dynamic programming algorithms, whilst still being guaranteed to be optimal, in that they find the true minimum of the cost function. Here we extend these pruning methods, and introduce two new algorithms for segmenting data: FPOP and SNIP. Empirical results show that FPOP is substantially faster than existing dynamic programming methods, and unlike the existing methods its computational efficiency is robust to the number of changepoints in the data. We evaluate the method for detecting copy number variations and observe that FPOP has a computational cost that is even competitive with that of binary segmentation, but can give much more accurate segmentations.

5.
Article in English | MEDLINE | ID: mdl-27375350

ABSTRACT

Widely used models in genetics include the Wright-Fisher diffusion and its moment dual, Kingman's coalescent. Each has a multilocus extension but under neither extension is the sampling distribution available in closed-form, and their computation is extremely difficult. In this paper we derive two new multilocus population genetic models, one a diffusion and the other a coalescent process, which are much simpler than the standard models, but which capture their key properties for large recombination rates. The diffusion model is based on a central limit theorem for density dependent population processes, and we show that the sampling distribution is a linear combination of moments of Gaussian distributions and hence available in closed-form. The coalescent process is based on a probabilistic coupling of the ancestral recombination graph to a simpler genealogical process which exposes the leading dynamics of the former. We further demonstrate that when we consider the sampling distribution as an asymptotic expansion in inverse powers of the recombination parameter, the sampling distributions of the new models agree with the standard ones up to the first two orders.

6.
Biometrics ; 70(2): 457-66, 2014 Jun.
Article in English | MEDLINE | ID: mdl-24467590

ABSTRACT

We consider inference for the reaction rates in discretely observed networks such as those found in models for systems biology, population ecology, and epidemics. Most such networks are neither slow enough nor small enough for inference via the true state-dependent Markov jump process to be feasible. Typically, inference is conducted by approximating the dynamics through an ordinary differential equation (ODE) or a stochastic differential equation (SDE). The former ignores the stochasticity in the true model and can lead to inaccurate inferences. The latter is more accurate but is harder to implement as the transition density of the SDE model is generally unknown. The linear noise approximation (LNA) arises from a first-order Taylor expansion of the approximating SDE about a deterministic solution and can be viewed as a compromise between the ODE and SDE models. It is a stochastic model, but discrete time transition probabilities for the LNA are available through the solution of a series of ordinary differential equations. We describe how a restarting LNA can be efficiently used to perform inference for a general class of reaction networks; evaluate the accuracy of such an approach; and show how and when this approach is either statistically or computationally more efficient than ODE or SDE methods. We apply the LNA to analyze Google Flu Trends data from the North and South Islands of New Zealand, and are able to obtain more accurate short-term forecasts of new flu cases than another recently proposed method, although at a greater computational cost.


Subject(s)
Biometry/methods , Models, Statistical , Computer Simulation , Ecology/statistics & numerical data , Epidemics/statistics & numerical data , Epidemiologic Methods , Gene Regulatory Networks , Humans , Influenza, Human/epidemiology , Linear Models , Stochastic Processes , Systems Biology/statistics & numerical data
7.
Stat Appl Genet Mol Biol ; 13(1): 67-82, 2014 Feb.
Article in English | MEDLINE | ID: mdl-24323893

ABSTRACT

A central statistical goal is to choose between alternative explanatory models of data. In many modern applications, such as population genetics, it is not possible to apply standard methods based on evaluating the likelihood functions of the models, as these are numerically intractable. Approximate Bayesian computation (ABC) is a commonly used alternative for such situations. ABC simulates data x for many parameter values under each model, which is compared to the observed data x obs. More weight is placed on models under which S(x) is close to S(x obs), where S maps data to a vector of summary statistics. Previous work has shown the choice of S is crucial to the efficiency and accuracy of ABC. This paper provides a method to select good summary statistics for model choice. It uses a preliminary step, simulating many x values from all models and fitting regressions to this with the model as response. The resulting model weight estimators are used as S in an ABC analysis. Theoretical results are given to justify this as approximating low dimensional sufficient statistics. A substantive application is presented: choosing between competing coalescent models of demographic growth for Campylobacter jejuni in New Zealand using multi-locus sequence typing data.


Subject(s)
Computer Simulation , Models, Genetic , Algorithms , Bayes Theorem , Campylobacter jejuni/genetics , Genes, Bacterial , Likelihood Functions , Multilocus Sequence Typing
8.
Microbiologyopen ; 2(4): 659-73, 2013 Aug.
Article in English | MEDLINE | ID: mdl-23873654

ABSTRACT

A repeated cross-sectional study was conducted to determine the prevalence of Campylobacter spp. and the population structure of C. jejuni in European starlings and ducks cohabiting multiple public access sites in an urban area of New Zealand. The country's geographical isolation and relatively recent history of introduction of wild bird species, including the European starling and mallard duck, create an ideal setting to explore the impact of geographical separation on the population biology of C. jejuni, as well as potential public health implications. A total of 716 starling and 720 duck fecal samples were collected and screened for C. jejuni over a 12 month period. This study combined molecular genotyping, population genetics and epidemiological modeling and revealed: (i) higher Campylobacter spp. isolation in starlings (46%) compared with ducks (30%), but similar isolation of C. jejuni in ducks (23%) and starlings (21%), (ii) significant associations between the isolation of Campylobacter spp. and host species, sampling location and time of year using logistic regression, (iii) evidence of population differentiation, as indicated by FST , and host-genotype association with clonal complexes CC ST-177 and CC ST-682 associated with starlings, and clonal complexes CC ST-1034, CC ST-692, and CC ST-1332 associated with ducks, and (iv) greater genetic diversity and genotype richness in ducks compared with starlings. These findings provide evidence that host-associated genotypes, such as the starling-associated ST-177 and ST-682, represent lineages that were introduced with the host species in the 19th century. The isolation of sequence types associated with human disease in New Zealand indicate that wild ducks and starlings need to be considered as a potential public health risk, particularly in urban areas.


Subject(s)
Biodiversity , Campylobacter Infections/veterinary , Campylobacter jejuni/isolation & purification , Ducks/microbiology , Starlings/microbiology , Animals , Campylobacter Infections/epidemiology , Campylobacter Infections/microbiology , Campylobacter jejuni/classification , Campylobacter jejuni/genetics , Feces/microbiology , Genetic Variation , Humans , Molecular Epidemiology , Molecular Typing , New Zealand , Prevalence , Urban Population
9.
J Mol Evol ; 74(5-6): 273-80, 2012 Jun.
Article in English | MEDLINE | ID: mdl-22767048

ABSTRACT

Single locus variants (SLVs) are bacterial sequence types that differ at only one of the seven canonical multilocus sequence typing (MLST) loci. Estimating the relative roles of recombination and point mutation in the generation of new alleles that lead to SLVs is helpful in understanding how organisms evolve. The relative rates of recombination and mutation for Campylobacter jejuni and Campylobacter coli were estimated at seven different housekeeping loci from publically available MLST data. The probability of recombination generating a new allele that leads to an SLV is estimated to be roughly seven times more than that of mutation for C. jejuni, but for C. coli recombination and mutation were estimated to have a similar contribution to the generation of SLVs. The majority of nucleotide differences (98 % for C. jejuni and 85 % for C. coli) between strains that make up an SLV are attributable to recombination. These estimates are much larger than estimates of the relative rate of recombination to mutation calculated from more distantly related isolates using MLST data. One explanation for this is that purifying selection plays an important role in the evolution of Campylobacter. A simulation study was performed to test the performance of our method under a range of biologically realistic parameters. We found that our method performed well when the recombination tract length was longer than 3 kb. For situations in which recombination may occur with shorter tract lengths, our estimates are likely to be an underestimate of the ratio of recombination to mutation, and of the importance of recombination for creating diversity in closely related isolates. A parametric bootstrap method was applied to calculate the uncertainty of these estimates.


Subject(s)
Campylobacter coli/genetics , Campylobacter jejuni/genetics , Genetic Loci/genetics , Genetic Variation , Point Mutation/genetics , Recombination, Genetic , Alleles , Campylobacter coli/classification , Campylobacter jejuni/classification , Databases, Genetic , Multilocus Sequence Typing , Nucleotides/genetics
10.
PLoS One ; 6(11): e27121, 2011.
Article in English | MEDLINE | ID: mdl-22096527

ABSTRACT

Campylobacter jejuni ST-474 is the most important human enteric pathogen in New Zealand, and yet this genotype is rarely found elsewhere in the world. Insight into the evolution of this organism was gained by a whole genome comparison of two ST-474, flaA SVR-14 isolates and other available C. jejuni isolates and genomes. The two isolates were collected from different sources, human (H22082) and retail poultry (P110b), at the same time and from the same geographical location. Solexa sequencing of each isolate resulted in ~1.659 Mb (H22082) and ~1.656 Mb (P110b) of assembled sequences within 28 (H22082) and 29 (P110b) contigs. We analysed 1502 genes for which we had sequences within both ST-474 isolates and within at least one of 11 C. jejuni reference genomes. Although 94.5% of genes were identical between the two ST-474 isolates, we identified 83 genes that differed by at least one nucleotide, including 55 genes with non-synonymous substitutions. These covered 101 kb and contained 672 point differences. We inferred that 22 (3.3%) of these differences were due to mutation and 650 (96.7%) were imported via recombination. Our analysis estimated 38 recombinant breakpoints within these 83 genes, which correspond to recombination events affecting at least 19 loci regions and gives a tract length estimate of ~2 kb. This includes a ~12 kb region displaying non-homologous recombination in one of the ST-474 genomes, with the insertion of two genes, including ykgC, a putative oxidoreductase, and a conserved hypothetical protein of unknown function. Furthermore, our analysis indicates that the source of this recombined DNA is more likely to have come from C. jejuni strains that are more closely related to ST-474. This suggests that the rates of recombination and mutation are similar in order of magnitude, but that recombination has been much more important for generating divergence between the two ST-474 isolates.


Subject(s)
Campylobacter jejuni/genetics , Genome, Bacterial/genetics , Campylobacter jejuni/classification , Homologous Recombination/genetics , Recombination, Genetic/genetics
11.
Mol Biol Evol ; 26(2): 385-97, 2009 Feb.
Article in English | MEDLINE | ID: mdl-19008526

ABSTRACT

Responsible for the majority of bacterial gastroenteritis in the developed world, Campylobacter jejuni is a pervasive pathogen of humans and animals, but its evolution is obscure. In this paper, we exploit contemporary genetic diversity and empirical evidence to piece together the evolutionary history of C. jejuni and quantify its evolutionary potential. Our combined population genetics-phylogenetics approach reveals a surprising picture. Campylobacter jejuni is a rapidly evolving species, subject to intense purifying selection that purges 60% of novel variation, but possessing a massive evolutionary potential. The low mutation rate is offset by a large effective population size so that a mutation at any site can occur somewhere in the population within the space of a week. Recombination has a fundamental role, generating diversity at twice the rate of de novo mutation, and facilitating gene flow between C. jejuni and its sister species Campylobacter coli. We attempt to calibrate the rate of molecular evolution in C. jejuni based solely on within-species variation. The rates we obtain are up to 1,000 times faster than conventional estimates, placing the C. jejuni-C. coli split at the time of the Neolithic revolution. We weigh the plausibility of such recent bacterial evolution against alternative explanations and discuss the evidence required to settle the issue.


Subject(s)
Campylobacter jejuni/genetics , Evolution, Molecular , Campylobacter Infections/microbiology , Campylobacter coli/genetics , Campylobacter jejuni/classification , England , Genetic Drift , Genetic Speciation , Humans , Mutation , Recombination, Genetic , Selection, Genetic
12.
PLoS Genet ; 4(9): e1000203, 2008 Sep 26.
Article in English | MEDLINE | ID: mdl-18818764

ABSTRACT

Campylobacter jejuni is the leading cause of bacterial gastro-enteritis in the developed world. It is thought to infect 2-3 million people a year in the US alone, at a cost to the economy in excess of US $4 billion. C. jejuni is a widespread zoonotic pathogen that is carried by animals farmed for meat and poultry. A connection with contaminated food is recognized, but C. jejuni is also commonly found in wild animals and water sources. Phylogenetic studies have suggested that genotypes pathogenic to humans bear greatest resemblance to non-livestock isolates. Moreover, seasonal variation in campylobacteriosis bears the hallmarks of water-borne disease, and certain outbreaks have been attributed to contamination of drinking water. As a result, the relative importance of these reservoirs to human disease is controversial. We use multilocus sequence typing to genotype 1,231 cases of C. jejuni isolated from patients in Lancashire, England. By modeling the DNA sequence evolution and zoonotic transmission of C. jejuni between host species and the environment, we assign human cases probabilistically to source populations. Our novel population genetics approach reveals that the vast majority (97%) of sporadic disease can be attributed to animals farmed for meat and poultry. Chicken and cattle are the principal sources of C. jejuni pathogenic to humans, whereas wild animal and environmental sources are responsible for just 3% of disease. Our results imply that the primary transmission route is through the food chain, and suggest that incidence could be dramatically reduced by enhanced on-farm biosecurity or preventing food-borne transmission.


Subject(s)
Animals, Wild/microbiology , Campylobacter Infections/transmission , Campylobacter jejuni/isolation & purification , Meat/microbiology , Water Microbiology , Animals , Bacterial Typing Techniques , Biodiversity , Birds , Campylobacter Infections/epidemiology , Campylobacter Infections/microbiology , Campylobacter jejuni/classification , Campylobacter jejuni/genetics , Cattle , Chickens , Disease Reservoirs/microbiology , England/epidemiology , Humans , Rabbits , Sheep , Swine
13.
Genetics ; 177(1): 427-34, 2007 Sep.
Article in English | MEDLINE | ID: mdl-17660571

ABSTRACT

We look at how to choose genetic distance so as to maximize the power of detecting spatial structure. We answer this question through analyzing two population genetic models that allow for a spatially structured population in a continuous habitat. These models, like most that incorporate spatial structure, can be characterized by a separation of timescales: the history of the sample can be split into a scattering and a collecting phase, and it is only during the scattering phase that the spatial locations of the sample affect the coalescence times. Our results suggest that the optimal choice of genetic distance is based upon splitting a DNA sequence into segments and counting the number of segments at which two sequences differ. The size of these segments depends on the length of the scattering phase for the population genetic model.


Subject(s)
Campylobacter jejuni/genetics , Chromosomes, Bacterial/genetics , Genetics, Population , Genome, Bacterial/genetics , Models, Genetic , Campylobacter jejuni/classification , Data Interpretation, Statistical , Evolution, Molecular , Geography
14.
Genetics ; 177(1): 347-58, 2007 Sep.
Article in English | MEDLINE | ID: mdl-17565950

ABSTRACT

We consider inference for demographic models and parameters based upon postprocessing the output of an MCMC method that generates samples of genealogical trees (from the posterior distribution for a specific prior distribution of the genealogy). This approach has the advantage of taking account of the uncertainty in the inference for the tree when making inferences about the demographic model and can be computationally efficient in terms of reanalyzing data under a wide variety of models. We consider a (simulation-consistent) estimate of the likelihood for variable population size models, which uses importance sampling, and propose two new approximate likelihoods, one for migration models and one for continuous spatial models.


Subject(s)
Evolution, Molecular , Genes/physiology , Genetics, Population , Models, Genetic , Models, Statistical , Pedigree , Algorithms , Animals , Bayes Theorem , DNA/genetics , Data Interpretation, Statistical , Emigration and Immigration , Genetic Variation , Humans , Likelihood Functions , Markov Chains , Monte Carlo Method , Software
15.
Nat Genet ; 39(5): 645-9, 2007 May.
Article in English | MEDLINE | ID: mdl-17401363

ABSTRACT

Recently, common variants on human chromosome 8q24 were found to be associated with prostate cancer risk. While conducting a genome-wide association study in the Cancer Genetic Markers of Susceptibility project with 550,000 SNPs in a nested case-control study (1,172 cases and 1,157 controls of European origin), we identified a new association at 8q24 with an independent effect on prostate cancer susceptibility. The most significant signal is 70 kb centromeric to the previously reported SNP, rs1447295, but shows little evidence of linkage disequilibrium with it. A combined analysis with four additional studies (total: 4,296 cases and 4,299 controls) confirms association with prostate cancer for rs6983267 in the centromeric locus (P = 9.42 x 10(-13); heterozygote odds ratio (OR): 1.26, 95% confidence interval (c.i.): 1.13-1.41; homozygote OR: 1.58, 95% c.i.: 1.40-1.78). Each SNP remained significant in a joint analysis after adjusting for the other (rs1447295 P = 1.41 x 10(-11); rs6983267 P = 6.62 x 10(-10)). These observations, combined with compelling evidence for a recombination hotspot between the two markers, indicate the presence of at least two independent loci within 8q24 that contribute to prostate cancer in men of European ancestry. We estimate that the population attributable risk of the new locus, marked by rs6983267, is higher than the locus marked by rs1447295 (21% versus 9%).


Subject(s)
Chromosomes, Human, Pair 8/genetics , Genetic Predisposition to Disease/genetics , Genetic Variation , Prostatic Neoplasms/genetics , Black or African American , Base Sequence , Ethnicity/genetics , Gene Frequency , Genomics/methods , Genotype , Haplotypes/genetics , Humans , Male , Molecular Sequence Data , Odds Ratio , Polymorphism, Single Nucleotide , Risk Factors , United States , White People
16.
Bioinformatics ; 22(24): 3061-6, 2006 Dec 15.
Article in English | MEDLINE | ID: mdl-17060358

ABSTRACT

MOTIVATION: There is much local variation in recombination rates across the human genome--with the majority of recombination occurring in recombination hotspots--short regions of around approximately 2 kb in length that have much higher recombination rates than neighbouring regions. Knowledge of this local variation is important, e.g. in the design and analysis of association studies for disease genes. Population genetic data, such as that generated by the HapMap project, can be used to infer the location of these hotspots. We present a new, efficient and powerful method for detecting recombination hotspots from population data. RESULTS: We compare our method with four current methods for detecting hotspots. It is orders of magnitude quicker, and has greater power, than two related approaches. It appears to be more powerful than HotspotFisher, though less accurate at inferring the precise positions of the hotspot. It was also more powerful than LDhot in some situations: particularly for weaker hotspots (10-40 times the background rate) when SNP density is lower (< 1/kb). AVAILABILITY: Program, data sets, and full details of results are available at: http://www.maths.lancs.ac.uk/~fearnhea/Hotspot.


Subject(s)
Algorithms , Chromosome Mapping/methods , Databases, Genetic , Genetics, Population , Recombination, Genetic/genetics , Sequence Analysis, DNA/methods , Software , Base Sequence , Molecular Sequence Data , Polymorphism, Genetic
17.
Genetics ; 174(3): 1397-406, 2006 Nov.
Article in English | MEDLINE | ID: mdl-16951070

ABSTRACT

We show how the idea of monotone coupling from the past can produce simple algorithms for simulating samples at a nonneutral locus under a range of demographic models. We specifically consider a biallelic locus and either a general variable population size mode or a general migration model for population subdivision. We investigate the effect of demography on the efficacy of selection and the effect of selection on genetic divergence between populations.


Subject(s)
Computer Simulation , Models, Genetic , Algorithms , Alleles , Animals , Genetic Variation , Markov Chains , Population/genetics , Population Density , Selection, Genetic
18.
Theor Popul Biol ; 70(3): 376-86, 2006 Nov.
Article in English | MEDLINE | ID: mdl-16563450

ABSTRACT

We consider population genetics models where selection acts at a set of unlinked loci. It is known that if the fitness of an individual is multiplicative across loci, then these loci are independent. We consider general selection models, but assume parent-independent mutation at each locus. For such a model, the joint stationary distribution of allele frequencies is proportional to the stationary distribution under neutrality multiplied by a known function of the mean fitness of the population. We further show how knowledge of this stationary distribution enables direct simulation of the genealogy of a sample at a single-locus. For a specific selection model appropriate for complex disease genes, we use simulation to determine what features of the genealogy differ between our general selection model and a multiplicative model.


Subject(s)
Gene Frequency/genetics , Genetic Linkage/genetics , Models, Genetic , Selection, Genetic , Analysis of Variance , Genetic Drift , Genetics, Population , Genotype , Haplotypes/genetics , Mutation/genetics , Pedigree , Population Dynamics , Time Factors
19.
Am J Hum Genet ; 77(5): 781-94, 2005 Nov.
Article in English | MEDLINE | ID: mdl-16252238

ABSTRACT

We introduce a new method for detection of recombination hotspots from population genetic data. This method is based on (a) defining an (approximate) penalized likelihood for how recombination rate varies with physical position and (b) maximizing this penalized likelihood over possible sets of recombination hotspots. Simulation results suggest that this is a more powerful method for detection of hotspots than are existing methods. We apply the method to data from 89 genes sequenced in African American and European American populations. We find many genes with multiple hotspots, and some hotspots show evidence of being population-specific. Our results suggest that hotspots are randomly positioned within genes and could be as frequent as one per 30 kb.


Subject(s)
Genetic Variation , Genome, Human , Black People , Computer Simulation , Genetics, Population , Humans , Likelihood Functions , Models, Genetic , Polymorphism, Genetic , Recombination, Genetic , Sequence Analysis, DNA/methods , White People
20.
Genetics ; 171(4): 2073-84, 2005 Dec.
Article in English | MEDLINE | ID: mdl-16085703

ABSTRACT

We develop a method for maximum-likelihood estimation of coalescence times in genealogical trees, based on population genetics data. For this purpose, a Viterbi-type algorithm is constructed to maximize the joint likelihood of the coalescence times. Marginal confidence intervals for the coalescence times based on the profile likelihoods are also computed. Our method of finding MLEs and calculating C.I.'s appears to be more accurate than alternative numerical maximization methods, and maximum-likelihood inference appears to be more accurate than other existing model-free approaches to estimating coalescent times. We demonstrate the method on two different data sets: human Y chromosome DNA data and fungus DNA data.


Subject(s)
Algorithms , Chromosomes, Human, Y/genetics , Classification/methods , Evolution, Molecular , Models, Genetic , Phylogeny , Ascomycota/genetics , Computer Simulation , Genetics, Population , Humans , Likelihood Functions , Male , Time Factors
SELECTION OF CITATIONS
SEARCH DETAIL
...