Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 103
Filtrar
1.
Theor Popul Biol ; 156: 1-4, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38184209

RESUMO

Consider the problem of estimating the branch lengths in a symmetric 2-state substitution model with a known topology and a general, clock-like or star-shaped tree with three leaves. We show that the maximum likelihood estimates are analytically tractable and can be obtained from pairwise sequence comparisons. Furthermore, we demonstrate that this property does not generalize to larger state spaces, more complex models or larger trees. Our arguments are based on an enumeration of the free parameters of the model and the dimension of the minimal sufficient data vector. Our interest in this problem arose from discussions with our former colleague Freddy Bugge Christiansen.


Assuntos
Evolução Molecular , Modelos Genéticos , Funções Verossimilhança , Filogenia
2.
Genetics ; 225(2)2023 Oct 04.
Artigo em Inglês | MEDLINE | ID: mdl-37611212

RESUMO

Principal component analysis (PCA) is commonly used in genetics to infer and visualize population structure and admixture between populations. PCA is often interpreted in a way similar to inferred admixture proportions, where it is assumed that individuals belong to one of several possible populations or are admixed between these populations. We propose a new method to assess the statistical fit of PCA (interpreted as a model spanned by the top principal components) and to show that violations of the PCA assumptions affect the fit. Our method uses the chosen top principal components to predict the genotypes. By assessing the covariance (and the correlation) of the residuals (the differences between observed and predicted genotypes), we are able to detect violation of the model assumptions. Based on simulations and genome-wide human data, we show that our assessment of fit can be used to guide the interpretation of the data and to pinpoint individuals that are not well represented by the chosen principal components. Our method works equally on other similar models, such as the admixture model, where the mean of the data is represented by linear matrix decomposition.

3.
Mol Ecol Resour ; 23(7): 1604-1619, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37400991

RESUMO

The genome of recently admixed individuals or hybrids has characteristic genetic patterns that can be used to learn about their recent admixture history. One of these are patterns of interancestry heterozygosity, which can be inferred from SNP data from either called genotypes or genotype likelihoods, without the need for information on genomic location. This makes them applicable to a wide range of data that are often used in evolutionary and conservation genomic studies, such as low-depth sequencing mapped to scaffolds and reduced representation sequencing. Here we implement maximum likelihood estimation of interancestry heterozygosity patterns using two complementary models. We furthermore develop apoh (Admixture Pedigrees of Hybrids), a software that uses estimates of paired ancestry proportions to detect recently admixed individuals or hybrids, and to suggest possible admixture pedigrees. It furthermore calculates several hybrid indices that make it easier to identify and rank possible admixture pedigrees that could give rise to the estimated patterns. We implemented apoh both as a command line tool and as a Graphical User Interface that allows the user to automatically and interactively explore, rank and visualize compatible recent admixture pedigrees, and calculate the different summary indices. We validate the performance of the method using admixed family trios from the 1000 Genomes Project. In addition, we show its applicability on identifying recent hybrids from RAD-seq data of Grant's gazelle (Nanger granti and Nanger petersii) and whole genome low-depth data of waterbuck (Kobus ellipsiprymnus) which shows complex admixture of up to four populations.


Assuntos
Genética Populacional , Genoma , Humanos , Linhagem , Genoma/genética , Genótipo , Software
4.
J R Soc Interface ; 20(203): 20220877, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-37340782

RESUMO

With a view towards artificial cells, molecular communication systems, molecular multiagent systems and federated learning, we propose a novel reaction network scheme (termed the Baum-Welch (BW) reaction network) that learns parameters for hidden Markov models (HMMs). All variables including inputs and outputs are encoded by separate species. Each reaction in the scheme changes only one molecule of one species to one molecule of another. The reverse change is also accessible but via a different set of enzymes, in a design reminiscent of futile cycles in biochemical pathways. We show that every positive fixed point of the BW algorithm for HMMs is a fixed point of the reaction network scheme, and vice versa. Furthermore, we prove that the 'expectation' step and the 'maximization' step of the reaction network separately converge exponentially fast and compute the same values as the E-step and the M-step of the BW algorithm. We simulate example sequences, and show that our reaction network learns the same parameters for the HMM as the BW algorithm, and that the log-likelihood increases continuously along the trajectory of the reaction network.


Assuntos
Algoritmos , Cadeias de Markov
5.
Genetics ; 222(4)2022 11 30.
Artigo em Inglês | MEDLINE | ID: mdl-36173322

RESUMO

The site frequency spectrum is an important summary statistic in population genetics used for inference on demographic history and selection. However, estimation of the site frequency spectrum from called genotypes introduces bias when working with low-coverage sequencing data. Methods exist for addressing this issue but sometimes suffer from 2 problems. First, they can have very high computational demands, to the point that it may not be possible to run estimation for genome-scale data. Second, existing methods are prone to overfitting, especially for multidimensional site frequency spectrum estimation. In this article, we present a stochastic expectation-maximization algorithm for inferring the site frequency spectrum from NGS data that address these challenges. We show that this algorithm greatly reduces runtime and enables estimation with constant, trivial RAM usage. Furthermore, the algorithm reduces overfitting and thereby improves downstream inference. An implementation is available at github.com/malthesr/winsfs.


Assuntos
Algoritmos , Genética Populacional , Genótipo , Genoma , Viés , Sequenciamento de Nucleotídeos em Larga Escala/métodos
7.
Math Biosci Eng ; 19(3): 2720-2749, 2022 01 11.
Artigo em Inglês | MEDLINE | ID: mdl-35240803

RESUMO

We consider stochastic reaction networks modeled by continuous-time Markov chains. Such reaction networks often contain many reactions, potentially occurring at different time scales, and have unknown parameters (kinetic rates, total amounts). This makes their analysis complex. We examine stochastic reaction networks with non-interacting species that often appear in examples of interest (e.g. in the two-substrate Michaelis Menten mechanism). Non-interacting species typically appear as intermediate (or transient) chemical complexes that are depleted at a fast rate. We embed the Markov process of the reaction network into a one-parameter family under a two time-scale approach, such that molecules of non-interacting species are degraded fast. We derive simplified reaction networks where the non-interacting species are eliminated and that approximate the scaled Markov process in the limit as the parameter becomes small. Then, we derive sufficient conditions for such reductions based on the reaction network structure for both homogeneous and time-varying stochastic settings, and study examples and properties of the reduction.


Assuntos
Cadeias de Markov , Modelos Teóricos , Cinética , Processos Estocásticos
8.
Theor Popul Biol ; 142: 1-11, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34563554

RESUMO

A coalescent model of a sample of size n is derived from a birth-death process that originates at a random time in the past from a single founder individual. Over time, the descendants of the founder evolve into a population of large (infinite) size from which a sample of size n is taken. The parameters and time of the birth-death process are scaled in N0, the size of the present-day population, while letting N0→∞, similarly to how the standard Kingman coalescent process arises from the Wright-Fisher model. The model is named the Limit Birth-Death (LBD) coalescent model. Simulations from the LBD coalescent model with sample size n are computationally slow compared to standard coalescent models. Therefore, we suggest different approximations to the LBD coalescent model assuming the population size is a deterministic function of time rather than a stochastic process. Furthermore, we introduce a hybrid LBD coalescent model, that combines the exactness of the LBD coalescent model model with the speed of the approximations.


Assuntos
Genética Populacional , Modelos Genéticos , Densidade Demográfica , Tamanho da Amostra , Processos Estocásticos
9.
Math Biosci ; 320: 108295, 2020 02.
Artigo em Inglês | MEDLINE | ID: mdl-31843554

RESUMO

We consider the question whether a chemical reaction network preserves the number and stability of its positive steady states upon inclusion of inflow and outflow reactions. Often a model of a reaction network is presented without inflows and outflows, while in fact some of the species might be degraded or leaked to the environment, or be synthesized or transported into the system. We provide a sufficient and easy-to-check criterion based on the stoichiometry of the reaction network alone and discuss examples from systems biology.


Assuntos
Fenômenos Bioquímicos , Redes e Vias Metabólicas , Modelos Biológicos , Modelos Químicos , Biologia de Sistemas , Humanos
10.
Theor Popul Biol ; 125: 56-66, 2019 02.
Artigo em Inglês | MEDLINE | ID: mdl-30562538

RESUMO

We provide a general mathematical framework based on the theory of graphical models to study admixture graphs. Admixture graphs are used to describe the ancestral relationships between past and present populations, allowing for population merges and migration events, by means of gene flow. We give various mathematical properties of admixture graphs with particular focus on properties of the so-called F-statistics. Also the Wright-Fisher model is studied and a general expression for the loss of heterozygosity is derived.


Assuntos
Deriva Genética , Genética Populacional , Processos Estocásticos , Genética Populacional/estatística & dados numéricos , Heterozigoto , Humanos , Modelos Teóricos
11.
Math Biosci ; 301: 68-82, 2018 07.
Artigo em Inglês | MEDLINE | ID: mdl-29601834

RESUMO

We introduce a unifying and generalizing framework for complex and detailed balanced steady states in chemical reaction network theory. To this end, we generalize the graph commonly used to represent a reaction network. Specifically, we introduce a graph, called a reaction graph, that has one edge for each reaction but potentially multiple nodes for each complex. A special class of steady states, called node balanced steady states, is naturally associated with such a reaction graph. We show that complex and detailed balanced steady states are special cases of node balanced steady states by choosing appropriate reaction graphs. Further, we show that node balanced steady states have properties analogous to complex balanced steady states, such as uniqueness and asymptotic stability in each stoichiometric compatibility class. Moreover, we associate an integer, called the deficiency, to a reaction graph that gives the number of independent relations in the reaction rate constants that need to be satisfied for a positive node balanced steady state to exist. The set of reaction graphs (modulo isomorphism) is equipped with a partial order that has the complex balanced reaction graph as minimal element. We relate this order to the deficiency and to the set of reaction rate constants for which a positive node balanced steady state exists.


Assuntos
Modelos Químicos , Fenômenos Bioquímicos , Cinética , Conceitos Matemáticos
12.
Theor Popul Biol ; 122: 36-45, 2018 07.
Artigo em Inglês | MEDLINE | ID: mdl-29452133

RESUMO

In many areas of genetics it is of relevance to consider a population of individuals that is founded by a single individual in the past. One model for such a scenario is the conditioned reconstructed process with Bernoulli sampling that describes the evolution of a population of individuals that originates from a single individual. Several aspects of this reconstructed process are studied, in particular the Markov structure of the process. It is shown that at any given time in the past, the conditioned reconstructed process behaves as the original conditioned reconstructed process after a suitable time-dependent change of the sampling probability. Additionally, it is discussed how mutations accumulate in a sample of particles. It is shown that random sampling of particles at the present time has the effect of making the mutation rate look time-dependent. Conditions are given under which this sampling effect is negligible. A possible extension of the reconstructed process that allows for multiple founding particles is discussed.


Assuntos
Distribuição Binomial , Genética Populacional , Modelos Genéticos , Probabilidade , Algoritmos , Coeficiente de Natalidade , Genealogia e Heráldica , Humanos , Cadeias de Markov , Mortalidade , Mutação
13.
G3 (Bethesda) ; 8(2): 551-566, 2018 02 02.
Artigo em Inglês | MEDLINE | ID: mdl-29196497

RESUMO

The detection of ancient gene flow between human populations is an important issue in population genetics. A common tool for detecting ancient admixture events is the D-statistic. The D-statistic is based on the hypothesis of a genetic relationship that involves four populations, whose correctness is assessed by evaluating specific coincidences of alleles between the groups. When working with high-throughput sequencing data, calling genotypes accurately is not always possible; therefore, the D-statistic currently samples a single base from the reads of one individual per population. This implies ignoring much of the information in the data, an issue especially striking in the case of ancient genomes. We provide a significant improvement to overcome the problems of the D-statistic by considering all reads from multiple individuals in each population. We also apply type-specific error correction to combat the problems of sequencing errors, and show a way to correct for introgression from an external population that is not part of the supposed genetic relationship, and how this leads to an estimate of the admixture rate. We prove that the D-statistic is approximated by a standard normal distribution. Furthermore, we show that our method outperforms the traditional D-statistic in detecting admixtures. The power gain is most pronounced for low and medium sequencing depth (1-10×), and performances are as good as with perfectly called genotypes at a sequencing depth of 2×. We show the reliability of error correction in scenarios with simulated errors and ancient data, and correct for introgression in known scenarios to estimate the admixture rates.


Assuntos
Fluxo Gênico , Genética Populacional/estatística & dados numéricos , Genoma Humano/genética , Sequenciamento Completo do Genoma/métodos , Algoritmos , Frequência do Gene , Genética Populacional/métodos , Genótipo , Migração Humana , Humanos , Modelos Genéticos , Modelos Estatísticos , Polimorfismo de Nucleotídeo Único
14.
PLoS Comput Biol ; 13(10): e1005751, 2017 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-28972969

RESUMO

Mathematical modelling has become an established tool for studying the dynamics of biological systems. Current applications range from building models that reproduce quantitative data to identifying systems with predefined qualitative features, such as switching behaviour, bistability or oscillations. Mathematically, the latter question amounts to identifying parameter values associated with a given qualitative feature. We introduce a procedure to partition the parameter space of a parameterized system of ordinary differential equations into regions for which the system has a unique or multiple equilibria. The procedure is based on the computation of the Brouwer degree, and it creates a multivariate polynomial with parameter depending coefficients. The signs of the coefficients determine parameter regions with and without multistationarity. A particular strength of the procedure is the avoidance of numerical analysis and parameter sampling. The procedure consists of a number of steps. Each of these steps might be addressed algorithmically using various computer programs and available software, or manually. We demonstrate our procedure on several models of gene transcription and cell signalling, and show that in many cases we obtain a complete partitioning of the parameter space with respect to multistationarity.


Assuntos
Algoritmos , Interpretação Estatística de Dados , Modelos Biológicos , Modelos Estatísticos , Análise Multivariada , Simulação por Computador
15.
Bull Math Biol ; 79(7): 1662-1686, 2017 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-28620882

RESUMO

Known graphical conditions for the generic and global convergence to equilibria of the dynamical system arising from a reaction network are shown to be invariant under the so-called successive removal of intermediates, a systematic procedure to simplify the network, making the graphical conditions considerably easier to check.


Assuntos
Variação Genética , Modelos Teóricos , Humanos
16.
Genome Biol ; 18(1): 38, 2017 02 21.
Artigo em Inglês | MEDLINE | ID: mdl-28222791

RESUMO

The study of epigenetic heterogeneity at the level of individual cells and in whole populations is the key to understanding cellular differentiation, organismal development, and the evolution of cancer. We develop a statistical method, epiG, to infer and differentiate between different epi-allelic haplotypes, annotated with CpG methylation status and DNA polymorphisms, from whole-genome bisulfite sequencing data, and nucleosome occupancy from NOMe-seq data. We demonstrate the capabilities of the method by inferring allele-specific methylation and nucleosome occupancy in cell lines, and colon and tumor samples, and by benchmarking the method against independent experimental data.


Assuntos
Metilação de DNA , Epigenômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA , Software , Alelos , Ilhas de CpG , Perfilação da Expressão Gênica , Genótipo , Nucleossomos/metabolismo , Polimorfismo de Nucleotídeo Único , Ligação Proteica , Reprodutibilidade dos Testes
17.
J Math Biol ; 74(4): 887-932, 2017 03.
Artigo em Inglês | MEDLINE | ID: mdl-27480320

RESUMO

For dynamical systems arising from chemical reaction networks, persistence is the property that each species concentration remains positively bounded away from zero, as long as species concentrations were all positive in the beginning. We describe two graphical procedures for simplifying reaction networks without breaking known necessary or sufficient conditions for persistence, by iteratively removing so-called intermediates and catalysts from the network. The procedures are easy to apply and, in many cases, lead to highly simplified network structures, such as monomolecular networks. For specific classes of reaction networks, we show that these conditions for persistence are equivalent to one another. Furthermore, they can also be characterized by easily checkable strong connectivity properties of a related graph. In particular, this is the case for (conservative) monomolecular networks, as well as cascades of a large class of post-translational modification systems (of which the MAPK cascade and the n-site futile cycle are prominent examples). Since one of the aforementioned sufficient conditions for persistence precludes the existence of boundary steady states, our method also provides a graphical tool to check for that.


Assuntos
Fenômenos Bioquímicos/fisiologia , Técnicas de Química Analítica/métodos , Sistema de Sinalização das MAP Quinases/fisiologia , Processamento de Proteína Pós-Traducional/fisiologia
18.
J Math Biol ; 74(1-2): 195-237, 2017 01.
Artigo em Inglês | MEDLINE | ID: mdl-27221101

RESUMO

The quasi-steady state approximation and time-scale separation are commonly applied methods to simplify models of biochemical reaction networks based on ordinary differential equations (ODEs). The concentrations of the "fast" species are assumed effectively to be at steady state with respect to the "slow" species. Under this assumption the steady state equations can be used to eliminate the "fast" variables and a new ODE system with only the slow species can be obtained. We interpret a reduced system obtained by time-scale separation as the ODE system arising from a unique reaction network, by identification of a set of reactions and the corresponding rate functions. The procedure is graphically based and can easily be worked out by hand for small networks. For larger networks, we provide a pseudo-algorithm. We study properties of the reduced network, its kinetics and conservation laws, and show that the kinetics of the reduced network fulfil realistic assumptions, provided the original network does. We illustrate our results using biological examples such as substrate mechanisms, post-translational modification systems and networks with intermediates (transient) steps.


Assuntos
Fenômenos Bioquímicos/fisiologia , Modelos Biológicos , Algoritmos , Cinética , Processamento de Proteína Pós-Traducional/fisiologia
19.
J R Soc Interface ; 13(123)2016 10.
Artigo em Inglês | MEDLINE | ID: mdl-27733693

RESUMO

Bistability, and more generally multistability, is a key system dynamics feature enabling decision-making and memory in cells. Deciphering the molecular determinants of multistability is thus crucial for a better understanding of cellular pathways and their (re)engineering in synthetic biology. Here, we show that a key motif found predominantly in eukaryotic signalling systems, namely a futile signalling cycle, can display bistability when featuring a two-state kinase. We provide necessary and sufficient mathematical conditions on the kinetic parameters of this motif that guarantee the existence of multiple steady states. These conditions foster the intuition that bistability arises as a consequence of competition between the two states of the kinase. Extending from this result, we find that increasing the number of kinase states linearly translates into an increase in the number of steady states in the system. These findings reveal, to our knowledge, a new mechanism for the generation of bistability and multistability in cellular signalling systems. Further the futile cycle featuring a two-state kinase is among the smallest bistable signalling motifs. We show that multi-state kinases and the described competition-based motif are part of several natural signalling systems and thereby could enable them to implement complex information processing through multistability. These results indicate that multi-state kinases in signalling systems are readily exploited by natural evolution and could equally be used by synthetic approaches for the generation of multistable information processing systems at the cellular level.


Assuntos
Modelos Biológicos , Proteínas Quinases/metabolismo , Transdução de Sinais/fisiologia , Animais , Humanos
20.
Stat Appl Genet Mol Biol ; 15(4): 349-61, 2016 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-27269897

RESUMO

In many areas of science it is custom to perform many, potentially millions, of tests simultaneously. To gain statistical power it is common to group tests based on a priori criteria such as predefined regions or by sliding windows. However, it is not straightforward to choose grouping criteria and the results might depend on the chosen criteria. Methods that summarize, or aggregate, test statistics or p-values, without relying on a priori criteria, are therefore desirable. We present a simple method to aggregate a sequence of stochastic variables, such as test statistics or p-values, into fewer variables without assuming a priori defined groups. We provide different ways to evaluate the significance of the aggregated variables based on theoretical considerations and resampling techniques, and show that under certain assumptions the FWER is controlled in the strong sense. Validity of the method was demonstrated using simulations and real data analyses. Our method may be a useful supplement to standard procedures relying on evaluation of test statistics individually. Moreover, by being agnostic and not relying on predefined selected regions, it might be a practical alternative to conventionally used methods of aggregation of p-values over regions. The method is implemented in Python and freely available online (through GitHub, see the Supplementary information).


Assuntos
Modelos Teóricos , Software , Algoritmos , Simulação por Computador , Interpretação Estatística de Dados , Internet , Reprodutibilidade dos Testes
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...