Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 103
Filtrar
1.
Theor Popul Biol ; 156: 1-4, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38184209

RESUMEN

Consider the problem of estimating the branch lengths in a symmetric 2-state substitution model with a known topology and a general, clock-like or star-shaped tree with three leaves. We show that the maximum likelihood estimates are analytically tractable and can be obtained from pairwise sequence comparisons. Furthermore, we demonstrate that this property does not generalize to larger state spaces, more complex models or larger trees. Our arguments are based on an enumeration of the free parameters of the model and the dimension of the minimal sufficient data vector. Our interest in this problem arose from discussions with our former colleague Freddy Bugge Christiansen.


Asunto(s)
Evolución Molecular , Modelos Genéticos , Funciones de Verosimilitud , Filogenia
2.
Genetics ; 225(2)2023 Oct 04.
Artículo en Inglés | MEDLINE | ID: mdl-37611212

RESUMEN

Principal component analysis (PCA) is commonly used in genetics to infer and visualize population structure and admixture between populations. PCA is often interpreted in a way similar to inferred admixture proportions, where it is assumed that individuals belong to one of several possible populations or are admixed between these populations. We propose a new method to assess the statistical fit of PCA (interpreted as a model spanned by the top principal components) and to show that violations of the PCA assumptions affect the fit. Our method uses the chosen top principal components to predict the genotypes. By assessing the covariance (and the correlation) of the residuals (the differences between observed and predicted genotypes), we are able to detect violation of the model assumptions. Based on simulations and genome-wide human data, we show that our assessment of fit can be used to guide the interpretation of the data and to pinpoint individuals that are not well represented by the chosen principal components. Our method works equally on other similar models, such as the admixture model, where the mean of the data is represented by linear matrix decomposition.

3.
Mol Ecol Resour ; 23(7): 1604-1619, 2023 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-37400991

RESUMEN

The genome of recently admixed individuals or hybrids has characteristic genetic patterns that can be used to learn about their recent admixture history. One of these are patterns of interancestry heterozygosity, which can be inferred from SNP data from either called genotypes or genotype likelihoods, without the need for information on genomic location. This makes them applicable to a wide range of data that are often used in evolutionary and conservation genomic studies, such as low-depth sequencing mapped to scaffolds and reduced representation sequencing. Here we implement maximum likelihood estimation of interancestry heterozygosity patterns using two complementary models. We furthermore develop apoh (Admixture Pedigrees of Hybrids), a software that uses estimates of paired ancestry proportions to detect recently admixed individuals or hybrids, and to suggest possible admixture pedigrees. It furthermore calculates several hybrid indices that make it easier to identify and rank possible admixture pedigrees that could give rise to the estimated patterns. We implemented apoh both as a command line tool and as a Graphical User Interface that allows the user to automatically and interactively explore, rank and visualize compatible recent admixture pedigrees, and calculate the different summary indices. We validate the performance of the method using admixed family trios from the 1000 Genomes Project. In addition, we show its applicability on identifying recent hybrids from RAD-seq data of Grant's gazelle (Nanger granti and Nanger petersii) and whole genome low-depth data of waterbuck (Kobus ellipsiprymnus) which shows complex admixture of up to four populations.


Asunto(s)
Genética de Población , Genoma , Humanos , Linaje , Genoma/genética , Genotipo , Programas Informáticos
4.
J R Soc Interface ; 20(203): 20220877, 2023 06.
Artículo en Inglés | MEDLINE | ID: mdl-37340782

RESUMEN

With a view towards artificial cells, molecular communication systems, molecular multiagent systems and federated learning, we propose a novel reaction network scheme (termed the Baum-Welch (BW) reaction network) that learns parameters for hidden Markov models (HMMs). All variables including inputs and outputs are encoded by separate species. Each reaction in the scheme changes only one molecule of one species to one molecule of another. The reverse change is also accessible but via a different set of enzymes, in a design reminiscent of futile cycles in biochemical pathways. We show that every positive fixed point of the BW algorithm for HMMs is a fixed point of the reaction network scheme, and vice versa. Furthermore, we prove that the 'expectation' step and the 'maximization' step of the reaction network separately converge exponentially fast and compute the same values as the E-step and the M-step of the BW algorithm. We simulate example sequences, and show that our reaction network learns the same parameters for the HMM as the BW algorithm, and that the log-likelihood increases continuously along the trajectory of the reaction network.


Asunto(s)
Algoritmos , Cadenas de Markov
5.
Genetics ; 222(4)2022 11 30.
Artículo en Inglés | MEDLINE | ID: mdl-36173322

RESUMEN

The site frequency spectrum is an important summary statistic in population genetics used for inference on demographic history and selection. However, estimation of the site frequency spectrum from called genotypes introduces bias when working with low-coverage sequencing data. Methods exist for addressing this issue but sometimes suffer from 2 problems. First, they can have very high computational demands, to the point that it may not be possible to run estimation for genome-scale data. Second, existing methods are prone to overfitting, especially for multidimensional site frequency spectrum estimation. In this article, we present a stochastic expectation-maximization algorithm for inferring the site frequency spectrum from NGS data that address these challenges. We show that this algorithm greatly reduces runtime and enables estimation with constant, trivial RAM usage. Furthermore, the algorithm reduces overfitting and thereby improves downstream inference. An implementation is available at github.com/malthesr/winsfs.


Asunto(s)
Algoritmos , Genética de Población , Genotipo , Genoma , Sesgo , Secuenciación de Nucleótidos de Alto Rendimiento/métodos
7.
Math Biosci Eng ; 19(3): 2720-2749, 2022 01 11.
Artículo en Inglés | MEDLINE | ID: mdl-35240803

RESUMEN

We consider stochastic reaction networks modeled by continuous-time Markov chains. Such reaction networks often contain many reactions, potentially occurring at different time scales, and have unknown parameters (kinetic rates, total amounts). This makes their analysis complex. We examine stochastic reaction networks with non-interacting species that often appear in examples of interest (e.g. in the two-substrate Michaelis Menten mechanism). Non-interacting species typically appear as intermediate (or transient) chemical complexes that are depleted at a fast rate. We embed the Markov process of the reaction network into a one-parameter family under a two time-scale approach, such that molecules of non-interacting species are degraded fast. We derive simplified reaction networks where the non-interacting species are eliminated and that approximate the scaled Markov process in the limit as the parameter becomes small. Then, we derive sufficient conditions for such reductions based on the reaction network structure for both homogeneous and time-varying stochastic settings, and study examples and properties of the reduction.


Asunto(s)
Cadenas de Markov , Modelos Teóricos , Cinética , Procesos Estocásticos
8.
Theor Popul Biol ; 142: 1-11, 2021 12.
Artículo en Inglés | MEDLINE | ID: mdl-34563554

RESUMEN

A coalescent model of a sample of size n is derived from a birth-death process that originates at a random time in the past from a single founder individual. Over time, the descendants of the founder evolve into a population of large (infinite) size from which a sample of size n is taken. The parameters and time of the birth-death process are scaled in N0, the size of the present-day population, while letting N0→∞, similarly to how the standard Kingman coalescent process arises from the Wright-Fisher model. The model is named the Limit Birth-Death (LBD) coalescent model. Simulations from the LBD coalescent model with sample size n are computationally slow compared to standard coalescent models. Therefore, we suggest different approximations to the LBD coalescent model assuming the population size is a deterministic function of time rather than a stochastic process. Furthermore, we introduce a hybrid LBD coalescent model, that combines the exactness of the LBD coalescent model model with the speed of the approximations.


Asunto(s)
Genética de Población , Modelos Genéticos , Densidad de Población , Tamaño de la Muestra , Procesos Estocásticos
9.
Math Biosci ; 320: 108295, 2020 02.
Artículo en Inglés | MEDLINE | ID: mdl-31843554

RESUMEN

We consider the question whether a chemical reaction network preserves the number and stability of its positive steady states upon inclusion of inflow and outflow reactions. Often a model of a reaction network is presented without inflows and outflows, while in fact some of the species might be degraded or leaked to the environment, or be synthesized or transported into the system. We provide a sufficient and easy-to-check criterion based on the stoichiometry of the reaction network alone and discuss examples from systems biology.


Asunto(s)
Fenómenos Bioquímicos , Redes y Vías Metabólicas , Modelos Biológicos , Modelos Químicos , Biología de Sistemas , Humanos
10.
Theor Popul Biol ; 125: 56-66, 2019 02.
Artículo en Inglés | MEDLINE | ID: mdl-30562538

RESUMEN

We provide a general mathematical framework based on the theory of graphical models to study admixture graphs. Admixture graphs are used to describe the ancestral relationships between past and present populations, allowing for population merges and migration events, by means of gene flow. We give various mathematical properties of admixture graphs with particular focus on properties of the so-called F-statistics. Also the Wright-Fisher model is studied and a general expression for the loss of heterozygosity is derived.


Asunto(s)
Flujo Genético , Genética de Población , Procesos Estocásticos , Genética de Población/estadística & datos numéricos , Heterocigoto , Humanos , Modelos Teóricos
11.
Math Biosci ; 301: 68-82, 2018 07.
Artículo en Inglés | MEDLINE | ID: mdl-29601834

RESUMEN

We introduce a unifying and generalizing framework for complex and detailed balanced steady states in chemical reaction network theory. To this end, we generalize the graph commonly used to represent a reaction network. Specifically, we introduce a graph, called a reaction graph, that has one edge for each reaction but potentially multiple nodes for each complex. A special class of steady states, called node balanced steady states, is naturally associated with such a reaction graph. We show that complex and detailed balanced steady states are special cases of node balanced steady states by choosing appropriate reaction graphs. Further, we show that node balanced steady states have properties analogous to complex balanced steady states, such as uniqueness and asymptotic stability in each stoichiometric compatibility class. Moreover, we associate an integer, called the deficiency, to a reaction graph that gives the number of independent relations in the reaction rate constants that need to be satisfied for a positive node balanced steady state to exist. The set of reaction graphs (modulo isomorphism) is equipped with a partial order that has the complex balanced reaction graph as minimal element. We relate this order to the deficiency and to the set of reaction rate constants for which a positive node balanced steady state exists.


Asunto(s)
Modelos Químicos , Fenómenos Bioquímicos , Cinética , Conceptos Matemáticos
12.
Theor Popul Biol ; 122: 36-45, 2018 07.
Artículo en Inglés | MEDLINE | ID: mdl-29452133

RESUMEN

In many areas of genetics it is of relevance to consider a population of individuals that is founded by a single individual in the past. One model for such a scenario is the conditioned reconstructed process with Bernoulli sampling that describes the evolution of a population of individuals that originates from a single individual. Several aspects of this reconstructed process are studied, in particular the Markov structure of the process. It is shown that at any given time in the past, the conditioned reconstructed process behaves as the original conditioned reconstructed process after a suitable time-dependent change of the sampling probability. Additionally, it is discussed how mutations accumulate in a sample of particles. It is shown that random sampling of particles at the present time has the effect of making the mutation rate look time-dependent. Conditions are given under which this sampling effect is negligible. A possible extension of the reconstructed process that allows for multiple founding particles is discussed.


Asunto(s)
Distribución Binomial , Genética de Población , Modelos Genéticos , Probabilidad , Algoritmos , Tasa de Natalidad , Genealogía y Heráldica , Humanos , Cadenas de Markov , Mortalidad , Mutación
13.
G3 (Bethesda) ; 8(2): 551-566, 2018 02 02.
Artículo en Inglés | MEDLINE | ID: mdl-29196497

RESUMEN

The detection of ancient gene flow between human populations is an important issue in population genetics. A common tool for detecting ancient admixture events is the D-statistic. The D-statistic is based on the hypothesis of a genetic relationship that involves four populations, whose correctness is assessed by evaluating specific coincidences of alleles between the groups. When working with high-throughput sequencing data, calling genotypes accurately is not always possible; therefore, the D-statistic currently samples a single base from the reads of one individual per population. This implies ignoring much of the information in the data, an issue especially striking in the case of ancient genomes. We provide a significant improvement to overcome the problems of the D-statistic by considering all reads from multiple individuals in each population. We also apply type-specific error correction to combat the problems of sequencing errors, and show a way to correct for introgression from an external population that is not part of the supposed genetic relationship, and how this leads to an estimate of the admixture rate. We prove that the D-statistic is approximated by a standard normal distribution. Furthermore, we show that our method outperforms the traditional D-statistic in detecting admixtures. The power gain is most pronounced for low and medium sequencing depth (1-10×), and performances are as good as with perfectly called genotypes at a sequencing depth of 2×. We show the reliability of error correction in scenarios with simulated errors and ancient data, and correct for introgression in known scenarios to estimate the admixture rates.


Asunto(s)
Flujo Génico , Genética de Población/estadística & datos numéricos , Genoma Humano/genética , Secuenciación Completa del Genoma/métodos , Algoritmos , Frecuencia de los Genes , Genética de Población/métodos , Genotipo , Migración Humana , Humanos , Modelos Genéticos , Modelos Estadísticos , Polimorfismo de Nucleótido Simple
14.
PLoS Comput Biol ; 13(10): e1005751, 2017 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-28972969

RESUMEN

Mathematical modelling has become an established tool for studying the dynamics of biological systems. Current applications range from building models that reproduce quantitative data to identifying systems with predefined qualitative features, such as switching behaviour, bistability or oscillations. Mathematically, the latter question amounts to identifying parameter values associated with a given qualitative feature. We introduce a procedure to partition the parameter space of a parameterized system of ordinary differential equations into regions for which the system has a unique or multiple equilibria. The procedure is based on the computation of the Brouwer degree, and it creates a multivariate polynomial with parameter depending coefficients. The signs of the coefficients determine parameter regions with and without multistationarity. A particular strength of the procedure is the avoidance of numerical analysis and parameter sampling. The procedure consists of a number of steps. Each of these steps might be addressed algorithmically using various computer programs and available software, or manually. We demonstrate our procedure on several models of gene transcription and cell signalling, and show that in many cases we obtain a complete partitioning of the parameter space with respect to multistationarity.


Asunto(s)
Algoritmos , Interpretación Estadística de Datos , Modelos Biológicos , Modelos Estadísticos , Análisis Multivariante , Simulación por Computador
15.
Bull Math Biol ; 79(7): 1662-1686, 2017 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-28620882

RESUMEN

Known graphical conditions for the generic and global convergence to equilibria of the dynamical system arising from a reaction network are shown to be invariant under the so-called successive removal of intermediates, a systematic procedure to simplify the network, making the graphical conditions considerably easier to check.


Asunto(s)
Variación Genética , Modelos Teóricos , Humanos
16.
Genome Biol ; 18(1): 38, 2017 02 21.
Artículo en Inglés | MEDLINE | ID: mdl-28222791

RESUMEN

The study of epigenetic heterogeneity at the level of individual cells and in whole populations is the key to understanding cellular differentiation, organismal development, and the evolution of cancer. We develop a statistical method, epiG, to infer and differentiate between different epi-allelic haplotypes, annotated with CpG methylation status and DNA polymorphisms, from whole-genome bisulfite sequencing data, and nucleosome occupancy from NOMe-seq data. We demonstrate the capabilities of the method by inferring allele-specific methylation and nucleosome occupancy in cell lines, and colon and tumor samples, and by benchmarking the method against independent experimental data.


Asunto(s)
Metilación de ADN , Epigenómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de Secuencia de ADN , Programas Informáticos , Alelos , Islas de CpG , Perfilación de la Expresión Génica , Genotipo , Nucleosomas/metabolismo , Polimorfismo de Nucleótido Simple , Unión Proteica , Reproducibilidad de los Resultados
17.
J Math Biol ; 74(4): 887-932, 2017 03.
Artículo en Inglés | MEDLINE | ID: mdl-27480320

RESUMEN

For dynamical systems arising from chemical reaction networks, persistence is the property that each species concentration remains positively bounded away from zero, as long as species concentrations were all positive in the beginning. We describe two graphical procedures for simplifying reaction networks without breaking known necessary or sufficient conditions for persistence, by iteratively removing so-called intermediates and catalysts from the network. The procedures are easy to apply and, in many cases, lead to highly simplified network structures, such as monomolecular networks. For specific classes of reaction networks, we show that these conditions for persistence are equivalent to one another. Furthermore, they can also be characterized by easily checkable strong connectivity properties of a related graph. In particular, this is the case for (conservative) monomolecular networks, as well as cascades of a large class of post-translational modification systems (of which the MAPK cascade and the n-site futile cycle are prominent examples). Since one of the aforementioned sufficient conditions for persistence precludes the existence of boundary steady states, our method also provides a graphical tool to check for that.


Asunto(s)
Fenómenos Bioquímicos/fisiología , Técnicas de Química Analítica/métodos , Sistema de Señalización de MAP Quinasas/fisiología , Procesamiento Proteico-Postraduccional/fisiología
18.
J Math Biol ; 74(1-2): 195-237, 2017 01.
Artículo en Inglés | MEDLINE | ID: mdl-27221101

RESUMEN

The quasi-steady state approximation and time-scale separation are commonly applied methods to simplify models of biochemical reaction networks based on ordinary differential equations (ODEs). The concentrations of the "fast" species are assumed effectively to be at steady state with respect to the "slow" species. Under this assumption the steady state equations can be used to eliminate the "fast" variables and a new ODE system with only the slow species can be obtained. We interpret a reduced system obtained by time-scale separation as the ODE system arising from a unique reaction network, by identification of a set of reactions and the corresponding rate functions. The procedure is graphically based and can easily be worked out by hand for small networks. For larger networks, we provide a pseudo-algorithm. We study properties of the reduced network, its kinetics and conservation laws, and show that the kinetics of the reduced network fulfil realistic assumptions, provided the original network does. We illustrate our results using biological examples such as substrate mechanisms, post-translational modification systems and networks with intermediates (transient) steps.


Asunto(s)
Fenómenos Bioquímicos/fisiología , Modelos Biológicos , Algoritmos , Cinética , Procesamiento Proteico-Postraduccional/fisiología
19.
J R Soc Interface ; 13(123)2016 10.
Artículo en Inglés | MEDLINE | ID: mdl-27733693

RESUMEN

Bistability, and more generally multistability, is a key system dynamics feature enabling decision-making and memory in cells. Deciphering the molecular determinants of multistability is thus crucial for a better understanding of cellular pathways and their (re)engineering in synthetic biology. Here, we show that a key motif found predominantly in eukaryotic signalling systems, namely a futile signalling cycle, can display bistability when featuring a two-state kinase. We provide necessary and sufficient mathematical conditions on the kinetic parameters of this motif that guarantee the existence of multiple steady states. These conditions foster the intuition that bistability arises as a consequence of competition between the two states of the kinase. Extending from this result, we find that increasing the number of kinase states linearly translates into an increase in the number of steady states in the system. These findings reveal, to our knowledge, a new mechanism for the generation of bistability and multistability in cellular signalling systems. Further the futile cycle featuring a two-state kinase is among the smallest bistable signalling motifs. We show that multi-state kinases and the described competition-based motif are part of several natural signalling systems and thereby could enable them to implement complex information processing through multistability. These results indicate that multi-state kinases in signalling systems are readily exploited by natural evolution and could equally be used by synthetic approaches for the generation of multistable information processing systems at the cellular level.


Asunto(s)
Modelos Biológicos , Proteínas Quinasas/metabolismo , Transducción de Señal/fisiología , Animales , Humanos
20.
Stat Appl Genet Mol Biol ; 15(4): 349-61, 2016 08 01.
Artículo en Inglés | MEDLINE | ID: mdl-27269897

RESUMEN

In many areas of science it is custom to perform many, potentially millions, of tests simultaneously. To gain statistical power it is common to group tests based on a priori criteria such as predefined regions or by sliding windows. However, it is not straightforward to choose grouping criteria and the results might depend on the chosen criteria. Methods that summarize, or aggregate, test statistics or p-values, without relying on a priori criteria, are therefore desirable. We present a simple method to aggregate a sequence of stochastic variables, such as test statistics or p-values, into fewer variables without assuming a priori defined groups. We provide different ways to evaluate the significance of the aggregated variables based on theoretical considerations and resampling techniques, and show that under certain assumptions the FWER is controlled in the strong sense. Validity of the method was demonstrated using simulations and real data analyses. Our method may be a useful supplement to standard procedures relying on evaluation of test statistics individually. Moreover, by being agnostic and not relying on predefined selected regions, it might be a practical alternative to conventionally used methods of aggregation of p-values over regions. The method is implemented in Python and freely available online (through GitHub, see the Supplementary information).


Asunto(s)
Modelos Teóricos , Programas Informáticos , Algoritmos , Simulación por Computador , Interpretación Estadística de Datos , Internet , Reproducibilidad de los Resultados
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...