Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 9 de 9
Filter
Add more filters










Database
Language
Publication year range
1.
Mol Biol Evol ; 41(6)2024 Jun 01.
Article in English | MEDLINE | ID: mdl-38693911

ABSTRACT

Modeling the rate at which adaptive phenotypes appear in a population is a key to predicting evolutionary processes. Given random mutations, should this rate be modeled by a simple Poisson process, or is a more complex dynamics needed? Here we use analytic calculations and simulations of evolving populations on explicit genotype-phenotype maps to show that the introduction of novel phenotypes can be "bursty" or overdispersed. In other words, a novel phenotype either appears multiple times in quick succession or not at all for many generations. These bursts are fundamentally caused by statistical fluctuations and other structure in the map from genotypes to phenotypes. Their strength depends on population parameters, being highest for "monomorphic" populations with low mutation rates. They can also be enhanced by additional inhomogeneities in the mapping from genotypes to phenotypes. We mainly investigate the effect of bursts using the well-studied genotype-phenotype map for RNA secondary structure, but find similar behavior in a lattice protein model and in Richard Dawkins's biomorphs model of morphological development. Bursts can profoundly affect adaptive dynamics. Most notably, they imply that fitness differences play a smaller role in determining which phenotype fixes than would be the case for a Poisson process without bursts.


Subject(s)
Models, Genetic , Phenotype , Genotype , Computer Simulation , Adaptation, Physiological/genetics , Evolution, Molecular , Mutation , Biological Evolution , Poisson Distribution , RNA/genetics , Adaptation, Biological/genetics
2.
Sci Rep ; 13(1): 4053, 2023 03 11.
Article in English | MEDLINE | ID: mdl-36906642

ABSTRACT

Electronic health records (EHRs) are used in hospitals to store diagnoses, clinician notes, examinations, lab results, and interventions for each patient. Grouping patients into distinct subsets, for example, via clustering, may enable the discovery of unknown disease patterns or comorbidities, which could eventually lead to better treatment through personalized medicine. Patient data derived from EHRs is heterogeneous and temporally irregular. Therefore, traditional machine learning methods like PCA are ill-suited for analysis of EHR-derived patient data. We propose to address these issues with a new methodology based on training a gated recurrent unit (GRU) autoencoder directly on health record data. Our method learns a low-dimensional feature space by training on patient data time series, where the time of each data point is expressed explicitly. We use positional encodings for time, allowing our model to better handle the temporal irregularity of the data. We apply our method to data from the Medical Information Mart for Intensive Care (MIMIC-III). Using our data-derived feature space, we can cluster patients into groups representing major classes of disease patterns. Additionally, we show that our feature space exhibits a rich substructure at multiple scales.


Subject(s)
Electronic Health Records , Machine Learning , Humans , Time Factors , Comorbidity , Intensive Care Units
3.
Liver Int ; 42(3): 640-650, 2022 03.
Article in English | MEDLINE | ID: mdl-35007409

ABSTRACT

BACKGROUND & AIMS: Decompensation is a hallmark of disease progression in cirrhotic patients. Early detection of a phase transition from compensated cirrhosis to decompensation would enable targeted therapeutic interventions potentially extending life expectancy. This study aims to (a) identify the predictors of decompensation in a large, multicentric cohort of patients with compensated cirrhosis, (b) to build a reliable prognostic score for decompensation and (c) to evaluate the score in independent cohorts. METHODS: Decompensation was identified in electronic health records data from 6049 cirrhosis patients in the IBM Explorys database training cohort by diagnostic codes for variceal bleeding, encephalopathy, ascites, hepato-renal syndrome and/or jaundice. We identified predictors of clinical decompensation and developed a prognostic score using Cox regression analysis. The score was evaluated using the IBM Explorys database validation cohort (N = 17662), the Penn Medicine BioBank (N = 1326) and the UK Biobank (N = 317). RESULTS: The new Early Prediction of Decompensation (EPOD) score uses platelet count, albumin, and bilirubin concentration. It predicts decompensation during a 3-year follow-up in three validation cohorts with AUROCs of 0.69, 0.69 and 0.77, respectively, and outperforms the well-known MELD and Child-Pugh score in predicting decompensation. Furthermore, the EPOD score predicted the 3-year probability of decompensation. CONCLUSIONS: The EPOD score provides a prediction tool for the risk of decompensation in patients with cirrhosis that outperforms well-known cirrhosis scores. Since EPOD is based on three blood parameters, only, it provides maximal clinical feasibility at minimal costs.


Subject(s)
Esophageal and Gastric Varices , Ascites/etiology , Esophageal and Gastric Varices/diagnosis , Esophageal and Gastric Varices/etiology , Gastrointestinal Hemorrhage , Humans , Liver Cirrhosis/complications , Liver Cirrhosis/diagnosis , Liver Cirrhosis/drug therapy , Prognosis , Retrospective Studies , Severity of Illness Index
4.
J Chem Inf Model ; 59(11): 4893-4905, 2019 11 25.
Article in English | MEDLINE | ID: mdl-31714067

ABSTRACT

Oral administration of drug products is a strict requirement in many medical indications. Therefore, bioavailability prediction models are of high importance for prioritization of compound candidates in the drug discovery process. However, oral exposure and bioavailability are difficult to predict, as they are the result of various highly complex factors and/or processes influenced by the physicochemical properties of a compound, such as solubility, lipophilicity, or charge state, as well as by interactions with the organism, for instance, metabolism or membrane permeation. In this study, we assess whether it is possible to predict intravenous (iv) or oral drug exposure and oral bioavailability in rats. As input parameters, we use (i) six experimentally determined in vitro and physicochemical endpoints, namely, membrane permeation, free fraction, metabolic stability, solubility, pKa value, and lipophilicity; (ii) the outputs of six in silico absorption, distribution, metabolism, and excretion models trained on the same endpoints, or (iii) the chemical structure encoded as fingerprints or simplified molecular input line entry system strings. The underlying data set for the models is an unprecedented collection of almost 1900 data points with high-quality in vivo experiments performed in rats. We find that drug exposure after iv administration can be predicted similarly well using hybrid models with in vitro- or in silico-predicted endpoints as inputs, with fold change errors (FCE) of 2.28 and 2.08, respectively. The FCEs for exposure after oral administration are higher, and here, the prediction from in vitro inputs performs significantly better in comparison to in silico-based models with FCEs of 3.49 and 2.40, respectively, most probably reflecting the higher complexity of oral bioavailability. Simplifying the prediction task to a binary alert for low oral bioavailability, based only on chemical structure, we achieve accuracy and precision close to 70%.


Subject(s)
Drug Discovery/methods , Hepatocytes/metabolism , Pharmaceutical Preparations/metabolism , Administration, Oral , Animals , Biological Availability , Caco-2 Cells , Computer Simulation , Humans , Machine Learning , Male , Models, Biological , Permeability , Pharmaceutical Preparations/chemistry , Rats , Rats, Wistar , Serum Albumin/metabolism , Solubility
5.
Bioinformatics ; 34(13): i494-i501, 2018 07 01.
Article in English | MEDLINE | ID: mdl-29949983

ABSTRACT

Motivation: Mathematical models have become standard tools for the investigation of cellular processes and the unraveling of signal processing mechanisms. The parameters of these models are usually derived from the available data using optimization and sampling methods. However, the efficiency of these methods is limited by the properties of the mathematical model, e.g. non-identifiabilities, and the resulting posterior distribution. In particular, multi-modal distributions with long valleys or pronounced tails are difficult to optimize and sample. Thus, the developement or improvement of optimization and sampling methods is subject to ongoing research. Results: We suggest a region-based adaptive parallel tempering algorithm which adapts to the problem-specific posterior distributions, i.e. modes and valleys. The algorithm combines several established algorithms to overcome their individual shortcomings and to improve sampling efficiency. We assessed its properties for established benchmark problems and two ordinary differential equation models of biochemical reaction networks. The proposed algorithm outperformed state-of-the-art methods in terms of calculation efficiency and mixing. Since the algorithm does not rely on a specific problem structure, but adapts to the posterior distribution, it is suitable for a variety of model classes. Availability and implementation: The code is available both as Supplementary Material and in a Git repository written in MATLAB. Supplementary information: Supplementary data are available at Bioinformatics online.


Subject(s)
Computational Biology/methods , Metabolic Networks and Pathways , Models, Theoretical , Software , Algorithms , Bayes Theorem , Models, Biological
6.
PLoS Comput Biol ; 12(3): e1004773, 2016 Mar.
Article in English | MEDLINE | ID: mdl-26937652

ABSTRACT

Mutational neighbourhoods in genotype-phenotype (GP) maps are widely believed to be more likely to share characteristics than expected from random chance. Such genetic correlations should strongly influence evolutionary dynamics. We explore and quantify these intuitions by comparing three GP maps-a model for RNA secondary structure, the HP model for protein tertiary structure, and the Polyomino model for protein quaternary structure-to a simple random null model that maintains the number of genotypes mapping to each phenotype, but assigns genotypes randomly. The mutational neighbourhood of a genotype in these GP maps is much more likely to contain genotypes mapping to the same phenotype than in the random null model. Such neutral correlations can be quantified by the robustness to mutations, which can be many orders of magnitude larger than that of the null model, and crucially, above the critical threshold for the formation of large neutral networks of mutationally connected genotypes which enhance the capacity for the exploration of phenotypic novelty. Thus neutral correlations increase evolvability. We also study non-neutral correlations: Compared to the null model, i) If a particular (non-neutral) phenotype is found once in the 1-mutation neighbourhood of a genotype, then the chance of finding that phenotype multiple times in this neighbourhood is larger than expected; ii) If two genotypes are connected by a single neutral mutation, then their respective non-neutral 1-mutation neighbourhoods are more likely to be similar; iii) If a genotype maps to a folding or self-assembling phenotype, then its non-neutral neighbours are less likely to be a potentially deleterious non-folding or non-assembling phenotype. Non-neutral correlations of type i) and ii) reduce the rate at which new phenotypes can be found by neutral exploration, and so may diminish evolvability, while non-neutral correlations of type iii) may instead facilitate evolutionary exploration and so increase evolvability.


Subject(s)
Evolution, Molecular , Genetics, Population , Models, Genetic , Models, Statistical , Mutation/genetics , Proteome/genetics , Animals , Computer Simulation , Genotype , Humans
7.
Interface Focus ; 5(6): 20150053, 2015 Dec 06.
Article in English | MEDLINE | ID: mdl-26640651

ABSTRACT

The prevalence of neutral mutations implies that biological systems typically have many more genotypes than phenotypes. But, can the way that genotypes are distributed over phenotypes determine evolutionary outcomes? Answering such questions is difficult, in part because the number of genotypes can be hyper-astronomically large. By solving the genotype-phenotype (GP) map for RNA secondary structure (SS) for systems up to length L = 126 nucleotides (where the set of all possible RNA strands would weigh more than the mass of the visible universe), we show that the GP map strongly constrains the evolution of non-coding RNA (ncRNA). Simple random sampling over genotypes predicts the distribution of properties such as the mutational robustness or the number of stems per SS found in naturally occurring ncRNA with surprising accuracy. Because we ignore natural selection, this strikingly close correspondence with the mapping suggests that structures allowing for functionality are easily discovered, despite the enormous size of the genetic spaces. The mapping is extremely biased: the majority of genotypes map to an exponentially small portion of the morphospace of all biophysically possible structures. Such strong constraints provide a non-adaptive explanation for the convergent evolution of structures such as the hammerhead ribozyme. These results present a particularly clear example of bias in the arrival of variation strongly shaping evolutionary outcomes and may be relevant to Mayr's distinction between proximate and ultimate causes in evolutionary biology.

8.
PLoS One ; 9(2): e86635, 2014.
Article in English | MEDLINE | ID: mdl-24505262

ABSTRACT

Genotype-phenotype (GP) maps specify how the random mutations that change genotypes generate variation by altering phenotypes, which, in turn, can trigger selection. Many GP maps share the following general properties: 1) The total number of genotypes N(G) is much larger than the number of selectable phenotypes; 2) Neutral exploration changes the variation that is accessible to the population; 3) The distribution of phenotype frequencies F(p)=N(p)/N(G), with N(p) the number of genotypes mapping onto phenotype p, is highly biased: the majority of genotypes map to only a small minority of the phenotypes. Here we explore how these properties affect the evolutionary dynamics of haploid Wright-Fisher models that are coupled to a random GP map or to a more complex RNA sequence to secondary structure map. For both maps the probability of a mutation leading to a phenotype p scales to first order as F(p), although for the RNA map there are further correlations as well. By using mean-field theory, supported by computer simulations, we show that the discovery time T(p) of a phenotype p similarly scales to first order as 1/F(p) for a wide range of population sizes and mutation rates in both the monomorphic and polymorphic regimes. These differences in the rate at which variation arises can vary over many orders of magnitude. Phenotypic variation with a larger F(p) is therefore be much more likely to arise than variation with a small F(p). We show, using the RNA model, that frequent phenotypes (with larger F(p)) can fix in a population even when alternative, but less frequent, phenotypes with much higher fitness are potentially accessible. In other words, if the fittest never 'arrive' on the timescales of evolutionary change, then they can't fix. We call this highly non-ergodic effect the 'arrival of the frequent'.


Subject(s)
Gene-Environment Interaction , Genotype , Models, Genetic , Phenotype
9.
Proc Biol Sci ; 279(1734): 1777-83, 2012 May 07.
Article in English | MEDLINE | ID: mdl-22158953

ABSTRACT

In evolution, the effects of a single deleterious mutation can sometimes be compensated for by a second mutation which recovers the original phenotype. Such epistatic interactions have implications for the structure of genome space--namely, that networks of genomes encoding the same phenotype may not be connected by single mutational moves. We use the folding of RNA sequences into secondary structures as a model genotype-phenotype map and explore the neutral spaces corresponding to networks of genotypes with the same phenotype. In most of these networks, we find that it is not possible to connect all genotypes to one another by single point mutations. Instead, a network for a phenotypic structure with n bonds typically fragments into at least 2(n) neutral components, often of similar size. While components of the same network generate the same phenotype, they show important variations in their properties, most strikingly in their evolvability and mutational robustness. This heterogeneity implies contingency in the evolutionary process.


Subject(s)
Epistasis, Genetic , Evolution, Molecular , Genome , RNA/genetics , Computational Biology , Genotype , Mutation , Phenotype , RNA/chemistry
SELECTION OF CITATIONS
SEARCH DETAIL
...