Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
PLoS Comput Biol ; 18(8): e1010394, 2022 08.
Artigo em Inglês | MEDLINE | ID: mdl-35984845

RESUMO

When two influenza viruses co-infect the same cell, they can exchange genome segments in a process known as reassortment. Reassortment is an important source of genetic diversity and is known to have been involved in the emergence of most pandemic influenza strains. However, because of the difficulty in identifying reassortment events from viral sequence data, little is known about their role in the evolution of the seasonal influenza viruses. Here we introduce TreeKnit, a method that infers ancestral reassortment graphs (ARG) from two segment trees. It is based on topological differences between trees, and proceeds in a greedy fashion by finding regions that are compatible in the two trees. Using simulated genealogies with reassortments, we show that TreeKnit performs well in a wide range of settings and that it is as accurate as a more principled bayesian method, while being orders of magnitude faster. Finally, we show that it is possible to use the inferred ARG to better resolve segment trees and to construct more informative visualizations of reassortments.


Assuntos
Influenza Humana , Orthomyxoviridae , Teorema de Bayes , Genoma Viral/genética , Humanos , Orthomyxoviridae/genética , Filogenia , Vírus Reordenados/genética
2.
Phys Rev E ; 104(2-1): 024407, 2021 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-34525554

RESUMO

Boltzmann machines (BMs) are widely used as generative models. For example, pairwise Potts models (PMs), which are instances of the BM class, provide accurate statistical models of families of evolutionarily related protein sequences. Their parameters are the local fields, which describe site-specific patterns of amino acid conservation, and the two-site couplings, which mirror the coevolution between pairs of sites. This coevolution reflects structural and functional constraints acting on protein sequences during evolution. The most conservative choice to describe the coevolution signal is to include all possible two-site couplings into the PM. This choice, typical of what is known as Direct Coupling Analysis, has been successful for predicting residue contacts in the three-dimensional structure, mutational effects, and generating new functional sequences. However, the resulting PM suffers from important overfitting effects: many couplings are small, noisy, and hardly interpretable; the PM is close to a critical point, meaning that it is highly sensitive to small parameter perturbations. In this work, we introduce a general parameter-reduction procedure for BMs, via a controlled iterative decimation of the less statistically significant couplings, identified by an information-based criterion that selects either weak or statistically unsupported couplings. For several protein families, our procedure allows one to remove more than 90% of the PM couplings, while preserving the predictive and generative properties of the original dense PM, and the resulting model is far away from criticality, hence more robust to noise.

3.
Mol Biol Evol ; 38(7): 2767-2777, 2021 06 25.
Artigo em Inglês | MEDLINE | ID: mdl-33749787

RESUMO

Seasonal influenza viruses repeatedly infect humans in part because they rapidly change their antigenic properties and evade host immune responses, necessitating frequent updates of the vaccine composition. Accurate predictions of strains circulating in the future could therefore improve the vaccine match. Here, we studied the predictability of frequency dynamics and fixation of amino acid substitutions. Current frequency was the strongest predictor of eventual fixation, as expected in neutral evolution. Other properties, such as occurrence in previously characterized epitopes or high Local Branching Index (LBI) had little predictive power. Parallel evolution was found to be moderately predictive of fixation. Although the LBI had little power to predict frequency dynamics, it was still successful at picking strains representative of future populations. The latter is due to a tendency of the LBI to be high for consensus-like sequences that are closer to the future than the average sequence. Simulations of models of adapting populations, in contrast, show clear signals of predictability. This indicates that the evolution of influenza HA and NA, while driven by strong selection pressure to change, is poorly described by common models of directional selection such as traveling fitness waves.


Assuntos
Evolução Molecular , Glicoproteínas de Hemaglutininação de Vírus da Influenza/genética , Vírus da Influenza A Subtipo H1N1/genética , Vírus da Influenza A Subtipo H3N2/genética , Neuraminidase/genética , Adaptação Biológica/genética , Substituição de Aminoácidos , Vírus da Influenza A Subtipo H1N1/enzimologia , Vírus da Influenza A Subtipo H3N2/enzimologia , Modelos Genéticos
4.
Elife ; 92020 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-32876050

RESUMO

Seasonal influenza virus A/H3N2 is a major cause of death globally. Vaccination remains the most effective preventative. Rapid mutation of hemagglutinin allows viruses to escape adaptive immunity. This antigenic drift necessitates regular vaccine updates. Effective vaccine strains need to represent H3N2 populations circulating one year after strain selection. Experts select strains based on experimental measurements of antigenic drift and predictions made by models from hemagglutinin sequences. We developed a novel influenza forecasting framework that integrates phenotypic measures of antigenic drift and functional constraint with previously published sequence-only fitness estimates. Forecasts informed by phenotypic measures of antigenic drift consistently outperformed previous sequence-only estimates, while sequence-only estimates of functional constraint surpassed more comprehensive experimentally-informed estimates. Importantly, the best models integrated estimates of both functional constraint and either antigenic drift phenotypes or recent population growth.


Vaccination is the best protection against seasonal flu. It teaches the immune system what the flu virus looks like, preparing it to fight off an infection. But the flu virus changes its molecular appearance every year, escaping the immune defences learnt the year before. So, every year, the vaccine needs updating. Since it takes almost a year to design and make a new flu vaccine, researchers need to be able to predict what flu viruses will look like in the future. Currently, this prediction relies on experiments that assess the molecular appearance of flu viruses, a complex and slow approach. One alternative is to examine the virus's genetic code. Mathematical models try to predict which genetic changes might alter the appearance of a flu virus, saving the cost of performing specialised experiments. Recent research has shown that these models can make good predictions, but including experimental measures of the virus' appearance could improve them even further. This could help the model to work out which genetic changes are likely to be beneficial to the virus, and which are not. To find out whether experimental data improves model predictions, Huddleston et al. designed a new forecasting tool which used 25 years of historical data from past flu seasons. Each forecast predicted what the virus population might look like the next year using the previous year's genetic code, experimental data, or both. Huddleston et al. then compared the predictions with the historical data to find the most useful data types. This showed that the best predictions combined changes from the virus's genetic code with experimental measures of its appearance. This new forecasting tool is open source, allowing teams across the world to start using it to improve their predictions straight away. Seasonal flu infects between 5 and 15% of the world's population every year, causing between quarter of a million and half a million deaths. Better predictions could lead to better flu vaccines and fewer illnesses and deaths.


Assuntos
Genótipo , Vírus da Influenza A Subtipo H3N2/genética , Influenza Humana/virologia , Fenótipo , Previsões , Humanos , Estações do Ano
5.
Science ; 369(6502): 440-445, 2020 07 24.
Artigo em Inglês | MEDLINE | ID: mdl-32703877

RESUMO

The rational design of enzymes is an important goal for both fundamental and practical reasons. Here, we describe a process to learn the constraints for specifying proteins purely from evolutionary sequence data, design and build libraries of synthetic genes, and test them for activity in vivo using a quantitative complementation assay. For chorismate mutase, a key enzyme in the biosynthesis of aromatic amino acids, we demonstrate the design of natural-like catalytic function with substantial sequence diversity. Further optimization focuses the generative model toward function in a specific genomic context. The data show that sequence-based statistical models suffice to specify proteins and provide access to an enormous space of functional sequences. This result provides a foundation for a general process for evolution-based design of artificial proteins.


Assuntos
Corismato Mutase , Evolução Molecular , Modelos Genéticos , Modelos Estatísticos , Sequência de Aminoácidos , Corismato Mutase/química , Corismato Mutase/genética , Proteínas de Escherichia coli/química , Proteínas de Escherichia coli/genética
6.
Mol Biol Evol ; 35(4): 1018-1027, 2018 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-29351669

RESUMO

Global coevolutionary models of homologous protein families, as constructed by direct coupling analysis (DCA), have recently gained popularity in particular due to their capacity to accurately predict residue-residue contacts from sequence information alone, and thereby to facilitate tertiary and quaternary protein structure prediction. More recently, they have also been used to predict fitness effects of amino-acid substitutions in proteins, and to predict evolutionary conserved protein-protein interactions. These models are based on two currently unjustified hypotheses: 1) correlations in the amino-acid usage of different positions are resulting collectively from networks of direct couplings; and 2) pairwise couplings are sufficient to capture the amino-acid variability. Here, we propose a highly precise inference scheme based on Boltzmann-machine learning, which allows us to systematically address these hypotheses. We show how correlations are built up in a highly collective way by a large number of coupling paths, which are based on the proteins three-dimensional structure. We further find that pairwise coevolutionary models capture the collective residue variability across homologous proteins even for quantities which are not imposed by the inference procedure, like three-residue correlations, the clustered structure of protein families in sequence space or the sequence distances between homologs. These findings strongly suggest that pairwise coevolutionary models are actually sufficient to accurately capture the residue variability in homologous protein families.


Assuntos
Coevolução Biológica , Modelos Genéticos , Proteínas/genética , Família Multigênica , Homologia de Sequência de Aminoácidos
7.
Biol Aujourdhui ; 211(3): 239-244, 2017.
Artigo em Francês | MEDLINE | ID: mdl-29412135

RESUMO

Thanks to next-generation sequencing, the number of sequenced genomes grows rapidly, providing in particular ample examples for the sequence variability between homologous proteins. This article discusses data-driven probabilistic sequence models, which are able to extract a multitude of information from sequence data alone, including (i) structural features like residue-residue contacts, which are formed in the folded protein, (ii) protein-protein interaction interfaces and (iii) phenotypic effects of amino-acid substitutions in proteins.


Assuntos
Biologia Computacional/métodos , Variação Genética , Proteínas/química , Proteínas/fisiologia , Análise de Sequência de Proteína/métodos , Homologia de Sequência de Aminoácidos , Animais , Sítios de Ligação/genética , Evolução Molecular , Humanos , Família Multigênica , Ligação Proteica/genética , Domínios e Motivos de Interação entre Proteínas/genética , Relação Estrutura-Atividade
8.
Sci Rep ; 6: 37812, 2016 11 25.
Artigo em Inglês | MEDLINE | ID: mdl-27886273

RESUMO

The inverse Ising problem and its generalizations to Potts and continuous spin models have recently attracted much attention thanks to their successful applications in the statistical modeling of biological data. In the standard setting, the parameters of an Ising model (couplings and fields) are inferred using a sample of equilibrium configurations drawn from the Boltzmann distribution. However, in the context of biological applications, quantitative information for a limited number of microscopic spins configurations has recently become available. In this paper, we extend the usual setting of the inverse Ising model by developing an integrative approach combining the equilibrium sample with (possibly noisy) measurements of the energy performed for a number of arbitrary configurations. Using simulated data, we show that our integrative approach outperforms standard inference based only on the equilibrium sample or the energy measurements, including error correction of noisy energy measurements. As a biological proof-of-concept application, we show that mutational fitness landscapes in proteins can be better described when combining evolutionary sequence data with complementary structural information about mutant sequences.


Assuntos
Modelos Teóricos , Algoritmos , Simulação por Computador
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...