Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 17 de 17
Filter
Add more filters










Publication year range
1.
Nat Commun ; 15(1): 597, 2024 Jan 18.
Article in English | MEDLINE | ID: mdl-38238291

ABSTRACT

The revolution brought about by AlphaFold2 opens promising perspectives to unravel the complexity of protein-protein interaction networks. The analysis of interaction networks obtained from proteomics experiments does not systematically provide the delimitations of the interaction regions. This is of particular concern in the case of interactions mediated by intrinsically disordered regions, in which the interaction site is generally small. Using a dataset of protein-peptide complexes involving intrinsically disordered regions that are non-redundant with the structures used in AlphaFold2 training, we show that when using the full sequences of the proteins, AlphaFold2-Multimer only achieves 40% success rate in identifying the correct site and structure of the interface. By delineating the interaction region into fragments of decreasing size and combining different strategies for integrating evolutionary information, we manage to raise this success rate up to 90%. We obtain similar success rates using a much larger dataset of protein complexes taken from the ELM database. Beyond the correct identification of the interaction site, our study also explores specificity issues. We show the advantages and limitations of using the AlphaFold2 confidence score to discriminate between alternative binding partners, a task that can be particularly challenging in the case of small interaction motifs.


Subject(s)
Intrinsically Disordered Proteins , Proteins , Proteins/metabolism , Protein Interaction Maps , Biological Evolution , Intrinsically Disordered Proteins/metabolism , Protein Binding
2.
J Struct Biol ; 215(3): 107997, 2023 09.
Article in English | MEDLINE | ID: mdl-37453591

ABSTRACT

Alternative splicing of repeats in proteins provides a mechanism for rewiring and fine-tuning protein interaction networks. In this work, we developed a robust and versatile method, ASPRING, to identify alternatively spliced protein repeats from gene annotations. ASPRING leverages evolutionary meaningful alternative splicing-aware hierarchical graphs to provide maps between protein repeats sequences and 3D structures. We re-think the definition of repeats by explicitly accounting for transcript diversity across several genes/species. Using a stringent sequence-based similarity criterion, we detected over 5,000 evolutionary conserved repeats by screening virtually all human protein-coding genes and their orthologs across a dozen species. Through a joint analysis of their sequences and structures, we extracted specificity-determining sequence signatures and assessed their implication in experimentally resolved and modelled protein interactions. Our findings demonstrate the widespread alternative usage of protein repeats in modulating protein interactions and open avenues for targeting repeat-mediated interactions.


Subject(s)
Alternative Splicing , Proteins , Humans , Alternative Splicing/genetics , Proteins/genetics
3.
Methods Mol Biol ; 2627: 83-100, 2023.
Article in English | MEDLINE | ID: mdl-36959443

ABSTRACT

Homology modeling is the most common technique to build structural models of a target protein based on the structure of proteins with high-sequence identity and available high-resolution structures. This technique is based on the idea that protein structure shows fewer changes than sequence through evolution. While in this scenario single mutations would minimally perturb the structure, experimental evidence shows otherwise: proteins with high conformational diversity impose a limit of the paradigm of comparative modeling as the same protein sequence can adopt dissimilar three-dimensional structures. These cases present challenges for modeling; at first glance, they may seem to be easy cases, but they have a complexity that is not evident at the sequence level. In this chapter, we address the following questions: Why should we care about conformational diversity? How to consider conformational diversity when doing template-based modeling in a practical way?


Subject(s)
Molecular Dynamics Simulation , Proteins , Proteins/genetics , Proteins/chemistry , Amino Acid Sequence , Structural Homology, Protein , Protein Conformation
4.
Bioinformatics ; 38(10): 2742-2748, 2022 05 13.
Article in English | MEDLINE | ID: mdl-35561203

ABSTRACT

MOTIVATION: After the outstanding breakthrough of AlphaFold in predicting protein 3D models, new questions appeared and remain unanswered. The ensemble nature of proteins, for example, challenges the structural prediction methods because the models should represent a set of conformers instead of single structures. The evolutionary and structural features captured by effective deep learning techniques may unveil the information to generate several diverse conformations from a single sequence. Here, we address the performance of AlphaFold2 predictions obtained through ColabFold under this ensemble paradigm. RESULTS: Using a curated collection of apo-holo pairs of conformers, we found that AlphaFold2 predicts the holo form of a protein in ∼70% of the cases, being unable to reproduce the observed conformational diversity with the same error for both conformers. More importantly, we found that AlphaFold2's performance worsens with the increasing conformational diversity of the studied protein. This impairment is related to the heterogeneity in the degree of conformational diversity found between different members of the homologous family of the protein under study. Finally, we found that main-chain flexibility associated with apo-holo pairs of conformers negatively correlates with the predicted local model quality score plDDT, indicating that plDDT values in a single 3D model could be used to infer local conformational changes linked to ligand binding transitions. AVAILABILITY AND IMPLEMENTATION: Data and code used in this manuscript are publicly available at https://gitlab.com/sbgunq/publications/af2confdiv-oct2021. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Proteins , Protein Binding , Protein Conformation , Proteins/chemistry
5.
Bioinformatics ; 38(9): 2615-2616, 2022 04 28.
Article in English | MEDLINE | ID: mdl-35188186

ABSTRACT

SUMMARY: ASES is a versatile tool for assessing the impact of alternative splicing (AS), initiation and termination of transcription on protein diversity in evolution. It identifies exon and transcript orthogroups from a set of input genes/species for comparative transcriptomics analyses. It computes an evolutionary splicing graph, where the nodes are exon orthogroups, allowing for a direct evaluation of AS conservation. It also reconstructs a transcripts' phylogenetic forest to date the appearance of specific transcripts and explore the events that have shaped them. ASES web server features a highly interactive interface enabling the synchronous selection of events, exons or transcripts in the different outputs, and the visualization and retrieval of the corresponding amino acid sequences, for subsequent 3D structure prediction. AVAILABILITY AND IMPLEMENTATION: http://www.lcqb.upmc.fr/Ases. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Alternative Splicing , Proteins , Phylogeny , Exons , Proteins/chemistry , RNA Splicing
6.
Genome Res ; 31(8): 1462-1473, 2021 08.
Article in English | MEDLINE | ID: mdl-34266979

ABSTRACT

Understanding how protein function has evolved and diversified is of great importance for human genetics and medicine. Here, we tackle the problem of describing the whole transcript variability observed in several species by generalizing the definition of splicing graph. We provide a practical solution to construct parsimonious evolutionary splicing graphs where each node is a minimal transcript building block defined across species. We show a clear link between the functional relevance, tissue regulation, and conservation of alternative transcripts on a set of 50 genes. By scaling up to the whole human protein-coding genome, we identify a few thousand genes where alternative splicing modulates the number and composition of pseudorepeats. We have implemented our approach in ThorAxe, an efficient, versatile, robust, and freely available computational tool.


Subject(s)
Alternative Splicing , RNA Splicing , Genome, Human , Humans
7.
Virus Evol ; 6(1): veaa006, 2020 Jan.
Article in English | MEDLINE | ID: mdl-32158552

ABSTRACT

The study of mutational landscapes of viral proteins is fundamental for the understanding of the mechanisms of cross-resistance to drugs and the design of effective therapeutic strategies based on several drugs. Antiviral therapy with nucleos(t)ide analogues targeting the hepatitis B virus (HBV) polymerase protein (Pol) can inhibit disease progression by suppression of HBV replication and makes it an important case study. In HBV, treatment may fail due to the emergence of drug-resistant mutants. Primary and compensatory mutations have been associated with lamivudine resistance, whereas more complex mutational patterns are responsible for resistance to other HBV antiviral drugs. So far, all known drug-resistance mutations are located in one of the four Pol domains, called reverse transcriptase. We demonstrate that sequence covariation identifies drug-resistance mutations in viral sequences. A new algorithmic strategy, BIS2TreeAnalyzer, is designed to apply the coevolution analysis method BIS2, successfully used in the past on small sets of conserved sequences, to large sets of evolutionary related sequences. When applied to HBV, BIS2TreeAnalyzer highlights diversified viral solutions by discovering thirty-seven positions coevolving with residues known to be associated with drug resistance and located on the four Pol domains. These results suggest a sequential mechanism of emergence for some mutational patterns. They reveal complex combinations of positions involved in HBV drug resistance and contribute with new information to the landscape of HBV evolutionary solutions. The computational approach is general and can be applied to other viral sequences when compensatory mutations are presumed.

8.
J Mol Biol ; 432(7): 2121-2140, 2020 03 27.
Article in English | MEDLINE | ID: mdl-32067951

ABSTRACT

Alternative splicing and alternative initiation/termination transcription sites have the potential to greatly expand the proteome in eukaryotes by producing several transcript isoforms from the same gene. Although these mechanisms are well described at the genomic level, little is known about their contribution to protein evolution and their impact at the protein structure level. Here, we address both issues by reconstructing the evolutionary history of transcripts and by modeling the tertiary structures of the corresponding protein isoforms. We reconstruct phylogenetic forests relating 60 protein-coding transcripts from the c-Jun N-terminal kinase (JNK) family observed in seven species. We identify two alternative splicing events of ancient origin and show that they induce subtle changes in the protein's structural dynamics. We highlight a previously uncharacterized transcript whose predicted structure seems stable in solution. We further demonstrate that orphan transcripts, for which no phylogeny could be reconstructed, display peculiar sequence and structural properties. Our approach is implemented in PhyloSofS (Phylogenies of Splicing Isoforms Structures), a fully automated computational tool freely available at https://github.com/PhyloSofS-Team/PhyloSofS.


Subject(s)
Computational Biology/methods , Evolution, Molecular , MAP Kinase Kinase 4/genetics , MAP Kinase Kinase 4/metabolism , Protein Conformation , Proteome/analysis , Transcriptome , Alternative Splicing , Animals , Humans , MAP Kinase Kinase 4/chemistry , MAP Kinase Kinase 4/classification , Phylogeny , Protein Isoforms , Transcription, Genetic
9.
Hum Mutat ; 40(4): 413-425, 2019 04.
Article in English | MEDLINE | ID: mdl-30629309

ABSTRACT

Malignant tumors originate from somatic mutations and other genomic and epigenomic alterations, which lead to loss of control of the cellular circuitry. These alterations present patterns of co-occurrence and mutual exclusivity that can influence prognosis and modify response to drugs, highlighting the need for multitargeted therapies. Studies in this area have generally focused in particular malignancies and considered whole genes instead of specific mutations, ignoring the fact that different alterations in the same gene can have widely different effects. Here, we present a comprehensive analysis of co-dependencies of individual somatic mutations in the whole spectrum of human tumors. Combining multitesting with conditional and expected mutational probabilities, we have discovered rules governing the codependencies of driver and nondriver mutations. We also uncovered pairs and networks of comutations and exclusions, some of them restricted to certain cancer types and others widespread. These pairs and networks are not only of basic but also of clinical interest, and can be of help in the selection of multitargeted antitumor therapies. In this respect, recurrent driver comutations suggest combinations of drugs that might be effective in the clinical setting, while recurrent exclusions indicate combinations unlikely to be useful.


Subject(s)
Biomarkers, Tumor , Computational Biology , Neoplasms/etiology , Neoplasms/therapy , Chromosome Mapping , Computational Biology/methods , Disease Susceptibility , Gene Expression Regulation, Neoplastic , Gene Regulatory Networks , Humans , Molecular Targeted Therapy , Mutation , Quantitative Trait Loci
10.
Brief Bioinform ; 20(1): 356-359, 2019 01 18.
Article in English | MEDLINE | ID: mdl-28981583

ABSTRACT

Major scientific challenges that are beyond the capability of individuals need to be addressed by multi-disciplinary and multi-institutional consortia. Examples of these endeavours include the Human Genome Project, and more recently, the Structural Genomics (SG) initiative. The SG initiative pursues the expansion of structural coverage to include at least one structural representative for each protein family to derive the remaining structures using homology modelling. However, biological function is inherently connected with protein dynamics that can be studied by knowing different structures of the same protein. This ensemble of structures provides snapshots of protein conformational diversity under native conditions. Thus, sequence redundancy in the Protein Data Bank (PDB) (i.e. crystallization of the same protein under different conditions) is therefore an essential input contributing to experimentally based studies of protein dynamics and providing insights into protein function. In this work, we show that sequence redundancy, a key concept for exploring protein dynamics, is highly biased and fundamentally incomplete in the PDB. Additionally, our results show that dynamical behaviour of proteins cannot be inferred using homologous proteins. Minor to moderate changes in sequence can produce great differences in dynamical behaviour. Nonetheless, the structural and dynamical incompleteness of the PDB is apparently unrelated concepts in SG. While the first could be reversed by promoting the extension of the structural coverage, we would like to emphasize that further focused efforts will be needed to amend the incompleteness of the PDB in terms of dynamical information content, essential to fully understand protein function.


Subject(s)
Databases, Protein/statistics & numerical data , Computational Biology/methods , Computational Biology/statistics & numerical data , Crystallography, X-Ray , Genomics/statistics & numerical data , Humans , Molecular Dynamics Simulation , Protein Conformation , Proteins/chemistry , Proteins/genetics , Proteomics/statistics & numerical data , Sequence Homology, Amino Acid , Structural Homology, Protein
11.
Methods Mol Biol ; 1851: 353-365, 2019.
Article in English | MEDLINE | ID: mdl-30298408

ABSTRACT

The native state of proteins is composed of conformers in dynamical equilibrium. In this chapter, different issues related to conformational diversity are explored using a curated and experimentally based database called CoDNaS (Conformational Diversity in the Native State). This database is a collection of redundant structures for the same sequence. CoDNaS estimates the degree of conformational diversity using different global and local structural similarity measures. It allows the user to explore how structural differences among conformers change as a function of several structural features providing further biological information. This chapter explores the measurement of conformational diversity and its relationship with sequence divergence. Also, it discusses how proteins with high conformational diversity could affect homology modeling techniques.


Subject(s)
Proteins/chemistry , Databases, Protein , Evolution, Molecular , Molecular Dynamics Simulation , Protein Conformation
12.
Mol Phylogenet Evol ; 127: 859-866, 2018 10.
Article in English | MEDLINE | ID: mdl-29953938

ABSTRACT

The analysis of evolutionary information in a protein family, such as conservation and covariation, is often linked to its structural information. Multiple sequence alignments of distant homologous sequences are used to measure evolutionary variables. Although high structural differences between proteins can be expected in such divergent alignments, most works linking evolutionary and structural information use a single structure ignoring the structural variability within protein families. The goal of this work is to elucidate the relevance of structural divergence when sequence-based measures are integrated with structural information. We found that inter-residue contacts and solvent accessibility undergo large variations in protein families. Our results show that high covariation scores tend to reveal residue contacts that are conserved in the family, instead of protein or conformer specific contacts. We also found that residue accessible surface area shows a high variability between structures of the same family. As a consequence, the mean relative solvent accessibility of multiple structures correlates better with the conservation pattern than the relative solvent accessibility of a single structure. We conclude that the use of comprehensive structural information allows a more accurate interpretation of the information computed from sequence alignments. Therefore, considering structural divergence would lead to a better understanding of protein function, dynamics, and evolution.


Subject(s)
Evolution, Molecular , Proteins/chemistry , Proteins/genetics , Amino Acid Sequence , Amino Acids/genetics , Area Under Curve , Conserved Sequence/genetics , Phylogeny , Protein Domains , Protein Kinases/chemistry , Sequence Alignment , Solvents , Statistics, Nonparametric
13.
Protein Sci ; 26(12): 2438-2444, 2017 Dec.
Article in English | MEDLINE | ID: mdl-28980349

ABSTRACT

Protein-protein interactions are essential to all aspects of life. Specific interactions result from evolutionary pressure at the interacting interfaces of partner proteins. However, evolutionary pressure is not homogeneous within the interface: for instance, each residue does not contribute equally to the binding energy of the complex. To understand functional differences between residues within the interface, we analyzed their properties in the core and rim regions. Here, we characterized protein interfaces with two evolutionary measures, conservation and coevolution, using a comprehensive dataset of 896 protein complexes. These scores can detect different selection pressures at a given position in a multiple sequence alignment. We also analyzed how the number of interactions in which a residue is involved influences those evolutionary signals. We found that the coevolutionary signal is higher in the interface core than in the interface rim region. Additionally, the difference in coevolution between core and rim regions is comparable to the known difference in conservation between those regions. Considering proteins with multiple interactions, we found that conservation and coevolution increase with the number of different interfaces in which a residue is involved, suggesting that more constraints (i.e., a residue that must satisfy a greater number of interactions) allow fewer sequence changes at those positions, resulting in higher conservation and coevolution values. These findings shed light on the evolution of protein interfaces and provide information useful for identifying protein interfaces and predicting protein-protein interactions.


Subject(s)
Binding Sites , Evolution, Molecular , Protein Conformation , Proteins/chemistry , Proteins/metabolism , Amino Acid Sequence , Computational Biology , Conserved Sequence , Databases, Protein , Models, Molecular , Protein Binding , Sequence Alignment
14.
Protein Sci ; 26(11): 2195-2206, 2017 Nov.
Article in English | MEDLINE | ID: mdl-28815769

ABSTRACT

A key concept in template-based modeling (TBM) is the high correlation between sequence and structural divergence, with the practical consequence that homologous proteins that are similar at the sequence level will also be similar at the structural level. However, conformational diversity of the native state will reduce the correlation between structural and sequence divergence, because structural variation can appear without sequence diversity. In this work, we explore the impact that conformational diversity has on the relationship between structural and sequence divergence. We find that the extent of conformational diversity can be as high as the maximum structural divergence among families. Also, as expected, conformational diversity impairs the well-established correlation between sequence and structural divergence, which is nosier than previously suggested. However, we found that this noise can be resolved using a priori information coming from the structure-function relationship. We show that protein families with low conformational diversity show a well-correlated relationship between sequence and structural divergence, which is severely reduced in proteins with larger conformational diversity. This lack of correlation could impair TBM results in highly dynamical proteins. Finally, we also find that the presence of order/disorder can provide useful beforehand information for better TBM performance.


Subject(s)
Models, Molecular , Proteins/chemistry , Sequence Homology, Amino Acid , Structural Homology, Protein , Amino Acid Sequence , Databases, Protein , Sequence Alignment
15.
PLoS Comput Biol ; 13(2): e1005398, 2017 02.
Article in English | MEDLINE | ID: mdl-28192432

ABSTRACT

Protein motions are a key feature to understand biological function. Recently, a large-scale analysis of protein conformational diversity showed a positively skewed distribution with a peak at 0.5 Å C-alpha root-mean-square-deviation (RMSD). To understand this distribution in terms of structure-function relationships, we studied a well curated and large dataset of ~5,000 proteins with experimentally determined conformational diversity. We searched for global behaviour patterns studying how structure-based features change among the available conformer population for each protein. This procedure allowed us to describe the RMSD distribution in terms of three main protein classes sharing given properties. The largest of these protein subsets (~60%), which we call "rigid" (average RMSD = 0.83 Å), has no disordered regions, shows low conformational diversity, the largest tunnels and smaller and buried cavities. The two additional subsets contain disordered regions, but with differential sequence composition and behaviour. Partially disordered proteins have on average 67% of their conformers with disordered regions, average RMSD = 1.1 Å, the highest number of hinges and the longest disordered regions. In contrast, malleable proteins have on average only 25% of disordered conformers and average RMSD = 1.3 Å, flexible cavities affected in size by the presence of disordered regions and show the highest diversity of cognate ligands. Proteins in each set are mostly non-homologous to each other, share no given fold class, nor functional similarity but do share features derived from their conformer population. These shared features could represent conformational mechanisms related with biological functions.


Subject(s)
Models, Chemical , Models, Statistical , Molecular Dynamics Simulation , Protein Conformation , Proteins/chemistry , Proteins/ultrastructure , Structure-Activity Relationship
16.
Protein Sci ; 25(6): 1138-46, 2016 06.
Article in English | MEDLINE | ID: mdl-27038125

ABSTRACT

Structural differences between conformers sustain protein biological function. Here, we studied in a large dataset of 745 intrinsically disordered proteins, how ordered-disordered transitions modulate structural differences between conformers as derived from crystallographic data. We found that almost 50% of the proteins studied show no transitions and have low conformational diversity while the rest show transitions and a higher conformational diversity. In this last subset, 60% of the proteins become more ordered after ligand binding, while 40% more disordered. As protein conformational diversity is inherently connected with protein function our analysis suggests differences in structure-function relationships related to order-disorder transitions.


Subject(s)
Databases, Protein , Intrinsically Disordered Proteins/chemistry , Intrinsically Disordered Proteins/genetics , Protein Conformation
17.
Curr Opin Struct Biol ; 32: 58-65, 2015 Jun.
Article in English | MEDLINE | ID: mdl-25749052

ABSTRACT

Proteins' native structure is an ensemble of conformers in equilibrium, including all their respective functional states and intermediates. The induced-fit first and the pre-equilibrium theories later, described how structural changes are required to explain the allosteric and cooperative behaviours in proteins, which are key to protein function. The conformational ensemble concept has become a key tool in explaining an endless list of essential protein properties such as function, enzyme and antibody promiscuity, signal transduction, protein-protein recognition, origin of diseases, origin of new protein functions, evolutionary rate and order-disorder transitions, among others. Conformational diversity is encoded by the amino acid sequence and such a signature can be evidenced through evolutionary studies as evolutionary rate, conservation and coevolution.


Subject(s)
Proteins/chemistry , Amino Acid Sequence , Animals , Evolution, Molecular , Humans , Models, Molecular , Protein Conformation , Proteins/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...