Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 17 de 17
Filter
Add more filters










Publication year range
1.
Nat Commun ; 14(1): 919, 2023 02 17.
Article in English | MEDLINE | ID: mdl-36808136

ABSTRACT

Cohort-wide sequencing studies have revealed that the largest category of variants is those deemed 'rare', even for the subset located in coding regions (99% of known coding variants are seen in less than 1% of the population. Associative methods give some understanding how rare genetic variants influence disease and organism-level phenotypes. But here we show that additional discoveries can be made through a knowledge-based approach using protein domains and ontologies (function and phenotype) that considers all coding variants regardless of allele frequency. We describe an ab initio, genetics-first method making molecular knowledge-based interpretations for exome-wide non-synonymous variants for phenotypes at the organism and cellular level. By using this reverse approach, we identify plausible genetic causes for developmental disorders that have eluded other established methods and present molecular hypotheses for the causal genetics of 40 phenotypes generated from a direct-to-consumer genotype cohort. This system offers a chance to extract further discovery from genetic data after standard tools have been applied.


Subject(s)
Exome , Genetic Predisposition to Disease , Humans , Phenotype , Genotype , Gene Frequency
2.
Nucleic Acids Res ; 47(D1): D490-D494, 2019 01 08.
Article in English | MEDLINE | ID: mdl-30445555

ABSTRACT

Here, we present a major update to the SUPERFAMILY database and the webserver. We describe the addition of new SUPERFAMILY 2.0 profile HMM library containing a total of 27 623 HMMs. The database now includes Superfamily domain annotations for millions of protein sequences taken from the Universal Protein Recourse Knowledgebase (UniProtKB) and the National Center for Biotechnology Information (NCBI). This addition constitutes about 51 and 45 million distinct protein sequences obtained from UniProtKB and NCBI respectively. Currently, the database contains annotations for 63 244 and 102 151 complete genomes taken from UniProtKB and NCBI respectively. The current sequence collection and genome update is the biggest so far in the history of SUPERFAMILY updates. In order to the deal with the massive wealth of information, here we introduce a new SUPERFAMILY 2.0 webserver (http://supfam.org). Currently, the webserver mainly focuses on the search, retrieval and display of Superfamily annotation for the entire sequence and genome collection in the database.


Subject(s)
Databases, Protein , Protein Domains , Proteome/chemistry , Genome , Internet , Markov Chains , Protein Domains/genetics , Sequence Analysis, Protein
3.
Plant Physiol ; 173(2): 1371-1390, 2017 02.
Article in English | MEDLINE | ID: mdl-27909045

ABSTRACT

Of the three classes of enzymes involved in ubiquitination, ubiquitin-conjugating enzymes (E2) have been often incorrectly considered to play merely an auxiliary role in the process, and few E2 enzymes have been investigated in plants. To reveal the role of E2 in plant innate immunity, we identified and cloned 40 tomato genes encoding ubiquitin E2 proteins. Thioester assays indicated that the majority of the genes encode enzymatically active E2. Phylogenetic analysis classified the 40 tomato E2 enzymes into 13 groups, of which members of group III were found to interact and act specifically with AvrPtoB, a Pseudomonas syringae pv tomato effector that uses its ubiquitin ligase (E3) activity to suppress host immunity. Knocking down the expression of group III E2 genes in Nicotiana benthamiana diminished the AvrPtoB-promoted degradation of the Fen kinase and the AvrPtoB suppression of host immunity-associated programmed cell death. Importantly, silencing group III E2 genes also resulted in reduced pattern-triggered immunity (PTI). By contrast, programmed cell death induced by several effector-triggered immunity elicitors was not affected on group III-silenced plants. Functional characterization suggested redundancy among group III members for their role in the suppression of plant immunity by AvrPtoB and in PTI and identified UBIQUITIN-CONJUGATING11 (UBC11), UBC28, UBC29, UBC39, and UBC40 as playing a more significant role in PTI than other group III members. Our work builds a foundation for the further characterization of E2s in plant immunity and reveals that AvrPtoB has evolved a strategy for suppressing host immunity that is difficult for the plant to thwart.


Subject(s)
Plant Immunity/physiology , Plant Proteins/immunology , Solanum lycopersicum/genetics , Ubiquitin-Conjugating Enzymes/immunology , Bacterial Proteins/genetics , Bacterial Proteins/metabolism , Cell Death , Gene Silencing , Genome, Plant , Host-Pathogen Interactions/immunology , Solanum lycopersicum/cytology , Solanum lycopersicum/immunology , Solanum lycopersicum/microbiology , Phylogeny , Plant Proteins/genetics , Plant Proteins/metabolism , Plants, Genetically Modified , Protein Serine-Threonine Kinases/genetics , Protein Serine-Threonine Kinases/metabolism , Pseudomonas syringae/pathogenicity , Nicotiana/genetics , Nicotiana/metabolism , Ubiquitin-Conjugating Enzymes/genetics , Ubiquitin-Conjugating Enzymes/metabolism , Ubiquitination
4.
Mol Cell ; 63(4): 579-592, 2016 08 18.
Article in English | MEDLINE | ID: mdl-27540857

ABSTRACT

Gene fusions are common cancer-causing mutations, but the molecular principles by which fusion protein products affect interaction networks and cause disease are not well understood. Here, we perform an integrative analysis of the structural, interactomic, and regulatory properties of thousands of putative fusion proteins. We demonstrate that genes that form fusions (i.e., parent genes) tend to be highly connected hub genes, whose protein products are enriched in structured and disordered interaction-mediating features. Fusion often results in the loss of these parental features and the depletion of regulatory sites such as post-translational modifications. Fusion products disproportionately connect proteins that did not previously interact in the protein interaction network. In this manner, fusion products can escape cellular regulation and constitutively rewire protein interaction networks. We suggest that the deregulation of central, interaction-prone proteins may represent a widespread mechanism by which fusion proteins alter the topology of cellular signaling pathways and promote cancer.


Subject(s)
Gene Fusion , Neoplasm Proteins/genetics , Neoplasm Proteins/metabolism , Neoplasms/genetics , Neoplasms/metabolism , Protein Interaction Maps , Computational Biology , Databases, Protein , Humans , Protein Interaction Mapping , Protein Processing, Post-Translational , Signal Transduction , Transcription Factors/genetics , Transcription Factors/metabolism , Ubiquitination
5.
Genome Biol Evol ; 8(7): 2118-32, 2016 07 14.
Article in English | MEDLINE | ID: mdl-27358427

ABSTRACT

To progress our understanding of molecular evolution from a collection of well-studied genes toward the level of the cell, we must consider whole systems. Here, we reveal the evolution of an important intracellular signaling system. The calcium-signaling toolkit is made up of different multidomain proteins that have undergone duplication, recombination, sequence divergence, and selection. The picture of evolution, considering the repertoire of proteins in the toolkit of both extant organisms and ancestors, is radically different from that of other systems. In eukaryotes, the repertoire increased in both abundance and diversity at a far greater rate than general genomic expansion. We describe how calcium-based intracellular signaling evolution differs not only in rate but in nature, and how this correlates with the disparity of plants and animals.


Subject(s)
Calcium Signaling/genetics , Calcium-Binding Proteins/genetics , Evolution, Molecular , Animals , Calcium-Binding Proteins/chemistry , Calcium-Binding Proteins/metabolism , Eukaryota/genetics
6.
Protein Sci ; 25(5): 1030-6, 2016 May.
Article in English | MEDLINE | ID: mdl-26941008

ABSTRACT

We have identified that the collagen helix has the potential to be disruptive to analyses of intrinsically disordered proteins. The collagen helix is an extended fibrous structure that is both promiscuous and repetitive. Whilst its sequence is predicted to be disordered, this type of protein structure is not typically considered as intrinsic disorder. Here, we show that collagen-encoding proteins skew the distribution of exon lengths in genes. We find that previous results, demonstrating that exons encoding disordered regions are more likely to be symmetric, are due to the abundance of the collagen helix. Other related results, showing increased levels of alternative splicing in disorder-encoding exons, still hold after considering collagen-containing proteins. Aside from analyses of exons, we find that the set of proteins that contain collagen significantly alters the amino acid composition of regions predicted as disordered. We conclude that research in this area should be conducted in the light of the collagen helix.


Subject(s)
Alternative Splicing , Collagen/chemistry , Collagen/genetics , Exons , Amino Acid Sequence , Genome, Human , Humans , Intrinsically Disordered Proteins/chemistry , Intrinsically Disordered Proteins/genetics , Protein Conformation , Protein Structure, Secondary
7.
Nat Genet ; 48(3): 331-5, 2016 Mar.
Article in English | MEDLINE | ID: mdl-26780608

ABSTRACT

Transdifferentiation, the process of converting from one cell type to another without going through a pluripotent state, has great promise for regenerative medicine. The identification of key transcription factors for reprogramming is currently limited by the cost of exhaustive experimental testing of plausible sets of factors, an approach that is inefficient and unscalable. Here we present a predictive system (Mogrify) that combines gene expression data with regulatory network information to predict the reprogramming factors necessary to induce cell conversion. We have applied Mogrify to 173 human cell types and 134 tissues, defining an atlas of cellular reprogramming. Mogrify correctly predicts the transcription factors used in known transdifferentiations. Furthermore, we validated two new transdifferentiations predicted by Mogrify. We provide a practical and efficient mechanism for systematically implementing novel cell conversions, facilitating the generalization of reprogramming of human cells. Predictions are made available to help rapidly further the field of cell conversion.


Subject(s)
Cell Differentiation/genetics , Cell Transdifferentiation/genetics , Cellular Reprogramming/genetics , Gene Regulatory Networks , Fibroblasts , Humans , Induced Pluripotent Stem Cells , Regenerative Medicine , Transcription Factors/biosynthesis , Transcription Factors/genetics
8.
Proc Natl Acad Sci U S A ; 112(38): 11893-8, 2015 Sep 22.
Article in English | MEDLINE | ID: mdl-26324906

ABSTRACT

The most diverse marine ecosystems, coral reefs, depend upon a functional symbiosis between a cnidarian animal host (the coral) and intracellular photosynthetic dinoflagellate algae. The molecular and cellular mechanisms underlying this endosymbiosis are not well understood, in part because of the difficulties of experimental work with corals. The small sea anemone Aiptasia provides a tractable laboratory model for investigating these mechanisms. Here we report on the assembly and analysis of the Aiptasia genome, which will provide a foundation for future studies and has revealed several features that may be key to understanding the evolution and function of the endosymbiosis. These features include genomic rearrangements and taxonomically restricted genes that may be functionally related to the symbiosis, aspects of host dependence on alga-derived nutrients, a novel and expanded cnidarian-specific family of putative pattern-recognition receptors that might be involved in the animal-algal interactions, and extensive lineage-specific horizontal gene transfer. Extensive integration of genes of prokaryotic origin, including genes for antimicrobial peptides, presumably reflects an intimate association of the animal-algal pair also with its prokaryotic microbiome.


Subject(s)
Anthozoa/physiology , Genome/genetics , Sea Anemones/genetics , Symbiosis/genetics , Animals , Chromosomes/genetics , Evolution, Molecular , Gene Expression Profiling , Gene Transfer, Horizontal/genetics , Genome Size , Microbial Interactions/genetics , Models, Biological , Molecular Sequence Annotation , Phylogeny , Repetitive Sequences, Nucleic Acid/genetics , Synteny/genetics
9.
Nucleic Acids Res ; 43(10): 4814-22, 2015 May 26.
Article in English | MEDLINE | ID: mdl-25934802

ABSTRACT

We have discovered that positions of splice junctions in genes are constrained by the tolerance for disorder-promoting amino acids in the translated protein region. It is known that efficient splicing requires nucleotide bias at the splice junction; the preferred usage produces a distribution of amino acids that is disorder-promoting. We observe that efficiency of splicing, as seen in the amino-acid distribution, is not compromised to accommodate globular structure. Thus we infer that it is the positions of splice junctions in the gene that must be under constraint by the local protein environment. Examining exonic splicing enhancers found near the splice junction in the gene, reveals that these (short DNA motifs) are more prevalent in exons that encode disordered protein regions than exons encoding structured regions. Thus we also conclude that local protein features constrain efficient splicing more in structure than in disorder.


Subject(s)
Intrinsically Disordered Proteins/genetics , RNA Splice Sites , Amino Acids/analysis , Animals , Eukaryota/genetics , Exons , Nucleotide Motifs , Nucleotides/analysis
10.
Biochimie ; 119: 269-77, 2015 Dec.
Article in English | MEDLINE | ID: mdl-25980317

ABSTRACT

To help evaluate how protein function impacts on genome evolution, we introduce a new concept of 'architecture plasticity potential' - the capacity to form distinct domain architectures - both for an individual domain, or more generally for a set of domains grouped by shared function. We devise a scoring metric to measure the plasticity potential for these domain sets, and evaluate how function has changed over time for different species. Applying this metric to a phylogenetic tree of eukaryotic genomes, we find that the involvement of each function is not random but highly selective. For certain lineages there is strong bias for evolution to involve domains related to certain functions. In general eukaryotic genomes, particularly animals, expand complex functional activities such as signalling and regulation, but at the cost of reducing metabolic processes. We also observe differential evolution of transcriptional regulation and a unique evolutionary role of channel regulators; crucially this is only observable in terms of the architecture plasticity potential. Our findings provide a new layer of information to understand the significance of function in eukaryotic genome evolution. A web search tool, available at http://supfam.org/Pevo, offers a wide spectrum of options for exploring functional importance in eukaryotic genome evolution.


Subject(s)
Eukaryota/genetics , Evolution, Molecular , Genome , Genomics/methods , Models, Genetic , Proteome/chemistry , Animals , Cell Lineage , Cell Plasticity , Databases, Genetic , Databases, Protein , Eukaryota/cytology , Eukaryota/metabolism , Humans , Internet , Phylogeny , Protein Structure, Tertiary , Proteome/genetics , Proteome/metabolism , Search Engine , Structural Homology, Protein
11.
Nucleic Acids Res ; 43(Database issue): D227-33, 2015 Jan.
Article in English | MEDLINE | ID: mdl-25414345

ABSTRACT

We present updates to the SUPERFAMILY 1.75 (http://supfam.org) online resource and protein sequence collection. The hidden Markov model library that provides sequence homology to SCOP structural domains remains unchanged at version 1.75. In the last 4 years SUPERFAMILY has more than doubled its holding of curated complete proteomes over all cellular life, from 1400 proteomes reported previously in 2010 up to 3258 at present. Outside of the main sequence collection, SUPERFAMILY continues to provide domain annotation for sequences provided by other resources such as: UniProt, Ensembl, PDB, much of JGI Phytozome and selected subcollections of NCBI RefSeq. Despite this growth in data volume, SUPERFAMILY now provides users with an expanded and daily updated phylogenetic tree of life (sTOL). This tree is built with genomic-scale domain annotation data as before, but constantly updated when new species are introduced to the sequence library. Our Gene Ontology and other functional and phenotypic annotations previously reported have stood up to critical assessment by the function prediction community. We have now introduced these data in an integrated manner online at the level of an individual sequence, and--in the case of whole genomes--with enrichment analysis against a taxonomically defined background.


Subject(s)
Databases, Protein , Protein Structure, Tertiary , Gene Ontology , Molecular Sequence Annotation , Phylogeny , Proteins/classification , Proteins/genetics , Proteome/chemistry , Sequence Analysis, Protein
12.
Nucleic Acids Res ; 43(Database issue): D382-6, 2015 Jan.
Article in English | MEDLINE | ID: mdl-25348407

ABSTRACT

Genome3D (http://www.genome3d.eu) is a collaborative resource that provides predicted domain annotations and structural models for key sequences. Since introducing Genome3D in a previous NAR paper, we have substantially extended and improved the resource. We have annotated representatives from Pfam families to improve coverage of diverse sequences and added a fast sequence search to the website to allow users to find Genome3D-annotated sequences similar to their own. We have improved and extended the Genome3D data, enlarging the source data set from three model organisms to 10, and adding VIVACE, a resource new to Genome3D. We have analysed and updated Genome3D's SCOP/CATH mapping. Finally, we have improved the superposition tools, which now give users a more powerful interface for investigating similarities and differences between structural models.


Subject(s)
Databases, Protein , Molecular Sequence Annotation , Protein Structure, Tertiary , Algorithms , Genomics , Internet , Models, Molecular , Protein Structure, Tertiary/genetics , Sequence Analysis, Protein
13.
Environ Microbiol ; 17(1): 4-9, 2015 Jan.
Article in English | MEDLINE | ID: mdl-25339269

ABSTRACT

We present the Proteome Quality Index (PQI; http://pqi-list.org), a much-needed resource for users of bacterial and eukaryotic proteomes. Completely sequenced genomes for which there is an available set of protein sequences (the proteome) are given a one- to five-star rating supported by 11 different metrics of quality. The database indexes over 3000 proteomes at the time of writing and is provided via a website for browsing, filtering and downloading. Previous to this work, there was no systematic way to account for the large variability in quality of the thousands of proteomes, and this is likely to have profoundly influenced the outcome of many published studies, in particular large-scale comparative analyses. The lack of a measure of proteome quality is likely due to the difficulty in producing one, a problem that we have approached by integrating multiple metrics. The continued development and improvement of the index will require the contribution of additional metrics by us and by others; the PQI provides a useful point of reference for the scientific community, but it is only the first step towards a 'standard' for the field.


Subject(s)
Databases, Protein , Proteome/standards , Genome , Internet
14.
Curr Opin Struct Biol ; 27: 129-37, 2014 Aug.
Article in English | MEDLINE | ID: mdl-25198166

ABSTRACT

The seven-transmembrane (7TM) helix fold of G-protein coupled receptors (GPCRs) has been adapted for a wide variety of physiologically important signaling functions. Here, we discuss the diversity in the structured and disordered regions of GPCRs based on the recently published crystal structures and sequence analysis of all human GPCRs. A comparison of the structures of rhodopsin-like receptors (class A), secretin-like receptors (class B), metabotropic receptors (class C) and frizzled receptors (class F) shows that the relative arrangement of the transmembrane helices is conserved across all four GPCR classes although individual receptors can be activated by ligand binding at varying positions within and around the transmembrane helical bundle. A systematic analysis of GPCR sequences reveals the presence of disordered segments in the cytoplasmic side, abundant post-translational modification sites, evidence for alternative splicing and several putative linear peptide motifs that have the potential to mediate interactions with cytosolic proteins. While the structured regions permit the receptor to bind diverse ligands, the disordered regions appear to have an underappreciated role in modulating downstream signaling in response to the cellular state. An integrated paradigm combining the knowledge of structured and disordered regions is imperative for gaining a holistic understanding of the GPCR (un)structure-function relationship.


Subject(s)
Receptors, G-Protein-Coupled/chemistry , Animals , Cell Membrane/chemistry , Cell Membrane/metabolism , Humans , Receptors, G-Protein-Coupled/metabolism
15.
Mol Biol Evol ; 31(6): 1364-74, 2014 Jun.
Article in English | MEDLINE | ID: mdl-24692656

ABSTRACT

Humans are composed of hundreds of cell types. As the genomic DNA of each somatic cell is identical, cell type is determined by what is expressed and when. Until recently, little has been reported about the determinants of human cell identity, particularly from the joint perspective of gene evolution and expression. Here, we chart the evolutionary past of all documented human cell types via the collective histories of proteins, the principal product of gene expression. FANTOM5 data provide cell-type-specific digital expression of human protein-coding genes and the SUPERFAMILY resource is used to provide protein domain annotation. The evolutionary epoch in which each protein was created is inferred by comparison with domain annotation of all other completely sequenced genomes. Studying the distribution across epochs of genes expressed in each cell type reveals insights into human cellular evolution in terms of protein innovation. For each cell type, its history of protein innovation is charted based on the genes it expresses. Combining the histories of all cell types enables us to create a timeline of cell evolution. This timeline identifies the possibility that our common ancestor Coelomata (cavity-forming animals) provided the innovation required for the innate immune system, whereas cells which now form the brain of human have followed a trajectory of continually accumulating novel proteins since Opisthokonta (boundary of animals and fungi). We conclude that exaptation of existing domain architectures into new contexts is the dominant source of cell-type-specific domain architectures.


Subject(s)
Evolution, Molecular , Phylogeny , Proteins/chemistry , Proteins/genetics , Eukaryotic Cells , Humans , Immunity, Innate , Protein Structure, Tertiary , Sequence Analysis, Protein , Transcriptome
16.
Sci Rep ; 3: 2015, 2013.
Article in English | MEDLINE | ID: mdl-23778980

ABSTRACT

We report a daily-updated sequenced/species Tree Of Life (sTOL) as a reference for the increasing number of cellular organisms with their genomes sequenced. The sTOL builds on a likelihood-based weight calibration algorithm to consolidate NCBI taxonomy information in concert with unbiased sampling of molecular characters from whole genomes of all sequenced organisms. Via quantifying the extent of agreement between taxonomic and molecular data, we observe there are many potential improvements that can be made to the status quo classification, particularly in the Fungi kingdom; we also see that the current state of many animal genomes is rather poor. To augment the use of sTOL in providing evolutionary contexts, we integrate an ontology infrastructure and demonstrate its utility for evolutionary understanding on: nuclear receptors, stem cells and eukaryotic genomes. The sTOL (http://supfam.org/SUPERFAMILY/sTOL) provides a binary tree of (sequenced) life, and contributes to an analytical platform linking genome evolution, function and phenotype.


Subject(s)
Databases, Genetic , Genome , Genomics , Phylogeny , Animals , Computational Biology/methods , Databases, Genetic/standards , Genomics/methods , Genomics/standards , Internet
17.
Nucleic Acids Res ; 41(Database issue): D508-16, 2013 Jan.
Article in English | MEDLINE | ID: mdl-23203878

ABSTRACT

We present the Database of Disordered Protein Prediction (D(2)P(2)), available at http://d2p2.pro (including website source code). A battery of disorder predictors and their variants, VL-XT, VSL2b, PrDOS, PV2, Espritz and IUPred, were run on all protein sequences from 1765 complete proteomes (to be updated as more genomes are completed). Integrated with these results are all of the predicted (mostly structured) SCOP domains using the SUPERFAMILY predictor. These disorder/structure annotations together enable comparison of the disorder predictors with each other and examination of the overlap between disordered predictions and SCOP domains on a large scale. D(2)P(2) will increase our understanding of the interplay between disorder and structure, the genomic distribution of disorder, and its evolutionary history. The parsed data are made available in a unified format for download as flat files or SQL tables either by genome, by predictor, or for the complete set. An interactive website provides a graphical view of each protein annotated with the SCOP domains and disordered regions from all predictors overlaid (or shown as a consensus). There are statistics and tools for browsing and comparing genomes and their disorder within the context of their position on the tree of life.


Subject(s)
Databases, Protein , Protein Conformation , Genome , Internet , Protein Structure, Tertiary , Proteins/chemistry , Proteins/genetics , Sequence Analysis, Protein
SELECTION OF CITATIONS
SEARCH DETAIL
...