Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 18 de 18
Filter
Add more filters










Publication year range
1.
Biol Direct ; 17(1): 7, 2022 03 21.
Article in English | MEDLINE | ID: mdl-35313954

ABSTRACT

BACKGROUND: Bacteria and archaea produce an enormous diversity of modified peptides that are involved in various forms of inter-microbial conflicts or communication. A vast class of such peptides are Ribosomally synthesized, Postranslationally modified Peptides (RiPPs), and a major group of RiPPs are graspetides, so named after ATP-grasp ligases that catalyze the formation of lactam and lactone linkages in these peptides. The diversity of graspetides, the multiple proteins encoded in the respective Biosynthetic Gene Clusters (BGCs) and their evolution have not been studied in full detail. In this work, we attempt a comprehensive analysis of the graspetide-encoding BGCs and report a variety of novel graspetide groups as well as ancillary proteins implicated in graspetide biosynthesis and expression. RESULTS: We compiled a comprehensive, manually curated set of graspetides that includes 174 families including 115 new families with distinct patterns of amino acids implicated in macrocyclization and further modification, roughly tripling the known graspetide diversity. We derived signature motifs for the leader regions of graspetide precursors that could be used to facilitate graspetide prediction. Graspetide biosynthetic gene clusters and specific precursors were identified in bacterial divisions not previously known to encode RiPPs, in particular, the parasitic and symbiotic bacteria of the Candidate phyla radiation. We identified Bacteroides-specific biosynthetic gene clusters (BGC) that include remarkable diversity of graspetides encoded in the same loci which predicted to be modified by the same ATP-grasp ligase. We studied in details evolution of recently characterized chryseoviridin BGCs and showed that duplication and horizonal gene exchange both contribute to the diversification of the graspetides during evolution. CONCLUSIONS: We demonstrate previously unsuspected diversity of graspetide sequences, even those associated with closely related ATP-grasp enzymes. Several previously unnoticed families of proteins associated with graspetide biosynthetic gene clusters are identified. The results of this work substantially expand the known diversity of RiPPs and can be harnessed to further advance approaches for their identification.


Subject(s)
Multigene Family , Peptides , Adenosine Triphosphate/chemistry , Adenosine Triphosphate/metabolism , Bacteria/genetics , Peptides/chemistry , Phylogeny , Protein Processing, Post-Translational
2.
J Am Chem Soc ; 143(21): 8056-8068, 2021 06 02.
Article in English | MEDLINE | ID: mdl-34028251

ABSTRACT

Among the ribosomally synthesized and post-translationally modified peptide (RiPP) natural products, "graspetides" (formerly known as microviridins) contain macrocyclic esters and amides that are formed by ATP-grasp ligase tailoring enzymes using the side chains of Asp/Glu as acceptors and Thr/Ser/Lys as donors. Graspetides exhibit diverse patterns of macrocylization and connectivities exemplified by microviridins, that have a caged tricyclic core, and thuringin and plesiocin that feature a "hairpin topology" with cross-strand ω-ester bonds. Here, we characterize chryseoviridin, a new type of multicore RiPP encoded by Chryseobacterium gregarium DS19109 (Phylum Bacteroidetes) and solve a 2.44 Å resolution crystal structure of a quaternary complex consisting of the ATP-grasp ligase CdnC bound to ADP, a conserved leader peptide and a peptide substrate. HRMS/MS analyses show that chryseoviridin contains four consecutive five- or six-residue macrocycles ending with a microviridin-like core. The crystal structure captures respective subunits of the CdnC homodimer in the apo or substrate-bound state revealing a large conformational change in the B-domain upon substrate binding. A docked model of ATP places the γ-phosphate group within 2.8 Å of the Asp acceptor residue. The orientation of the bound substrate is consistent with a model in which macrocyclization occurs in the N- to C-terminal direction for core peptides containing multiple Thr/Ser-to-Asp macrocycles. Using systematically varied sequences, we validate this model and identify two- or three-amino acid templating elements that flank the macrolactone and are required for enzyme activity in vitro. This work reveals the structural basis for ω-ester bond formation in RiPP biosynthesis.


Subject(s)
Adenosine Triphosphate/metabolism , Biological Products/metabolism , Ligases/metabolism , Peptides/metabolism , Adenosine Triphosphate/chemistry , Amides/chemistry , Amides/metabolism , Biological Products/chemistry , Esters/chemistry , Esters/metabolism , Ligases/chemistry , Macrocyclic Compounds/chemistry , Macrocyclic Compounds/metabolism , Molecular Conformation , Peptides/chemistry , Protein Processing, Post-Translational
3.
Nature ; 593(7860): 553-557, 2021 05.
Article in English | MEDLINE | ID: mdl-33911286

ABSTRACT

Asgard is a recently discovered superphylum of archaea that appears to include the closest archaeal relatives of eukaryotes1-5. Debate continues as to whether the archaeal ancestor of eukaryotes belongs within the Asgard superphylum or whether this ancestor is a sister group to all other archaea (that is, a two-domain versus a three-domain tree of life)6-8. Here we present a comparative analysis of 162 complete or nearly complete genomes of Asgard archaea, including 75 metagenome-assembled genomes that-to our knowledge-have not previously been reported. Our results substantially expand the phylogenetic diversity of Asgard and lead us to propose six additional phyla that include a deep branch that we have provisionally named Wukongarchaeota. Our phylogenomic analysis does not resolve unequivocally the evolutionary relationship between eukaryotes and Asgard archaea, but instead-depending on the choice of species and conserved genes used to build the phylogeny-supports either the origin of eukaryotes from within Asgard (as a sister group to the expanded Heimdallarchaeota-Wukongarchaeota branch) or a deeper branch for the eukaryote ancestor within archaea. Our comprehensive protein domain analysis using the 162 Asgard genomes results in a major expansion of the set of eukaryotic signature proteins. The Asgard eukaryotic signature proteins show variable phyletic distributions and domain architectures, which is suggestive of dynamic evolution through horizontal gene transfer, gene loss, gene duplication and domain shuffling. The phylogenomics of the Asgard archaea points to the accumulation of the components of the mobile archaeal 'eukaryome' in the archaeal ancestor of eukaryotes (within or outside Asgard) through extensive horizontal gene transfer.


Subject(s)
Archaea/classification , Genome, Archaeal , Phylogeny , Biological Evolution , Eukaryota , Metagenomics
4.
PLoS Curr ; 2: RRN1200, 2010 Dec 03.
Article in English | MEDLINE | ID: mdl-21152078

ABSTRACT

Severity of seasonal influenza A epidemics is related to the antigenic novelty of the predominant viral strains circulating each year. Support for a strong correlation between epidemic severity and antigenic drift comes from infectious challenge experiments on vaccinated animals and human volunteers, field studies of vaccine efficacy, prospective studies of subjects with laboratory-confirmed prior infections, and analysis of the connection between drift and severity from surveillance data. We show that, given data on the antigenic and sequence novelty of the hemagglutinin protein of clinical isolates of H3N2 virus from a season along with the corresponding data from prior seasons, we can accurately predict the influenza severity for that season. This model therefore provides a framework for making projections of the severity of the upcoming season using assumptions based on viral isolates collected in the current season. Our results based on two independent data sets from the US and Hong Kong suggest that seasonal severity is largely determined by the novelty of the hemagglutinin protein although other factors, including mutations in other influenza genes, co-circulating pathogens and weather conditions, might also play a role. These results should be helpful for the control of seasonal influenza and have implications for improvement of influenza surveillance.

5.
PLoS Curr ; 1: RRN1001, 2009 Aug 18.
Article in English | MEDLINE | ID: mdl-20025194

ABSTRACT

The hemagglutinin protein of influenza virus bears several sites of N-linked asparagine glycosylation. The number and location of these sites varies with strain and substrain. The human H3 hemagglutinin has gained several glycosylation sites on the antigenically important globular head since its introduction to humans, presumably due to selection. Although there is abundant evidence that glycosylation can affect antigenic and functional properties of the protein, direct evidence for selection is lacking. We have analyzed gain and loss of glycosylation sites on the side branches of a large phylogenetic tree of H(3) HA1 sequences (branches off of the main, long-term line of descent). Side branches contrast with the main line of descent: losses of glycosylation sites are not uncommon, and they outnumber gains. Although other explanations are possible, this observation is consistent with weak selection for glycosylation sites or a more complicated pattern of selection. Furthermore, terminal and internal branches differ with respect to rates of gain and loss of glycosylation sites. This pattern would not be expected under selective neutrality, but is easily explained by weak selection or selection that changes with the immune state of the host population. Thus, it provides evidence that selection acts on the glycosylation state of hemagglutinin.

6.
Methods Mol Biol ; 484: 465-90, 2008.
Article in English | MEDLINE | ID: mdl-18592196

ABSTRACT

Genome sequencing projects have resulted in a rapid accumulation of predicted protein sequences. With experimentally verified information on protein function lagging far behind, computational methods are used for functional annotation of proteins. Here we describe a number of protocols for protein sequence and structure analysis that can be used to infer function of uncharacterized proteins. These protocols rely on publicly available computational resources and tools and can be utilized by anyone with an Internet access.


Subject(s)
Amino Acid Sequence , Proteins , Sequence Homology, Amino Acid , Animals , Databases, Protein , Humans , Models, Molecular , Molecular Sequence Data , Protein Conformation , Proteins/chemistry , Proteins/genetics , Proteins/metabolism , Sequence Alignment
7.
Methods Enzymol ; 422: 47-74, 2007.
Article in English | MEDLINE | ID: mdl-17628134

ABSTRACT

The availability of complete genome sequences of diverse bacteria and archaea makes comparative sequence analysis a powerful tool for analyzing signal transduction systems encoded in these genomes. However, most signal transduction proteins consist of two or more individual protein domains, which significantly complicates their functional annotation and makes automated annotation of these proteins in the course of large-scale genome sequencing projects particularly unreliable. This chapter describes certain common-sense protocols for sequence analysis of two-component histidine kinases and response regulators, as well as other components of the prokaryotic signal transduction machinery: Ser/Thr/Tyr protein kinases and protein phosphatases, adenylate and diguanylate cyclases, and c-di-GMP phosphodiesterases. These protocols rely on publicly available computational tools and databases and can be utilized by anyone with Internet access.


Subject(s)
Bacteria/genetics , Protein Kinases/genetics , Signal Transduction/physiology , Bacteria/enzymology , Bacterial Proteins/chemistry , Bacterial Proteins/genetics , Binding Sites , Conserved Sequence , Databases, Protein , Histidine Kinase , Protein Kinases/chemistry , Protein Kinases/metabolism , Sequence Alignment , Sequence Analysis, Protein , Sequence Homology, Amino Acid
8.
Nucleic Acids Res ; 35(Database issue): D224-8, 2007 Jan.
Article in English | MEDLINE | ID: mdl-17202162

ABSTRACT

InterPro is an integrated resource for protein families, domains and functional sites, which integrates the following protein signature databases: PROSITE, PRINTS, ProDom, Pfam, SMART, TIGRFAMs, PIRSF, SUPERFAMILY, Gene3D and PANTHER. The latter two new member databases have been integrated since the last publication in this journal. There have been several new developments in InterPro, including an additional reading field, new database links, extensions to the web interface and additional match XML files. InterPro has always provided matches to UniProtKB proteins on the website and in the match XML file on the FTP site. Additional matches to proteins in UniParc (UniProt archive) are now available for download in the new match XML files only. The latest InterPro release (13.0) contains more than 13 000 entries, covering over 78% of all proteins in UniProtKB. The database is available for text- and sequence-based searches via a webserver (http://www.ebi.ac.uk/interpro), and for download by anonymous FTP (ftp://ftp.ebi.ac.uk/pub/databases/interpro). The InterProScan search tool is now also available via a web service at http://www.ebi.ac.uk/Tools/webservices/WSInterProScan.html.


Subject(s)
Databases, Protein , Internet , Protein Structure, Tertiary , Proteins/chemistry , Proteins/classification , Proteins/physiology , Sequence Analysis, Protein , Systems Integration , User-Computer Interface
9.
Evol Bioinform Online ; 2: 197-209, 2007 Feb 10.
Article in English | MEDLINE | ID: mdl-19455212

ABSTRACT

The PIRSF protein classification system (http://pir.georgetown.edu/pirsf/) reflects evolutionary relationships of full-length proteins and domains. The primary PIRSF classification unit is the homeomorphic family, whose members are both homologous (evolved from a common ancestor) and homeomorphic (sharing full-length sequence similarity and a common domain architecture). PIRSF families are curated systematically based on literature review and integrative sequence and functional analysis, including sequence and structure similarity, domain architecture, functional association, genome context, and phyletic pattern. The results of classification and expert annotation are summarized in PIRSF family reports with graphical viewers for taxonomic distribution, domain architecture, family hierarchy, and multiple alignment and phylogenetic tree. The PIRSF system provides a comprehensive resource for bioinformatics analysis and comparative studies of protein function and evolution. Domain or fold-based searches allow identification of evolutionarily related protein families sharing domains or structural folds. Functional convergence and functional divergence are revealed by the relationships between protein classification and curated family functions. The taxonomic distribution allows the identification of lineage-specific or broadly conserved protein families and can reveal horizontal gene transfer. Here we demonstrate, with illustrative examples, how to use the web-based PIRSF system as a tool for functional and evolutionary studies of protein families.

10.
Nucleic Acids Res ; 33(Database issue): D201-5, 2005 Jan 01.
Article in English | MEDLINE | ID: mdl-15608177

ABSTRACT

InterPro, an integrated documentation resource of protein families, domains and functional sites, was created to integrate the major protein signature databases. Currently, it includes PROSITE, Pfam, PRINTS, ProDom, SMART, TIGRFAMs, PIRSF and SUPERFAMILY. Signatures are manually integrated into InterPro entries that are curated to provide biological and functional information. Annotation is provided in an abstract, Gene Ontology mapping and links to specialized databases. New features of InterPro include extended protein match views, taxonomic range information and protein 3D structure data. One of the new match views is the InterPro Domain Architecture view, which shows the domain composition of protein matches. Two new entry types were introduced to better describe InterPro entries: these are active site and binding site. PIRSF and the structure-based SUPERFAMILY are the latest member databases to join InterPro, and CATH and PANTHER are soon to be integrated. InterPro release 8.0 contains 11 007 entries, representing 2573 domains, 8166 families, 201 repeats, 26 active sites, 21 binding sites and 20 post-translational modification sites. InterPro covers over 78% of all proteins in the Swiss-Prot and TrEMBL components of UniProt. The database is available for text- and sequence-based searches via a webserver (http://www.ebi.ac.uk/interpro), and for download by anonymous FTP (ftp://ftp.ebi.ac.uk/pub/databases/interpro).


Subject(s)
Databases, Protein , Proteins/chemistry , Proteins/classification , Sequence Analysis, Protein , Databases, Protein/trends , Humans , Protein Structure, Tertiary , Sequence Alignment , Systems Integration
11.
Comput Biol Chem ; 28(1): 87-96, 2004 Feb.
Article in English | MEDLINE | ID: mdl-15022647

ABSTRACT

Increasingly, scientists have begun to tackle gene functions and other complex regulatory processes by studying organisms at the global scales for various levels of biological organization, ranging from genomes to metabolomes and physiomes. Meanwhile, new bioinformatics methods have been developed for inferring protein function using associative analysis of functional properties to complement the traditional sequence homology-based methods. To fully exploit the value of the high-throughput system biology data and to facilitate protein functional studies requires bioinformatics infrastructures that support both data integration and associative analysis. The iProClass database, designed to serve as a framework for data integration in a distributed networking environment, provides comprehensive descriptions of all proteins, with rich links to over 50 databases of protein family, function, pathway, interaction, modification, structure, genome, ontology, literature, and taxonomy. In particular, the database is organized with PIRSF family classification and maps to other family, function, and structure classification schemes. Coupled with the underlying taxonomic information for complete genomes, the iProClass system (http://pir.georgetown.edu/iproclass/) supports associative studies of protein family, domain, function, and structure. A case study of the phosphoglycerate mutases illustrates a systematic approach for protein family and phylogenetic analysis. Such studies may serve as a basis for further analysis of protein functional evolution, and its relationship to the co-evolution of metabolic pathways, cellular networks, and organisms.


Subject(s)
Databases, Factual , Genome, Human , Proteins/metabolism , Amino Acid Sequence , Computational Biology , Humans , Molecular Biology/methods , Molecular Sequence Data , Phosphoglycerate Mutase/chemistry , Phosphoglycerate Mutase/genetics , Phosphoglycerate Mutase/metabolism , Phylogeny , Proteins/chemistry , Proteins/genetics
12.
Genome Biol ; 5(2): R7, 2004.
Article in English | MEDLINE | ID: mdl-14759257

ABSTRACT

BACKGROUND: Sequencing the genomes of multiple, taxonomically diverse eukaryotes enables in-depth comparative-genomic analysis which is expected to help in reconstructing ancestral eukaryotic genomes and major events in eukaryotic evolution and in making functional predictions for currently uncharacterized conserved genes. RESULTS: We examined functional and evolutionary patterns in the recently constructed set of 5,873 clusters of predicted orthologs (eukaryotic orthologous groups or KOGs) from seven eukaryotic genomes: Caenorhabditis elegans, Drosophila melanogaster, Homo sapiens, Arabidopsis thaliana, Saccharomyces cerevisiae, Schizosaccharomyces pombe and Encephalitozoon cuniculi. Conservation of KOGs through the phyletic range of eukaryotes strongly correlates with their functions and with the effect of gene knockout on the organism's viability. The approximately 40% of KOGs that are represented in six or seven species are enriched in proteins responsible for housekeeping functions, particularly translation and RNA processing. These conserved KOGs are often essential for survival and might approximate the minimal set of essential eukaryotic genes. The 131 single-member, pan-eukaryotic KOGs we identified were examined in detail. For around 20 that remained uncharacterized, functions were predicted by in-depth sequence analysis and examination of genomic context. Nearly all these proteins are subunits of known or predicted multiprotein complexes, in agreement with the balance hypothesis of evolution of gene copy number. Other KOGs show a variety of phyletic patterns, which points to major contributions of lineage-specific gene loss and the 'invention' of genes new to eukaryotic evolution. Examination of the sets of KOGs lost in individual lineages reveals co-elimination of functionally connected genes. Parsimonious scenarios of eukaryotic genome evolution and gene sets for ancestral eukaryotic forms were reconstructed. The gene set of the last common ancestor of the crown group consists of 3,413 KOGs and largely includes proteins involved in genome replication and expression, and central metabolism. Only 44% of the KOGs, mostly from the reconstructed gene set of the last common ancestor of the crown group, have detectable homologs in prokaryotes; the remainder apparently evolved via duplication with divergence and invention of new genes. CONCLUSIONS: The KOG analysis reveals a conserved core of largely essential eukaryotic genes as well as major diversification and innovation associated with evolution of eukaryotic genomes. The results provide quantitative support for major trends of eukaryotic evolution noticed previously at the qualitative level and a basis for detailed reconstruction of evolution of eukaryotic genomes and biology of ancestral forms.


Subject(s)
Eukaryotic Cells/classification , Genome , Phylogeny , Proteins/classification , Animals , Caenorhabditis elegans/genetics , Evolution, Molecular , Gene Deletion , Humans , Prokaryotic Cells/classification , Protein Structure, Tertiary , Proteins/genetics , Proteins/physiology , Sequence Analysis, Protein , Yeasts/genetics
13.
Nucleic Acids Res ; 32(Database issue): D112-4, 2004 Jan 01.
Article in English | MEDLINE | ID: mdl-14681371

ABSTRACT

The Protein Information Resource (PIR) is an integrated public resource of protein informatics. To facilitate the sensible propagation and standardization of protein annotation and the systematic detection of annotation errors, PIR has extended its superfamily concept and developed the SuperFamily (PIRSF) classification system. Based on the evolutionary relationships of whole proteins, this classification system allows annotation of both specific biological and generic biochemical functions. The system adopts a network structure for protein classification from superfamily to subfamily levels. Protein family members are homologous (sharing common ancestry) and homeomorphic (sharing full-length sequence similarity with common domain architecture). The PIRSF database consists of two data sets, preliminary clusters and curated families. The curated families include family name, protein membership, parent-child relationship, domain architecture, and optional description and bibliography. PIRSF is accessible from the website at http://pir.georgetown.edu/pirsf/ for report retrieval and sequence classification. The report presents family annotation, membership statistics, cross-references to other databases, graphical display of domain architecture, and links to multiple sequence alignments and phylogenetic trees for curated families. PIRSF can be utilized to analyze phylogenetic profiles, to reveal functional convergence and divergence, and to identify interesting relationships between homeomorphic families, domains and structural classes.


Subject(s)
Computational Biology , Databases, Protein , Proteins/chemistry , Proteins/classification , Amino Acid Motifs , Animals , Evolution, Molecular , Humans , Information Storage and Retrieval , Internet , Protein Structure, Tertiary
14.
BMC Bioinformatics ; 4: 41, 2003 Sep 11.
Article in English | MEDLINE | ID: mdl-12969510

ABSTRACT

BACKGROUND: The availability of multiple, essentially complete genome sequences of prokaryotes and eukaryotes spurred both the demand and the opportunity for the construction of an evolutionary classification of genes from these genomes. Such a classification system based on orthologous relationships between genes appears to be a natural framework for comparative genomics and should facilitate both functional annotation of genomes and large-scale evolutionary studies. RESULTS: We describe here a major update of the previously developed system for delineation of Clusters of Orthologous Groups of proteins (COGs) from the sequenced genomes of prokaryotes and unicellular eukaryotes and the construction of clusters of predicted orthologs for 7 eukaryotic genomes, which we named KOGs after eukaryotic orthologous groups. The COG collection currently consists of 138,458 proteins, which form 4873 COGs and comprise 75% of the 185,505 (predicted) proteins encoded in 66 genomes of unicellular organisms. The eukaryotic orthologous groups (KOGs) include proteins from 7 eukaryotic genomes: three animals (the nematode Caenorhabditis elegans, the fruit fly Drosophila melanogaster and Homo sapiens), one plant, Arabidopsis thaliana, two fungi (Saccharomyces cerevisiae and Schizosaccharomyces pombe), and the intracellular microsporidian parasite Encephalitozoon cuniculi. The current KOG set consists of 4852 clusters of orthologs, which include 59,838 proteins, or approximately 54% of the analyzed eukaryotic 110,655 gene products. Compared to the coverage of the prokaryotic genomes with COGs, a considerably smaller fraction of eukaryotic genes could be included into the KOGs; addition of new eukaryotic genomes is expected to result in substantial increase in the coverage of eukaryotic genomes with KOGs. Examination of the phyletic patterns of KOGs reveals a conserved core represented in all analyzed species and consisting of approximately 20% of the KOG set. This conserved portion of the KOG set is much greater than the ubiquitous portion of the COG set (approximately 1% of the COGs). In part, this difference is probably due to the small number of included eukaryotic genomes, but it could also reflect the relative compactness of eukaryotes as a clade and the greater evolutionary stability of eukaryotic genomes. CONCLUSION: The updated collection of orthologous protein sets for prokaryotes and eukaryotes is expected to be a useful platform for functional annotation of newly sequenced genomes, including those of complex eukaryotes, and genome-wide evolutionary studies.


Subject(s)
Databases, Protein/trends , Eukaryotic Cells , Proteins/classification , Proteins/genetics , Animals , Databases, Nucleic Acid/trends , Eukaryotic Cells/chemistry , Eukaryotic Cells/physiology , Evolution, Molecular , Humans , National Institutes of Health (U.S.) , Proteins/physiology , Terminology as Topic , United States
15.
Nucleic Acids Res ; 31(1): 383-7, 2003 Jan 01.
Article in English | MEDLINE | ID: mdl-12520028

ABSTRACT

The Conserved Domain Database (CDD) is now indexed as a separate database within the Entrez system and linked to other Entrez databases such as MEDLINE(R). This allows users to search for domain types by name, for example, or to view the domain architecture of any protein in Entrez's sequence database. CDD can be accessed on the WorldWideWeb at http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=cdd. Users may also employ the CD-Search service to identify conserved domains in new sequences, at http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi. CD-Search results, and pre-computed links from Entrez's protein database, are calculated using the RPS-BLAST algorithm and Position Specific Score Matrices (PSSMs) derived from CDD alignments. CD-Searches are also run by default for protein-protein queries submitted to BLAST(R) at http://www.ncbi.nlm.nih.gov/BLAST. CDD mirrors the publicly available domain alignment collections SMART and PFAM, and now also contains alignment models curated at NCBI. Structure information is used to identify the core substructure likely to be present in all family members, and to produce sequence alignments consistent with structure conservation. This alignment model allows NCBI curators to annotate 'columns' corresponding to functional sites conserved among family members.


Subject(s)
Databases, Protein , Protein Structure, Tertiary , Amino Acid Sequence , Animals , Conserved Sequence , Information Storage and Retrieval , Models, Molecular , Sequence Alignment
16.
Nucleic Acids Res ; 31(1): 474-7, 2003 Jan 01.
Article in English | MEDLINE | ID: mdl-12520055

ABSTRACT

Three-dimensional structures are now known within most protein families and it is likely, when searching a sequence database, that one will identify a homolog of known structure. The goal of Entrez's 3D-structure database is to make structure information and the functional annotation it can provide easily accessible to molecular biologists. To this end, Entrez's search engine provides several powerful features: (i) links between databases, for example between a protein's sequence and structure; (ii) pre-computed sequence and structure neighbors; and (iii) structure and sequence/structure alignment visualization. Here, we focus on a new feature of Entrez's Molecular Modeling Database (MMDB): Graphical summaries of the biological annotation available for each 3D structure, based on the results of automated comparative analysis. MMDB is available at: http://www.ncbi.nlm.nih.gov/Entrez/structure.html.


Subject(s)
Databases, Protein , Models, Molecular , Structural Homology, Protein , Animals , Computer Graphics , Imaging, Three-Dimensional , Protein Structure, Tertiary , Proteins/chemistry
17.
J Bacteriol ; 185(1): 285-94, 2003 Jan.
Article in English | MEDLINE | ID: mdl-12486065

ABSTRACT

Transmembrane receptors in microorganisms, such as sensory histidine kinases and methyl-accepting chemotaxis proteins, are molecular devices for monitoring environmental changes. We report here that sensory domain sharing is widespread among different classes of transmembrane receptors. We have identified two novel conserved extracellular sensory domains, named CHASE2 and CHASE3, that are found in at least four classes of transmembrane receptors: histidine kinases, adenylate cyclases, predicted diguanylate cyclases, and either serine/threonine protein kinases (CHASE2) or methyl-accepting chemotaxis proteins (CHASE3). Three other extracellular sensory domains were shared by at least two different classes of transmembrane receptors: histidine kinases and either diguanylate cyclases, adenylate cyclases, or phosphodiesterases. These observations suggest that microorganisms use similar conserved domains to sense similar environmental signals and transmit this information via different signal transduction pathways to different regulatory circuits: transcriptional regulation (histidine kinases), chemotaxis (methyl-accepting proteins), catabolite repression (adenylate cyclases), and modulation of enzyme activity (diguanylate cyclases and phosphodiesterases). The variety of signaling pathways using the CHASE-type domains indicates that these domains sense some critically important extracellular signals.


Subject(s)
Archaea/chemistry , Bacteria/chemistry , Receptors, Cell Surface/chemistry , Signal Transduction , Adenylyl Cyclases/chemistry , Adenylyl Cyclases/genetics , Amino Acid Sequence , Archaea/genetics , Archaea/metabolism , Archaeal Proteins/metabolism , Bacteria/genetics , Bacteria/metabolism , Bacterial Proteins/metabolism , Chemotaxis , Computational Biology , Databases, Genetic , Guanylate Cyclase/chemistry , Guanylate Cyclase/genetics , Histidine Kinase , Molecular Sequence Data , Protein Kinases/chemistry , Protein Kinases/genetics , Receptors, Cell Surface/genetics , Sequence Alignment
18.
Nucleic Acids Res ; 30(11): 2453-9, 2002 Jun 01.
Article in English | MEDLINE | ID: mdl-12034833

ABSTRACT

Sequence analysis of bacterial genomes revealed a novel DNA-binding domain. This domain is found in several response regulators of the two-component signal transduction system, such as Pseudomonas aeruginosa AlgR, involved in the regulation of alginate biosynthesis and in the pathogenesis of cystic fibrosis; Clostridium perfringens VirR, a regulator of virulence factors, and in several regulators of bacteriocin biosynthesis, previously unified in the AgrA/ComE family. Most of the transcriptional regulators that contain this DNA-binding domain are involved in biosynthesis of extracellular polysaccharides, fimbriation, expression of exoproteins, including toxins, and quorum sensing. We refer to it as the LytTR ('litter') domain, after Bacillus subtilis LytT and Staphylococcus aureus LytR response regulators, involved in regulation of cell autolysis. In addition to response regulators, the LytTR domain is found in combination with MHYT, PAS and other sensor domains.


Subject(s)
Bacterial Proteins/metabolism , Conserved Sequence/genetics , DNA/metabolism , Trans-Activators , Transcription Factors/chemistry , Transcription Factors/metabolism , Amino Acid Sequence , Bacterial Proteins/chemistry , Binding Sites , Clostridium perfringens/chemistry , Clostridium perfringens/genetics , Computational Biology , DNA/genetics , DNA-Binding Proteins/chemistry , DNA-Binding Proteins/metabolism , Databases, Protein , Gene Expression Regulation, Bacterial , Helix-Turn-Helix Motifs , Molecular Sequence Data , Phylogeny , Protein Binding , Protein Structure, Tertiary , Pseudomonas aeruginosa/chemistry , Pseudomonas aeruginosa/genetics , Sequence Alignment , Signal Transduction , Virulence/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...