Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 11 de 11
Filter
Add more filters










Publication year range
1.
Database (Oxford) ; 20202020 01 01.
Article in English | MEDLINE | ID: mdl-32761142

ABSTRACT

The National Center for Biotechnology Information (NCBI) Taxonomy includes organism names and classifications for every sequence in the nucleotide and protein sequence databases of the International Nucleotide Sequence Database Collaboration. Since the last review of this resource in 2012, it has undergone several improvements. Most notable is the shift from a single SQL database to a series of linked databases tied to a framework of data called NameBank. This means that relations among data elements can be adjusted in more detail, resulting in expanded annotation of synonyms, the ability to flag names with specific nomenclatural properties, enhanced tracking of publications tied to names and improved annotation of scientific authorities and types. Additionally, practices utilized by NCBI Taxonomy curators specific to major taxonomic groups are described, terms peculiar to NCBI Taxonomy are explained, external resources are acknowledged and updates to tools and other resources are documented. Database URL: https://www.ncbi.nlm.nih.gov/taxonomy.


Subject(s)
Classification , Database Management Systems , Databases, Genetic , Animals , Bacteria/genetics , Humans , National Library of Medicine (U.S.) , Plants/genetics , United States , Viruses/genetics
2.
Zootaxa ; 4706(3): zootaxa.4706.3.1, 2019 Dec 10.
Article in English | MEDLINE | ID: mdl-32230528

ABSTRACT

We compared the species names in the Reptile Database, a dedicated taxonomy database, with those in the NCBI taxonomy database, which provides the taxonomic backbone for the GenBank sequence database. About 67% of the known ~11,000 reptile species are represented with at least one DNA sequence and a binary species name in GenBank. However, a common problem arises through the submission of preliminary species names (such as "Pelomedusa sp. A CK-2014") to GenBank and thus the NCBI taxonomy. These names cannot be assigned to any accepted species names and thus create a disconnect between DNA sequences and species. While these names of unknown taxonomic meaning sometimes get updated, often they remain in GenBank which now contains sequences from ~1,300 such "putative" reptile species tagged by informal names (~15% of its reptile names). We estimate that NCBI/GenBank probably contain tens of thousands of such "disconnected" entries. We encourage sequence submitters to update informal species names after they have been published, otherwise the disconnect will cause increasing confusion and possibly misleading taxonomic conclusions.


Subject(s)
Databases, Genetic , Databases, Nucleic Acid , Reptiles/genetics , Animals , DNA
3.
Nucleic Acids Res ; 33(12): 3875-96, 2005.
Article in English | MEDLINE | ID: mdl-16027112

ABSTRACT

We report an in-depth computational study of the protein sequences and structures of the superfamily of archaeo-eukaryotic primases (AEPs). This analysis greatly expands the range of diversity of the AEPs and reveals the unique active site shared by all members of this superfamily. In particular, it is shown that eukaryotic nucleo-cytoplasmic large DNA viruses, including poxviruses, asfarviruses, iridoviruses, phycodnaviruses and the mimivirus, encode AEPs of a distinct family, which also includes the herpesvirus primases whose relationship to AEPs has not been recognized previously. Many eukaryotic genomes, including chordates and plants, encode previously uncharacterized homologs of these predicted viral primases, which might be involved in novel DNA repair pathways. At a deeper level of evolutionary connections, structural comparisons indicate that AEPs, the nucleases involved in the initiation of rolling circle replication in plasmids and viruses, and origin-binding domains of papilloma and polyoma viruses evolved from a common ancestral protein that might have been involved in a protein-priming mechanism of initiation of DNA replication. Contextual analysis of multidomain protein architectures and gene neighborhoods in prokaryotes and viruses reveals remarkable parallels between AEPs and the unrelated DnaG-type primases, in particular, tight associations with the same repertoire of helicases. These observations point to a functional equivalence of the two classes of primases, which seem to have repeatedly displaced each other in various extrachromosomal replicons.


Subject(s)
DNA Primase/chemistry , DNA Primase/classification , Evolution, Molecular , Amino Acid Sequence , Archaea/enzymology , Bacteria/enzymology , Catalytic Domain , Computational Biology , DNA Helicases/chemistry , DNA Primase/genetics , DNA Replication , DNA Viruses/enzymology , DNA-Directed DNA Polymerase/chemistry , Eukaryotic Cells/enzymology , Molecular Sequence Data , Operon , Phylogeny , Protein Structure, Tertiary , Sequence Alignment , Sequence Analysis, Protein , Viral Proteins
4.
J Mol Biol ; 343(1): 1-28, 2004 Oct 08.
Article in English | MEDLINE | ID: mdl-15381417

ABSTRACT

Using sequence profile analysis and sequence-based structure predictions, we define a previously unrecognized, widespread class of P-loop NTPases. The signal transduction ATPases with numerous domains (STAND) class includes the AP-ATPases (animal apoptosis regulators CED4/Apaf-1, plant disease resistance proteins, and bacterial AfsR-like transcription regulators) and NACHT NTPases (e.g. NAIP, TLP1, Het-E-1) that have been studied extensively in the context of apoptosis, pathogen response in animals and plants, and transcriptional regulation in bacteria. We show that, in addition to these well-characterized protein families, the STAND class includes several other groups of (predicted) NTPase domains from diverse signaling and transcription regulatory proteins from bacteria and eukaryotes, and three Archaea-specific families. We identified the STAND domain in several biologically well-characterized proteins that have not been suspected to have NTPase activity, including soluble adenylyl cyclases, nephrocystin 3 (implicated in polycystic kidney disease), and Rolling pebble (a regulator of muscle development); these findings are expected to facilitate elucidation of the functions of these proteins. The STAND class belongs to the additional strand, catalytic E division of P-loop NTPases together with the AAA+ ATPases, RecA/helicase-related ATPases, ABC-ATPases, and VirD4/PilT-like ATPases. The STAND proteins are distinguished from other P-loop NTPases by the presence of unique sequence motifs associated with the N-terminal helix and the core strand-4, as well as a C-terminal helical bundle that is fused to the NTPase domain. This helical module contains a signature GxP motif in the loop between the two distal helices. With the exception of the archaeal families, almost all STAND NTPases are multidomain proteins containing three or more domains. In addition to the NTPase domain, these proteins typically contain DNA-binding or protein-binding domains, superstructure-forming repeats, such as WD40 and TPR, and enzymatic domains involved in signal transduction, including adenylate cyclases and kinases. By analogy to the AAA+ ATPases, it can be predicted that STAND NTPases use the C-terminal helical bundle as a "lever" to transmit the conformational changes brought about by NTP hydrolysis to effector domains. STAND NTPases represent a novel paradigm in signal transduction, whereby adaptor, regulatory switch, scaffolding, and, in some cases, signal-generating moieties are combined into a single polypeptide. The STAND class consists of 14 distinct families, and the evolutionary history of most of these families is riddled with dramatic instances of lineage-specific expansion and apparent horizontal gene transfer. The STAND NTPases are most abundant in developmentally and organizationally complex prokaryotes and eukaryotes. Transfer of genes for STAND NTPases from bacteria to eukaryotes on several occasions might have played a significant role in the evolution of eukaryotic signaling systems.


Subject(s)
Apoptosis , Gene Transfer, Horizontal , Nucleoside-Triphosphatase/genetics , Phylogeny , Plants/enzymology , Adenosine Triphosphatases/chemistry , Adenosine Triphosphatases/classification , Adenosine Triphosphatases/genetics , Adenosine Triphosphatases/physiology , Animals , Evolution, Molecular , Gene Expression Regulation, Enzymologic , Humans , Nucleoside-Triphosphatase/chemistry , Protein Structure, Tertiary/genetics , Sequence Homology, Amino Acid
6.
Genome Biol ; 5(5): R30, 2004.
Article in English | MEDLINE | ID: mdl-15128444

ABSTRACT

BACKGROUND: Recent sequence-structure studies on P-loop-fold NTPases have substantially advanced the existing understanding of their evolution and functional diversity. These studies provide a framework for characterization of novel lineages within this fold and prediction of their functional properties. RESULTS: Using sequence profile searches and homology-based structure prediction, we have identified a previously uncharacterized family of P-loop NTPases, which includes the neuronal membrane protein and receptor tyrosine kinase substrate Kidins220/ARMS, which is conserved in animals, the F-plasmid PifA protein involved in phage T7 exclusion, and several uncharacterized bacterial proteins. We refer to these (predicted) NTPases as the KAP family, after Kidins220/ARMS and PifA. The KAP family NTPases are sporadically distributed across a wide phylogenetic range in bacteria but among the eukaryotes are represented only in animals. Many of the prokaryotic KAP NTPases are encoded in plasmids and tend to undergo disruption to form pseudogenes. A unique feature of all eukaryotic and certain bacterial KAP NTPases is the presence of two or four transmembrane helices inserted into the P-loop NTPase domain. These transmembrane helices anchor KAP NTPases in the membrane such that the P-loop domain is located on the intracellular side. We show that the KAP family belongs to the same major division of the P-loop NTPase fold with the AAA+, ABC, RecA-like, VirD4-like, PilT-like, and AP/NACHT-like NTPase classes. In addition to the KAP family, we identified another small family of predicted bacterial NTPases, with two transmembrane helices inserted into the P-loop domain. This family is not specifically related to the KAP NTPases, suggesting independent acquisition of the transmembrane helices. CONCLUSIONS: We predict that KAP family NTPases function principally in the NTP-dependent dynamics of protein complexes, especially those associated with the intracellular surface of cell membranes. Animal KAP NTPases, including Kidins220/ARMS, are likely to function as NTP-dependent regulators of the assembly of membrane-associated signaling complexes involved in neurite growth and development. One possible function of the prokaryotic KAP NTPases might be in the exclusion of selfish replicons, such as viruses, from the host cells. Phylogenetic analysis and phyletic patterns suggest that the common ancestor of the animals acquired a KAP NTPase via lateral transfer from bacteria. However, an earlier transfer into eukaryotes followed by multiple losses in several eukaryotic lineages cannot be ruled out.


Subject(s)
Catalytic Domain/genetics , Membrane Proteins/chemistry , Multigene Family/genetics , Nucleoside-Triphosphatase/genetics , Peptides/genetics , Phylogeny , Adenosine Triphosphatases/classification , Adenosine Triphosphatases/genetics , Adenosine Triphosphatases/physiology , Animals , Bacterial Proteins/chemistry , Bacterial Proteins/genetics , Caenorhabditis elegans Proteins/chemistry , Caenorhabditis elegans Proteins/genetics , Databases, Protein , Drosophila Proteins/chemistry , Drosophila Proteins/genetics , Evolution, Molecular , Insect Proteins/chemistry , Insect Proteins/genetics , Membrane Proteins/genetics , Mutagenesis, Insertional/genetics , Nucleoside-Triphosphatase/chemistry , Predictive Value of Tests , Protein Structure, Tertiary/genetics , Sequence Homology, Amino Acid , Zebrafish Proteins/chemistry , Zebrafish Proteins/genetics
7.
J Struct Biol ; 146(1-2): 11-31, 2004.
Article in English | MEDLINE | ID: mdl-15037234

ABSTRACT

The AAA+ ATPases are enzymes containing a P-loop NTPase domain, and function as molecular chaperones, ATPase subunits of proteases, helicases or nucleic-acid-stimulated ATPases. All available sequences and structures of AAA+ protein domains were compared with the aim of identifying the definitive sequence and structure features of these domains and inferring the principal events in their evolution. An evolutionary classification of the AAA+ class was developed using standard phylogenetic methods, analysis of shared sequence and structural signatures, and similarity-based clustering. This analysis resulted in the identification of 26 major families within the AAA+ ATPase class. We also describe the position of the AAA+ ATPases with respect to the RecA/F1, helicase superfamilies I/II, PilT, and ABC classes of P-loop NTPases. The AAA+ class appears to have undergone an early radiation into the clamp-loader, DnaA/Orc/Cdc6, classic AAA, and "pre-sensor 1 beta-hairpin" (PS1BH) clades. Within the PS1BH clade, chelatases, MoxR, YifB, McrB, Dynein-midasin, NtrC, and MCMs form a monophyletic assembly defined by a distinct insert in helix-2 of the conserved ATPase core, and additional helical segment between the core ATPase domain and the C-terminal alpha-helical bundle. At least 6 distinct AAA+ proteins, which represent the different major clades, are traceable to the last universal common ancestor (LUCA) of extant cellular life. Additionally, superfamily III helicases, which belong to the PS1BH assemblage, were probably present at this stage in virus-like "selfish" replicons. The next major radiation, at the base of the two prokaryotic kingdoms, bacteria and archaea, gave rise to several distinct chaperones, ATPase subunits of proteases, DNA helicases, and transcription factors. The third major radiation, at the outset of eukaryotic evolution, contributed to the origin of several eukaryote-specific adaptations related to nuclear and cytoskeletal functions. The new relationships and previously undetected domains reported here might provide new leads for investigating the biology of AAA+ ATPases.


Subject(s)
Adenosine Triphosphatases/classification , Adenosine Triphosphatases/genetics , Amino Acid Sequence , Evolution, Molecular , Classification , Computational Biology , Phylogeny , Protein Conformation , Sequence Homology , Structural Homology, Protein
8.
J Mol Biol ; 333(4): 781-815, 2003 Oct 31.
Article in English | MEDLINE | ID: mdl-14568537

ABSTRACT

Sequences and structures of all P-loop-fold proteins were compared with the aim of reconstructing the principal events in the evolution of P-loop-containing kinases. It is shown that kinases and some related proteins comprise a monophyletic assemblage within the P-loop NTPase fold. An evolutionary classification of these proteins was developed using standard phylogenetic methods, analysis of shared sequence and structural signatures, and similarity-based clustering. This analysis resulted in the identification of approximately 40 distinct protein families within the P-loop kinase class. Most of these enzymes phosphorylate nucleosides and nucleotides, as well as sugars, coenzyme precursors, adenosine 5'-phosphosulfate and polynucleotides. In addition, the class includes sulfotransferases, amide bond ligases, pyrimidine and dihydrofolate reductases, and several other families of enzymes that have acquired new catalytic capabilities distinct from the ancestral kinase reaction. Our reconstruction of the early history of the P-loop NTPase fold includes the initial split into the common ancestor of the kinase and the GTPase classes, and the common ancestor of ATPases. This was followed by the divergence of the kinases, which primarily phosphorylated nucleoside monophosphates (NMP), but could have had broader specificity. We provide evidence for the presence of at least two to four distinct P-loop kinases, including distinct forms specific for dNMP and rNMP, and related enzymes in the last universal common ancestor of all extant life forms. Subsequent evolution of kinases seems to have been dominated by the emergence of new bacterial and, to a lesser extent, archaeal families. Some of these enzymes retained their kinase activity but evolved new substrate specificities, whereas others acquired new activities, such as sulfate transfer and reduction. Eukaryotes appear to have acquired most of their kinases via horizontal gene transfer from Bacteria, partly from the mitochondrial and chloroplast endosymbionts and partly at later stages of evolution. A distinct superfamily of kinases, which we designated DxTN after its sequence signature, appears to have evolved in selfish replicons, such as bacteriophages, and was subsequently widely recruited by eukaryotes for multiple functions related to nucleic acid processing and general metabolism. In the course of this analysis, several previously undetected groups of predicted kinases were identified, including widespread archaeo-eukaryotic and archaeal families. The results could serve as a framework for systematic experimental characterization of new biochemical and biological functions of kinases.


Subject(s)
Phosphotransferases/classification , Phosphotransferases/genetics , Amino Acid Sequence , Animals , Conserved Sequence , Evolution, Molecular , Humans , Models, Molecular , Molecular Sequence Data , Phosphotransferases/chemistry , Phosphotransferases/metabolism , Phylogeny , Protein Folding , Protein Structure, Secondary , Sequence Alignment
9.
Mol Biol Evol ; 19(10): 1782-91, 2002 Oct.
Article in English | MEDLINE | ID: mdl-12270904

ABSTRACT

Diplomonads, such as Giardia, and their close relatives retortamonads have been proposed as early-branching eukaryotes that diverged before the acquisition-retention of mitochondria, and they have become key organisms in attempts to understand the evolution of eukaryotic cells. In this phylogenetic study we focus on a series of eukaryotes suggested to be relatives of diplomonads on morphological grounds, the "excavate taxa". Phylogenies of small subunit ribosomal RNA (SSU rRNA) genes, alpha-tubulin, beta-tubulin, and combined alpha- + beta-tubulin all scatter the various excavate taxa across the diversity of eukaryotes. But all phylogenies place the excavate taxon Carpediemonas as the closest relative of diplomonads (and, where data are available, retortamonads). This novel relationship is recovered across phylogenetic methods and across various taxon-deletion experiments. Statistical support is strongest under maximum-likelihood (ML) (when among-site rate variation is modeled) and when the most divergent diplomonad sequences are excluded, suggesting a true relationship rather than an artifact of long-branch attraction. When all diplomonads are excluded, our ML SSU rRNA tree actually places retortamonads and Carpediemonas away from the base of the eukaryotes. The branches separating excavate taxa are mostly not well supported (especially in analyses of SSU rRNA data). Statistical tests of the SSU rRNA data, including an "expected likelihood weights" approach, do not reject trees where excavate taxa are constrained to be a clade (with or without parabasalids and Euglenozoa). Although diplomonads and retortamonads lack any mitochondria-like organelle, Carpediemonas contains double membrane-bounded structures physically resembling hydrogenosomes. The phylogenetic position of Carpediemonas suggests that it will be valuable in interpreting the evolutionary significance of many molecular and cellular peculiarities of diplomonads.


Subject(s)
Diplomonadida/classification , Diplomonadida/genetics , Evolution, Molecular , Giardia/classification , Giardia/genetics , Animals , Base Sequence , Genes, Protozoan , Molecular Sequence Data , Phylogeny , RNA, Protozoan/genetics , RNA, Ribosomal/genetics , Tubulin/genetics
10.
J Mol Biol ; 317(1): 41-72, 2002 Mar 15.
Article in English | MEDLINE | ID: mdl-11916378

ABSTRACT

Sequences and available structures were compared for all the widely distributed representatives of the P-loop GTPases and GTPase-related proteins with the aim of constructing an evolutionary classification for this superclass of proteins and reconstructing the principal events in their evolution. The GTPase superclass can be divided into two large classes, each of which has a unique set of sequence and structural signatures (synapomorphies). The first class, designated TRAFAC (after translation factors) includes enzymes involved in translation (initiation, elongation, and release factors), signal transduction (in particular, the extended Ras-like family), cell motility, and intracellular transport. The second class, designated SIMIBI (after signal recognition particle, MinD, and BioD), consists of signal recognition particle (SRP) GTPases, the assemblage of MinD-like ATPases, which are involved in protein localization, chromosome partitioning, and membrane transport, and a group of metabolic enzymes with kinase or related phosphate transferase activity. These two classes together contain over 20 distinct families that are further subdivided into 57 subfamilies (ancient lineages) on the basis of conserved sequence motifs, shared structural features, and domain architectures. Ten subfamilies show a universal phyletic distribution compatible with presence in the last universal common ancestor of the extant life forms (LUCA). These include four translation factors, two OBG-like GTPases, the YawG/YlqF-like GTPases (these two subfamilies also consist of predicted translation factors), the two signal-recognition-associated GTPases, and the MRP subfamily of MinD-like ATPases. The distribution of nucleotide specificity among the proteins of the GTPase superclass indicates that the common ancestor of the entire superclass was a GTPase and that a secondary switch to ATPase activity has occurred on several independent occasions during evolution. The functions of most GTPases that are traceable to LUCA are associated with translation. However, in contrast to other superclasses of P-loop NTPases (RecA-F1/F0, AAA+, helicases, ABC), GTPases do not participate in NTP-dependent nucleic acid unwinding and reorganizing activities. Hence, we hypothesize that the ancestral GTPase was an enzyme with a generic regulatory role in translation, with subsequent diversification resulting in acquisition of diverse functions in transport, protein trafficking, and signaling. In addition to the classification of previously known families of GTPases and related ATPases, we introduce several previously undetected families and describe new functional predictions.


Subject(s)
Adenosine Triphosphatases/chemistry , Adenosine Triphosphatases/classification , Evolution, Molecular , GTP Phosphohydrolases/chemistry , GTP Phosphohydrolases/classification , Amino Acid Sequence , Animals , Computational Biology , Conserved Sequence , GTP Phosphohydrolase-Linked Elongation Factors/chemistry , GTP Phosphohydrolase-Linked Elongation Factors/classification , Heterotrimeric GTP-Binding Proteins/chemistry , Heterotrimeric GTP-Binding Proteins/classification , Humans , Kinesins/chemistry , Kinesins/classification , Models, Molecular , Molecular Sequence Data , Monomeric GTP-Binding Proteins/chemistry , Monomeric GTP-Binding Proteins/classification , Multigene Family/genetics , Myosins/chemistry , Myosins/classification , Phylogeny , Protein Conformation , Sequence Alignment , Signal Recognition Particle/chemistry
11.
Nucleic Acids Res ; 30(1): 13-6, 2002 Jan 01.
Article in English | MEDLINE | ID: mdl-11752242

ABSTRACT

In addition to maintaining the GenBank nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources that operate on the data in GenBank and a variety of other biological data made available through NCBI's web site. NCBI data retrieval resources include Entrez, PubMed, LocusLink and the Taxonomy Browser. Data analysis resources include BLAST, Electronic PCR, OrfFinder, RefSeq, UniGene, HomoloGene, Database of Single Nucleotide Polymorphisms (dbSNP), Human Genome Sequencing, Human MapViewer, Human inverted exclamation markVMouse Homology Map, Cancer Chromosome Aberration Project (CCAP), Entrez Genomes, Clusters of Orthologous Groups (COGs) database, Retroviral Genotyping Tools, SAGEmap, Gene Expression Omnibus (GEO), Online Mendelian Inheritance in Man (OMIM), the Molecular Modeling Database (MMDB) and the Conserved Domain Database (CDD). Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at http://www.ncbi.nlm.nih.gov.


Subject(s)
Biotechnology , Databases, Genetic , Amino Acid Sequence , Animals , Base Sequence , Chromosome Aberrations , Chromosomes , Conserved Sequence , Gene Expression Profiling , Genome , Genome, Human , Humans , Information Storage and Retrieval , National Library of Medicine (U.S.) , Polymorphism, Single Nucleotide , Protein Structure, Tertiary , RNA, Messenger/genetics , Sequence Homology , United States
SELECTION OF CITATIONS
SEARCH DETAIL
...