Search | VHL Regional Portal

1.

GenBank.

Sayers, Eric W; Cavanaugh, Mark; Clark, Karen; Ostell, James; Pruitt, Kim D; Karsch-Mizrachi, Ilene.

Nucleic Acids Res ; 48(D1): D84-D86, 2020 01 08.

Article in English | MEDLINE | ID: mdl-31665464

ABSTRACT

GenBank® (www.ncbi.nlm.nih.gov/genbank/) is a comprehensive, public database that contains over 6.25 trillion base pairs from over 1.6 billion nucleotide sequences for 450 000 formally described species. Daily data exchange with the European Nucleotide Archive (ENA) and the DNA Data Bank of Japan (DDBJ) ensures worldwide coverage. Recent updates include a new version of Genome Workbench that supports GenBank submissions, new submission wizards for viral genomes, enhancements to BankIt and improved handling of taxonomy for sequences from pathogens.

Subject(s)

Computational Biology/methods , Databases, Nucleic Acid , Genomics/methods , Software , Molecular Sequence Annotation , National Institutes of Health (U.S.) , United States , Web Browser

2.

Database resources of the National Center for Biotechnology Information.

Sayers, Eric W; Beck, Jeff; Brister, J Rodney; Bolton, Evan E; Canese, Kathi; Comeau, Donald C; Funk, Kathryn; Ketter, Anne; Kim, Sunghwan; Kimchi, Avi; Kitts, Paul A; Kuznetsov, Anatoliy; Lathrop, Stacy; Lu, Zhiyong; McGarvey, Kelly; Madden, Thomas L; Murphy, Terence D; O'Leary, Nuala; Phan, Lon; Schneider, Valerie A; Thibaud-Nissen, Françoise; Trawick, Bart W; Pruitt, Kim D; Ostell, James.

Nucleic Acids Res ; 48(D1): D9-D16, 2020 01 08.

Article in English | MEDLINE | ID: mdl-31602479

ABSTRACT

The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database and the PubMed database of citations and abstracts published in life science journals. The Entrez system provides search and retrieval operations for most of these data from 35 distinct databases. The E-utilities serve as the programming interface for the Entrez system. Custom implementations of the BLAST program provide sequence-based searching of many specialized datasets. New resources released in the past year include a new PubMed interface, a sequence database search and a gene orthologs page. Additional resources that were updated in the past year include PMC, Bookshelf, My Bibliography, Assembly, RefSeq, viral genomes, the prokaryotic genome annotation pipeline, Genome Workbench, dbSNP, BLAST, Primer-BLAST, IgBLAST and PubChem. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.

Subject(s)

Computational Biology/methods , Computational Biology/organization & administration , Databases, Genetic , National Library of Medicine (U.S.) , Databases, Nucleic Acid , Genomics/methods , Humans , PubMed , United States , Web Browser

3.

Database resources of the National Center for Biotechnology Information.

Sayers, Eric W; Agarwala, Richa; Bolton, Evan E; Brister, J Rodney; Canese, Kathi; Clark, Karen; Connor, Ryan; Fiorini, Nicolas; Funk, Kathryn; Hefferon, Timothy; Holmes, J Bradley; Kim, Sunghwan; Kimchi, Avi; Kitts, Paul A; Lathrop, Stacy; Lu, Zhiyong; Madden, Thomas L; Marchler-Bauer, Aron; Phan, Lon; Schneider, Valerie A; Schoch, Conrad L; Pruitt, Kim D; Ostell, James.

Nucleic Acids Res ; 47(D1): D23-D28, 2019 01 08.

Article in English | MEDLINE | ID: mdl-30395293

ABSTRACT

The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database and the PubMed database of citations and abstracts published in life science journals. The Entrez system provides search and retrieval operations for most of these data from 38 distinct databases. The E-utilities serve as the programming interface for the Entrez system. Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized data sets. New resources released in the past year include PubMed Labs and a new sequence database search. Resources that were updated in the past year include PubMed, PMC, Bookshelf, genome data viewer, Assembly, prokaryotic genomes, Genome, BioProject, dbSNP, dbVar, BLAST databases, igBLAST, iCn3D and PubChem. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.

Subject(s)

Biotechnology/organization & administration , Databases, Genetic , Animals , Biotechnology/methods , Databases, Chemical , Humans , Software , United States/epidemiology , Web Browser

4.

GenBank.

Sayers, Eric W; Cavanaugh, Mark; Clark, Karen; Ostell, James; Pruitt, Kim D; Karsch-Mizrachi, Ilene.

Nucleic Acids Res ; 47(D1): D94-D99, 2019 01 08.

Article in English | MEDLINE | ID: mdl-30365038

ABSTRACT

GenBank® (www.ncbi.nlm.nih.gov/genbank/) is a comprehensive database that contains publicly available nucleotide sequences for 420 000 formally described species. Most GenBank submissions are made using BankIt, the NCBI Submission Portal, or the tool tbl2asn, and are obtained from individual laboratories and batch submissions from large-scale sequencing projects, including whole genome shotgun (WGS) and environmental sampling projects. Daily data exchange with the European Nucleotide Archive (ENA) and the DNA Data Bank of Japan (DDBJ) ensures worldwide coverage. GenBank is accessible through the NCBI Nucleotide database, which links to related information such as taxonomy, genomes, protein sequences and structures, and biomedical journal literature in PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. Recent updates include an expansion of sequence identifier formats to accommodate expected database growth, submission wizards for ribosomal RNA, and the transfer of Expressed Sequence Tag (EST) and Genome Survey Sequence (GSS) data into the Nucleotide database.

Subject(s)

Databases, Nucleic Acid , Web Browser , Computational Biology/methods , Databases, Nucleic Acid/trends , Genomics/methods , Humans , Information Storage and Retrieval , Software Design

5.

Best Match: New relevance search for PubMed.

Fiorini, Nicolas; Canese, Kathi; Starchenko, Grisha; Kireev, Evgeny; Kim, Won; Miller, Vadim; Osipov, Maxim; Kholodov, Michael; Ismagilov, Rafis; Mohan, Sunil; Ostell, James; Lu, Zhiyong.

PLoS Biol ; 16(8): e2005343, 2018 08.

Article in English | MEDLINE | ID: mdl-30153250

ABSTRACT

PubMed is a free search engine for biomedical literature accessed by millions of users from around the world each day. With the rapid growth of biomedical literature-about two articles are added every minute on average-finding and retrieving the most relevant papers for a given query is increasingly challenging. We present Best Match, a new relevance search algorithm for PubMed that leverages the intelligence of our users and cutting-edge machine-learning technology as an alternative to the traditional date sort order. The Best Match algorithm is trained with past user searches with dozens of relevance-ranking signals (factors), the most important being the past usage of an article, publication date, relevance score, and type of article. This new algorithm demonstrates state-of-the-art retrieval performance in benchmarking experiments as well as an improved user experience in real-world testing (over 20% increase in user click-through rate). Since its deployment in June 2017, we have observed a significant increase (60%) in PubMed searches with relevance sort order: it now assists millions of PubMed searches each week. In this work, we hope to increase the awareness and transparency of this new relevance sort option for PubMed users, enabling them to retrieve information more effectively.

Subject(s)

Data Mining/methods , Information Storage and Retrieval/methods , Algorithms , Humans , MEDLINE , Machine Learning , PubMed , Publications , Search Engine

6.

GenBank.

Benson, Dennis A; Cavanaugh, Mark; Clark, Karen; Karsch-Mizrachi, Ilene; Ostell, James; Pruitt, Kim D; Sayers, Eric W.

Nucleic Acids Res ; 46(D1): D41-D47, 2018 01 04.

Article in English | MEDLINE | ID: mdl-29140468

ABSTRACT

GenBank® (www.ncbi.nlm.nih.gov/genbank/) is a comprehensive database that contains publicly available nucleotide sequences for 400 000 formally described species. These sequences are obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects, including whole genome shotgun and environmental sampling projects. Most submissions are made using BankIt, the National Center for Biotechnology Information (NCBI) Submission Portal, or the tool tbl2asn. GenBank staff assign accession numbers upon data receipt. Daily data exchange with the European Nucleotide Archive and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through the NCBI Nucleotide database, which links to related information such as taxonomy, genomes, protein sequences and structures, and biomedical journal literature in PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. Recent updates include changes to sequence identifiers, submission wizards for 16S and Influenza sequences, and an Identical Protein Groups resource.

Subject(s)

Databases, Nucleic Acid , Animals , Computational Biology , Databases, Nucleic Acid/statistics & numerical data , Databases, Nucleic Acid/trends , Europe , Genomics , Humans , Information Dissemination , Information Storage and Retrieval , Internet , Japan , National Library of Medicine (U.S.) , Orthomyxoviridae/genetics , Proteomics , RNA, Ribosomal/genetics , Sequence Alignment , United States

7.

GenBank.

Benson, Dennis A; Cavanaugh, Mark; Clark, Karen; Karsch-Mizrachi, Ilene; Lipman, David J; Ostell, James; Sayers, Eric W.

Nucleic Acids Res ; 45(D1): D37-D42, 2017 01 04.

Article in English | MEDLINE | ID: mdl-27899564

ABSTRACT

GenBank® (www.ncbi.nlm.nih.gov/genbank/) is a comprehensive database that contains publicly available nucleotide sequences for 370 000 formally described species. These sequences are obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects, including whole genome shotgun (WGS) and environmental sampling projects. Most submissions are made using the web-based BankIt or the NCBI Submission Portal. GenBank staff assign accession numbers upon data receipt. Daily data exchange with the European Nucleotide Archive (ENA) and the DNA Data Bank of Japan (DDBJ) ensures worldwide coverage. GenBank is accessible through the NCBI Nucleotide database, which links to related information such as taxonomy, genomes, protein sequences and structures, and biomedical journal literature in PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. Recent updates include changes to policies regarding sequence identifiers, an improved 16S submission wizard, targeted loci studies, the ability to submit methylation and BioNano mapping files, and a database of anti-microbial resistance genes.

Subject(s)

Databases, Nucleic Acid , Sequence Analysis, DNA , Animals , DNA Methylation , Genome, Bacterial , Genomics , Humans , RNA, Ribosomal, 16S/genetics , beta-Lactamases/genetics

8.

NCBI prokaryotic genome annotation pipeline.

Tatusova, Tatiana; DiCuccio, Michael; Badretdin, Azat; Chetvernin, Vyacheslav; Nawrocki, Eric P; Zaslavsky, Leonid; Lomsadze, Alexandre; Pruitt, Kim D; Borodovsky, Mark; Ostell, James.

Nucleic Acids Res ; 44(14): 6614-24, 2016 08 19.

Article in English | MEDLINE | ID: mdl-27342282

ABSTRACT

Recent technological advances have opened unprecedented opportunities for large-scale sequencing and analysis of populations of pathogenic species in disease outbreaks, as well as for large-scale diversity studies aimed at expanding our knowledge across the whole domain of prokaryotes. To meet the challenge of timely interpretation of structure, function and meaning of this vast genetic information, a comprehensive approach to automatic genome annotation is critically needed. In collaboration with Georgia Tech, NCBI has developed a new approach to genome annotation that combines alignment based methods with methods of predicting protein-coding and RNA genes and other functional elements directly from sequence. A new gene finding tool, GeneMarkS+, uses the combined evidence of protein and RNA placement by homology as an initial map of annotation to generate and modify ab initio gene predictions across the whole genome. Thus, the new NCBI's Prokaryotic Genome Annotation Pipeline (PGAP) relies more on sequence similarity when confident comparative data are available, while it relies more on statistical predictions in the absence of external evidence. The pipeline provides a framework for generation and analysis of annotation on the full breadth of prokaryotic taxonomy. For additional information on PGAP see https://www.ncbi.nlm.nih.gov/genome/annotation_prok/ and the NCBI Handbook, https://www.ncbi.nlm.nih.gov/books/NBK174280/.

Subject(s)

Genome, Bacterial , Molecular Sequence Annotation , Prokaryotic Cells/metabolism , Bacteria/genetics , Bacterial Proteins/chemistry , Databases, Nucleic Acid , Genes, Bacterial

9.

GenBank.

Clark, Karen; Karsch-Mizrachi, Ilene; Lipman, David J; Ostell, James; Sayers, Eric W.

Nucleic Acids Res ; 44(D1): D67-72, 2016 Jan 04.

Article in English | MEDLINE | ID: mdl-26590407

ABSTRACT

GenBank(®) (www.ncbi.nlm.nih.gov/genbank/) is a comprehensive database that contains publicly available nucleotide sequences for over 340 000 formally described species. Recent developments include a new starting page for submitters, a shift toward using accession.version identifiers rather than GI numbers, a wizard for submitting 16S rRNA sequences, and an Identical Protein Report to address growing issues of data redundancy. GenBank organizes the sequence data received from individual laboratories and large-scale sequencing projects into 18 divisions, and GenBank staff assign unique accession.version identifiers upon data receipt. Most submitters use the web-based BankIt or standalone Sequin programs. Daily data exchange with the European Nucleotide Archive (ENA) and the DNA Data Bank of Japan (DDBJ) ensures worldwide coverage. GenBank is accessible through the nuccore, nucest, and nucgss databases of the Entrez retrieval system, which integrates these records with a variety of other data including taxonomy nodes, genomes, protein structures, and biomedical journal literature in PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP.

Subject(s)

Databases, Nucleic Acid , Sequence Analysis, DNA , Proteins/genetics , RNA, Ribosomal, 16S/genetics

10.

GenBank.

Benson, Dennis A; Clark, Karen; Karsch-Mizrachi, Ilene; Lipman, David J; Ostell, James; Sayers, Eric W.

Nucleic Acids Res ; 43(Database issue): D30-5, 2015 Jan.

Article in English | MEDLINE | ID: mdl-25414350

ABSTRACT

GenBank(®) (http://www.ncbi.nlm.nih.gov/genbank/) is a comprehensive database that contains publicly available nucleotide sequences for over 300 000 formally described species. These sequences are obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects, including whole-genome shotgun and environmental sampling projects. Most submissions are made using the web-based BankIt or standalone Sequin programs, and GenBank staff assign accession numbers upon data receipt. Daily data exchange with the European Nucleotide Archive and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through the NCBI Entrez retrieval system, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP.

Subject(s)

Databases, Nucleic Acid , Bacteria/classification , Genomics , Internet , Sequence Analysis, DNA , Sequence Analysis, Protein

11.

Toward richer metadata for microbial sequences: replacing strain-level NCBI taxonomy taxids with BioProject, BioSample and Assembly records.

Federhen, Scott; Clark, Karen; Barrett, Tanya; Parkinson, Helen; Ostell, James; Kodama, Yuichi; Mashima, Jun; Nakamura, Yasukazu; Cochrane, Guy; Karsch-Mizrachi, Ilene.

Stand Genomic Sci ; 9(3): 1275-7, 2014 Jun 15.

Article in English | MEDLINE | ID: mdl-25197497

ABSTRACT

Microbial genome sequence submissions to the International Nucleotide Sequence Database Collaboration (INSDC) have been annotated with organism names that include the strain identifier. Each of these strain-level names has been assigned a unique 'taxid' in the NCBI Taxonomy Database. With the significant growth in genome sequencing, it is not possible to continue with the curation of strain-level taxids. In January 2014, NCBI will cease assigning strain-level taxids. Instead, submitters are encouraged provide strain information and rich metadata with their submission to the sequence database, BioProject and BioSample.

12.

RefSeq: an update on mammalian reference sequences.

Pruitt, Kim D; Brown, Garth R; Hiatt, Susan M; Thibaud-Nissen, Françoise; Astashyn, Alexander; Ermolaeva, Olga; Farrell, Catherine M; Hart, Jennifer; Landrum, Melissa J; McGarvey, Kelly M; Murphy, Michael R; O'Leary, Nuala A; Pujar, Shashikant; Rajput, Bhanu; Rangwala, Sanjida H; Riddick, Lillian D; Shkeda, Andrei; Sun, Hanzhen; Tamez, Pamela; Tully, Raymond E; Wallin, Craig; Webb, David; Weber, Janet; Wu, Wendy; DiCuccio, Michael; Kitts, Paul; Maglott, Donna R; Murphy, Terence D; Ostell, James M.

Nucleic Acids Res ; 42(Database issue): D756-63, 2014 Jan.

Article in English | MEDLINE | ID: mdl-24259432

ABSTRACT

The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database is a collection of annotated genomic, transcript and protein sequence records derived from data in public sequence archives and from computation, curation and collaboration (http://www.ncbi.nlm.nih.gov/refseq/). We report here on growth of the mammalian and human subsets, changes to NCBI's eukaryotic annotation pipeline and modifications affecting transcript and protein records. Recent changes to NCBI's eukaryotic genome annotation pipeline provide higher throughput, and the addition of RNAseq data to the pipeline results in a significant expansion of the number of transcripts and novel exons annotated on mammalian RefSeq genomes. Recent annotation changes include reporting supporting evidence for transcript records, modification of exon feature annotation and the addition of a structured report of gene and sequence attributes of biological interest. We also describe a revised protein annotation policy for alternatively spliced transcripts with more divergent predicted proteins and we summarize the current status of the RefSeqGene project.

Subject(s)

Databases, Genetic , Genomics , Mammals/genetics , Animals , Eukaryota/genetics , Exons , Genome , Genomics/standards , Humans , Internet , Molecular Sequence Annotation , Proteins/chemistry , Proteins/genetics , RNA/chemistry , Reference Standards

13.

Current status and new features of the Consensus Coding Sequence database.

Farrell, Catherine M; O'Leary, Nuala A; Harte, Rachel A; Loveland, Jane E; Wilming, Laurens G; Wallin, Craig; Diekhans, Mark; Barrell, Daniel; Searle, Stephen M J; Aken, Bronwen; Hiatt, Susan M; Frankish, Adam; Suner, Marie-Marthe; Rajput, Bhanu; Steward, Charles A; Brown, Garth R; Bennett, Ruth; Murphy, Michael; Wu, Wendy; Kay, Mike P; Hart, Jennifer; Rajan, Jeena; Weber, Janet; Snow, Catherine; Riddick, Lillian D; Hunt, Toby; Webb, David; Thomas, Mark; Tamez, Pamela; Rangwala, Sanjida H; McGarvey, Kelly M; Pujar, Shashikant; Shkeda, Andrei; Mudge, Jonathan M; Gonzalez, Jose M; Gilbert, James G R; Trevanion, Stephen J; Baertsch, Robert; Harrow, Jennifer L; Hubbard, Tim; Ostell, James M; Haussler, David; Pruitt, Kim D.

Nucleic Acids Res ; 42(Database issue): D865-72, 2014 Jan.

Article in English | MEDLINE | ID: mdl-24217909

ABSTRACT

The Consensus Coding Sequence (CCDS) project (http://www.ncbi.nlm.nih.gov/CCDS/) is a collaborative effort to maintain a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assemblies by the National Center for Biotechnology Information (NCBI) and Ensembl genome annotation pipelines. Identical annotations that pass quality assurance tests are tracked with a stable identifier (CCDS ID). Members of the collaboration, who are from NCBI, the Wellcome Trust Sanger Institute and the University of California Santa Cruz, provide coordinated and continuous review of the dataset to ensure high-quality CCDS representations. We describe here the current status and recent growth in the CCDS dataset, as well as recent changes to the CCDS web and FTP sites. These changes include more explicit reporting about the NCBI and Ensembl annotation releases being compared, new search and display options, the addition of biologically descriptive information and our approach to representing genes for which support evidence is incomplete. We also present a summary of recent and future curation targets.

Subject(s)

Databases, Genetic , Proteins/genetics , Animals , Exons , Genomics , Humans , Internet , Mice , Molecular Sequence Annotation , Sequence Analysis

14.

GenBank.

Benson, Dennis A; Clark, Karen; Karsch-Mizrachi, Ilene; Lipman, David J; Ostell, James; Sayers, Eric W.

Nucleic Acids Res ; 42(Database issue): D32-7, 2014 Jan.

Article in English | MEDLINE | ID: mdl-24217914

ABSTRACT

GenBank is a comprehensive database that contains publicly available nucleotide sequences for over 280,000 formally described species. These sequences are obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects, including whole-genome shotgun and environmental sampling projects. Most submissions are made using the web-based BankIt or standalone Sequin programs, and GenBank staff assign accession numbers upon data receipt. Daily data exchange with the European Nucleotide Archive and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through the National Center for Biotechnology Information (NCBI) Entrez retrieval system, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI home page: www.ncbi.nlm.nih.gov.

Subject(s)

Databases, Nucleic Acid , Sequence Analysis, DNA , Bacteria/classification , Bacteria/genetics , High-Throughput Nucleotide Sequencing , Internet , Molecular Sequence Annotation

15.

IgBLAST: an immunoglobulin variable domain sequence analysis tool.

Ye, Jian; Ma, Ning; Madden, Thomas L; Ostell, James M.

Nucleic Acids Res ; 41(Web Server issue): W34-40, 2013 Jul.

Article in English | MEDLINE | ID: mdl-23671333

ABSTRACT

The variable domain of an immunoglobulin (IG) sequence is encoded by multiple genes, including the variable (V) gene, the diversity (D) gene and the joining (J) gene. Analysis of IG sequences typically requires identification of each gene, as well as a comparison of sequence variations in the context of defined regions. General purpose tools, such as the BLAST program, have only limited use for such tasks, as the rearranged nature of an IG sequence and the variable length of each gene requires multiple rounds of BLAST searches for a single IG sequence. Additionally, manual assembly of different genes is difficult and error-prone. To address these issues and to facilitate other common tasks in analysing IG sequences, we have developed the sequence analysis tool IgBLAST (http://www.ncbi.nlm.nih.gov/igblast/). With this tool, users can view the matches to the germline V, D and J genes, details at rearrangement junctions, the delineation of IG V domain framework regions and complementarity determining regions. IgBLAST has the capability to analyse nucleotide and protein sequences and can process sequences in batches. Furthermore, IgBLAST allows searches against the germline gene databases and other sequence databases simultaneously to minimize the chance of missing possibly the best matching germline V gene.

Subject(s)

Immunoglobulin Variable Region/genetics , Sequence Alignment/methods , Software , Humans , Immunoglobulin Variable Region/chemistry , Internet , Sequence Analysis, DNA , Sequence Analysis, Protein , V(D)J Recombination

16.

The NIH genetic testing registry: a new, centralized database of genetic tests to enable access to comprehensive information and improve transparency.

Rubinstein, Wendy S; Maglott, Donna R; Lee, Jennifer M; Kattman, Brandi L; Malheiro, Adriana J; Ovetsky, Michael; Hem, Vichet; Gorelenkov, Viatcheslav; Song, Guangfeng; Wallin, Craig; Husain, Nora; Chitipiralla, Shanmuga; Katz, Kenneth S; Hoffman, Douglas; Jang, Wonhee; Johnson, Mark; Karmanov, Fedor; Ukrainchik, Alexander; Denisenko, Mikhail; Fomous, Cathy; Hudson, Kathy; Ostell, James M.

Nucleic Acids Res ; 41(Database issue): D925-35, 2013 Jan.

Article in English | MEDLINE | ID: mdl-23193275

ABSTRACT

The National Institutes of Health Genetic Testing Registry (GTR; available online at http://www.ncbi.nlm.nih.gov/gtr/) maintains comprehensive information about testing offered worldwide for disorders with a genetic basis. Information is voluntarily submitted by test providers. The database provides details of each test (e.g. its purpose, target populations, methods, what it measures, analytical validity, clinical validity, clinical utility, ordering information) and laboratory (e.g. location, contact information, certifications and licenses). Each test is assigned a stable identifier of the format GTR000000000, which is versioned when the submitter updates information. Data submitted by test providers are integrated with basic information maintained in National Center for Biotechnology Information's databases and presented on the web and through FTP (ftp.ncbi.nih.gov/pub/GTR/_README.html).

Subject(s)

Databases, Genetic , Genetic Testing , Registries , Genes , Genetic Variation , Humans , Internet , Phenotype

17.

GenBank.

Benson, Dennis A; Cavanaugh, Mark; Clark, Karen; Karsch-Mizrachi, Ilene; Lipman, David J; Ostell, James; Sayers, Eric W.

Nucleic Acids Res ; 41(Database issue): D36-42, 2013 Jan.

Article in English | MEDLINE | ID: mdl-23193287

ABSTRACT

GenBank® (http://www.ncbi.nlm.nih.gov) is a comprehensive database that contains publicly available nucleotide sequences for almost 260 000 formally described species. These sequences are obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects, including whole-genome shotgun (WGS) and environmental sampling projects. Most submissions are made using the web-based BankIt or standalone Sequin programs, and GenBank staff assigns accession numbers upon data receipt. Daily data exchange with the European Nucleotide Archive (ENA) and the DNA Data Bank of Japan (DDBJ) ensures worldwide coverage. GenBank is accessible through the NCBI Entrez retrieval system, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI home page: www.ncbi.nlm.nih.gov.

Subject(s)

Base Sequence , Databases, Nucleic Acid , Genomics , High-Throughput Nucleotide Sequencing , Internet , Molecular Sequence Annotation , Sequence Analysis, DNA

18.

BioProject and BioSample databases at NCBI: facilitating capture and organization of metadata.

Barrett, Tanya; Clark, Karen; Gevorgyan, Robert; Gorelenkov, Vyacheslav; Gribov, Eugene; Karsch-Mizrachi, Ilene; Kimelman, Michael; Pruitt, Kim D; Resenchuk, Sergei; Tatusova, Tatiana; Yaschenko, Eugene; Ostell, James.

Nucleic Acids Res ; 40(Database issue): D57-63, 2012 Jan.

Article in English | MEDLINE | ID: mdl-22139929

ABSTRACT

As the volume and complexity of data sets archived at NCBI grow rapidly, so does the need to gather and organize the associated metadata. Although metadata has been collected for some archival databases, previously, there was no centralized approach at NCBI for collecting this information and using it across databases. The BioProject database was recently established to facilitate organization and classification of project data submitted to NCBI, EBI and DDBJ databases. It captures descriptive information about research projects that result in high volume submissions to archival databases, ties together related data across multiple archives and serves as a central portal by which to inform users of data availability. Concomitantly, the BioSample database is being developed to capture descriptive information about the biological samples investigated in projects. BioProject and BioSample records link to corresponding data stored in archival repositories. Submissions are supported by a web-based Submission Portal that guides users through a series of forms for input of rich metadata describing their projects and samples. Together, these databases offer improved ways for users to query, locate, integrate and interpret the masses of data held in NCBI's archival repositories. The BioProject and BioSample databases are available at http://www.ncbi.nlm.nih.gov/bioproject and http://www.ncbi.nlm.nih.gov/biosample, respectively.

Subject(s)

Databases, Genetic , Genomics , Internet , Systems Integration , Transcriptome , User-Computer Interface

19.

Database resources of the National Center for Biotechnology Information.

Sayers, Eric W; Barrett, Tanya; Benson, Dennis A; Bolton, Evan; Bryant, Stephen H; Canese, Kathi; Chetvernin, Vyacheslav; Church, Deanna M; Dicuccio, Michael; Federhen, Scott; Feolo, Michael; Fingerman, Ian M; Geer, Lewis Y; Helmberg, Wolfgang; Kapustin, Yuri; Krasnov, Sergey; Landsman, David; Lipman, David J; Lu, Zhiyong; Madden, Thomas L; Madej, Tom; Maglott, Donna R; Marchler-Bauer, Aron; Miller, Vadim; Karsch-Mizrachi, Ilene; Ostell, James; Panchenko, Anna; Phan, Lon; Pruitt, Kim D; Schuler, Gregory D; Sequeira, Edwin; Sherry, Stephen T; Shumway, Martin; Sirotkin, Karl; Slotta, Douglas; Souvorov, Alexandre; Starchenko, Grigory; Tatusova, Tatiana A; Wagner, Lukas; Wang, Yanli; Wilbur, W John; Yaschenko, Eugene; Ye, Jian.

Nucleic Acids Res ; 40(Database issue): D13-25, 2012 Jan.

Article in English | MEDLINE | ID: mdl-22140104

ABSTRACT

In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI Website. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central (PMC), Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Primer-BLAST, COBALT, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, Genome and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, BioProject, BioSample, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Probe, Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), Biosystems, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.

Subject(s)

Databases as Topic , Databases, Genetic , Databases, Protein , Gene Expression , Genomics , Internet , Models, Molecular , National Library of Medicine (U.S.) , Periodicals as Topic , PubMed , Sequence Alignment , Sequence Analysis, DNA , Sequence Analysis, Protein , Sequence Analysis, RNA , Small Molecule Libraries , United States

20.

GenBank.

Benson, Dennis A; Karsch-Mizrachi, Ilene; Clark, Karen; Lipman, David J; Ostell, James; Sayers, Eric W.

Nucleic Acids Res ; 40(Database issue): D48-53, 2012 Jan.

Article in English | MEDLINE | ID: mdl-22144687

ABSTRACT

GenBank® is a comprehensive database that contains publicly available nucleotide sequences for more than 250,00 formally described species. These sequences are obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects, including whole-genome shotgun (WGS) and environmental sampling projects. Most submissions are made using the web-based BankIt or standalone Sequin programs, and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the European Nucleotide Archive (ENA) and the DNA Data Bank of Japan (DDBJ) ensures worldwide coverage. GenBank is accessible through the NCBI Entrez retrieval system, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI home page: www.ncbi.nlm.nih.gov.

Subject(s)

Databases, Nucleic Acid , Sequence Analysis, DNA , Genomics , High-Throughput Nucleotide Sequencing , Internet , Molecular Sequence Annotation , Sequence Analysis, RNA , User-Computer Interface

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL