Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 11 de 11
Filter
Add more filters










Publication year range
1.
Sci Data ; 11(1): 732, 2024 Jul 05.
Article in English | MEDLINE | ID: mdl-38969627

ABSTRACT

To explore complex biological questions, it is often necessary to access various data types from public data repositories. As the volume and complexity of biological sequence data grow, public repositories face significant challenges in ensuring that the data is easily discoverable and usable by the biological research community. To address these challenges, the National Center for Biotechnology Information (NCBI) has created NCBI Datasets. This resource provides straightforward, comprehensive, and scalable access to biological sequences, annotations, and metadata for a wide range of taxa. Following the FAIR (Findable, Accessible, Interoperable, and Reusable) data management principles, NCBI Datasets offers user-friendly web interfaces, command-line tools, and documented APIs, empowering researchers to access NCBI data seamlessly. The data is delivered as packages of sequences and metadata, thus facilitating improved data retrieval, sharing, and usability in research. Moreover, this data delivery method fosters effective data attribution and promotes its further reuse. This paper outlines the current scope of data accessible through NCBI Datasets and explains various options for exploring and downloading the data.


Subject(s)
Metadata , Databases, Genetic , United States , Information Storage and Retrieval
2.
Nucleic Acids Res ; 46(D1): D221-D228, 2018 01 04.
Article in English | MEDLINE | ID: mdl-29126148

ABSTRACT

The Consensus Coding Sequence (CCDS) project provides a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assembly in genome annotations produced independently by NCBI and the Ensembl group at EMBL-EBI. This dataset is the product of an international collaboration that includes NCBI, Ensembl, HUGO Gene Nomenclature Committee, Mouse Genome Informatics and University of California, Santa Cruz. Identically annotated coding regions, which are generated using an automated pipeline and pass multiple quality assurance checks, are assigned a stable and tracked identifier (CCDS ID). Additionally, coordinated manual review by expert curators from the CCDS collaboration helps in maintaining the integrity and high quality of the dataset. The CCDS data are available through an interactive web page (https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi) and an FTP site (ftp://ftp.ncbi.nlm.nih.gov/pub/CCDS/). In this paper, we outline the ongoing work, growth and stability of the CCDS dataset and provide updates on new collaboration members and new features added to the CCDS user interface. We also present expert curation scenarios, with specific examples highlighting the importance of an accurate reference genome assembly and the crucial role played by input from the research community.


Subject(s)
Consensus Sequence , Databases, Genetic , Open Reading Frames , Animals , Data Curation/methods , Data Curation/standards , Databases, Genetic/standards , Guidelines as Topic , Humans , Mice , Molecular Sequence Annotation , National Library of Medicine (U.S.) , United States , User-Computer Interface
3.
Nucleic Acids Res ; 44(D1): D733-45, 2016 Jan 04.
Article in English | MEDLINE | ID: mdl-26553804

ABSTRACT

The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55,000 organisms (>4800 viruses, >40,000 prokaryotes and >10,000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management.


Subject(s)
Databases, Genetic , Genomics , Animals , Cattle , Gene Expression Profiling , Genome, Fungal , Genome, Human , Genome, Microbial , Genome, Plant , Genome, Viral , Genomics/standards , Humans , Invertebrates/genetics , Mice , Molecular Sequence Annotation , Nematoda/genetics , Phylogeny , RNA, Long Noncoding/genetics , Rats , Reference Standards , Sequence Analysis, Protein , Sequence Analysis, RNA , Vertebrates/genetics
4.
Mamm Genome ; 26(9-10): 379-90, 2015 Oct.
Article in English | MEDLINE | ID: mdl-26215545

ABSTRACT

Complete and accurate annotation of the mouse genome is critical to the advancement of research conducted on this important model organism. The National Center for Biotechnology Information (NCBI) develops and maintains many useful resources to assist the mouse research community. In particular, the reference sequence (RefSeq) database provides high-quality annotation of multiple mouse genome assemblies using a combinatorial approach that leverages computation, manual curation, and collaboration. Implementation of this conservative and rigorous approach, which focuses on representation of only full-length and non-redundant data, produces high-quality annotation products. RefSeq records explicitly link sequences to current knowledge in a timely manner, updating public records regularly and rapidly in response to nomenclature updates, addition of new relevant publications, collaborator discussion, and user feedback. Whole genome re-annotation is also conducted at least every 12-18 months, and often more frequently in response to assembly updates or availability of informative data. This article highlights key features and advantages of RefSeq genome annotation products and presents an overview of NCBI processes to generate these data. Further discussion of NCBI's resources highlights useful features and the best methods for accessing our data.


Subject(s)
Amino Acid Sequence/genetics , Databases, Genetic , Databases, Nucleic Acid , Genome , Animals , Internet , Mice
5.
Nucleic Acids Res ; 42(Database issue): D756-63, 2014 Jan.
Article in English | MEDLINE | ID: mdl-24259432

ABSTRACT

The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database is a collection of annotated genomic, transcript and protein sequence records derived from data in public sequence archives and from computation, curation and collaboration (http://www.ncbi.nlm.nih.gov/refseq/). We report here on growth of the mammalian and human subsets, changes to NCBI's eukaryotic annotation pipeline and modifications affecting transcript and protein records. Recent changes to NCBI's eukaryotic genome annotation pipeline provide higher throughput, and the addition of RNAseq data to the pipeline results in a significant expansion of the number of transcripts and novel exons annotated on mammalian RefSeq genomes. Recent annotation changes include reporting supporting evidence for transcript records, modification of exon feature annotation and the addition of a structured report of gene and sequence attributes of biological interest. We also describe a revised protein annotation policy for alternatively spliced transcripts with more divergent predicted proteins and we summarize the current status of the RefSeqGene project.


Subject(s)
Databases, Genetic , Genomics , Mammals/genetics , Animals , Eukaryota/genetics , Exons , Genome , Genomics/standards , Humans , Internet , Molecular Sequence Annotation , Proteins/chemistry , Proteins/genetics , RNA/chemistry , Reference Standards
6.
Nucleic Acids Res ; 42(Database issue): D865-72, 2014 Jan.
Article in English | MEDLINE | ID: mdl-24217909

ABSTRACT

The Consensus Coding Sequence (CCDS) project (http://www.ncbi.nlm.nih.gov/CCDS/) is a collaborative effort to maintain a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assemblies by the National Center for Biotechnology Information (NCBI) and Ensembl genome annotation pipelines. Identical annotations that pass quality assurance tests are tracked with a stable identifier (CCDS ID). Members of the collaboration, who are from NCBI, the Wellcome Trust Sanger Institute and the University of California Santa Cruz, provide coordinated and continuous review of the dataset to ensure high-quality CCDS representations. We describe here the current status and recent growth in the CCDS dataset, as well as recent changes to the CCDS web and FTP sites. These changes include more explicit reporting about the NCBI and Ensembl annotation releases being compared, new search and display options, the addition of biologically descriptive information and our approach to representing genes for which support evidence is incomplete. We also present a summary of recent and future curation targets.


Subject(s)
Databases, Genetic , Proteins/genetics , Animals , Exons , Genomics , Humans , Internet , Mice , Molecular Sequence Annotation , Sequence Analysis
7.
Dev Comp Immunol ; 33(7): 806-10, 2009 Jul.
Article in English | MEDLINE | ID: mdl-19428481

ABSTRACT

Injection of non-specific dsRNA initiates a broad-spectrum innate antiviral immune response in the Pacific white shrimp, Litopenaeus vannamei, however, the receptor involved in recognition of this by-product of viral infections remains unknown. In vertebrates, dsRNA sensing is mediated by a class of Toll-like receptors (TLRs) and results in activation of the interferon system. Because a TLR (lToll) was recently characterized in L. vannamei, we investigated its potential role in dsRNA recognition. We showed that injection of non-specific RNA duplexes did not modify lToll gene expression. A reverse genetic approach was therefore implemented to study its role in vivo. Silencing of lToll did not impair the ability of non-specific dsRNA to trigger protection from white spot syndrome virus and did not increase the shrimp susceptibility to viral infection, when compared to controls. In contrast, gene-specific dsRNA injected to specifically silence lToll expression activated an antiviral response. These data strongly suggest that shrimp lToll plays no role in dsRNA-induced antiviral immunity.


Subject(s)
Penaeidae/immunology , Penaeidae/virology , RNA, Double-Stranded/immunology , Toll-Like Receptors/immunology , White spot syndrome virus 1/immunology , Animals , Immunity, Innate , Penaeidae/genetics , RNA, Double-Stranded/metabolism , Toll-Like Receptors/genetics , Toll-Like Receptors/metabolism
8.
Dev Comp Immunol ; 33(5): 668-73, 2009 May.
Article in English | MEDLINE | ID: mdl-19100764

ABSTRACT

Crustin antimicrobial peptides, identified in crustaceans, are hypothesized to have both antimicrobial and protease inhibitor activity based on their primary structure and in vitro assays. In this study, a reverse genetic approach was utilized to test the hypothesis that crustins are antimicrobial in vivo in response to bacterial and fungal challenge. Injection of double-stranded RNA specific to a 120-bp region of LvABP1, one of the most prominent crustin isoforms, yielded a significant reduction in the expression of both crustin mRNA and protein within the hemocytes. To test the role of crustins in the shrimp immune response, RNAi was first used to suppress crustin expression and animals were subsequently injected with low pathogenic doses of either Vibrio penaeicida or Fusarium oxysporum. A significant increase in mortality in crustin-depleted animals was observed in animals infected with V. penaeicida as compared to controls, whereas no significant change in shrimp mortality was observed following infection with F. oxysporum.


Subject(s)
Antimicrobial Cationic Peptides/immunology , Penaeidae/immunology , Penaeidae/microbiology , Animals , Antimicrobial Cationic Peptides/antagonists & inhibitors , Antimicrobial Cationic Peptides/metabolism , Fusarium/immunology , Gene Knockdown Techniques , Penaeidae/metabolism , RNA Interference/immunology , Vibrio/immunology
9.
Mol Immunol ; 45(7): 1916-25, 2008 Apr.
Article in English | MEDLINE | ID: mdl-18078996

ABSTRACT

Antimicrobial peptides are an essential component of the innate immune system of most organisms. Expressed sequence tag analysis from various shrimp (Litopenaeus vannamei) tissues revealed transcripts corresponding to two distinct sequences (LvALF1 and LvALF2) with strong sequence similarity to anti-lipopolysaccharide factor (ALF), an antimicrobial peptide originally isolated from the horseshoe crab Limulus polyphemus. Full-length clones contained a 528bp transcript with a predicted open reading frame coding for 120 amino acids in LvALF1, and a 623bp transcript with a predicted open reading frame coding for 93 amino acids in LvALF2. A reverse genetic approach was implemented to study the in vivo role of LvALF1 in protecting shrimp from bacterial, fungal and viral infections. Injection of double-stranded RNA (dsRNA) corresponding to the LvALF1 message resulted in a significant reduction of LvALF1 mRNA transcript abundance as determined by qPCR. Following knockdown, shrimp were challenged with low pathogenic doses of Vibrio penaeicida, Fusarium oxysporum or white spot syndrome virus (WSSV) and the resulting mortality curves were compared with controls. A significant increase of mortality in the LvALF1 knockdown shrimp was observed in the V. penaeicida and F. oxysporum infections when compared to controls, showing that this gene has a role in protecting shrimp from both bacterial and fungal infections. In contrast, LvALF1 dsRNA activated the sequence-independent innate anti-viral immune response giving increased protection from WSSV infection.


Subject(s)
Antimicrobial Cationic Peptides/immunology , Bacterial Infections/veterinary , Immunity/immunology , Invertebrate Hormones/immunology , Mycoses/veterinary , Penaeidae/immunology , Amino Acid Sequence , Animals , Bacterial Infections/immunology , Biological Assay , Gene Expression Profiling , Gene Silencing/drug effects , Immunity/drug effects , Invertebrate Hormones/chemistry , Invertebrate Hormones/genetics , Invertebrate Hormones/metabolism , Molecular Sequence Data , Mycoses/immunology , Penaeidae/drug effects , Penaeidae/microbiology , Penaeidae/virology , Phylogeny , RNA, Double-Stranded/administration & dosage , RNA, Double-Stranded/pharmacology , RNA, Messenger/genetics , RNA, Messenger/metabolism , Sequence Homology, Amino Acid , Survival Rate , White spot syndrome virus 1/drug effects
10.
Gene ; 371(1): 75-83, 2006 Apr 12.
Article in English | MEDLINE | ID: mdl-16488092

ABSTRACT

Penaeidins are a family of shrimp antimicrobial peptides that have a unique molecular structure consisting of a highly conserved leader peptide followed by an N-terminal proline-rich domain and a C-terminal cysteine-rich domain. Three distinct classes of penaeidins, named PEN2, PEN3, and PEN4, are expressed in the hemocytes of the Pacific white shrimp, Litopenaeus vannamei. Multiple isoforms, generated by substitutions and deletions within the proline and cysteine-rich domains, have been reported at the mRNA level for all three classes of penaeidins suggesting that this is a highly diverse gene family; however, the genetic mechanisms by which sequence variability in the penaeidin gene family is generated are unknown. The present study examines the genomic sources for both class and isoform diversity in the penaeidin family. We show that each penaeidin class is encoded by a unique gene and that isoform diversity is generated by polymorphism within each penaeidin gene locus. Furthermore, the genomic regions upstream of each penaeidin gene were partially characterized and found to drive transcription.


Subject(s)
Gene Expression Regulation/genetics , Genome/genetics , Penaeidae/genetics , Peptides/genetics , Quantitative Trait Loci/genetics , Animals , Anti-Infective Agents/metabolism , Hemocytes/metabolism , Mutation , Penaeidae/metabolism , Peptides/metabolism , Polymorphism, Genetic , Protein Isoforms/genetics , Protein Isoforms/metabolism , Protein Sorting Signals/genetics , Protein Structure, Tertiary/genetics , Transcription, Genetic/genetics
11.
Integr Comp Biol ; 46(6): 931-9, 2006 Dec.
Article in English | MEDLINE | ID: mdl-21672797

ABSTRACT

Multiple small-scale transcriptome studies have been undertaken for various members of the Penaeidae. Penaeid shrimp are important both as members of diverse ecosystems around the world and for their importance as commercial commodities. Of the many shrimps, the most important from this family is the Pacific whiteleg shrimp, Litopenaeus vannamei, as it is the primary shrimp used in worldwide aquaculture. The sequencing and analysis of 13 656 expressed sequence tags (ESTs) from this species is presented. ESTs were derived from multiple tissue-specific cDNA libraries with an emphasis being placed on those tissues with predicted immune function. Assembly of the sequences into non-overlapping clusters yielded 7466 putative unigenes (1981 contigs and 5485 singletons). Multiple approaches were taken to assign putative function to each transcript; sequence homology searches using BLASTX (Basic Local Alignment Search Tool: Translated query versus protein database) of the National Center for Biotechnology Information's (NCBI) GenBank Database and Gene Ontology annotation, and still a significant portion of the shrimp ESTs (62%) had no homology with known proteins in the public databases. The sequence and complete annotation of all ESTs is available at www.marinegenomics.org, a publicly accessible database. In addition to providing the basic resources for microarray construction, transcript profiling, and novel gene discovery, this study constitutes the largest combined analysis of ESTs from any shrimp species and is a prelude to an even larger effort aimed at identifying and depleting highly redundant genes from shrimp cDNA libraries toward the goal of sequencing 100 000 shrimp ESTs.

SELECTION OF CITATIONS
SEARCH DETAIL
...