Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 11 de 11
Filter
Add more filters










Publication year range
1.
Nucleic Acids Res ; 48(D1): D835-D844, 2020 01 08.
Article in English | MEDLINE | ID: mdl-31777943

ABSTRACT

ClinVar is a freely available, public archive of human genetic variants and interpretations of their relationships to diseases and other conditions, maintained at the National Institutes of Health (NIH). Submitted interpretations of variants are aggregated and made available on the ClinVar website (https://www.ncbi.nlm.nih.gov/clinvar/), and as downloadable files via FTP and through programmatic tools such as NCBI's E-utilities. The default view on the ClinVar website, the Variation page, was recently redesigned. The new layout includes several new sections that make it easier to find submitted data as well as summary data such as all diseases and citations reported for the variant. The new design also better represents more complex data such as haplotypes and genotypes, as well as variants that are in ClinVar as part of a haplotype or genotype but have no interpretation for the single variant. ClinVar's variant-centric XML had its production release in April 2019. The ClinVar website and E-utilities both have been updated to support the VCV (variation in ClinVar) accession numbers found in the variant-centric XML file. ClinVar's search engine has been fine-tuned for improved retrieval of search results.


Subject(s)
Databases, Genetic , Disease/genetics , Genetic Variation/genetics , Genome, Human , Genomics , Haplotypes , Humans , Internet , National Library of Medicine (U.S.) , Search Engine , United States
2.
Nucleic Acids Res ; 46(D1): D1062-D1067, 2018 01 04.
Article in English | MEDLINE | ID: mdl-29165669

ABSTRACT

ClinVar (https://www.ncbi.nlm.nih.gov/clinvar/) is a freely available, public archive of human genetic variants and interpretations of their significance to disease, maintained at the National Institutes of Health. Interpretations of the clinical significance of variants are submitted by clinical testing laboratories, research laboratories, expert panels and other groups. ClinVar aggregates data by variant-disease pairs, and by variant (or set of variants). Data aggregated by variant are accessible on the website, in an improved set of variant call format files and as a new comprehensive XML report. ClinVar recently started accepting submissions that are focused primarily on providing phenotypic information for individuals who have had genetic testing. Submissions may come from clinical providers providing their own interpretation of the variant ('provider interpretation') or from groups such as patient registries that primarily provide phenotypic information from patients ('phenotyping only'). ClinVar continues to make improvements to its search and retrieval functions. Several new fields are now indexed for more precise searching, and filters allow the user to narrow down a large set of search results.


Subject(s)
Databases, Nucleic Acid , Disease/genetics , Genetic Variation , Humans , Phenotype
3.
Nucleic Acids Res ; 43(Database issue): D36-42, 2015 Jan.
Article in English | MEDLINE | ID: mdl-25355515

ABSTRACT

The National Center for Biotechnology Information's (NCBI) Gene database (www.ncbi.nlm.nih.gov/gene) integrates gene-specific information from multiple data sources. NCBI Reference Sequence (RefSeq) genomes for viruses, prokaryotes and eukaryotes are the primary foundation for Gene records in that they form the critical association between sequence and a tracked gene upon which additional functional and descriptive content is anchored. Additional content is integrated based on the genomic location and RefSeq transcript and protein sequence data. The content of a Gene record represents the integration of curation and automated processing from RefSeq, collaborating model organism databases, consortia such as Gene Ontology, and other databases within NCBI. Records in Gene are assigned unique, tracked integers as identifiers. The content (citations, nomenclature, genomic location, gene products and their attributes, phenotypes, sequences, interactions, variation details, maps, expression, homologs, protein domains and external databases) is available via interactive browsing through NCBI's Entrez system, via NCBI's Entrez programming utilities (E-Utilities and Entrez Direct) and for bulk transfer by FTP.


Subject(s)
Databases, Genetic , Genes , Genetic Variation , Genomics , Internet , National Library of Medicine (U.S.) , Phenotype , United States
4.
Nucleic Acids Res ; 42(Database issue): D756-63, 2014 Jan.
Article in English | MEDLINE | ID: mdl-24259432

ABSTRACT

The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database is a collection of annotated genomic, transcript and protein sequence records derived from data in public sequence archives and from computation, curation and collaboration (http://www.ncbi.nlm.nih.gov/refseq/). We report here on growth of the mammalian and human subsets, changes to NCBI's eukaryotic annotation pipeline and modifications affecting transcript and protein records. Recent changes to NCBI's eukaryotic genome annotation pipeline provide higher throughput, and the addition of RNAseq data to the pipeline results in a significant expansion of the number of transcripts and novel exons annotated on mammalian RefSeq genomes. Recent annotation changes include reporting supporting evidence for transcript records, modification of exon feature annotation and the addition of a structured report of gene and sequence attributes of biological interest. We also describe a revised protein annotation policy for alternatively spliced transcripts with more divergent predicted proteins and we summarize the current status of the RefSeqGene project.


Subject(s)
Databases, Genetic , Genomics , Mammals/genetics , Animals , Eukaryota/genetics , Exons , Genome , Genomics/standards , Humans , Internet , Molecular Sequence Annotation , Proteins/chemistry , Proteins/genetics , RNA/chemistry , Reference Standards
5.
Nucleic Acids Res ; 42(Database issue): D865-72, 2014 Jan.
Article in English | MEDLINE | ID: mdl-24217909

ABSTRACT

The Consensus Coding Sequence (CCDS) project (http://www.ncbi.nlm.nih.gov/CCDS/) is a collaborative effort to maintain a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assemblies by the National Center for Biotechnology Information (NCBI) and Ensembl genome annotation pipelines. Identical annotations that pass quality assurance tests are tracked with a stable identifier (CCDS ID). Members of the collaboration, who are from NCBI, the Wellcome Trust Sanger Institute and the University of California Santa Cruz, provide coordinated and continuous review of the dataset to ensure high-quality CCDS representations. We describe here the current status and recent growth in the CCDS dataset, as well as recent changes to the CCDS web and FTP sites. These changes include more explicit reporting about the NCBI and Ensembl annotation releases being compared, new search and display options, the addition of biologically descriptive information and our approach to representing genes for which support evidence is incomplete. We also present a summary of recent and future curation targets.


Subject(s)
Databases, Genetic , Proteins/genetics , Animals , Exons , Genomics , Humans , Internet , Mice , Molecular Sequence Annotation , Sequence Analysis
6.
Nucleic Acids Res ; 40(Database issue): D130-5, 2012 Jan.
Article in English | MEDLINE | ID: mdl-22121212

ABSTRACT

The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database is a collection of genomic, transcript and protein sequence records. These records are selected and curated from public sequence archives and represent a significant reduction in redundancy compared to the volume of data archived by the International Nucleotide Sequence Database Collaboration. The database includes over 16,00 organisms, 2.4 × 0(6) genomic records, 13 × 10(6) proteins and 2 × 10(6) RNA records spanning prokaryotes, eukaryotes and viruses (RefSeq release 49, September 2011). The RefSeq database is maintained by a combined approach of automated analyses, collaboration and manual curation to generate an up-to-date representation of the sequence, its features, names and cross-links to related sources of information. We report here on recent growth, the status of curating the human RefSeq data set, more extensive feature annotation and current policy for eukaryotic genome annotation via the NCBI annotation pipeline. More information about the resource is available online (see http://www.ncbi.nlm.nih.gov/RefSeq/).


Subject(s)
Databases, Genetic , Molecular Sequence Annotation , Sequence Analysis/standards , Genomics/standards , Humans , Reference Standards , Sequence Analysis, DNA/standards , Sequence Analysis, Protein/standards , Sequence Analysis, RNA/standards
7.
Genetics ; 172(3): 1915-26, 2006 Mar.
Article in English | MEDLINE | ID: mdl-16387885

ABSTRACT

Genetic association studies are rapidly becoming the experimental approach of choice to dissect complex traits, including tolerance to drought stress, which is the most common cause of mortality and yield losses in forest trees. Optimization of association mapping requires knowledge of the patterns of nucleotide diversity and linkage disequilibrium and the selection of suitable polymorphisms for genotyping. Moreover, standard neutrality tests applied to DNA sequence variation data can be used to select candidate genes or amino acid sites that are putatively under selection for association mapping. In this article, we study the pattern of polymorphism of 18 candidate genes for drought-stress response in Pinus taeda L., an important tree crop. Data analyses based on a set of 21 putatively neutral nuclear microsatellites did not show population genetic structure or genomewide departures from neutrality. Candidate genes had moderate average nucleotide diversity at silent sites (pi(sil) = 0.00853), varying 100-fold among single genes. The level of within-gene LD was low, with an average pairwise r2 of 0.30, decaying rapidly from approximately 0.50 to approximately 0.20 at 800 bp. No apparent LD among genes was found. A selective sweep may have occurred at the early-response-to-drought-3 (erd3) gene, although population expansion can also explain our results and evidence for selection was not conclusive. One other gene, ccoaomt-1, a methylating enzyme involved in lignification, showed dimorphism (i.e., two highly divergent haplotype lineages at equal frequency), which is commonly associated with the long-term action of balancing selection. Finally, a set of haplotype-tagging SNPs (htSNPs) was selected. Using htSNPs, a reduction of genotyping effort of approximately 30-40%, while sampling most common allelic variants, can be gained in our ongoing association studies for drought tolerance in pine.


Subject(s)
Dehydration/genetics , Genes, Plant , Genetic Variation , Pinus taeda/genetics , Polymorphism, Single Nucleotide , Stress, Physiological/genetics , Base Sequence , Droughts , Haplotypes , Linkage Disequilibrium , Molecular Sequence Data
8.
Proc Natl Acad Sci U S A ; 101(42): 15255-60, 2004 Oct 19.
Article in English | MEDLINE | ID: mdl-15477602

ABSTRACT

Outbreeding species with large, stable population sizes, such as widely distributed conifers, are expected to harbor relatively more DNA sequence polymorphism. Under the neutral theory of molecular evolution, the expected heterozygosity is a function of the product 4N(e)mu, where N(e) is the effective population size and mu is the per-generation mutation rate, and the genomic scale of linkage disequilibrium is determined by 4N(e)r, where r is the per-generation recombination rate between adjacent sites. These parameters were estimated in the long-lived, outcrossing gymnosperm loblolly pine (Pinus taeda L.) from a survey of single nucleotide polymorphisms across approximately 18 kb of DNA distributed among 19 loci from a common set of 32 haploid genomes. Estimates of 4N(e)mu at silent and nonsynonymous sites were 0.00658 and 0.00108, respectively, and both were statistically heterogeneous among loci. By Tajima's D statistic, the site frequency spectrum of no locus was observed to deviate from that predicted by neutral theory. Substantial recombination in the history of the sampled alleles was observed and linkage disequilibrium declined within several kilobases. The composite likelihood estimate of 4N(e)r based on all two-site sample configurations equaled 0.00175. When geological dating, an assumed generation time (25 years), and an estimated divergence from Pinus pinaster Ait. are used, the effective population size of loblolly pine should be 5.6 x 10(5). The emerging narrow range of estimated silent site heterozygosities (relative to the vast range of population sizes) for humans, Drosophila, maize, and pine parallels the paradox described earlier for allozyme polymorphism and challenges simple equilibrium models of molecular evolution.


Subject(s)
Pinus taeda/genetics , Animals , DNA, Plant/genetics , Evolution, Molecular , Genetic Variation , Genetics, Population , Heterozygote , Humans , Linkage Disequilibrium , Models, Genetic , Molecular Sequence Data , Polymorphism, Single Nucleotide , Recombination, Genetic , Species Specificity
9.
Genetics ; 168(1): 447-61, 2004 Sep.
Article in English | MEDLINE | ID: mdl-15454556

ABSTRACT

A comparative genetic map was constructed between two important genera of the family Pinaceae. Ten homologous linkage groups in loblolly pine (Pinus taeda L.) and Douglas fir (Pseudotsuga menziesii [Mirb.] Franco) were identified using orthologous expressed sequence tag polymorphism (ESTP) and restriction fragment length polymorphism (RFLP) markers. The comparative mapping revealed extensive synteny and colinearity between genomes of the Pinaceae, consistent with the hypothesis of conservative chromosomal evolution in this important plant family. This study reports the first comparative map in forest trees at the family taxonomic level and establishes a framework for comparative genomics in Pinaceae.


Subject(s)
Chromosome Mapping , Pinaceae/genetics , Polymorphism, Genetic , Base Sequence , Evolution, Molecular , Expressed Sequence Tags , Genetic Markers , Genomics/methods , Molecular Sequence Data , Polymorphism, Restriction Fragment Length , Sequence Analysis, DNA , Synteny/genetics
10.
Genetics ; 164(4): 1537-46, 2003 Aug.
Article in English | MEDLINE | ID: mdl-12930758

ABSTRACT

A long-term series of experiments to map QTL influencing wood property traits in loblolly pine has been completed. These experiments were designed to identify and subsequently verify QTL in multiple genetic backgrounds, environments, and growing seasons. Verification of QTL is necessary to substantiate a biological basis for observed marker-trait associations, to provide precise estimates of the magnitude of QTL effects, and to predict QTL expression at a given age or in a particular environment. Verification was based on the repeated detection of QTL among populations, as well as among multiple growing seasons for each population. Temporal stability of QTL was moderate, with approximately half being detected in multiple seasons. Fewer QTL were common to different populations, but the results are nonetheless encouraging for restricted applications of marker-assisted selection. QTL from larger populations accounted for less phenotypic variation than QTL detected in smaller populations, emphasizing the need for experiments employing much larger families. Additionally, 18 candidate genes related to lignin biosynthesis and cell wall structure were mapped genetically. Several candidate genes colocated with wood property QTL; however, these relationships must be verified in future experiments.


Subject(s)
Chromosome Mapping , Lignin/genetics , Pinus taeda/genetics , Quantitative Trait Loci , Wood , Crosses, Genetic , Genes, Plant , Genetic Linkage , Genetic Markers , Genetic Variation , Lignin/biosynthesis , Pinus taeda/growth & development , Seasons , Selection, Genetic
11.
Plant Biotechnol J ; 1(4): 253-8, 2003 Jul.
Article in English | MEDLINE | ID: mdl-17163902

ABSTRACT

Evidence for the molecular basis of a null allele of cinnamyl alcohol dehydrogenase (CAD) has been discovered in the loblolly pine (Pinus taeda L.) clone 7-56. The mutation is a two-base pair adenosine insertion located in exon 5 that causes a frame-shift which is predicted to result in premature termination of the protein. For routine detection of the mutation, a diagnostic assay was developed utilizing Template-directed Dye-terminator Incorporation and Fluorescence Polarization detection (FP-TDI). Loblolly pine is the most important commercial tree species in the USA, being harvested for pulp and solid wood products. Chemical pulping could be increased in efficiency by selecting for trees having a two-base pair adenosine insertion, by use of the rapid diagnostic assay developed in this study.

SELECTION OF CITATIONS
SEARCH DETAIL
...