Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 11 de 11
Filter
Add more filters










Publication year range
1.
Nucleic Acids Res ; 48(D1): D835-D844, 2020 01 08.
Article in English | MEDLINE | ID: mdl-31777943

ABSTRACT

ClinVar is a freely available, public archive of human genetic variants and interpretations of their relationships to diseases and other conditions, maintained at the National Institutes of Health (NIH). Submitted interpretations of variants are aggregated and made available on the ClinVar website (https://www.ncbi.nlm.nih.gov/clinvar/), and as downloadable files via FTP and through programmatic tools such as NCBI's E-utilities. The default view on the ClinVar website, the Variation page, was recently redesigned. The new layout includes several new sections that make it easier to find submitted data as well as summary data such as all diseases and citations reported for the variant. The new design also better represents more complex data such as haplotypes and genotypes, as well as variants that are in ClinVar as part of a haplotype or genotype but have no interpretation for the single variant. ClinVar's variant-centric XML had its production release in April 2019. The ClinVar website and E-utilities both have been updated to support the VCV (variation in ClinVar) accession numbers found in the variant-centric XML file. ClinVar's search engine has been fine-tuned for improved retrieval of search results.


Subject(s)
Databases, Genetic , Disease/genetics , Genetic Variation/genetics , Genome, Human , Genomics , Haplotypes , Humans , Internet , National Library of Medicine (U.S.) , Search Engine , United States
2.
Bioinformatics ; 34(1): 80-87, 2018 01 01.
Article in English | MEDLINE | ID: mdl-28968638

ABSTRACT

Motivation: Despite significant efforts in expert curation, clinical relevance about most of the 154 million dbSNP reference variants (RS) remains unknown. However, a wealth of knowledge about the variant biological function/disease impact is buried in unstructured literature data. Previous studies have attempted to harvest and unlock such information with text-mining techniques but are of limited use because their mutation extraction results are not standardized or integrated with curated data. Results: We propose an automatic method to extract and normalize variant mentions to unique identifiers (dbSNP RSIDs). Our method, in benchmarking results, demonstrates a high F-measure of ∼90% and compared favorably to the state of the art. Next, we applied our approach to the entire PubMed and validated the results by verifying that each extracted variant-gene pair matched the dbSNP annotation based on mapped genomic position, and by analyzing variants curated in ClinVar. We then determined which text-mined variants and genes constituted novel discoveries. Our analysis reveals 41 889 RS numbers (associated with 9151 genes) not found in ClinVar. Moreover, we obtained a rich set worth further review: 12 462 rare variants (MAF ≤ 0.01) in 3849 genes which are presumed to be deleterious and not frequently found in the general population. To our knowledge, this is the first large-scale study to analyze and integrate text-mined variant data with curated knowledge in existing databases. Our results suggest that databases can be significantly enriched by text mining and that the combined information can greatly assist human efforts in evaluating/prioritizing variants in genomic research. Availability and implementation: The tmVar 2.0 source code and corpus are freely available at https://www.ncbi.nlm.nih.gov/research/bionlp/Tools/tmvar/. Contact: zhiyong.lu@nih.gov.


Subject(s)
Data Mining/methods , Mutation , Polymorphism, Genetic , Precision Medicine/methods , Software , Data Curation , Databases, Factual , Genetic Predisposition to Disease , Genomics/methods , Humans , Phenotype , PubMed , Publications
3.
Genome Res ; 19(10): 1722-31, 2009 Oct.
Article in English | MEDLINE | ID: mdl-19717792

ABSTRACT

While most Ascomycetes tend to associate principally with plants, the dimorphic fungi Coccidioides immitis and Coccidioides posadasii are primary pathogens of immunocompetent mammals, including humans. Infection results from environmental exposure to Coccidiodies, which is believed to grow as a soil saprophyte in arid deserts. To investigate hypotheses about the life history and evolution of Coccidioides, the genomes of several Onygenales, including C. immitis and C. posadasii; a close, nonpathogenic relative, Uncinocarpus reesii; and a more diverged pathogenic fungus, Histoplasma capsulatum, were sequenced and compared with those of 13 more distantly related Ascomycetes. This analysis identified increases and decreases in gene family size associated with a host/substrate shift from plants to animals in the Onygenales. In addition, comparison among Onygenales genomes revealed evolutionary changes in Coccidioides that may underlie its infectious phenotype, the identification of which may facilitate improved treatment and prevention of coccidioidomycosis. Overall, the results suggest that Coccidioides species are not soil saprophytes, but that they have evolved to remain associated with their dead animal hosts in soil, and that Coccidioides metabolism genes, membrane-related proteins, and putatively antigenic compounds have evolved in response to interaction with an animal host.


Subject(s)
Coccidioides/genetics , Genome, Fungal , Mitosporic Fungi/genetics , Animals , Genetic Speciation , Genomics/methods , Histoplasma/genetics , Humans , Molecular Sequence Data , Onygenales/genetics , Phylogeny , Selection, Genetic , Sequence Analysis, DNA , Synteny
4.
PLoS Genet ; 4(4): e1000046, 2008 Apr 11.
Article in English | MEDLINE | ID: mdl-18404212

ABSTRACT

We present the genome sequences of a new clinical isolate of the important human pathogen, Aspergillus fumigatus, A1163, and two closely related but rarely pathogenic species, Neosartorya fischeri NRRL181 and Aspergillus clavatus NRRL1. Comparative genomic analysis of A1163 with the recently sequenced A. fumigatus isolate Af293 has identified core, variable and up to 2% unique genes in each genome. While the core genes are 99.8% identical at the nucleotide level, identity for variable genes can be as low 40%. The most divergent loci appear to contain heterokaryon incompatibility (het) genes associated with fungal programmed cell death such as developmental regulator rosA. Cross-species comparison has revealed that 8.5%, 13.5% and 12.6%, respectively, of A. fumigatus, N. fischeri and A. clavatus genes are species-specific. These genes are significantly smaller in size than core genes, contain fewer exons and exhibit a subtelomeric bias. Most of them cluster together in 13 chromosomal islands, which are enriched for pseudogenes, transposons and other repetitive elements. At least 20% of A. fumigatus-specific genes appear to be functional and involved in carbohydrate and chitin catabolism, transport, detoxification, secondary metabolism and other functions that may facilitate the adaptation to heterogeneous environments such as soil or a mammalian host. Contrary to what was suggested previously, their origin cannot be attributed to horizontal gene transfer (HGT), but instead is likely to involve duplication, diversification and differential gene loss (DDL). The role of duplication in the origin of lineage-specific genes is further underlined by the discovery of genomic islands that seem to function as designated "gene dumps" and, perhaps, simultaneously, as "gene factories".


Subject(s)
Aspergillus fumigatus/genetics , Genomic Islands , Allergens/genetics , Aspergillus/classification , Aspergillus/genetics , Aspergillus/physiology , Aspergillus fumigatus/classification , Aspergillus fumigatus/pathogenicity , Aspergillus fumigatus/physiology , Chromosomes, Fungal/genetics , Eurotiales/classification , Eurotiales/genetics , Eurotiales/physiology , Evolution, Molecular , Fungal Proteins/genetics , Fungal Proteins/immunology , Genome, Fungal , Humans , Phylogeny , Species Specificity , Virulence/genetics
5.
Plant Cell ; 18(6): 1348-59, 2006 Jun.
Article in English | MEDLINE | ID: mdl-16632643

ABSTRACT

We sequenced 2.2 Mb representing triplicated genome segments of Brassica oleracea, which are each paralogous with one another and homologous with a segmentally duplicated region of the Arabidopsis thaliana genome. Sequence annotation identified 177 conserved collinear genes in the B. oleracea genome segments. Analysis of synonymous base substitution rates indicated that the triplicated Brassica genome segments diverged from a common ancestor soon after divergence of the Arabidopsis and Brassica lineages. This conclusion was corroborated by phylogenetic analysis of protein families. Using A. thaliana as an outgroup, 35% of the genes inferred to be present when genome triplication occurred in the Brassica lineage have been lost, most likely via a deletion mechanism, in an interspersed pattern. Genes encoding proteins involved in signal transduction or transcription were not found to be significantly more extensively retained than those encoding proteins classified with other functions, but putative proteins predicted in the A. thaliana genome were underrepresented in B. oleracea. We identified one example of gene loss from the Arabidopsis lineage. We found evidence for the frequent insertion of gene fragments of nuclear genomic origin and identified four apparently intact genes in noncollinear positions in the B. oleracea and A. thaliana genomes.


Subject(s)
Arabidopsis/genetics , Brassica/genetics , Genes, Plant/genetics , Genomics , Polyploidy , Conserved Sequence/genetics , Contig Mapping , DNA Transposable Elements/genetics , Gene Deletion , Gene Duplication , Genome, Plant , Oligonucleotide Array Sequence Analysis , Phylogeny , Sequence Alignment
6.
Genome Res ; 15(9): 1284-91, 2005 Sep.
Article in English | MEDLINE | ID: mdl-16109971

ABSTRACT

Rice (Oryza sativa L.) chromosome 3 is evolutionarily conserved across the cultivated cereals and shares large blocks of synteny with maize and sorghum, which diverged from rice more than 50 million years ago. To begin to completely understand this chromosome, we sequenced, finished, and annotated 36.1 Mb ( approximately 97%) from O. sativa subsp. japonica cv Nipponbare. Annotation features of the chromosome include 5915 genes, of which 913 are related to transposable elements. A putative function could be assigned to 3064 genes, with another 757 genes annotated as expressed, leaving 2094 that encode hypothetical proteins. Similarity searches against the proteome of Arabidopsis thaliana revealed putative homologs for 67% of the chromosome 3 proteins. Further searches of a nonredundant amino acid database, the Pfam domain database, plant Expressed Sequence Tags, and genomic assemblies from sorghum and maize revealed only 853 nontransposable element related proteins from chromosome 3 that lacked similarity to other known sequences. Interestingly, 426 of these have a paralog within the rice genome. A comparative physical map of the wild progenitor species, Oryza nivara, with japonica chromosome 3 revealed a high degree of sequence identity and synteny between these two species, which diverged approximately 10,000 years ago. Although no major rearrangements were detected, the deduced size of the O. nivara chromosome 3 was 21% smaller than that of japonica. Synteny between rice and other cereals using an integrated maize physical map and wheat genetic map was strikingly high, further supporting the use of rice and, in particular, chromosome 3, as a model for comparative studies among the cereals.


Subject(s)
Chromosomes, Plant/genetics , Oryza/genetics , Poaceae/genetics , Arabidopsis/genetics , Chromosome Mapping , Chromosomes, Artificial, Bacterial/genetics , Genes, Plant , Minisatellite Repeats , Molecular Sequence Data , Oryza/classification , Physical Chromosome Mapping , Poaceae/classification , Proteome , Species Specificity , Zea mays/classification , Zea mays/genetics
7.
Plant Physiol ; 138(1): 18-26, 2005 May.
Article in English | MEDLINE | ID: mdl-15888674

ABSTRACT

We have developed a rice (Oryza sativa) genome annotation database (Osa1) that provides structural and functional annotation for this emerging model species. Using the sequence of O. sativa subsp. japonica cv Nipponbare from the International Rice Genome Sequencing Project, pseudomolecules, or virtual contigs, of the 12 rice chromosomes were constructed. Our most recent release, version 3, represents our third build of the pseudomolecules and is composed of 98% finished sequence. Genes were identified using a series of computational methods developed for Arabidopsis (Arabidopsis thaliana) that were modified for use with the rice genome. In release 3 of our annotation, we identified 57,915 genes, of which 14,196 are related to transposable elements. Of these 43,719 non-transposable element-related genes, 18,545 (42.4%) were annotated with a putative function, 5,777 (13.2%) were annotated as encoding an expressed protein with no known function, and the remaining 19,397 (44.4%) were annotated as encoding a hypothetical protein. Multiple splice forms (5,873) were detected for 2,538 genes, resulting in a total of 61,250 gene models in the rice genome. We incorporated experimental evidence into 18,252 gene models to improve the quality of the structural annotation. A series of functional data types has been annotated for the rice genome that includes alignment with genetic markers, assignment of gene ontologies, identification of flanking sequence tags, alignment with homologs from related species, and syntenic mapping with other cereal species. All structural and functional annotation data are available through interactive search and display windows as well as through download of flat files. To integrate the data with other genome projects, the annotation data are available through a Distributed Annotation System and a Genome Browser. All data can be obtained through the project Web pages at http://rice.tigr.org.


Subject(s)
Databases, Genetic , Genes, Plant , Genome, Plant , Oryza/genetics , Computational Biology
8.
BMC Biol ; 3: 7, 2005 Mar 22.
Article in English | MEDLINE | ID: mdl-15784138

ABSTRACT

BACKGROUND: Since the initial publication of its complete genome sequence, Arabidopsis thaliana has become more important than ever as a model for plant research. However, the initial genome annotation was submitted by multiple centers using inconsistent methods, making the data difficult to use for many applications. RESULTS: Over the course of three years, TIGR has completed its effort to standardize the structural and functional annotation of the Arabidopsis genome. Using both manual and automated methods, Arabidopsis gene structures were refined and gene products were renamed and assigned to Gene Ontology categories. We present an overview of the methods employed, tools developed, and protocols followed, summarizing the contents of each data release with special emphasis on our final annotation release (version 5). CONCLUSION: Over the entire period, several thousand new genes and pseudogenes were added to the annotation. Approximately one third of the originally annotated gene models were significantly refined yielding improved gene structure annotations, and every protein-coding gene was manually inspected and classified using Gene Ontology terms.


Subject(s)
Arabidopsis/classification , Arabidopsis/genetics , Computational Biology/methods , Genome, Plant/genetics , Sequence Analysis, Protein/methods , Writing , Alternative Splicing/genetics , Computational Biology/standards , Models, Genetic , Plant Proteins/classification , Plant Proteins/genetics
9.
Science ; 307(5713): 1321-4, 2005 Feb 25.
Article in English | MEDLINE | ID: mdl-15653466

ABSTRACT

Cryptococcus neoformans is a basidiomycetous yeast ubiquitous in the environment, a model for fungal pathogenesis, and an opportunistic human pathogen of global importance. We have sequenced its approximately 20-megabase genome, which contains approximately 6500 intron-rich gene structures and encodes a transcriptome abundant in alternatively spliced and antisense messages. The genome is rich in transposons, many of which cluster at candidate centromeric regions. The presence of these transposons may drive karyotype instability and phenotypic variation. C. neoformans encodes unique genes that may contribute to its unusual virulence properties, and comparison of two phenotypically distinct strains reveals variation in gene content in addition to sequence polymorphisms between the genomes.


Subject(s)
Cryptococcus neoformans/genetics , Genome, Fungal , Alternative Splicing , Cell Wall/metabolism , Chromosomes, Fungal/genetics , Computational Biology , Cryptococcus neoformans/pathogenicity , Cryptococcus neoformans/physiology , DNA Transposable Elements , Fungal Proteins/metabolism , Gene Library , Genes, Fungal , Humans , Introns , Molecular Sequence Data , Phenotype , Polymorphism, Genetic , Polymorphism, Single Nucleotide , Polysaccharides/metabolism , RNA, Antisense , Sequence Analysis, DNA , Transcription, Genetic , Virulence , Virulence Factors/metabolism
10.
Nucleic Acids Res ; 31(19): 5654-66, 2003 Oct 01.
Article in English | MEDLINE | ID: mdl-14500829

ABSTRACT

The spliced alignment of expressed sequence data to genomic sequence has proven a key tool in the comprehensive annotation of genes in eukaryotic genomes. A novel algorithm was developed to assemble clusters of overlapping transcript alignments (ESTs and full-length cDNAs) into maximal alignment assemblies, thereby comprehensively incorporating all available transcript data and capturing subtle splicing variations. Complete and partial gene structures identified by this method were used to improve The Institute for Genomic Research Arabidopsis genome annotation (TIGR release v.4.0). The alignment assemblies permitted the automated modeling of several novel genes and >1000 alternative splicing variations as well as updates (including UTR annotations) to nearly half of the approximately 27 000 annotated protein coding genes. The algorithm of the Program to Assemble Spliced Alignments (PASA) tool is described, as well as the results of automated updates to Arabidopsis gene annotations.


Subject(s)
Arabidopsis/genetics , Genome, Plant , RNA, Plant/analysis , Sequence Alignment/methods , Software , Algorithms , Alternative Splicing , Arabidopsis/metabolism , DNA, Complementary/analysis , Expressed Sequence Tags , Introns , Plant Proteins/genetics , RNA, Plant/chemistry , Transcription, Genetic , Untranslated Regions
SELECTION OF CITATIONS
SEARCH DETAIL
...