Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 9 de 9
Filter
Add more filters










Database
Language
Publication year range
1.
Nature ; 437(7058): 551-5, 2005 Sep 22.
Article in English | MEDLINE | ID: mdl-16177791

ABSTRACT

Chromosome 18 appears to have the lowest gene density of any human chromosome and is one of only three chromosomes for which trisomic individuals survive to term. There are also a number of genetic disorders stemming from chromosome 18 trisomy and aneuploidy. Here we report the finished sequence and gene annotation of human chromosome 18, which will allow a better understanding of the normal and disease biology of this chromosome. Despite the low density of protein-coding genes on chromosome 18, we find that the proportion of non-protein-coding sequences evolutionarily conserved among mammals is close to the genome-wide average. Extending this analysis to the entire human genome, we find that the density of conserved non-protein-coding sequences is largely uncorrelated with gene density. This has important implications for the nature and roles of non-protein-coding sequence elements.


Subject(s)
Chromosomes, Human, Pair 18/genetics , DNA/genetics , Aneuploidy , Animals , Conserved Sequence/genetics , CpG Islands/genetics , Exons/genetics , Expressed Sequence Tags , Genes/genetics , Genome, Human , Humans , Introns/genetics , Molecular Sequence Data , Sequence Analysis, DNA , Synteny
2.
Genomics ; 86(2): 242-51, 2005 Aug.
Article in English | MEDLINE | ID: mdl-15922553

ABSTRACT

The ARID is an ancient DNA-binding domain that is conserved throughout the evolution of higher eukaryotes. The ARID consensus sequence spans about 100 amino acid residues, and structural studies identify the major groove contact site as a modified helix-turn-helix motif. ARID-containing proteins exhibit a range of cellular functions, including participation in chromatin remodeling, and regulation of gene expression during cell growth, differentiation, and development. A subset of ARID family proteins binds DNA specifically at AT-rich sites; the remainder bind DNA nonspecifically. Orthologs to each of the seven distinct subfamilies of mammalian ARID-containing proteins are found in insect genomes, indicating the minimum age for the organization of these higher metazoan subfamilies. Many of these ancestral genes were duplicated and fixed over time to yield the 15 ARID-containing genes that are found in the human, mouse, and dog genomes. This paper describes a nomenclature, recommended by the Mouse Genomic Nomenclature Committee (MGNC) and accepted by the Human Genome Organization (HUGO) Gene Nomenclature Committee, for these mammalian ARID-containing genes that reflects this evolutionary history.


Subject(s)
DNA-Binding Proteins/chemistry , DNA-Binding Proteins/classification , DNA-Binding Proteins/genetics , Amino Acid Sequence , Animals , Binding Sites , Biological Evolution , DNA/chemistry , Evolution, Molecular , Genome , Humans , International Cooperation , Mice , Models, Molecular , Molecular Sequence Data , Phylogeny , Protein Conformation , Protein Structure, Secondary , Protein Structure, Tertiary , Sequence Homology, Amino Acid , Terminology as Topic
3.
BMC Bioinformatics ; 6: 149, 2005 Jun 16.
Article in English | MEDLINE | ID: mdl-15958172

ABSTRACT

BACKGROUND: Massive text mining of the biological literature holds great promise of relating disparate information and discovering new knowledge. However, disambiguation of gene symbols is a major bottleneck. RESULTS: We developed a simple thesaurus-based disambiguation algorithm that can operate with very little training data. The thesaurus comprises the information from five human genetic databases and MeSH. The extent of the homonym problem for human gene symbols is shown to be substantial (33% of the genes in our combined thesaurus had one or more ambiguous symbols), not only because one symbol can refer to multiple genes, but also because a gene symbol can have many non-gene meanings. A test set of 52,529 Medline abstracts, containing 690 ambiguous human gene symbols taken from OMIM, was automatically generated. Overall accuracy of the disambiguation algorithm was up to 92.7% on the test set. CONCLUSION: The ambiguity of human gene symbols is substantial, not only because one symbol may denote multiple genes but particularly because many symbols have other, non-gene meanings. The proposed disambiguation approach resolves most ambiguities in our test set with high accuracy, including the important gene/not a gene decisions. The algorithm is fast and scalable, enabling gene-symbol disambiguation in massive text mining applications.


Subject(s)
Algorithms , Genes , Information Storage and Retrieval/methods , Terminology as Topic , Vocabulary, Controlled , Databases, Genetic , Humans , Symbolism
4.
Nat Rev Genet ; 5(12): 889-99, 2004 Dec.
Article in English | MEDLINE | ID: mdl-15573121

ABSTRACT

The major histocompatibility complex (MHC) is the most important region in the vertebrate genome with respect to infection and autoimmunity, and is crucial in adaptive and innate immunity. Decades of biomedical research have revealed many MHC genes that are duplicated, polymorphic and associated with more diseases than any other region of the human genome. The recent completion of several large-scale studies offers the opportunity to assimilate the latest data into an integrated gene map of the extended human MHC. Here, we present this map and review its content in relation to paralogy, polymorphism, immune function and disease.


Subject(s)
Genome, Human , Major Histocompatibility Complex , Autoimmune Diseases/genetics , Chromosome Mapping , Chromosomes, Human, Pair 6 , Humans , Immunity , Multigene Family , Polymorphism, Genetic , RNA, Transfer/genetics
5.
Pharmacogenetics ; 14(1): 1-18, 2004 Jan.
Article in English | MEDLINE | ID: mdl-15128046

ABSTRACT

OBJECTIVES: Completion of both the mouse and human genome sequences in the private and public sectors has prompted comparison between the two species at multiple levels. This review summarizes the cytochrome P450 (CYP) gene superfamily. For the first time, we have the ability to compare complete sets of CYP genes from two mammals. Use of the mouse as a model mammal, and as a surrogate for human biology, assumes reasonable similarity between the two. It is therefore of interest to catalog the genetic similarities and differences, and to clarify the limits of extrapolation from mouse to human. METHODS: Data-mining methods have been used to find all the mouse and human CYP sequences; this includes 102 putatively functional genes and 88 pseudogenes in the mouse, and 57 putatively functional genes and 58 pseudogenes in the human. Comparison is made between all these genes, especially the seven main CYP gene clusters. RESULTS AND CONCLUSIONS: The seven CYP clusters are greatly expanded in the mouse with 72 functional genes versus only 27 in the human, while many pseudogenes are present; presumably this phenomenon will be seen in many other gene superfamily clusters. Complete identification of all pseudogene sequences is likely to be clinically important, because some of these highly similar exons can interfere with PCR-based genotyping assays. A naming procedure for each of four categories of CYP pseudogenes is proposed, and we encourage various gene nomenclature committees to consider seriously the adoption and application of this pseudogene nomenclature system.


Subject(s)
Alternative Splicing , Cytochrome P-450 Enzyme System/genetics , Pseudogenes , Terminology as Topic , Animals , Humans , Mice , Multigene Family
6.
Nucleic Acids Res ; 32(Database issue): D255-7, 2004 Jan 01.
Article in English | MEDLINE | ID: mdl-14681406

ABSTRACT

Genew, the Human Gene Nomenclature Database http://www.gene.ucl.ac.uk/cgi-bin/nomenclature/searchgenes.pl is the only resource that provides data for all human genes that have approved symbols. It is managed by the HUGO Gene Nomenclature Committee (HGNC) as a confidential database, containing over 22 000 records, 75% of which are represented online by a publicly searchable text file. Since 2002, there have been significant improvements to the Genew search engine. Additionally we have increased our capacity to analyse confidential sequence data, which has enabled us to manage the large numbers of gene symbol requests that we receive from the chromosome sequencing consortia.


Subject(s)
Databases, Genetic , Genes , Terminology as Topic , Animals , Computational Biology , Humans , Information Storage and Retrieval , Internet , User-Computer Interface
7.
Hum Genomics ; 1(1): 66-71, 2003 Nov.
Article in English | MEDLINE | ID: mdl-15601535

ABSTRACT

Why is agreeing on one particular name for each gene important? As one genome after another becomes sequenced, it is imperative to consider the complexity of genes, genetic architecture, gene expression, gene-gene and gene-product interactions and evolutionary relatedness across species. To agree on a particular gene name not only makes one's own research easier, but will also be helpful to the present generation, as well as future generations, of graduate students and postdoctoral fellows who are about to enter genomics research.


Subject(s)
Genome, Human , Terminology as Topic , Databases, Factual , Evolution, Molecular , Humans , Internet , Molecular Sequence Data
9.
Nucleic Acids Res ; 30(1): 169-71, 2002 Jan 01.
Article in English | MEDLINE | ID: mdl-11752283

ABSTRACT

Genew, the Human Gene Nomenclature Database, is the only resource that provides data for all human genes which have approved symbols. It is managed by the HUGO Gene Nomenclature Committee (HGNC) as a confidential database, containing over 16 000 records, 80% of which are represented on the Web by searchable text files. The data in Genew are highly curated by HGNC editors and gene records can be searched on the Web by symbol or name to directly retrieve information on gene symbol, gene name, cytogenetic location, OMIM number and PubMed ID. Data are integrated with other human gene databases, e.g. GDB, LocusLink and SWISS-PROT, and approved gene symbols are carefully co-ordinated with the Mouse Genome Database (MGD). Approved gene symbols are available for querying and browsing at http://www.gene.ucl.ac.uk/cgi-bin/nomenclature/searchgenes.pl.


Subject(s)
Databases, Genetic , Terminology as Topic , Confidentiality , Data Collection , Database Management Systems , Genes , Humans , Information Storage and Retrieval , Internet
SELECTION OF CITATIONS
SEARCH DETAIL
...