Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 21
Filter
Add more filters










Publication year range
1.
BMC Bioinformatics ; 12 Suppl 8: S4, 2011 Oct 03.
Article in English | MEDLINE | ID: mdl-22151968

ABSTRACT

BACKGROUND: The BioCreative challenge evaluation is a community-wide effort for evaluating text mining and information extraction systems applied to the biological domain. The biocurator community, as an active user of biomedical literature, provides a diverse and engaged end user group for text mining tools. Earlier BioCreative challenges involved many text mining teams in developing basic capabilities relevant to biological curation, but they did not address the issues of system usage, insertion into the workflow and adoption by curators. Thus in BioCreative III (BC-III), the InterActive Task (IAT) was introduced to address the utility and usability of text mining tools for real-life biocuration tasks. To support the aims of the IAT in BC-III, involvement of both developers and end users was solicited, and the development of a user interface to address the tasks interactively was requested. RESULTS: A User Advisory Group (UAG) actively participated in the IAT design and assessment. The task focused on gene normalization (identifying gene mentions in the article and linking these genes to standard database identifiers), gene ranking based on the overall importance of each gene mentioned in the article, and gene-oriented document retrieval (identifying full text papers relevant to a selected gene). Six systems participated and all processed and displayed the same set of articles. The articles were selected based on content known to be problematic for curation, such as ambiguity of gene names, coverage of multiple genes and species, or introduction of a new gene name. Members of the UAG curated three articles for training and assessment purposes, and each member was assigned a system to review. A questionnaire related to the interface usability and task performance (as measured by precision and recall) was answered after systems were used to curate articles. Although the limited number of articles analyzed and users involved in the IAT experiment precluded rigorous quantitative analysis of the results, a qualitative analysis provided valuable insight into some of the problems encountered by users when using the systems. The overall assessment indicates that the system usability features appealed to most users, but the system performance was suboptimal (mainly due to low accuracy in gene normalization). Some of the issues included failure of species identification and gene name ambiguity in the gene normalization task leading to an extensive list of gene identifiers to review, which, in some cases, did not contain the relevant genes. The document retrieval suffered from the same shortfalls. The UAG favored achieving high performance (measured by precision and recall), but strongly recommended the addition of features that facilitate the identification of correct gene and its identifier, such as contextual information to assist in disambiguation. DISCUSSION: The IAT was an informative exercise that advanced the dialog between curators and developers and increased the appreciation of challenges faced by each group. A major conclusion was that the intended users should be actively involved in every phase of software development, and this will be strongly encouraged in future tasks. The IAT Task provides the first steps toward the definition of metrics and functional requirements that are necessary for designing a formal evaluation of interactive curation systems in the BioCreative IV challenge.


Subject(s)
Data Mining/methods , Genes , Animals , Computational Biology/methods , Periodicals as Topic , Plants/genetics , Plants/metabolism
2.
Mamm Genome ; 21(9-10): 427-41, 2010 Oct.
Article in English | MEDLINE | ID: mdl-20931200

ABSTRACT

Mammalian carboxylesterase (CES or Ces) genes encode enzymes that participate in xenobiotic, drug, and lipid metabolism in the body and are members of at least five gene families. Tandem duplications have added more genes for some families, particularly for mouse and rat genomes, which has caused confusion in naming rodent Ces genes. This article describes a new nomenclature system for human, mouse, and rat carboxylesterase genes that identifies homolog gene families and allocates a unique name for each gene. The guidelines of human, mouse, and rat gene nomenclature committees were followed and "CES" (human) and "Ces" (mouse and rat) root symbols were used followed by the family number (e.g., human CES1). Where multiple genes were identified for a family or where a clash occurred with an existing gene name, a letter was added (e.g., human CES4A; mouse and rat Ces1a) that reflected gene relatedness among rodent species (e.g., mouse and rat Ces1a). Pseudogenes were named by adding "P" and a number to the human gene name (e.g., human CES1P1) or by using a new letter followed by ps for mouse and rat Ces pseudogenes (e.g., Ces2d-ps). Gene transcript isoforms were named by adding the GenBank accession ID to the gene symbol (e.g., human CES1_AB119995 or mouse Ces1e_BC019208). This nomenclature improves our understanding of human, mouse, and rat CES/Ces gene families and facilitates research into the structure, function, and evolution of these gene families. It also serves as a model for naming CES genes from other mammalian species.


Subject(s)
Carboxylesterase/genetics , Genes , Pseudogenes , Terminology as Topic , Amino Acid Sequence , Animals , Humans , Mice , Multigene Family , Protein Isoforms/genetics , Rats , Sequence Homology
5.
Genomics ; 90(2): 285-9, 2007 Aug.
Article in English | MEDLINE | ID: mdl-17543498

ABSTRACT

An essential component of microtubules, alpha-tubulin is also a multigene family in many species. An orthology-based nomenclature for this gene family has previously been difficult to assign due to incomplete genome builds and the high degree of sequence similarity between members of this family. Using the current genome builds, sequence analysis of human, mouse, and rat alpha-tubulin genes has enabled an updated nomenclature to be generated. This revised nomenclature provides a unified language for the discussion of these genes in mammalian species; it has been approved by the gene nomenclature committees of the three species and is supported by researchers in the field.


Subject(s)
Mice/genetics , Multigene Family , Rats/genetics , Terminology as Topic , Tubulin/genetics , Animals , DNA, Complementary/metabolism , Humans , Phylogeny
8.
J Cell Biol ; 174(2): 169-74, 2006 Jul 17.
Article in English | MEDLINE | ID: mdl-16831889

ABSTRACT

Keratins are intermediate filament-forming proteins that provide mechanical support and fulfill a variety of additional functions in epithelial cells. In 1982, a nomenclature was devised to name the keratin proteins that were known at that point. The systematic sequencing of the human genome in recent years uncovered the existence of several novel keratin genes and their encoded proteins. Their naming could not be adequately handled in the context of the original system. We propose a new consensus nomenclature for keratin genes and proteins that relies upon and extends the 1982 system and adheres to the guidelines issued by the Human and Mouse Genome Nomenclature Committees. This revised nomenclature accommodates functional genes and pseudogenes, and although designed specifically for the full complement of human keratins, it offers the flexibility needed to incorporate additional keratins from other mammalian species.


Subject(s)
Keratins/classification , Terminology as Topic , Animals , Humans , Keratins/chemistry , Keratins/genetics , Mammals , Pseudogenes/genetics
9.
Biol Chem ; 387(6): 637-41, 2006 Jun.
Article in English | MEDLINE | ID: mdl-16800724

ABSTRACT

The human kallikrein locus on chromosome 19q13.3-13.4 contains kallikrein 1--the tissue kallikrein--and 14 related serine proteases. Recent investigations into their function and evolution have indicated that the present nomenclature for these proteins is inadequate or insufficient. Here we present a new nomenclature in which proteins without proven kininogenase activity are denoted kallikrein-related peptidase. Names are also given to the unique rodent proteins that are closely related to kallikrein 1.


Subject(s)
Serine Endopeptidases , Terminology as Topic , Tissue Kallikreins , Chromosomes, Human, Pair 19 , Humans , Sequence Homology , Serine Endopeptidases/genetics , Tissue Kallikreins/genetics
11.
J Lipid Res ; 46(9): 2029-32, 2005 Sep.
Article in English | MEDLINE | ID: mdl-16103133

ABSTRACT

Acyl-CoA thioesterases, also known as acyl-CoA hydrolases, are a group of enzymes that hydrolyze CoA esters such as acyl-CoAs (saturated, unsaturated, branched-chain), bile acid-CoAs, CoA esters of prostaglandins, etc., to the corresponding free acid and CoA. However, there is significant confusion regarding the nomenclature of these genes. In agreement with the HUGO Gene Nomenclature Committee and the Mouse Genomic Nomenclature Committee, a revised nomenclature for mammalian acyl-CoA thioesterases/hydrolases has been suggested for the 12 member family. The family root symbol is ACOT, with human genes named ACOT1-ACOT12, and rat and mouse genes named Acot1-Acot12. Several of the ACOT genes are the result of splicing events, and these splice variants are cataloged.


Subject(s)
Palmitoyl-CoA Hydrolase , Terminology as Topic , Alternative Splicing , Animals , Humans , Mice , Multigene Family , Palmitoyl-CoA Hydrolase/genetics , Rats
12.
Genomics ; 86(2): 242-51, 2005 Aug.
Article in English | MEDLINE | ID: mdl-15922553

ABSTRACT

The ARID is an ancient DNA-binding domain that is conserved throughout the evolution of higher eukaryotes. The ARID consensus sequence spans about 100 amino acid residues, and structural studies identify the major groove contact site as a modified helix-turn-helix motif. ARID-containing proteins exhibit a range of cellular functions, including participation in chromatin remodeling, and regulation of gene expression during cell growth, differentiation, and development. A subset of ARID family proteins binds DNA specifically at AT-rich sites; the remainder bind DNA nonspecifically. Orthologs to each of the seven distinct subfamilies of mammalian ARID-containing proteins are found in insect genomes, indicating the minimum age for the organization of these higher metazoan subfamilies. Many of these ancestral genes were duplicated and fixed over time to yield the 15 ARID-containing genes that are found in the human, mouse, and dog genomes. This paper describes a nomenclature, recommended by the Mouse Genomic Nomenclature Committee (MGNC) and accepted by the Human Genome Organization (HUGO) Gene Nomenclature Committee, for these mammalian ARID-containing genes that reflects this evolutionary history.


Subject(s)
DNA-Binding Proteins/chemistry , DNA-Binding Proteins/classification , DNA-Binding Proteins/genetics , Amino Acid Sequence , Animals , Binding Sites , Biological Evolution , DNA/chemistry , Evolution, Molecular , Genome , Humans , International Cooperation , Mice , Models, Molecular , Molecular Sequence Data , Phylogeny , Protein Conformation , Protein Structure, Secondary , Protein Structure, Tertiary , Sequence Homology, Amino Acid , Terminology as Topic
13.
J Lipid Res ; 45(10): 1958-61, 2004 Oct.
Article in English | MEDLINE | ID: mdl-15292367

ABSTRACT

By consensus, the acyl-CoA synthetase (ACS) community, with the advice of the human and mouse genome nomenclature committees, has revised the nomenclature for the mammalian long-chain acyl-CoA synthetases. ACS is the family root name, and the human and mouse genes for the long-chain ACSs are termed ACSL1,3-6 and Acsl1,3-6, respectively. Splice variants of ACSL3, -4, -5, and -6 are cataloged. Suggestions for naming other family members and for the nonmammalian acyl-CoA synthetases are made.


Subject(s)
Acyl Coenzyme A , Coenzyme A Ligases/genetics , Terminology as Topic , Animals , Genes , Humans , Mice , Multigene Family
14.
Pharmacogenetics ; 14(1): 1-18, 2004 Jan.
Article in English | MEDLINE | ID: mdl-15128046

ABSTRACT

OBJECTIVES: Completion of both the mouse and human genome sequences in the private and public sectors has prompted comparison between the two species at multiple levels. This review summarizes the cytochrome P450 (CYP) gene superfamily. For the first time, we have the ability to compare complete sets of CYP genes from two mammals. Use of the mouse as a model mammal, and as a surrogate for human biology, assumes reasonable similarity between the two. It is therefore of interest to catalog the genetic similarities and differences, and to clarify the limits of extrapolation from mouse to human. METHODS: Data-mining methods have been used to find all the mouse and human CYP sequences; this includes 102 putatively functional genes and 88 pseudogenes in the mouse, and 57 putatively functional genes and 58 pseudogenes in the human. Comparison is made between all these genes, especially the seven main CYP gene clusters. RESULTS AND CONCLUSIONS: The seven CYP clusters are greatly expanded in the mouse with 72 functional genes versus only 27 in the human, while many pseudogenes are present; presumably this phenomenon will be seen in many other gene superfamily clusters. Complete identification of all pseudogene sequences is likely to be clinically important, because some of these highly similar exons can interfere with PCR-based genotyping assays. A naming procedure for each of four categories of CYP pseudogenes is proposed, and we encourage various gene nomenclature committees to consider seriously the adoption and application of this pseudogene nomenclature system.


Subject(s)
Alternative Splicing , Cytochrome P-450 Enzyme System/genetics , Pseudogenes , Terminology as Topic , Animals , Humans , Mice , Multigene Family
15.
Genome Res ; 13(6B): 1505-19, 2003 Jun.
Article in English | MEDLINE | ID: mdl-12819150

ABSTRACT

The Mouse Genome Sequencing Consortium and the RIKEN Genome Exploration Research grouphave generated large sets of sequence data representing the mouse genome and transcriptome, respectively. These data provide a valuable foundation for genomic research. The challenges for the informatics community are how to integrate these data with the ever-expanding knowledge about the roles of genes and gene products in biological processes, and how to provide useful views to the scientific community. Public resources, such as the National Center for Biotechnology Information (NCBI; http://www.ncbi.nih.gov), and model organism databases, such as the Mouse Genome Informatics database (MGI; http://www.informatics.jax.org), maintain the primary data and provide connections between sequence and biology. In this paper, we describe how the partnership of MGI and NCBI LocusLink contributes to the integration of sequence and biology, especially in the context of the large-scale genome and transcriptome data now available for the laboratory mouse. In particular, we describe the methods and results of integration of 60,770 FANTOM2 mouse cDNAs with gene records in the databases of MGI and LocusLink.


Subject(s)
Base Sequence/genetics , Computational Biology/methods , Animals , Base Sequence/physiology , Computational Biology/statistics & numerical data , Computer Graphics/statistics & numerical data , Computer Graphics/trends , DNA, Complementary/genetics , DNA, Complementary/physiology , Databases, Genetic/statistics & numerical data , Databases, Genetic/trends , Genes/genetics , Genes/physiology , Genome , Internet/statistics & numerical data , Internet/trends , Mice
16.
Genomics ; 81(6): 618-22, 2003 Jun.
Article in English | MEDLINE | ID: mdl-12782131

ABSTRACT

Nucleic acid helicases are characterized by the presence of the helicase domain containing eight motifs. The sequence of the helicase domain is used to classify helicases into families. To identify members of the DEAD and DEAH families of human RNA helicases, we used the helicase domain sequences to search the nonredundant peptide sequence database. We report the identification of 36 and 14 members of the DEAD and DEAH families of putative RNA helicases, including several novel genes. The gene symbol DDX had been used previously for both DEAD- and DEAH-box families. We have now adopted DDX and DHX symbols to denote DEAD- and DEAH-box families, respectively. Members of human DDX and DHX families of putative RNA helicases play roles in differentiation and carcinogenesis.


Subject(s)
Multigene Family , RNA Helicases/genetics , Amino Acid Motifs , Amino Acid Sequence , DEAD-box RNA Helicases , Databases, Protein , Humans , Membrane Proteins/genetics , Neoplasm Proteins/genetics , RNA Helicases/classification
17.
Curr Protoc Hum Genet ; Appendix 1: Appendix 1C, 2003 Feb.
Article in English | MEDLINE | ID: mdl-18428336

ABSTRACT

Standard genetic nomenclature is necessary to help researchers, clinicians, and the public to access data on their genes of interest, and to communicate in a globally understood language of approved gene symbols. In both human and mouse, one unique symbol (acronym/abbreviation) and one name are assigned for each gene. Co-ordination between human and mouse gene nomenclature is a successful endeavor, due in part to the historical interaction between the two nomenclature committee groups. This interaction grew out of the Human Gene Mapping (HGM) Workshops. This appendix discusses development and organization of gene nomenclature, how to find a gene and how to name a new gene.


Subject(s)
Genes , Animals , Humans , Mice
18.
Mol Biol Cell ; 13(12): 4111-3, 2002 Dec.
Article in English | MEDLINE | ID: mdl-12475938

ABSTRACT

There are 10 known mammalian septin genes, some of which produce multiple splice variants. The current nomenclature for the genes and gene products is very confusing, with several different names having been given to the same gene product and distinct names given to splice variants of the same gene. Moreover, some names are based on those of yeast or Drosophila septins that are not the closest homologues. Therefore, we suggest that the mammalian septin field adopt a common nomenclature system, based on that adopted by the Mouse Genomic Nomenclature Committee and accepted by the Human Genome Organization Gene Nomenclature Committee. The human and mouse septin genes will be named SEPT1-SEPT10 and Sept1-Sept10, respectively. Splice variants will be designated by an underscore followed by a lowercase "v" and a number, e.g., SEPT4_v1.


Subject(s)
GTP Phosphohydrolases/classification , Terminology as Topic , Alternative Splicing , Animals , Cytoskeletal Proteins , Fungal Proteins/genetics , GTP Phosphohydrolases/genetics , GTP-Binding Proteins/genetics , Humans , Phylogeny , Protein Structure, Tertiary , Septins
19.
Genomics ; 80(5): 487-98, 2002 Nov.
Article in English | MEDLINE | ID: mdl-12408966

ABSTRACT

The multigene family encoding the five classes of replication-dependent histones has been identified from the human and mouse genome sequence. The large cluster of histone genes, HIST1, on human chromosome 6 (6p21-p22) contains 55 histone genes, and Hist1 on mouse chromosome 13 contains 51 histone genes. There are two smaller clusters on human chromosome 1: HIST2 (at 1q21), which contains six genes, and HIST3 (at 1q42), which contains three histone genes. Orthologous Hist2 and Hist3 clusters are present on mouse chromosomes 3 and 11, respectively. The organization of the human and mouse histone genes in the HIST1 cluster is essentially identical. All of the histone H1 genes are in HIST1, which is spread over about 2 Mb. There are two large gaps (>250 kb each) within this cluster where there are no histone genes, but many other genes. Each of the histone genes encodes an mRNA that ends in a stemloop followed by a purine-rich region that is complementary to the 5' end of U7 snRNA. In addition to the histone genes on these clusters, only two other genes containing the stem-loop sequence were identified, a histone H4 gene on human chromosome 12 (mouse chromosome 6) and the previously described H2a.X gene located on human chromosome 11. Each of the 14 histone H4 genes encodes the same protein, and there are only three histone H3 proteins encoded by the 12 histone H3 genes in each species. In contrast, both the mouse and human H2a and H2b proteins consist of at least 10 non-allelic variants, making the complexity of the histone protein complement significantly greater than previously thought.


Subject(s)
Chromosomes, Human, Pair 6 , Histones/genetics , Multigene Family , Terminology as Topic , Amino Acid Sequence , Animals , Chromosome Mapping , Chromosomes, Human, Pair 6/genetics , Histones/chemistry , Humans , Mice , Molecular Sequence Data , Phylogeny , RNA, Messenger/chemistry , RNA, Messenger/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...