Search | VHL Regional Portal

PubSearch and PubFetch: a simple management system for semiautomated retrieval and annotation of biological information from the literature.

Yoo, Danny; Xu, Iris; Berardini, Tanya Z; Rhee, Seung Yon; Narayanasamy, Vijay; Twigger, Simon.

Curr Protoc Bioinformatics ; Chapter 9: Unit9.7, 2006 Mar.

Article in English | MEDLINE | ID: mdl-18428773

ABSTRACT

For most systems in biology, a large body of literature exists that describes the complexity of the system based on experimental results. Manual review of this literature to extract targeted information into biological databases is difficult and time consuming. To address this problem, we developed PubSearch and PubFetch, which store literature, keyword, and gene information in a relational database, index the literature with keywords and gene names, and provide a Web user interface for annotating the genes from experimental data found in the associated literature. A set of protocols is provided in this unit for installing, populating, running, and using PubSearch and PubFetch. In addition, we provide support protocols for performing controlled vocabulary annotations. Intended users of PubSearch and PubFetch are database curators and biology researchers interested in tracking the literature and capturing information about genes of interest in a more effective way than with conventional spreadsheets and lab notebooks.

Subject(s)

Biology/methods , Database Management Systems , Information Storage and Retrieval/methods , Natural Language Processing , Periodicals as Topic , PubMed , User-Computer Interface , Abstracting and Indexing/methods , Artificial Intelligence , Computational Biology/methods , Vocabulary, Controlled

PatMatch: a program for finding patterns in peptide and nucleotide sequences.

Yan, Thomas; Yoo, Danny; Berardini, Tanya Z; Mueller, Lukas A; Weems, Dan C; Weng, Shuai; Cherry, J Michael; Rhee, Seung Y.

Nucleic Acids Res ; 33(Web Server issue): W262-6, 2005 Jul 01.

Article in English | MEDLINE | ID: mdl-15980466

ABSTRACT

Here, we present PatMatch, an efficient, web-based pattern-matching program that enables searches for short nucleotide or peptide sequences such as cis-elements in nucleotide sequences or small domains and motifs in protein sequences. The program can be used to find matches to a user-specified sequence pattern that can be described using ambiguous sequence codes and a powerful and flexible pattern syntax based on regular expressions. A recent upgrade has improved performance and now supports both mismatches and wildcards in a single pattern. This enhancement has been achieved by replacing the previous searching algorithm, scan_for_matches [D'Souza et al. (1997), Trends in Genetics, 13, 497-498], with nondeterministic-reverse grep (NR-grep), a general pattern matching tool that allows for approximate string matching [Navarro (2001), Software Practice and Experience, 31, 1265-1312]. We have tailored NR-grep to be used for DNA and protein searches with PatMatch. The stand-alone version of the software can be adapted for use with any sequence dataset and is available for download at The Arabidopsis Information Resource (TAIR) at ftp://ftp.arabidopsis.org/home/tair/Software/Patmatch/. The PatMatch server is available on the web at http://www.arabidopsis.org/cgi-bin/patmatch/nph-patmatch.pl for searching Arabidopsis thaliana sequences.

Subject(s)

Peptides/chemistry , Sequence Analysis, DNA/methods , Sequence Analysis, Protein/methods , Software , Arabidopsis/genetics , Arabidopsis Proteins/chemistry , DNA, Plant/chemistry , Internet , User-Computer Interface

Functional annotation of the Arabidopsis genome using controlled vocabularies.

Berardini, Tanya Z; Mundodi, Suparna; Reiser, Leonore; Huala, Eva; Garcia-Hernandez, Margarita; Zhang, Peifen; Mueller, Lukas A; Yoon, Jungwoon; Doyle, Aisling; Lander, Gabriel; Moseyko, Nick; Yoo, Danny; Xu, Iris; Zoeckler, Brandon; Montoya, Mary; Miller, Neil; Weems, Dan; Rhee, Seung Y.

Plant Physiol ; 135(2): 745-55, 2004 Jun.

Article in English | MEDLINE | ID: mdl-15173566

ABSTRACT

Controlled vocabularies are increasingly used by databases to describe genes and gene products because they facilitate identification of similar genes within an organism or among different organisms. One of The Arabidopsis Information Resource's goals is to associate all Arabidopsis genes with terms developed by the Gene Ontology Consortium that describe the molecular function, biological process, and subcellular location of a gene product. We have also developed terms describing Arabidopsis anatomy and developmental stages and use these to annotate published gene expression data. As of March 2004, we used computational and manual annotation methods to make 85,666 annotations representing 26,624 unique loci. We focus on associating genes to controlled vocabulary terms based on experimental data from the literature and use The Arabidopsis Information Resource-developed PubSearch software to facilitate this process. Each annotation is tagged with a combination of evidence codes, evidence descriptions, and references that provide a robust means to assess data quality. Annotation of all Arabidopsis genes will allow quantitative comparisons between sets of genes derived from sources such as microarray experiments. The Arabidopsis annotation data will also facilitate annotation of newly sequenced plant genomes by using sequence similarity to transfer annotations to homologous genes. In addition, complete and up-to-date annotations will make unknown genes easy to identify and target for experimentation. Here, we describe the process of Arabidopsis functional annotation using a variety of data sources and illustrate several ways in which this information can be accessed and used to infer knowledge about Arabidopsis and other plant species.

Subject(s)

Arabidopsis/genetics , Genome, Plant , Vocabulary, Controlled , Arabidopsis/growth & development , Databases, Factual , Gene Expression Regulation, Developmental , Gene Expression Regulation, Plant

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL