Search | VHL Regional Portal

GOSLING: a rule-based protein annotator using BLAST and GO.

Jones, Craig E; Schwerdt, Julian; Bretag, Tessa Arwen; Baumann, Ute; Brown, Alfred L.

Bioinformatics ; 24(22): 2628-9, 2008 Nov 15.

Article in English | MEDLINE | ID: mdl-18796479

ABSTRACT

UNLABELLED: GOSLING is a web-based protein function annotator that uses a decision tree-derived rule set to quickly predict Gene Ontology terms for a protein. A score is assigned to each term prediction that is indicative of the accuracy of the prediction. Due to its speed and accuracy GOSLING is ideally suited for high-throughput annotation tasks. AVAILABILITY: https://www.sapac.edu.au/gosling

Subject(s)

Computational Biology , Databases, Nucleic Acid , Proteins/genetics , Proteins/metabolism , Algorithms , Internet

Estimating the annotation error rate of curated GO database sequence annotations.

Jones, Craig E; Brown, Alfred L; Baumann, Ute.

BMC Bioinformatics ; 8: 170, 2007 May 22.

Article in English | MEDLINE | ID: mdl-17519041

ABSTRACT

BACKGROUND: Annotations that describe the function of sequences are enormously important to researchers during laboratory investigations and when making computational inferences. However, there has been little investigation into the data quality of sequence function annotations. Here we have developed a new method of estimating the error rate of curated sequence annotations, and applied this to the Gene Ontology (GO) sequence database (GOSeqLite). This method involved artificially adding errors to sequence annotations at known rates, and used regression to model the impact on the precision of annotations based on BLAST matched sequences. RESULTS: We estimated the error rate of curated GO sequence annotations in the GOSeqLite database (March 2006) at between 28% and 30%. Annotations made without use of sequence similarity based methods (non-ISS) had an estimated error rate of between 13% and 18%. Annotations made with the use of sequence similarity methodology (ISS) had an estimated error rate of 49%. CONCLUSION: While the overall error rate is reasonably low, it would be prudent to treat all ISS annotations with caution. Electronic annotators that use ISS annotations as the basis of predictions are likely to have higher false prediction rates, and for this reason designers of these systems should consider avoiding ISS annotations where possible. Electronic annotators that use ISS annotations to make predictions should be viewed sceptically. We recommend that curators thoroughly review ISS annotations before accepting them as valid. Overall, users of curated sequence annotations from the GO database should feel assured that they are using a comparatively high quality source of information.

Subject(s)

Data Interpretation, Statistical , Databases, Genetic , Sequence Analysis, DNA/methods , Base Sequence , Information Storage and Retrieval/methods , Molecular Sequence Data , Reproducibility of Results , Sensitivity and Specificity

Automated methods of predicting the function of biological sequences using GO and BLAST.

Jones, Craig E; Baumann, Ute; Brown, Alfred L.

BMC Bioinformatics ; 6: 272, 2005 Nov 15.

Article in English | MEDLINE | ID: mdl-16288652

ABSTRACT

BACKGROUND: With the exponential increase in genomic sequence data there is a need to develop automated approaches to deducing the biological functions of novel sequences with high accuracy. Our aim is to demonstrate how accuracy benchmarking can be used in a decision-making process evaluating competing designs of biological function predictors. We utilise the Gene Ontology, GO, a directed acyclic graph of functional terms, to annotate sequences with functional information describing their biological context. Initially we examine the effect on accuracy scores of increasing the allowed distance between predicted and a test set of curator assigned terms. Next we evaluate several annotator methods using accuracy benchmarking. Given an unannotated sequence we use the Basic Local Alignment Search Tool, BLAST, to find similar sequences that have already been assigned GO terms by curators. A number of methods were developed that utilise terms associated with the best five matching sequences. These methods were compared against a benchmark method of simply using terms associated with the best BLAST-matched sequence (best BLAST approach). RESULTS: The precision and recall of estimates increases rapidly as the amount of distance permitted between a predicted term and a correct term assignment increases. Accuracy benchmarking allows a comparison of annotation methods. A covering graph approach performs poorly, except where the term assignment rate is high. A term distance concordance approach has a similar accuracy to the best BLAST approach, demonstrating lower precision but higher recall. However, a discriminant function method has higher precision and recall than the best BLAST approach and other methods shown here. CONCLUSION: Allowing term predictions to be counted correct if closely related to a correct term decreases the reliability of the accuracy score. As such we recommend using accuracy measures that require exact matching of predicted terms with curator assigned terms. Furthermore, we conclude that competing designs of BLAST-based GO term annotators can be effectively compared using an accuracy benchmarking approach. The most accurate annotation method was developed using data mining techniques. As such we recommend that designers of term annotators utilise accuracy benchmarking and data mining to ensure newly developed annotators are of high quality.

Subject(s)

Sequence Analysis/methods , Software , Benchmarking , Chi-Square Distribution , Data Collection , Databases, Genetic , Discriminant Analysis , Information Storage and Retrieval/methods , Sequence Alignment

Effectiveness of off-line and web-based promotion of health information web sites.

Jones, Craig E; Pinnock, Carole B.

Telemed J E Health ; 8(4): 349-54, 2002.

Article in English | MEDLINE | ID: mdl-12626103

ABSTRACT

The relative effectiveness of off-line and web-based promotional activities in increasing the use of health information web sites by target audiences were compared. Visitor sessions were classified according to their method of arrival at the site (referral) as external web site, search engine, or "no referrer" (i.e., visitor arriving at the site by inputting URL or using bookmarks). The number of Australian visitor sessions correlated with no referrer referrals but not web site or search-engine referrals. Results showed that the targeted consumer group is more likely to access the web site as a result of off-line promotional activities. The properties of target audiences likely to influence the effectiveness of off-line versus on-line promotional strategies include the size of the Internet using population of the target audience, their proficiency in the use of the Internet, and the increase in effectiveness of off-line promotional activities when applied to locally defined target audiences.

Subject(s)

Health Education/statistics & numerical data , Information Services/statistics & numerical data , Internet/statistics & numerical data , Marketing of Health Services/methods , Australia , Health Education/methods , Humans , Internet/trends , Male , Prostatic Neoplasms

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL