Search | VHL Regional Portal

1.

StORF-Reporter: finding genes between genes.

Dimonaco, Nicholas J; Clare, Amanda; Kenobi, Kim; Aubrey, Wayne; Creevey, Christopher J.

Nucleic Acids Res ; 51(21): 11504-11517, 2023 Nov 27.

Article in English | MEDLINE | ID: mdl-37897345

ABSTRACT

Large regions of prokaryotic genomes are currently without any annotation, in part due to well-established limitations of annotation tools. For example, it is routine for genes using alternative start codons to be misreported or completely omitted. Therefore, we present StORF-Reporter, a tool that takes an annotated genome and returns regions that may contain missing CDS genes from unannotated regions. StORF-Reporter consists of two parts. The first begins with the extraction of unannotated regions from an annotated genome. Next, Stop-ORFs (StORFs) are identified in these unannotated regions. StORFs are open reading frames that are delimited by stop codons and thus can capture those genes most often missing in genome annotations. We show this methodology recovers genes missing from canonical genome annotations. We inspect the results of the genomes of model organisms, the pangenome of Escherichia coli, and a set of 5109 prokaryotic genomes of 247 genera from the Ensembl Bacteria database. StORF-Reporter extended the core, soft-core and accessory gene collections, identified novel gene families and extended families into additional genera. The high levels of sequence conservation observed between genera suggest that many of these StORFs are likely to be functional genes that should now be considered for inclusion in canonical annotations.

Subject(s)

Escherichia coli , Genome, Bacterial , Open Reading Frames/genetics , Databases, Factual , Escherichia coli/genetics , Molecular Sequence Annotation

2.

No one tool to rule them all: prokaryotic gene prediction tool annotations are highly dependent on the organism of study.

Dimonaco, Nicholas J; Aubrey, Wayne; Kenobi, Kim; Clare, Amanda; Creevey, Christopher J.

Bioinformatics ; 38(5): 1198-1207, 2022 02 07.

Article in English | MEDLINE | ID: mdl-34875010

ABSTRACT

MOTIVATION: The biases in CoDing Sequence (CDS) prediction tools, which have been based on historic genomic annotations from model organisms, impact our understanding of novel genomes and metagenomes. This hinders the discovery of new genomic information as it results in predictions being biased towards existing knowledge. To date, users have lacked a systematic and replicable approach to identify the strengths and weaknesses of any CDS prediction tool and allow them to choose the right tool for their analysis. RESULTS: We present an evaluation framework (ORForise) based on a comprehensive set of 12 primary and 60 secondary metrics that facilitate the assessment of the performance of CDS prediction tools. This makes it possible to identify which performs better for specific use-cases. We use this to assess 15 ab initio- and model-based tools representing those most widely used (historically and currently) to generate the knowledge in genomic databases. We find that the performance of any tool is dependent on the genome being analysed, and no individual tool ranked as the most accurate across all genomes or metrics analysed. Even the top-ranked tools produced conflicting gene collections, which could not be resolved by aggregation. The ORForise evaluation framework provides users with a replicable, data-led approach to make informed tool choices for novel genome annotations and for refining historical annotations. AVAILABILITY AND IMPLEMENTATION: Code and datasets for reproduction and customisation are available at https://github.com/NickJD/ORForise. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Genomics , Software , Prokaryotic Cells , Molecular Sequence Annotation , Metagenome

3.

On the complexity of haplotyping a microbial community.

Nicholls, Samuel M; Aubrey, Wayne; De Grave, Kurt; Schietgat, Leander; Creevey, Christopher J; Clare, Amanda.

Bioinformatics ; 37(10): 1360-1366, 2021 Jun 16.

Article in English | MEDLINE | ID: mdl-33444437

ABSTRACT

MOTIVATION: Population-level genetic variation enables competitiveness and niche specialization in microbial communities. Despite the difficulty in culturing many microbes from an environment, we can still study these communities by isolating and sequencing DNA directly from an environment (metagenomics). Recovering the genomic sequences of all isoforms of a given gene across all organisms in a metagenomic sample would aid evolutionary and ecological insights into microbial ecosystems with potential benefits for medicine and biotechnology. A significant obstacle to this goal arises from the lack of a computationally tractable solution that can recover these sequences from sequenced read fragments. This poses a problem analogous to reconstructing the two sequences that make up the genome of a diploid organism (i.e. haplotypes) but for an unknown number of individuals and haplotypes. RESULTS: The problem of single individual haplotyping was first formalized by Lancia et al. in 2001. Now, nearly two decades later, we discuss the complexity of 'haplotyping' metagenomic samples, with a new formalization of Lancia et al.'s data structure that allows us to effectively extend the single individual haplotype problem to microbial communities. This work describes and formalizes the problem of recovering genes (and other genomic subsequences) from all individuals within a complex community sample, which we term the metagenomic individual haplotyping problem. We also provide software implementations for a pairwise single nucleotide variant (SNV) co-occurrence matrix and greedy graph traversal algorithm. AVAILABILITY AND IMPLEMENTATION: Our reference implementation of the described pairwise SNV matrix (Hansel) and greedy haplotype path traversal algorithm (Gretel) is open source, MIT licensed and freely available online at github.com/samstudio8/hansel and github.com/samstudio8/gretel, respectively.

4.

A Tool for Multiple Targeted Genome Deletions that Is Precise, Scar-Free, and Suitable for Automation.

Aubrey, Wayne; Riley, Michael C; Young, Michael; King, Ross D; Oliver, Stephen G; Clare, Amanda.

PLoS One ; 10(12): e0142494, 2015.

Article in English | MEDLINE | ID: mdl-26630677

ABSTRACT

Many advances in synthetic biology require the removal of a large number of genomic elements from a genome. Most existing deletion methods leave behind markers, and as there are a limited number of markers, such methods can only be applied a fixed number of times. Deletion methods that recycle markers generally are either imprecise (remove untargeted sequences), or leave scar sequences which can cause genome instability and rearrangements. No existing marker recycling method is automation-friendly. We have developed a novel openly available deletion tool that consists of: 1) a method for deleting genomic elements that can be repeatedly used without limit, is precise, scar-free, and suitable for automation; and 2) software to design the method's primers. Our tool is sequence agnostic and could be used to delete large numbers of coding sequences, promoter regions, transcription factor binding sites, terminators, etc in a single genome. We have validated our tool on the deletion of non-essential open reading frames (ORFs) from S. cerevisiae. The tool is applicable to arbitrary genomes, and we provide primer sequences for the deletion of: 90% of the ORFs from the S. cerevisiae genome, 88% of the ORFs from S. pombe genome, and 85% of the ORFs from the L. lactis genome.

Subject(s)

Cicatrix/genetics , Computational Biology/methods , Genome, Fungal , Open Reading Frames/genetics , Saccharomyces cerevisiae/genetics , Schizosaccharomyces/genetics , Sequence Deletion , Automation , Genomics/methods , Polymerase Chain Reaction , Software

5.

Cheaper faster drug development validated by the repositioning of drugs against neglected tropical diseases.

Williams, Kevin; Bilsland, Elizabeth; Sparkes, Andrew; Aubrey, Wayne; Young, Michael; Soldatova, Larisa N; De Grave, Kurt; Ramon, Jan; de Clare, Michaela; Sirawaraporn, Worachart; Oliver, Stephen G; King, Ross D.

J R Soc Interface ; 12(104): 20141289, 2015 Mar 06.

Article in English | MEDLINE | ID: mdl-25652463

ABSTRACT

There is an urgent need to make drug discovery cheaper and faster. This will enable the development of treatments for diseases currently neglected for economic reasons, such as tropical and orphan diseases, and generally increase the supply of new drugs. Here, we report the Robot Scientist 'Eve' designed to make drug discovery more economical. A Robot Scientist is a laboratory automation system that uses artificial intelligence (AI) techniques to discover scientific knowledge through cycles of experimentation. Eve integrates and automates library-screening, hit-confirmation, and lead generation through cycles of quantitative structure activity relationship learning and testing. Using econometric modelling we demonstrate that the use of AI to select compounds economically outperforms standard drug screening. For further efficiency Eve uses a standardized form of assay to compute Boolean functions of compound properties. These assays can be quickly and cheaply engineered using synthetic biology, enabling more targets to be assayed for a given budget. Eve has repositioned several drugs against specific targets in parasites that cause tropical diseases. One validated discovery is that the anti-cancer compound TNP-470 is a potent inhibitor of dihydrofolate reductase from the malaria-causing parasite Plasmodium vivax.

Subject(s)

Drug Design , Drug Repositioning , Rare Diseases/drug therapy , Technology, Pharmaceutical/trends , Algorithms , Antineoplastic Agents/therapeutic use , Automation , Drug Evaluation, Preclinical , Humans , Malaria, Vivax/drug therapy , Models, Statistical , Plasmodium vivax/drug effects , Quantitative Structure-Activity Relationship , Regression Analysis , Reproducibility of Results , Software , Tropical Medicine

6.

PD5: a general purpose library for primer design software.

Riley, Michael C; Aubrey, Wayne; Young, Michael; Clare, Amanda.

PLoS One ; 8(11): e80156, 2013.

Article in English | MEDLINE | ID: mdl-24278254

ABSTRACT

BACKGROUND: Complex PCR applications for large genome-scale projects require fast, reliable and often highly sophisticated primer design software applications. Presently, such applications use pipelining methods to utilise many third party applications and this involves file parsing, interfacing and data conversion, which is slow and prone to error. A fully integrated suite of software tools for primer design would considerably improve the development time, the processing speed, and the reliability of bespoke primer design software applications. RESULTS: The PD5 software library is an open-source collection of classes and utilities, providing a complete collection of software building blocks for primer design and analysis. It is written in object-oriented C(++) with an emphasis on classes suitable for efficient and rapid development of bespoke primer design programs. The modular design of the software library simplifies the development of specific applications and also integration with existing third party software where necessary. We demonstrate several applications created using this software library that have already proved to be effective, but we view the project as a dynamic environment for building primer design software and it is open for future development by the bioinformatics community. Therefore, the PD5 software library is published under the terms of the GNU General Public License, which guarantee access to source-code and allow redistribution and modification. CONCLUSIONS: The PD5 software library is downloadable from Google Code and the accompanying Wiki includes instructions and examples: http://code.google.com/p/primer-design.

Subject(s)

DNA Primers , Polymerase Chain Reaction , Software , Programming Languages

7.

Yeast-based automated high-throughput screens to identify anti-parasitic lead compounds.

Bilsland, Elizabeth; Sparkes, Andrew; Williams, Kevin; Moss, Harry J; de Clare, Michaela; Pir, Pinar; Rowland, Jem; Aubrey, Wayne; Pateman, Ron; Young, Mike; Carrington, Mark; King, Ross D; Oliver, Stephen G.

Open Biol ; 3(2): 120158, 2013 Feb 27.

Article in English | MEDLINE | ID: mdl-23446112

ABSTRACT

We have developed a robust, fully automated anti-parasitic drug-screening method that selects compounds specifically targeting parasite enzymes and not their host counterparts, thus allowing the early elimination of compounds with potential side effects. Our yeast system permits multiple parasite targets to be assayed in parallel owing to the strains' expression of different fluorescent proteins. A strain expressing the human target is included in the multiplexed screen to exclude compounds that do not discriminate between host and parasite enzymes. This form of assay has the advantages of using known targets and not requiring the in vitro culture of parasites. We performed automated screens for inhibitors of parasite dihydrofolate reductases, N-myristoyltransferases and phosphoglycerate kinases, finding specific inhibitors of parasite targets. We found that our 'hits' have significant structural similarities to compounds with in vitro anti-parasitic activity, validating our screens and suggesting targets for hits identified in parasite-based assays. Finally, we demonstrate a 60 per cent success rate for our hit compounds in killing or severely inhibiting the growth of Trypanosoma brucei, the causative agent of African sleeping sickness.

Subject(s)

Antiparasitic Agents/pharmacology , Lead/chemistry , Small Molecule Libraries/chemistry , Trypanosomiasis, African/drug therapy , Antiparasitic Agents/chemistry , Drug Discovery , High-Throughput Screening Assays , Humans , Lead/pharmacology , Trypanosoma brucei brucei/drug effects , Trypanosomiasis, African/pathology , Yeasts/drug effects

8.

Towards Robot Scientists for autonomous scientific discovery.

Sparkes, Andrew; Aubrey, Wayne; Byrne, Emma; Clare, Amanda; Khan, Muhammed N; Liakata, Maria; Markham, Magdalena; Rowland, Jem; Soldatova, Larisa N; Whelan, Kenneth E; Young, Michael; King, Ross D.

Autom Exp ; 2: 1, 2010 Jan 04.

Article in English | MEDLINE | ID: mdl-20119518

ABSTRACT

We review the main components of autonomous scientific discovery, and how they lead to the concept of a Robot Scientist. This is a system which uses techniques from artificial intelligence to automate all aspects of the scientific discovery process: it generates hypotheses from a computer model of the domain, designs experiments to test these hypotheses, runs the physical experiments using robotic systems, analyses and interprets the resulting data, and repeats the cycle. We describe our two prototype Robot Scientists: Adam and Eve. Adam has recently proven the potential of such systems by identifying twelve genes responsible for catalysing specific reactions in the metabolic pathways of the yeast Saccharomyces cerevisiae. This work has been formally recorded in great detail using logic. We argue that the reporting of science needs to become fully formalised and that Robot Scientists can help achieve this. This will make scientific information more reproducible and reusable, and promote the integration of computers in scientific reasoning. We believe the greater automation of both the physical and intellectual aspects of scientific investigations to be essential to the future of science. Greater automation improves the accuracy and reliability of experiments, increases the pace of discovery and, in common with conventional laboratory automation, removes tedious and repetitive tasks from the human scientist.

9.

Make way for robot scientists.

King, Ross D; Rowland, Jem; Oliver, Stephen G; Young, Michael; Aubrey, Wayne; Byrne, Emma; Liakata, Maria; Markham, Magdalena; Pir, Pinar; Soldatova, Larisa N; Sparkes, Andrew; Whelan, Kenneth E; Clare, Amanda.

Science ; 325(5943): 945, 2009 Aug 21.

Article in English | MEDLINE | ID: mdl-19696334

Subject(s)

Artificial Intelligence , Robotics , Science , Automation

10.

The automation of science.

King, Ross D; Rowland, Jem; Oliver, Stephen G; Young, Michael; Aubrey, Wayne; Byrne, Emma; Liakata, Maria; Markham, Magdalena; Pir, Pinar; Soldatova, Larisa N; Sparkes, Andrew; Whelan, Kenneth E; Clare, Amanda.

Science ; 324(5923): 85-9, 2009 Apr 03.

Article in English | MEDLINE | ID: mdl-19342587

ABSTRACT

The basis of science is the hypothetico-deductive method and the recording of experiments in sufficient detail to enable reproducibility. We report the development of Robot Scientist "Adam," which advances the automation of both. Adam has autonomously generated functional genomics hypotheses about the yeast Saccharomyces cerevisiae and experimentally tested these hypotheses by using laboratory automation. We have confirmed Adam's conclusions through manual experiments. To describe Adam's research, we have developed an ontology and logical language. The resulting formalization involves over 10,000 different research units in a nested treelike structure, 10 levels deep, that relates the 6.6 million biomass measurements to their logical description. This formalization describes how a machine contributed to scientific knowledge.

Subject(s)

Artificial Intelligence , Automation , Computational Biology , Enzymes/genetics , Genes, Fungal , Saccharomyces cerevisiae/genetics , Computers , Genomics , Programming Languages , Robotics , Saccharomyces cerevisiae/enzymology , Saccharomyces cerevisiae/growth & development , Saccharomyces cerevisiae/metabolism , Software

11.

The EXACT description of biomedical protocols.

Soldatova, Larisa N; Aubrey, Wayne; King, Ross D; Clare, Amanda.

Bioinformatics ; 24(13): i295-303, 2008 Jul 01.

Article in English | MEDLINE | ID: mdl-18586727

ABSTRACT

MOTIVATION: Many published manuscripts contain experiment protocols which are poorly described or deficient in information. This means that the published results are very hard or impossible to repeat. This problem is being made worse by the increasing complexity of high-throughput/automated methods. There is therefore a growing need to represent experiment protocols in an efficient and unambiguous way. RESULTS: We have developed the Experiment ACTions (EXACT) ontology as the basis of a method of representing biological laboratory protocols. We provide example protocols that have been formalized using EXACT, and demonstrate the advantages and opportunities created by using this formalization. We argue that the use of EXACT will result in the publication of protocols with increased clarity and usefulness to the scientific community. AVAILABILITY: The ontology, examples and code can be downloaded from http://www.aber.ac.uk/compsci/Research/bio/dss/EXACT/.

Subject(s)

Database Management Systems , Databases, Factual , Documentation/methods , Information Storage and Retrieval/methods , Internet , Research/classification , Research/standards

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL