Search | VHL Regional Portal

Structuring research methods and data with the research object model: genomics workflows as a case study.

Hettne, Kristina M; Dharuri, Harish; Zhao, Jun; Wolstencroft, Katherine; Belhajjame, Khalid; Soiland-Reyes, Stian; Mina, Eleni; Thompson, Mark; Cruickshank, Don; Verdes-Montenegro, Lourdes; Garrido, Julian; de Roure, David; Corcho, Oscar; Klyne, Graham; van Schouwen, Reinout; 't Hoen, Peter A C; Bechhofer, Sean; Goble, Carole; Roos, Marco.

J Biomed Semantics ; 5(1): 41, 2014.

Article in English | MEDLINE | ID: mdl-25276335

ABSTRACT

BACKGROUND: One of the main challenges for biomedical research lies in the computer-assisted integrative study of large and increasingly complex combinations of data in order to understand molecular mechanisms. The preservation of the materials and methods of such computational experiments with clear annotations is essential for understanding an experiment, and this is increasingly recognized in the bioinformatics community. Our assumption is that offering means of digital, structured aggregation and annotation of the objects of an experiment will provide necessary meta-data for a scientist to understand and recreate the results of an experiment. To support this we explored a model for the semantic description of a workflow-centric Research Object (RO), where an RO is defined as a resource that aggregates other resources, e.g., datasets, software, spreadsheets, text, etc. We applied this model to a case study where we analysed human metabolite variation by workflows. RESULTS: We present the application of the workflow-centric RO model for our bioinformatics case study. Three workflows were produced following recently defined Best Practices for workflow design. By modelling the experiment as an RO, we were able to automatically query the experiment and answer questions such as "which particular data was input to a particular workflow to test a particular hypothesis?", and "which particular conclusions were drawn from a particular workflow?". CONCLUSIONS: Applying a workflow-centric RO model to aggregate and annotate the resources used in a bioinformatics experiment, allowed us to retrieve the conclusions of the experiment in the context of the driving hypothesis, the executed workflows and their input data. The RO model is an extendable reference model that can be used by other systems as well. AVAILABILITY: The Research Object is available at http://www.myexperiment.org/packs/428 The Wf4Ever Research Object Model is available at http://wf4ever.github.io/ro.

PAV ontology: provenance, authoring and versioning.

Ciccarese, Paolo; Soiland-Reyes, Stian; Belhajjame, Khalid; Gray, Alasdair Jg; Goble, Carole; Clark, Tim.

J Biomed Semantics ; 4(1): 37, 2013 Nov 22.

Article in English | MEDLINE | ID: mdl-24267948

ABSTRACT

BACKGROUND: Provenance is a critical ingredient for establishing trust of published scientific content. This is true whether we are considering a data set, a computational workflow, a peer-reviewed publication or a simple scientific claim with supportive evidence. Existing vocabularies such as Dublin Core Terms (DC Terms) and the W3C Provenance Ontology (PROV-O) are domain-independent and general-purpose and they allow and encourage for extensions to cover more specific needs. In particular, to track authoring and versioning information of web resources, PROV-O provides a basic methodology but not any specific classes and properties for identifying or distinguishing between the various roles assumed by agents manipulating digital artifacts, such as author, contributor and curator. RESULTS: We present the Provenance, Authoring and Versioning ontology (PAV, namespace http://purl.org/pav/): a lightweight ontology for capturing "just enough" descriptions essential for tracking the provenance, authoring and versioning of web resources. We argue that such descriptions are essential for digital scientific content. PAV distinguishes between contributors, authors and curators of content and creators of representations in addition to the provenance of originating resources that have been accessed, transformed and consumed. We explore five projects (and communities) that have adopted PAV illustrating their usage through concrete examples. Moreover, we present mappings that show how PAV extends the W3C PROV-O ontology to support broader interoperability. METHOD: The initial design of the PAV ontology was driven by requirements from the AlzSWAN project with further requirements incorporated later from other projects detailed in this paper. The authors strived to keep PAV lightweight and compact by including only those terms that have demonstrated to be pragmatically useful in existing applications, and by recommending terms from existing ontologies when plausible. DISCUSSION: We analyze and compare PAV with related approaches, namely Provenance Vocabulary (PRV), DC Terms and BIBFRAME. We identify similarities and analyze differences between those vocabularies and PAV, outlining strengths and weaknesses of our proposed model. We specify SKOS mappings that align PAV with DC Terms. We conclude the paper with general remarks on the applicability of PAV.

The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud.

Wolstencroft, Katherine; Haines, Robert; Fellows, Donal; Williams, Alan; Withers, David; Owen, Stuart; Soiland-Reyes, Stian; Dunlop, Ian; Nenadic, Aleksandra; Fisher, Paul; Bhagat, Jiten; Belhajjame, Khalid; Bacall, Finn; Hardisty, Alex; Nieva de la Hidalga, Abraham; Balcazar Vargas, Maria P; Sufi, Shoaib; Goble, Carole.

Nucleic Acids Res ; 41(Web Server issue): W557-61, 2013 Jul.

Article in English | MEDLINE | ID: mdl-23640334

ABSTRACT

The Taverna workflow tool suite (http://www.taverna.org.uk) is designed to combine distributed Web Services and/or local tools into complex analysis pipelines. These pipelines can be executed on local desktop machines or through larger infrastructure (such as supercomputers, Grids or cloud environments), using the Taverna Server. In bioinformatics, Taverna workflows are typically used in the areas of high-throughput omics analyses (for example, proteomics or transcriptomics), or for evidence gathering methods involving text mining or data mining. Through Taverna, scientists have access to several thousand different tools and resources that are freely available from a large range of life science institutions. Once constructed, the workflows are reusable, executable bioinformatics protocols that can be shared, reused and repurposed. A repository of public workflows is available at http://www.myexperiment.org. This article provides an update to the Taverna tool suite, highlighting new features and developments in the workbench and the Taverna Server.

Subject(s)

Computational Biology , Software , Data Mining , Gene Expression Profiling , Internet , Phylogeny , Proteomics , Search Engine , Workflow

Modeling and managing experimental data using FuGE.

Jones, Andrew R; Lister, Allyson L; Hermida, Leandro; Wilkinson, Peter; Eisenacher, Martin; Belhajjame, Khalid; Gibson, Frank; Lord, Phil; Pocock, Matthew; Rosenfelder, Heiko; Santoyo-Lopez, Javier; Wipat, Anil; Paton, Norman W.

OMICS ; 13(3): 239-51, 2009 Jun.

Article in English | MEDLINE | ID: mdl-19441879

ABSTRACT

The Functional Genomics Experiment data model (FuGE) has been developed to increase the consistency and efficiency of experimental data modeling in the life sciences, and it has been adopted by a number of high-profile standardization organizations. FuGE can be used: (1) directly, whereby generic modeling constructs are used to represent concepts from specific experimental activities; or (2) as a framework within which method-specific models can be developed. FuGE is both rich and flexible, providing a considerable number of modeling constructs, which can be used in a range of different ways. However, such richness and flexibility also mean that modelers and application developers have choices to make when applying FuGE in a given context. This paper captures emerging best practice in the use of FuGE in the light of the experience of several groups by: (1) proposing guidelines for the use and extension of the FuGE data model; (2) presenting design patterns that reflect recurring requirements in experimental data modeling; and (3) describing a community software tool kit (STK) that supports application development using FuGE. We anticipate that these guidelines will encourage consistent usage of FuGE, and as such, will contribute to the development of convergent data standards in omics research.

Subject(s)

Computational Biology/methods , Genomics/methods , Models, Theoretical , Computer Simulation , Flow Cytometry/instrumentation , Flow Cytometry/methods , Reproducibility of Results , Software , User-Computer Interface

A toolkit for capturing and sharing FuGE experiments.

Belhajjame, Khalid; Jones, Andrew R; Paton, Norman W.

Bioinformatics ; 24(22): 2647-9, 2008 Nov 15.

Article in English | MEDLINE | ID: mdl-18801749

ABSTRACT

MOTIVATION: The Functional Genomics Experiment Object Model (FuGE) supports modelling of experimental processes either directly or through extensions that specialize FuGE for use in specific contexts. FuGE applications commonly include components that capture, store and search experiment descriptions, where the requirements of different applications have much in common. RESULTS: We describe a toolkit that supports data capture, storage and web-based search of FuGE experiment models; the toolkit can be used directly on FuGE compliant models or configured for use with FuGE extensions. The toolkit is illustrated using a FuGE extension standardized by the proteomics standards initiative, namely GelML. AVAILABILITY: The toolkit and a demonstration are available at http://code.google.com/p/fugetoolkit

Subject(s)

Computational Biology , Genomics/methods , Models, Genetic , Software , Internet

ISPIDER Central: an integrated database web-server for proteomics.

Siepen, Jennifer A; Belhajjame, Khalid; Selley, Julian N; Embury, Suzanne M; Paton, Norman W; Goble, Carole A; Oliver, Stephen G; Stevens, Robert; Zamboulis, Lucas; Martin, Nigel; Poulovassillis, Alexandra; Jones, Philip; Côté, Richard; Hermjakob, Henning; Pentony, Melissa M; Jones, David T; Orengo, Christine A; Hubbard, Simon J.

Nucleic Acids Res ; 36(Web Server issue): W485-90, 2008 Jul 01.

Article in English | MEDLINE | ID: mdl-18440977

ABSTRACT

Despite the growing volumes of proteomic data, integration of the underlying results remains problematic owing to differences in formats, data captured, protein accessions and services available from the individual repositories. To address this, we present the ISPIDER Central Proteomic Database search (http://www.ispider.manchester.ac.uk/cgi-bin/ProteomicSearch.pl), an integration service offering novel search capabilities over leading, mature, proteomic repositories including PRoteomics IDEntifications database (PRIDE), PepSeeker, PeptideAtlas and the Global Proteome Machine. It enables users to search for proteins and peptides that have been characterised in mass spectrometry-based proteomics experiments from different groups, stored in different databases, and view the collated results with specialist viewers/clients. In order to overcome limitations imposed by the great variability in protein accessions used by individual laboratories, the European Bioinformatics Institute's Protein Identifier Cross-Reference (PICR) service is used to resolve accessions from different sequence repositories. Custom-built clients allow users to view peptide/protein identifications in different contexts from multiple experiments and repositories, as well as integration with the Dasty2 client supporting any annotations available from Distributed Annotation System servers. Further information on the protein hits may also be added via external web services able to take a protein as input. This web server offers the first truly integrated access to proteomics repositories and provides a unique service to biologists interested in mass spectrometry-based proteomics.

Subject(s)

Databases, Protein , Proteomics , Software , Computer Graphics , Internet , Mass Spectrometry , Systems Integration

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL