Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 16 de 16
Filter
1.
PLoS Comput Biol ; 16(7): e1007976, 2020 07.
Article in English | MEDLINE | ID: mdl-32702016

ABSTRACT

ELIXIR is a pan-European intergovernmental organisation for life science that aims to coordinate bioinformatics resources in a single infrastructure across Europe; bioinformatics training is central to its strategy, which aims to develop a training community that spans all ELIXIR member states. In an evidence-based approach for strengthening bioinformatics training programmes across Europe, the ELIXIR Training Platform, led by the ELIXIR EXCELERATE Quality and Impact Assessment Subtask in collaboration with the ELIXIR Training Coordinators Group, has implemented an assessment strategy to measure quality and impact of its entire training portfolio. Here, we present ELIXIR's framework for assessing training quality and impact, which includes the following: specifying assessment aims, determining what data to collect in order to address these aims, and our strategy for centralised data collection to allow for ELIXIR-wide analyses. In addition, we present an overview of the ELIXIR training data collected over the past 4 years. We highlight the importance of a coordinated and consistent data collection approach and the relevance of defining specific metrics and answer scales for consortium-wide analyses as well as for comparison of data across iterations of the same course.


Subject(s)
Computational Biology/education , Quality Control , Algorithms , Biomedical Research , Computational Biology/standards , Curriculum , Data Collection , Databases, Factual , Education, Continuing , Europe , Program Evaluation , Reproducibility of Results , Research Personnel , Software , User-Computer Interface
2.
F1000Res ; 62017.
Article in English | MEDLINE | ID: mdl-28781745

ABSTRACT

Quality training in computational skills for life scientists is essential to allow them to deliver robust, reproducible and cutting-edge research. A pan-European bioinformatics programme, ELIXIR, has adopted a well-established and progressive programme of computational lab and data skills training from Software and Data Carpentry, aimed at increasing the number of skilled life scientists and building a sustainable training community in this field. This article describes the Pilot action, which introduced the Carpentry training model to the ELIXIR community.

3.
Gigascience ; 5: 26, 2016 06 07.
Article in English | MEDLINE | ID: mdl-27267963

ABSTRACT

With ever-increasing amounts of data being produced by next-generation sequencing (NGS) experiments, the requirements placed on supporting e-infrastructures have grown. In this work, we provide recommendations based on the collective experiences from participants in the EU COST Action SeqAhead for the tasks of data preprocessing, upstream processing, data delivery, and downstream analysis, as well as long-term storage and archiving. We cover demands on computational and storage resources, networks, software stacks, automation of analysis, education, and also discuss emerging trends in the field. E-infrastructures for NGS require substantial effort to set up and maintain over time, and with sequencing technologies and best practices for data analysis evolving rapidly it is important to prioritize both processing capacity and e-infrastructure flexibility when making strategic decisions to support the data analysis demands of tomorrow. Due to increasingly demanding technical requirements we recommend that e-infrastructure development and maintenance be handled by a professional service unit, be it internal or external to the organization, and emphasis should be placed on collaboration between researchers and IT professionals.


Subject(s)
High-Throughput Nucleotide Sequencing/methods , Sequence Analysis, DNA/methods , Computational Biology/methods , Humans , Information Storage and Retrieval , Internet , Software
4.
Bioinform Biol Insights ; 9: 125-8, 2015.
Article in English | MEDLINE | ID: mdl-26401099

ABSTRACT

Virtualization is becoming increasingly important in bioscience, enabling assembly and provisioning of complete computer setups, including operating system, data, software, and services packaged as virtual machine images (VMIs). We present an open catalog of VMIs for the life sciences, where scientists can share information about images and optionally upload them to a server equipped with a large file system and fast Internet connection. Other scientists can then search for and download images that can be run on the local computer or in a cloud computing environment, providing easy access to bioinformatics environments. We also describe applications where VMIs aid life science research, including distributing tools and data, supporting reproducible analysis, and facilitating education. BioImg.org is freely available at: https://bioimg.org.

5.
Biol Direct ; 10: 43, 2015 Aug 19.
Article in English | MEDLINE | ID: mdl-26282399

ABSTRACT

High-throughput technologies, such as next-generation sequencing, have turned molecular biology into a data-intensive discipline, requiring bioinformaticians to use high-performance computing resources and carry out data management and analysis tasks on large scale. Workflow systems can be useful to simplify construction of analysis pipelines that automate tasks, support reproducibility and provide measures for fault-tolerance. However, workflow systems can incur significant development and administration overhead so bioinformatics pipelines are often still built without them. We present the experiences with workflows and workflow systems within the bioinformatics community participating in a series of hackathons and workshops of the EU COST action SeqAhead. The organizations are working on similar problems, but we have addressed them with different strategies and solutions. This fragmentation of efforts is inefficient and leads to redundant and incompatible solutions. Based on our experiences we define a set of recommendations for future systems to enable efficient yet simple bioinformatics workflow construction and execution.


Subject(s)
Computational Biology/methods , Electronic Data Processing/methods , Workflow , High-Throughput Nucleotide Sequencing , Reproducibility of Results
6.
Bioinformatics ; 31(1): 140-2, 2015 Jan 01.
Article in English | MEDLINE | ID: mdl-25189782

ABSTRACT

SUMMARY: Rapid technological advances have led to an explosion of biomedical data in recent years. The pace of change has inspired new collaborative approaches for sharing materials and resources to help train life scientists both in the use of cutting-edge bioinformatics tools and databases and in how to analyse and interpret large datasets. A prototype platform for sharing such training resources was recently created by the Bioinformatics Training Network (BTN). Building on this work, we have created a centralized portal for sharing training materials and courses, including a catalogue of trainers and course organizers, and an announcement service for training events. For course organizers, the portal provides opportunities to promote their training events; for trainers, the portal offers an environment for sharing materials, for gaining visibility for their work and promoting their skills; for trainees, it offers a convenient one-stop shop for finding suitable training resources and identifying relevant training events and activities locally and worldwide. AVAILABILITY AND IMPLEMENTATION: http://mygoblet.org/training-portal.


Subject(s)
Computational Biology/education , Curriculum , Database Management Systems , Research Personnel/education , Teaching , Humans , Programming Languages , Software Design
7.
Bioinformatics ; 30(1): 119-20, 2014 Jan 01.
Article in English | MEDLINE | ID: mdl-24149054

ABSTRACT

SUMMARY: Hadoop MapReduce-based approaches have become increasingly popular due to their scalability in processing large sequencing datasets. However, as these methods typically require in-depth expertise in Hadoop and Java, they are still out of reach of many bioinformaticians. To solve this problem, we have created SeqPig, a library and a collection of tools to manipulate, analyze and query sequencing datasets in a scalable and simple manner. SeqPigscripts use the Hadoop-based distributed scripting engine Apache Pig, which automatically parallelizes and distributes data processing tasks. We demonstrate SeqPig's scalability over many computing nodes and illustrate its use with example scripts. AVAILABILITY AND IMPLEMENTATION: Available under the open source MIT license at http://sourceforge.net/projects/seqpig/


Subject(s)
High-Throughput Screening Assays/methods , Software Design
8.
Bioinformatics ; 29(15): 1919-21, 2013 Aug 01.
Article in English | MEDLINE | ID: mdl-23742982

ABSTRACT

SUMMARY: We present iAnn, an open source community-driven platform for dissemination of life science events, such as courses, conferences and workshops. iAnn allows automatic visualisation and integration of customised event reports. A central repository lies at the core of the platform: curators add submitted events, and these are subsequently accessed via web services. Thus, once an iAnn widget is incorporated into a website, it permanently shows timely relevant information as if it were native to the remote site. At the same time, announcements submitted to the repository are automatically disseminated to all portals that query the system. To facilitate the visualization of announcements, iAnn provides powerful filtering options and views, integrated in Google Maps and Google Calendar. All iAnn widgets are freely available. AVAILABILITY: http://iann.pro/iannviewer CONTACT: manuel.corpas@tgac.ac.uk.


Subject(s)
Biological Science Disciplines , Software , Anniversaries and Special Events , Congresses as Topic , Internet
9.
Bioinformatics ; 28(6): 876-7, 2012 Mar 15.
Article in English | MEDLINE | ID: mdl-22302568

ABSTRACT

Hadoop-BAM is a novel library for the scalable manipulation of aligned next-generation sequencing data in the Hadoop distributed computing framework. It acts as an integration layer between analysis applications and BAM files that are processed using Hadoop. Hadoop-BAM solves the issues related to BAM data access by presenting a convenient API for implementing map and reduce functions that can directly operate on BAM records. It builds on top of the Picard SAM JDK, so tools that rely on the Picard API are expected to be easily convertible to support large-scale distributed processing. In this article we demonstrate the use of Hadoop-BAM by building a coverage summarizing tool for the Chipster genome browser. Our results show that Hadoop offers good scalability, and one should avoid moving data in and out of Hadoop between analysis steps.


Subject(s)
High-Throughput Nucleotide Sequencing/methods , Sequence Analysis, DNA/methods , Software , Genome , User-Computer Interface
10.
Nucleic Acids Res ; 40(1): e1, 2012 Jan.
Article in English | MEDLINE | ID: mdl-22009681

ABSTRACT

We developed a computational procedure for optimizing the binding site detections in a given ChIP-seq experiment by maximizing their reproducibility under bootstrap sampling. We demonstrate how the procedure can improve the detection accuracies beyond those obtained with the default settings of popular peak calling software, or inform the user whether the peak detection results are compromised, circumventing the need for arbitrary re-iterative peak calling under varying parameter settings. The generic, open-source implementation is easily extendable to accommodate additional features and to promote its widespread application in future ChIP-seq studies. The peakROTS R-package and user guide are freely available at http://www.nic.funet.fi/pub/sci/molbio/peakROTS.


Subject(s)
Chromatin Immunoprecipitation/methods , Transcription Factors/analysis , Animals , Binding Sites , High-Throughput Nucleotide Sequencing , Humans , Mice , Sequence Analysis, DNA , Software
11.
BMC Genomics ; 12: 507, 2011 Oct 14.
Article in English | MEDLINE | ID: mdl-21999641

ABSTRACT

BACKGROUND: The growth of high-throughput technologies such as microarrays and next generation sequencing has been accompanied by active research in data analysis methodology, producing new analysis methods at a rapid pace. While most of the newly developed methods are freely available, their use requires substantial computational skills. In order to enable non-programming biologists to benefit from the method development in a timely manner, we have created the Chipster software. RESULTS: Chipster (http://chipster.csc.fi/) brings a powerful collection of data analysis methods within the reach of bioscientists via its intuitive graphical user interface. Users can analyze and integrate different data types such as gene expression, miRNA and aCGH. The analysis functionality is complemented with rich interactive visualizations, allowing users to select datapoints and create new gene lists based on these selections. Importantly, users can save the performed analysis steps as reusable, automatic workflows, which can also be shared with other users. Being a versatile and easily extendable platform, Chipster can be used for microarray, proteomics and sequencing data. In this article we describe its comprehensive collection of analysis and visualization tools for microarray data using three case studies. CONCLUSIONS: Chipster is a user-friendly analysis software for high-throughput data. Its intuitive graphical user interface enables biologists to access a powerful collection of data analysis and integration tools, and to visualize data interactively. Users can collaborate by sharing analysis sessions and workflows. Chipster is open source, and the server installation package is freely available.


Subject(s)
Microarray Analysis/methods , Software , Algorithms , Databases, Genetic , Gene Expression Regulation , MicroRNAs/analysis , User-Computer Interface
12.
PLoS One ; 6(6): e21495, 2011.
Article in English | MEDLINE | ID: mdl-21731767

ABSTRACT

MicroRNAs (miRNAs) are small regulatory molecules that cause post-transcriptional gene silencing. Although some miRNAs are known to have region-specific expression patterns in the adult brain, the functional consequences of the region-specificity to the gene regulatory networks of the brain nuclei are not clear. Therefore, we studied miRNA expression patterns by miRNA-Seq and microarrays in two brain regions, frontal cortex (FCx) and hippocampus (HP), which have separate biological functions. We identified 354 miRNAs from FCx and 408 from HP using miRNA-Seq, and 245 from FCx and 238 from HP with microarrays. Several miRNA families and clusters were differentially expressed between FCx and HP, including the miR-8 family, miR-182|miR-96|miR-183 cluster, and miR-212|miR-312 cluster overexpressed in FCx and miR-34 family overexpressed in HP. To visualize the clusters, we developed support for viewing genomic alignments of miRNA-Seq reads in the Chipster genome browser. We carried out pathway analysis of the predicted target genes of differentially expressed miRNA families and clusters to assess their putative biological functions. Interestingly, several miRNAs from the same family/cluster were predicted to regulate specific biological pathways. We have developed a miRNA-Seq approach with a bioinformatic analysis workflow that is suitable for studying miRNA expression patterns from specific brain nuclei. FCx and HP were shown to have distinct miRNA expression patterns which were reflected in the predicted gene regulatory pathways. This methodology can be applied for the identification of brain region-specific and phenotype-specific miRNA-mRNA-regulatory networks from the adult and developing rodent brain.


Subject(s)
Frontal Lobe/metabolism , Gene Expression Profiling , Hippocampus/metabolism , MicroRNAs/genetics , Signal Transduction/genetics , Animals , Cluster Analysis , Computational Biology , Gene Expression Regulation , Genome/genetics , Mice , MicroRNAs/metabolism , Oligonucleotide Array Sequence Analysis , Organ Specificity/genetics , RNA, Messenger/genetics , RNA, Messenger/metabolism , Sequence Analysis, RNA
13.
Brief Bioinform ; 10(5): 547-55, 2009 Sep.
Article in English | MEDLINE | ID: mdl-19549804

ABSTRACT

Identification of reliable molecular markers that show differential expression between distinct groups of samples has remained a fundamental research problem in many large-scale profiling studies, such as those based on DNA microarray or mass-spectrometry technologies. Despite the availability of a wide spectrum of statistical procedures, the users of the high-throughput platforms are still facing the crucial challenge of deciding which test statistic is best adapted to the intrinsic properties of their own datasets. To meet this challenge, we recently introduced an adaptive procedure, named ROTS (Reproducibility-Optimized Test Statistic), which learns an optimal statistic directly from the given data, and whose relative benefits have previously been shown in comparison with state-of-the-art procedures for detecting differential expression. Using gene expression microarray and mass-spectrometry (MS)-based protein expression datasets as case studies, we illustrate here the practical usage and advantages of ROTS toward detecting reliable marker lists in clinical transcriptomic and proteomic studies. In a public leukemia microarray dataset, the procedure could improve the sensitivity of the gene marker lists detected with high specificity. When applied to a recent LC-MS dataset, involving plasma samples from severe burn patients, the procedure could identify several peptide markers that remained undetected in the conventional analysis, thus demonstrating the effectiveness of ROTS also for global quantitative proteomic studies. To promote its widespread usage, we have made freely available efficient implementations of ROTS, which are easily accessible either as a stand-alone R-package or as integrated in the open-source data analysis software Chipster.


Subject(s)
Biomarkers/metabolism , Gene Expression Profiling/methods , Oligonucleotide Array Sequence Analysis/methods , Proteomics/methods , Software , Humans , Models, Statistical , Reproducibility of Results , Sensitivity and Specificity
14.
J Appl Crystallogr ; 42(Pt 3): 376-384, 2009 Jun 01.
Article in English | MEDLINE | ID: mdl-22477769

ABSTRACT

Structural biology, homology modelling and rational drug design require accurate three-dimensional macromolecular coordinates. However, the coordinates in the Protein Data Bank (PDB) have not all been obtained using the latest experimental and computational methods. In this study a method is presented for automated re-refinement of existing structure models in the PDB. A large-scale benchmark with 16 807 PDB entries showed that they can be improved in terms of fit to the deposited experimental X-ray data as well as in terms of geometric quality. The re-refinement protocol uses TLS models to describe concerted atom movement. The resulting structure models are made available through the PDB_REDO databank (http://www.cmbi.ru.nl/pdb_redo/). Grid computing techniques were used to overcome the computational requirements of this endeavour.

15.
Brief Bioinform ; 9(6): 493-505, 2008 Nov.
Article in English | MEDLINE | ID: mdl-18621748

ABSTRACT

Programmatic access to data and tools through the web using so-called web services has an important role to play in bioinformatics. In this article, we discuss the most popular approaches based on SOAP/WS-I and REST and describe our, a cross section of the community, experiences with providing and using web services in the context of biological sequence analysis. We briefly review main technological approaches as well as best practice hints that are useful for both users and developers. Finally, syntactic and semantic data integration issues with multiple web services are discussed.


Subject(s)
Computational Biology , Databases, Genetic , Information Storage and Retrieval/methods , Internet/statistics & numerical data , Sequence Analysis/methods , Database Management Systems , Humans , Systems Integration , User-Computer Interface
16.
Nucleic Acids Res ; 36(Database issue): D276-80, 2008 Jan.
Article in English | MEDLINE | ID: mdl-17986464

ABSTRACT

Sequence similarity/database searching is a cornerstone of molecular biology. PairsDB is a database intended to make exploring protein sequences and their similarity relationships quick and easy. Behind PairsDB is a comprehensive collection of protein sequences and BLAST and PSI-BLAST alignments between them. Instead of running BLAST or PSI-BLAST individually on each request, results are retrieved instantaneously from a database of pre-computed alignments. Filtering options allow you to find a set of sequences satisfying a set of criteria-for example, all human proteins with solved structure and without transmembrane segments. PairsDB is continually updated and covers all sequences in Uniprot. The data is stored in a MySQL relational database. Data files will be made available for download at ftp://nic.funet.fi/pub/sci/molbio. PairsDB can also be accessed interactively at http://pairsdb.csc.fi. PairsDB data is a valuable platform to build various downstream automated analysis pipelines. For example, the graph of all-against-all similarity relationships is the starting point for clustering protein families, delineating domains, improving alignment accuracy by consistency measures, and defining orthologous genes. Moreover, query-anchored stacked sequence alignments, profiles and consensus sequences are useful in studies of sequence conservation patterns for clues about possible functional sites.


Subject(s)
Databases, Protein , Sequence Alignment , Sequence Analysis, Protein , Amino Acid Sequence , Conserved Sequence , Humans , Internet , User-Computer Interface
SELECTION OF CITATIONS
SEARCH DETAIL
...