Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 31
Filter
1.
PLoS Comput Biol ; 16(7): e1007976, 2020 07.
Article in English | MEDLINE | ID: mdl-32702016

ABSTRACT

ELIXIR is a pan-European intergovernmental organisation for life science that aims to coordinate bioinformatics resources in a single infrastructure across Europe; bioinformatics training is central to its strategy, which aims to develop a training community that spans all ELIXIR member states. In an evidence-based approach for strengthening bioinformatics training programmes across Europe, the ELIXIR Training Platform, led by the ELIXIR EXCELERATE Quality and Impact Assessment Subtask in collaboration with the ELIXIR Training Coordinators Group, has implemented an assessment strategy to measure quality and impact of its entire training portfolio. Here, we present ELIXIR's framework for assessing training quality and impact, which includes the following: specifying assessment aims, determining what data to collect in order to address these aims, and our strategy for centralised data collection to allow for ELIXIR-wide analyses. In addition, we present an overview of the ELIXIR training data collected over the past 4 years. We highlight the importance of a coordinated and consistent data collection approach and the relevance of defining specific metrics and answer scales for consortium-wide analyses as well as for comparison of data across iterations of the same course.


Subject(s)
Computational Biology/education , Quality Control , Algorithms , Biomedical Research , Computational Biology/standards , Curriculum , Data Collection , Databases, Factual , Education, Continuing , Europe , Program Evaluation , Reproducibility of Results , Research Personnel , Software , User-Computer Interface
2.
PLoS Comput Biol ; 16(5): e1007854, 2020 05.
Article in English | MEDLINE | ID: mdl-32437350

ABSTRACT

Everything we do today is becoming more and more reliant on the use of computers. The field of biology is no exception; but most biologists receive little or no formal preparation for the increasingly computational aspects of their discipline. In consequence, informal training courses are often needed to plug the gaps; and the demand for such training is growing worldwide. To meet this demand, some training programs are being expanded, and new ones are being developed. Key to both scenarios is the creation of new course materials. Rather than starting from scratch, however, it's sometimes possible to repurpose materials that already exist. Yet finding suitable materials online can be difficult: They're often widely scattered across the internet or hidden in their home institutions, with no systematic way to find them. This is a common problem for all digital objects. The scientific community has attempted to address this issue by developing a set of rules (which have been called the Findable, Accessible, Interoperable and Reusable [FAIR] principles) to make such objects more findable and reusable. Here, we show how to apply these rules to help make training materials easier to find, (re)use, and adapt, for the benefit of all.


Subject(s)
Computer-Assisted Instruction/standards , Guidelines as Topic , Biology/education , Computational Biology , Humans , Information Storage and Retrieval
3.
Article in English | MEDLINE | ID: mdl-30059314

ABSTRACT

RNA-Sequencing and de novo assembly have enabled the analysis of species with non-available reference transcriptomes, although intrinsic features (biological and technical) induce errors in the reconstruction. A strategy to resolve these errors consists of varying assembling process parameters to generate multiple reconstructions. However, the best assembly selection remains a challenge. Quantitative metrics for quality assessment have been inconsistent when compared with pertinent references. In this paper, a criterion for supporting assembly selection based on mapping DNA microarray hybridized probes to assembly sets is proposed. Mouse and fruit fly RNA-Seq datasets were assembled with standard de novo procedures. Quality assessment was estimated using quantitative metrics and the proposed criterion. The assembly that best mapped to the available reference transcriptomes of these model species provided the highest quality assembly. The hybridized probes identified the best assemblies, whereas quantitative metrics remained inconsistent. For example, subtle probe mapping difference of 0.25 percent, but statistically significant (ANOVA, p < 0.05), enabled the assembly selection that led to identify 3,719 more contigs and led to 1,049 further mapped contigs to the mouse reference transcriptome. The microarray data availability for non-model species makes the proposed criterion suitable for quality assessment of multiple de novo assembly strategies.


Subject(s)
High-Throughput Nucleotide Sequencing/methods , Oligonucleotide Array Sequence Analysis/methods , Sequence Analysis, RNA/methods , Transcriptome/genetics , Animals , Brain Chemistry/genetics , Computational Biology , Drosophila melanogaster , Female , Male , Mice , Mice, Inbred C57BL , RNA, Messenger/analysis , RNA, Messenger/genetics , Sequence Alignment
4.
F1000Res ; 82019.
Article in English | MEDLINE | ID: mdl-31824649

ABSTRACT

Intrinsically disordered proteins (IDPs) and intrinsically disordered regions (IDRs) are now recognised as major determinants in cellular regulation. This white paper presents a roadmap for future e-infrastructure developments in the field of IDP research within the ELIXIR framework. The goal of these developments is to drive the creation of high-quality tools and resources to support the identification, analysis and functional characterisation of IDPs. The roadmap is the result of a workshop titled "An intrinsically disordered protein user community proposal for ELIXIR" held at the University of Padua. The workshop, and further consultation with the members of the wider IDP community, identified the key priority areas for the roadmap including the development of standards for data annotation, storage and dissemination; integration of IDP data into the ELIXIR Core Data Resources; and the creation of benchmarking criteria for IDP-related software. Here, we discuss these areas of priority, how they can be implemented in cooperation with the ELIXIR platforms, and their connections to existing ELIXIR Communities and international consortia. The article provides a preliminary blueprint for an IDP Community in ELIXIR and is an appeal to identify and involve new stakeholders.


Subject(s)
Intrinsically Disordered Proteins/metabolism
5.
Brief Bioinform ; 20(2): 405-415, 2019 03 22.
Article in English | MEDLINE | ID: mdl-29028883

ABSTRACT

Demand for training life scientists in bioinformatics methods, tools and resources and computational approaches is urgent and growing. To meet this demand, new trainers must be prepared with effective teaching practices for delivering short hands-on training sessions-a specific type of education that is not typically part of professional preparation of life scientists in many countries. A new Train-the-Trainer (TtT) programme was created by adapting existing models, using input from experienced trainers and experts in bioinformatics, and from educational and cognitive sciences. This programme was piloted across Europe from May 2016 to January 2017. Preparation included drafting the training materials, organizing sessions to pilot them and studying this paradigm for its potential to support the development and delivery of future bioinformatics training by participants. Seven pilot TtT sessions were carried out, and this manuscript describes the results of the pilot year. Lessons learned include (i) support is required for logistics, so that new instructors can focus on their teaching; (ii) institutions must provide incentives to include training opportunities for those who want/need to become new or better instructors; (iii) formal evaluation of the TtT materials is now a priority; (iv) a strategy is needed to recruit, train and certify new instructor trainers (faculty); and (v) future evaluations must assess utility. Additionally, defining a flexible but rigorous and reliable process of TtT 'certification' may incentivize participants and will be considered in future.


Subject(s)
Biological Science Disciplines/education , Biomedical Research , Computational Biology/education , Data Curation/methods , Education, Continuing , Curriculum , Feasibility Studies , Humans , Pilot Projects
6.
Nat Methods ; 15(11): 984, 2018 Nov.
Article in English | MEDLINE | ID: mdl-30287931

ABSTRACT

This paper was originally published under standard Nature America Inc. copyright. As of the date of this correction, the Resource is available online as an open-access paper with a CC-BY license. No other part of the paper has been changed.

7.
PLoS Comput Biol ; 14(2): e1005772, 2018 02.
Article in English | MEDLINE | ID: mdl-29390004

ABSTRACT

Bioinformatics is recognized as part of the essential knowledge base of numerous career paths in biomedical research and healthcare. However, there is little agreement in the field over what that knowledge entails or how best to provide it. These disagreements are compounded by the wide range of populations in need of bioinformatics training, with divergent prior backgrounds and intended application areas. The Curriculum Task Force of the International Society of Computational Biology (ISCB) Education Committee has sought to provide a framework for training needs and curricula in terms of a set of bioinformatics core competencies that cut across many user personas and training programs. The initial competencies developed based on surveys of employers and training programs have since been refined through a multiyear process of community engagement. This report describes the current status of the competencies and presents a series of use cases illustrating how they are being applied in diverse training contexts. These use cases are intended to demonstrate how others can make use of the competencies and engage in the process of their continuing refinement and application. The report concludes with a consideration of remaining challenges and future plans.


Subject(s)
Computational Biology/education , Curriculum , Education, Graduate , Systems Biology/education , Advisory Committees , Africa , Algorithms , Genetic Predisposition to Disease , Illinois , New South Wales , Ohio , Pennsylvania , Software , Surveys and Questionnaires , United Kingdom , Universities
8.
F1000Res ; 62017.
Article in English | MEDLINE | ID: mdl-28928938

ABSTRACT

One of the main goals of the ELIXIR-EXCELERATE project from the European Union's Horizon 2020 programme is to support a pan-European training programme to increase bioinformatics capacity and competency across ELIXIR Nodes. To this end, a Train-the-Trainer (TtT) programme has been developed by the TtT subtask of EXCELERATE's Training Platform, to try to expose bioinformatics instructors to aspects of pedagogy and evidence-based learning principles, to help them better design, develop and deliver high-quality training in future. As a first step towards such a programme, an ELIXIR-EXCELERATE TtT (EE-TtT) pilot was developed, drawing on existing 'instructor training' models, using input both from experienced instructors and from experts in bioinformatics, the cognitive sciences and educational psychology. This manuscript describes the process of defining the pilot programme, illustrates its goals, structure and contents, and discusses its outcomes. From Jan 2016 to Jan 2017, we carried out seven pilot EE-TtT courses (training more than sixty new instructors), collaboratively drafted the training materials, and started establishing a network of trainers and instructors within the ELIXIR community. The EE-TtT pilot represents an essential step towards the development of a sustainable and scalable ELIXIR TtT programme. Indeed, the lessons learned from the pilot, the experience gained, the materials developed, and the analysis of the feedback collected throughout the seven pilot courses have both positioned us to consolidate the programme in the coming years, and contributed to the development of an enthusiastic and expanding ELIXIR community of instructors and trainers.

9.
Nat Methods ; 14(8): 775-781, 2017 Aug.
Article in English | MEDLINE | ID: mdl-28775673

ABSTRACT

Access to primary research data is vital for the advancement of science. To extend the data types supported by community repositories, we built a prototype Image Data Resource (IDR) that collects and integrates imaging data acquired across many different imaging modalities. IDR links data from several imaging modalities, including high-content screening, super-resolution and time-lapse microscopy, digital pathology, public genetic or chemical databases, and cell and tissue phenotypes expressed using controlled ontologies. Using this integration, IDR facilitates the analysis of gene networks and reveals functional interactions that are inaccessible to individual studies. To enable re-analysis, we also established a computational resource based on Jupyter notebooks that allows remote access to the entire IDR. IDR is also an open source platform that others can use to publish their own image data. Thus IDR provides both a novel on-line resource and a software infrastructure that promotes and extends publication and re-analysis of scientific image data.


Subject(s)
Database Management Systems , Databases, Factual , Image Interpretation, Computer-Assisted/methods , Information Dissemination/methods , Software , User-Computer Interface , Algorithms , Publishing , Systems Integration
10.
PLoS Comput Biol ; 12(6): e1004937, 2016 06.
Article in English | MEDLINE | ID: mdl-27309738

ABSTRACT

The advancement of high-throughput sequencing (HTS) technologies and the rapid development of numerous analysis algorithms and pipelines in this field has resulted in an unprecedentedly high demand for training scientists in HTS data analysis. Embarking on developing new training materials is challenging for many reasons. Trainers often do not have prior experience in preparing or delivering such materials and struggle to keep them up to date. A repository of curated HTS training materials would support trainers in materials preparation, reduce the duplication of effort by increasing the usage of existing materials, and allow for the sharing of teaching experience among the HTS trainers' community. To achieve this, we have developed a strategy for materials' curation and dissemination. Standards for describing training materials have been proposed and applied to the curation of existing materials. A Git repository has been set up for sharing annotated materials that can now be reused, modified, or incorporated into new courses. This repository uses Git; hence, it is decentralized and self-managed by the community and can be forked/built-upon by all users. The repository is accessible at http://bioinformatics.upsc.se/htmr.


Subject(s)
Computational Biology/education , High-Throughput Nucleotide Sequencing/statistics & numerical data , Algorithms , Data Interpretation, Statistical , Education , Humans , Teaching
11.
J Biomed Semantics ; 7: 28, 2016.
Article in English | MEDLINE | ID: mdl-27195102

ABSTRACT

BACKGROUND: Phenotypic data derived from high content screening is currently annotated using free-text, thus preventing the integration of independent datasets, including those generated in different biological domains, such as cell lines, mouse and human tissues. DESCRIPTION: We present the Cellular Microscopy Phenotype Ontology (CMPO), a species neutral ontology for describing phenotypic observations relating to the whole cell, cellular components, cellular processes and cell populations. CMPO is compatible with related ontology efforts, allowing for future cross-species integration of phenotypic data. CMPO was developed following a curator-driven approach where phenotype data were annotated by expert biologists following the Entity-Quality (EQ) pattern. These EQs were subsequently transformed into new CMPO terms following an established post composition process. CONCLUSION: CMPO is currently being utilized to annotate phenotypes associated with high content screening datasets stored in several image repositories including the Image Data Repository (IDR), MitoSys project database and the Cellular Phenotype Database to facilitate data browsing and discoverability.


Subject(s)
Biological Ontologies , Cells/cytology , Microscopy , Phenotype , Single-Cell Analysis
13.
Brief Bioinform ; 17(5): 819-30, 2016 09.
Article in English | MEDLINE | ID: mdl-26420780

ABSTRACT

Phenotypes have gained increased notoriety in the clinical and biological domain owing to their application in numerous areas such as the discovery of disease genes and drug targets, phylogenetics and pharmacogenomics. Phenotypes, defined as observable characteristics of organisms, can be seen as one of the bridges that lead to a translation of experimental findings into clinical applications and thereby support 'bench to bedside' efforts. However, to build this translational bridge, a common and universal understanding of phenotypes is required that goes beyond domain-specific definitions. To achieve this ambitious goal, a digital revolution is ongoing that enables the encoding of data in computer-readable formats and the data storage in specialized repositories, ready for integration, enabling translational research. While phenome research is an ongoing endeavor, the true potential hidden in the currently available data still needs to be unlocked, offering exciting opportunities for the forthcoming years. Here, we provide insights into the state-of-the-art in digital phenotyping, by means of representing, acquiring and analyzing phenotype data. In addition, we provide visions of this field for future research work that could enable better applications of phenotype data.


Subject(s)
Phenotype , Humans , Information Storage and Retrieval , Research Design , Translational Research, Biomedical
14.
Methods ; 96: 27-32, 2016 Mar 01.
Article in English | MEDLINE | ID: mdl-26476368

ABSTRACT

High content screening (HCS) experiments create a classic data management challenge-multiple, large sets of heterogeneous structured and unstructured data, that must be integrated and linked to produce a set of "final" results. These different data include images, reagents, protocols, analytic output, and phenotypes, all of which must be stored, linked and made accessible for users, scientists, collaborators and where appropriate the wider community. The OME Consortium has built several open source tools for managing, linking and sharing these different types of data. The OME Data Model is a metadata specification that supports the image data and metadata recorded in HCS experiments. Bio-Formats is a Java library that reads recorded image data and metadata and includes support for several HCS screening systems. OMERO is an enterprise data management application that integrates image data, experimental and analytic metadata and makes them accessible for visualization, mining, sharing and downstream analysis. We discuss how Bio-Formats and OMERO handle these different data types, and how they can be used to integrate, link and share HCS experiments in facilities and public data repositories. OME specifications and software are open source and are available at https://www.openmicroscopy.org.


Subject(s)
Computational Biology/statistics & numerical data , Data Mining/statistics & numerical data , High-Throughput Screening Assays/statistics & numerical data , Information Storage and Retrieval/statistics & numerical data , Software , Computational Biology/methods , Datasets as Topic , High-Throughput Screening Assays/methods , Humans , Information Dissemination , Information Storage and Retrieval/methods , Internet
15.
Bioinformatics ; 31(16): 2736-40, 2015 Aug 15.
Article in English | MEDLINE | ID: mdl-25861964

ABSTRACT

MOTIVATION: The Cellular Phenotype Database (CPD) is a repository for data derived from high-throughput systems microscopy studies. The aims of this resource are: (i) to provide easy access to cellular phenotype and molecular localization data for the broader research community; (ii) to facilitate integration of independent phenotypic studies by means of data aggregation techniques, including use of an ontology and (iii) to facilitate development of analytical methods in this field. RESULTS: In this article we present CPD, its data structure and user interface, propose a minimal set of information describing RNA interference experiments, and suggest a generic schema for management and aggregation of outputs from phenotypic or molecular localization experiments. The database has a flexible structure for management of data from heterogeneous sources of systems microscopy experimental outputs generated by a variety of protocols and technologies and can be queried by gene, reagent, gene attribute, study keywords, phenotype or ontology terms. AVAILABILITY AND IMPLEMENTATION: CPD is developed as part of the Systems Microscopy Network of Excellence and is accessible at http://www.ebi.ac.uk/fg/sym. CONTACT: jes@ebi.ac.uk or ugis@ebi.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Cells/cytology , Databases as Topic , Microscopy/methods , Phenotype , Statistics as Topic , User-Computer Interface
16.
J Clin Invest ; 125(4): 1648-64, 2015 Apr.
Article in English | MEDLINE | ID: mdl-25774502

ABSTRACT

Tumor cell migration is a key process for cancer cell dissemination and metastasis that is controlled by signal-mediated cytoskeletal and cell matrix adhesion remodeling. Using a phagokinetic track assay with migratory H1299 cells, we performed an siRNA screen of almost 1,500 genes encoding kinases/phosphatases and adhesome- and migration-related proteins to identify genes that affect tumor cell migration speed and persistence. Thirty candidate genes that altered cell migration were validated in live tumor cell migration assays. Eight were associated with metastasis-free survival in breast cancer patients, with integrin ß3-binding protein (ITGB3BP), MAP3K8, NIMA-related kinase (NEK2), and SHC-transforming protein 1 (SHC1) being the most predictive. Examination of genes that modulate migration indicated that SRPK1, encoding the splicing factor kinase SRSF protein kinase 1, is relevant to breast cancer outcomes, as it was highly expressed in basal breast cancer. Furthermore, high SRPK1 expression correlated with poor breast cancer disease outcome and preferential metastasis to the lungs and brain. In 2 independent murine models of breast tumor metastasis, stable shRNA-based SRPK1 knockdown suppressed metastasis to distant organs, including lung, liver, and spleen, and inhibited focal adhesion reorganization. Our study provides comprehensive information on the molecular determinants of tumor cell migration and suggests that SRPK1 has potential as a drug target for limiting breast cancer metastasis.


Subject(s)
Breast Neoplasms/genetics , Neoplasm Metastasis/genetics , Neoplasm Proteins/physiology , Protein Serine-Threonine Kinases/physiology , Animals , Bone Neoplasms/secondary , Carcinoma, Non-Small-Cell Lung/pathology , Cell Adhesion , Cell Movement/genetics , Cell Polarity , Female , Focal Adhesions/physiology , Gene Expression Regulation, Neoplastic , Genetic Association Studies , Humans , Kaplan-Meier Estimate , Lung Neoplasms/pathology , Lung Neoplasms/secondary , Mice , NF-kappa B/metabolism , Neoplasm Proteins/genetics , Nuclear Proteins/physiology , Organ Specificity , Prognosis , Protein Serine-Threonine Kinases/deficiency , Protein Serine-Threonine Kinases/genetics , RNA Interference , RNA, Small Interfering/genetics
17.
Nucleic Acids Res ; 43(Database issue): D1113-6, 2015 Jan.
Article in English | MEDLINE | ID: mdl-25361974

ABSTRACT

The ArrayExpress Archive of Functional Genomics Data (http://www.ebi.ac.uk/arrayexpress) is an international functional genomics database at the European Bioinformatics Institute (EMBL-EBI) recommended by most journals as a repository for data supporting peer-reviewed publications. It contains data from over 7000 public sequencing and 42,000 array-based studies comprising over 1.5 million assays in total. The proportion of sequencing-based submissions has grown significantly over the last few years and has doubled in the last 18 months, whilst the rate of microarray submissions is growing slightly. All data in ArrayExpress are available in the MAGE-TAB format, which allows robust linking to data analysis and visualization tools and standardized analysis. The main development over the last two years has been the release of a new data submission tool Annotare, which has reduced the average submission time almost 3-fold. In the near future, Annotare will become the only submission route into ArrayExpress, alongside MAGE-TAB format-based pipelines. ArrayExpress is a stable and highly accessed resource. Our future tasks include automation of data flows and further integration with other EMBL-EBI resources for the representation of multi-omics data.


Subject(s)
Databases, Genetic , Gene Expression Profiling , Oligonucleotide Array Sequence Analysis , Genomics , High-Throughput Nucleotide Sequencing , Internet , Software
18.
Nucleic Acids Res ; 42(Database issue): D926-32, 2014 Jan.
Article in English | MEDLINE | ID: mdl-24304889

ABSTRACT

Expression Atlas (http://www.ebi.ac.uk/gxa) is a value-added database providing information about gene, protein and splice variant expression in different cell types, organism parts, developmental stages, diseases and other biological and experimental conditions. The database consists of selected high-quality microarray and RNA-sequencing experiments from ArrayExpress that have been manually curated, annotated with Experimental Factor Ontology terms and processed using standardized microarray and RNA-sequencing analysis methods. The new version of Expression Atlas introduces the concept of 'baseline' expression, i.e. gene and splice variant abundance levels in healthy or untreated conditions, such as tissues or cell types. Differential gene expression data benefit from an in-depth curation of experimental intent, resulting in biologically meaningful 'contrasts', i.e. instances of differential pairwise comparisons between two sets of biological replicates. Other novel aspects of Expression Atlas are its strict quality control of raw experimental data, up-to-date RNA-sequencing analysis methods, expression data at the level of gene sets, as well as genes and a more powerful search interface designed to maximize the biological value provided to the user.


Subject(s)
Databases, Genetic , Gene Expression Profiling , Genomics , Humans , Internet , Oligonucleotide Array Sequence Analysis , Proteins/genetics , Proteins/metabolism , RNA Isoforms/metabolism , Sequence Analysis, RNA
19.
Brief Bioinform ; 14(5): 538-47, 2013 Sep.
Article in English | MEDLINE | ID: mdl-23543353

ABSTRACT

High-throughput technologies are widely used in the field of functional genomics and used in an increasing number of applications. For many 'wet lab' scientists, the analysis of the large amount of data generated by such technologies is a major bottleneck that can only be overcome through very specialized training in advanced data analysis methodologies and the use of dedicated bioinformatics software tools. In this article, we wish to discuss the challenges related to delivering training in the analysis of high-throughput sequencing data and how we addressed these challenges in the hands-on training courses that we have developed at the European Bioinformatics Institute.


Subject(s)
Computational Biology/education , Genomics/statistics & numerical data , High-Throughput Nucleotide Sequencing/statistics & numerical data , Academies and Institutes , Curriculum , Data Interpretation, Statistical , Europe , Faculty , Humans , Software , Teaching
20.
Nucleic Acids Res ; 41(Database issue): D987-90, 2013 Jan.
Article in English | MEDLINE | ID: mdl-23193272

ABSTRACT

The ArrayExpress Archive of Functional Genomics Data (http://www.ebi.ac.uk/arrayexpress) is one of three international functional genomics public data repositories, alongside the Gene Expression Omnibus at NCBI and the DDBJ Omics Archive, supporting peer-reviewed publications. It accepts data generated by sequencing or array-based technologies and currently contains data from almost a million assays, from over 30 000 experiments. The proportion of sequencing-based submissions has grown significantly over the last 2 years and has reached, in 2012, 15% of all new data. All data are available from ArrayExpress in MAGE-TAB format, which allows robust linking to data analysis and visualization tools, including Bioconductor and GenomeSpace. Additionally, R objects, for microarray data, and binary alignment format files, for sequencing data, have been generated for a significant proportion of ArrayExpress data.


Subject(s)
Databases, Genetic , Genomics , Microarray Analysis , Databases, Genetic/statistics & numerical data , Databases, Genetic/trends , High-Throughput Nucleotide Sequencing , Internet , Software , User-Computer Interface
SELECTION OF CITATIONS
SEARCH DETAIL
...