Search | VHL Regional Portal

1.

Fast and accurate genome-wide predictions and structural modeling of protein-protein interactions using Galaxy.

Guerler, Aysam; Baker, Dannon; van den Beek, Marius; Gruening, Bjoern; Bouvier, Dave; Coraor, Nate; Shank, Stephen D; Zehr, Jordan D; Schatz, Michael C; Nekrutenko, Anton.

BMC Bioinformatics ; 24(1): 263, 2023 Jun 23.

Article in English | MEDLINE | ID: mdl-37353753

ABSTRACT

BACKGROUND: Protein-protein interactions play a crucial role in almost all cellular processes. Identifying interacting proteins reveals insight into living organisms and yields novel drug targets for disease treatment. Here, we present a publicly available, automated pipeline to predict genome-wide protein-protein interactions and produce high-quality multimeric structural models. RESULTS: Application of our method to the Human and Yeast genomes yield protein-protein interaction networks similar in quality to common experimental methods. We identified and modeled Human proteins likely to interact with the papain-like protease of SARS-CoV2's non-structural protein 3. We also produced models of SARS-CoV2's spike protein (S) interacting with myelin-oligodendrocyte glycoprotein receptor and dipeptidyl peptidase-4. CONCLUSIONS: The presented method is capable of confidently identifying interactions while providing high-quality multimeric structural models for experimental validation. The interactome modeling pipeline is available at usegalaxy.org and usegalaxy.eu.

Subject(s)

COVID-19 , Protein Interaction Mapping , Humans , RNA, Viral/metabolism , SARS-CoV-2 , Saccharomyces cerevisiae/metabolism

2.

Ready-to-use public infrastructure for global SARS-CoV-2 monitoring.

Maier, Wolfgang; Bray, Simon; van den Beek, Marius; Bouvier, Dave; Coraor, Nathan; Miladi, Milad; Singh, Babita; De Argila, Jordi Rambla; Baker, Dannon; Roach, Nathan; Gladman, Simon; Coppens, Frederik; Martin, Darren P; Lonie, Andrew; Grüning, Björn; Kosakovsky Pond, Sergei L; Nekrutenko, Anton.

Nat Biotechnol ; 39(10): 1178-1179, 2021 10.

Article in English | MEDLINE | ID: mdl-34588690

Subject(s)

COVID-19/epidemiology , Databases, Factual , Pandemics , SARS-CoV-2/pathogenicity , COVID-19/virology , Genome, Viral/genetics , Humans

3.

Freely accessible ready to use global infrastructure for SARS-CoV-2 monitoring.

Maier, Wolfgang; Bray, Simon; van den Beek, Marius; Bouvier, Dave; Coraor, Nathaniel; Miladi, Milad; Singh, Babita; De Argila, Jordi Rambla; Baker, Dannon; Roach, Nathan; Gladman, Simon; Coppens, Frederik; Martin, Darren P; Lonie, Andrew; Grüning, Björn; Kosakovsky Pond, Sergei L; Nekrutenko, Anton.

bioRxiv ; 2021 Mar 25.

Article in English | MEDLINE | ID: mdl-33791701

ABSTRACT

The COVID-19 pandemic is the first global health crisis to occur in the age of big genomic data.Although data generation capacity is well established and sufficiently standardized, analytical capacity is not. To establish analytical capacity it is necessary to pull together global computational resources and deliver the best open source tools and analysis workflows within a ready to use, universally accessible resource. Such a resource should not be controlled by a single research group, institution, or country. Instead it should be maintained by a community of users and developers who ensure that the system remains operational and populated with current tools. A community is also essential for facilitating the types of discourse needed to establish best analytical practices. Bringing together public computational research infrastructure from the USA, Europe, and Australia, we developed a distributed data analysis platform that accomplishes these goals. It is immediately accessible to anyone in the world and is designed for the analysis of rapidly growing collections of deep sequencing datasets. We demonstrate its utility by detecting allelic variants in high-quality existing SARS-CoV-2 sequencing datasets and by continuous reanalysis of COG-UK data. All workflows, data, and documentation is available at https://covid19.galaxyproject.org .

4.

No more business as usual: Agile and effective responses to emerging pathogen threats require open data and open analytics.

Baker, Dannon; van den Beek, Marius; Blankenberg, Daniel; Bouvier, Dave; Chilton, John; Coraor, Nate; Coppens, Frederik; Eguinoa, Ignacio; Gladman, Simon; Grüning, Björn; Keener, Nicholas; Larivière, Delphine; Lonie, Andrew; Kosakovsky Pond, Sergei; Maier, Wolfgang; Nekrutenko, Anton; Taylor, James; Weaver, Steven.

PLoS Pathog ; 16(8): e1008643, 2020 08.

Article in English | MEDLINE | ID: mdl-32790776

ABSTRACT

The current state of much of the Wuhan pneumonia virus (severe acute respiratory syndrome coronavirus 2 [SARS-CoV-2]) research shows a regrettable lack of data sharing and considerable analytical obfuscation. This impedes global research cooperation, which is essential for tackling public health emergencies and requires unimpeded access to data, analysis tools, and computational infrastructure. Here, we show that community efforts in developing open analytical software tools over the past 10 years, combined with national investments into scientific computational infrastructure, can overcome these deficiencies and provide an accessible platform for tackling global health emergencies in an open and transparent manner. Specifically, we use all SARS-CoV-2 genomic data available in the public domain so far to (1) underscore the importance of access to raw data and (2) demonstrate that existing community efforts in curation and deployment of biomedical software can reliably support rapid, reproducible research during global health crises. All our analyses are fully documented at https://github.com/galaxyproject/SARS-CoV-2.

Subject(s)

Betacoronavirus/pathogenicity , Coronavirus Infections/virology , Pneumonia, Viral/virology , Public Health , Severe Acute Respiratory Syndrome/virology , COVID-19 , Data Analysis , Humans , Pandemics , SARS-CoV-2

5.

Community-Driven Data Analysis Training for Biology.

Batut, Bérénice; Hiltemann, Saskia; Bagnacani, Andrea; Baker, Dannon; Bhardwaj, Vivek; Blank, Clemens; Bretaudeau, Anthony; Brillet-Guéguen, Loraine; Cech, Martin; Chilton, John; Clements, Dave; Doppelt-Azeroual, Olivia; Erxleben, Anika; Freeberg, Mallory Ann; Gladman, Simon; Hoogstrate, Youri; Hotz, Hans-Rudolf; Houwaart, Torsten; Jagtap, Pratik; Larivière, Delphine; Le Corguillé, Gildas; Manke, Thomas; Mareuil, Fabien; Ramírez, Fidel; Ryan, Devon; Sigloch, Florian Christoph; Soranzo, Nicola; Wolff, Joachim; Videm, Pavankumar; Wolfien, Markus; Wubuli, Aisanjiang; Yusuf, Dilmurat; Taylor, James; Backofen, Rolf; Nekrutenko, Anton; Grüning, Björn.

Cell Syst ; 6(6): 752-758.e1, 2018 06 27.

Article in English | MEDLINE | ID: mdl-29953864

ABSTRACT

The primary problem with the explosion of biomedical datasets is not the data, not computational resources, and not the required storage space, but the general lack of trained and skilled researchers to manipulate and analyze these data. Eliminating this problem requires development of comprehensive educational resources. Here we present a community-driven framework that enables modern, interactive teaching of data analytics in life sciences and facilitates the development of training materials. The key feature of our system is that it is not a static but a continuously improved collection of tutorials. By coupling tutorials with a web-based analysis framework, biomedical researchers can learn by performing computation themselves through a web browser without the need to install software or search for example datasets. Our ultimate goal is to expand the breadth of training materials to include fundamental statistical and data science topics and to precipitate a complete re-engineering of undergraduate and graduate curricula in life sciences. This project is accessible at https://training.galaxyproject.org.

Subject(s)

Computational Biology/education , Computational Biology/methods , Research Personnel/education , Curriculum , Data Analysis , Education, Distance/methods , Education, Distance/trends , Humans , Software

6.

The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update.

Afgan, Enis; Baker, Dannon; Batut, Bérénice; van den Beek, Marius; Bouvier, Dave; Cech, Martin; Chilton, John; Clements, Dave; Coraor, Nate; Grüning, Björn A; Guerler, Aysam; Hillman-Jackson, Jennifer; Hiltemann, Saskia; Jalili, Vahid; Rasche, Helena; Soranzo, Nicola; Goecks, Jeremy; Taylor, James; Nekrutenko, Anton; Blankenberg, Daniel.

Nucleic Acids Res ; 46(W1): W537-W544, 2018 07 02.

Article in English | MEDLINE | ID: mdl-29790989

ABSTRACT

Galaxy (homepage: https://galaxyproject.org, main public server: https://usegalaxy.org) is a web-based scientific analysis platform used by tens of thousands of scientists across the world to analyze large biomedical datasets such as those found in genomics, proteomics, metabolomics and imaging. Started in 2005, Galaxy continues to focus on three key challenges of data-driven biomedical science: making analyses accessible to all researchers, ensuring analyses are completely reproducible, and making it simple to communicate analyses so that they can be reused and extended. During the last two years, the Galaxy team and the open-source community around Galaxy have made substantial improvements to Galaxy's core framework, user interface, tools, and training materials. Framework and user interface improvements now enable Galaxy to be used for analyzing tens of thousands of datasets, and >5500 tools are now available from the Galaxy ToolShed. The Galaxy community has led an effort to create numerous high-quality tutorials focused on common types of genomic analyses. The Galaxy developer and user communities continue to grow and be integral to Galaxy's development. The number of Galaxy public servers, developers contributing to the Galaxy framework and its tools, and users of the main Galaxy server have all increased substantially.

Subject(s)

Genomics/statistics & numerical data , Metabolomics/statistics & numerical data , Molecular Imaging/statistics & numerical data , Proteomics/statistics & numerical data , User-Computer Interface , Datasets as Topic , Humans , Information Dissemination , International Cooperation , Internet , Reproducibility of Results

7.

The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update.

Afgan, Enis; Baker, Dannon; van den Beek, Marius; Blankenberg, Daniel; Bouvier, Dave; Cech, Martin; Chilton, John; Clements, Dave; Coraor, Nate; Eberhard, Carl; Grüning, Björn; Guerler, Aysam; Hillman-Jackson, Jennifer; Von Kuster, Greg; Rasche, Eric; Soranzo, Nicola; Turaga, Nitesh; Taylor, James; Nekrutenko, Anton; Goecks, Jeremy.

Nucleic Acids Res ; 44(W1): W3-W10, 2016 07 08.

Article in English | MEDLINE | ID: mdl-27137889

ABSTRACT

High-throughput data production technologies, particularly 'next-generation' DNA sequencing, have ushered in widespread and disruptive changes to biomedical research. Making sense of the large datasets produced by these technologies requires sophisticated statistical and computational methods, as well as substantial computational power. This has led to an acute crisis in life sciences, as researchers without informatics training attempt to perform computation-dependent analyses. Since 2005, the Galaxy project has worked to address this problem by providing a framework that makes advanced computational tools usable by non experts. Galaxy seeks to make data-intensive research more accessible, transparent and reproducible by providing a Web-based environment in which users can perform computational analyses and have all of the details automatically tracked for later inspection, publication, or reuse. In this report we highlight recently added features enabling biomedical analyses on a large scale.

Subject(s)

Computational Biology/statistics & numerical data , Datasets as Topic/statistics & numerical data , User-Computer Interface , Biomedical Research , Computational Biology/methods , Databases, Genetic , Humans , Internet , Reproducibility of Results

8.

Dissemination of scientific software with Galaxy ToolShed.

Blankenberg, Daniel; Von Kuster, Gregory; Bouvier, Emil; Baker, Dannon; Afgan, Enis; Stoler, Nicholas; Taylor, James; Nekrutenko, Anton.

Genome Biol ; 15(2): 403, 2014 Feb 20.

Article in English | MEDLINE | ID: mdl-25001293

ABSTRACT

The proliferation of web-based integrative analysis frameworks has enabled users to perform complex analyses directly through the web. Unfortunately, it also revoked the freedom to easily select the most appropriate tools. To address this, we have developed Galaxy ToolShed.

Subject(s)

Computational Biology , Internet , Software , Science

9.

A reference model for deploying applications in virtualized environments.

Afgan, Enis; Baker, Dannon; Nekrutenko, Anton; Taylor, James.

Concurr Comput ; 24(12): 1349-1361, 2012 Aug 25.

Article in English | MEDLINE | ID: mdl-33907528

ABSTRACT

Modern scientific research has been revolutionized by the availability of powerful and flexible computational infrastructure. Virtualization has made it possible to acquire computational resources on demand. Establishing and enabling use of these environments is essential, but their widespread adoption will only succeed if they are transparently usable. Requiring changes to applications being deployed or requiring users to change how they utilize those applications represent barriers to the infrastructure acceptance. The problem lies in the process of deploying applications so that they can take advantage of the elasticity of the environment and deliver it transparently to users. Here, we describe a reference model for deploying applications into virtualized environments. The model is rooted in the low-level components common to a range of virtualized environments and it describes how to compose those otherwise dispersed components into a coherent unit. Use of the model enables applications to be deployed into the new environment without any modifications, it imposes minimal overhead on management of the infrastructure required to run the application, and yields a set of higher-level services as a byproduct of the component organization and the underlying infrastructure. We provide a fully functional sample application deployment and implement a framework for managing the overall application deployment.

10.

Harnessing cloud computing with Galaxy Cloud.

Afgan, Enis; Baker, Dannon; Coraor, Nate; Goto, Hiroki; Paul, Ian M; Makova, Kateryna D; Nekrutenko, Anton; Taylor, James.

Nat Biotechnol ; 29(11): 972-4, 2011 Nov 08.

Article in English | MEDLINE | ID: mdl-22068528

Subject(s)

Computer Storage Devices/trends , Internet/standards , Sequence Analysis, DNA/methods , Software/trends , Computer Storage Devices/economics , DNA, Mitochondrial/genetics , Humans

11.

Galaxy CloudMan: delivering cloud compute clusters.

Afgan, Enis; Baker, Dannon; Coraor, Nate; Chapman, Brad; Nekrutenko, Anton; Taylor, James.

BMC Bioinformatics ; 11 Suppl 12: S4, 2010 Dec 21.

Article in English | MEDLINE | ID: mdl-21210983

ABSTRACT

BACKGROUND: Widespread adoption of high-throughput sequencing has greatly increased the scale and sophistication of computational infrastructure needed to perform genomic research. An alternative to building and maintaining local infrastructure is "cloud computing", which, in principle, offers on demand access to flexible computational infrastructure. However, cloud computing resources are not yet suitable for immediate "as is" use by experimental biologists. RESULTS: We present a cloud resource management system that makes it possible for individual researchers to compose and control an arbitrarily sized compute cluster on Amazon's EC2 cloud infrastructure without any informatics requirements. Within this system, an entire suite of biological tools packaged by the NERC Bio-Linux team (http://nebc.nerc.ac.uk/tools/bio-linux) is available for immediate consumption. The provided solution makes it possible, using only a web browser, to create a completely configured compute cluster ready to perform analysis in less than five minutes. Moreover, we provide an automated method for building custom deployments of cloud resources. This approach promotes reproducibility of results and, if desired, allows individuals and labs to add or customize an otherwise available cloud system to better meet their needs. CONCLUSIONS: The expected knowledge and associated effort with deploying a compute cluster in the Amazon EC2 cloud is not trivial. The solution presented in this paper eliminates these barriers, making it possible for researchers to deploy exactly the amount of computing power they need, combined with a wealth of existing analysis software, to handle the ongoing data deluge.

Subject(s)

Computational Biology/methods , Software , Cluster Analysis , Internet

ABSTRACT

Subject(s)

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL