Search | VHL Regional Portal

1.

Fast and accurate genome-wide predictions and structural modeling of protein-protein interactions using Galaxy.

Guerler, Aysam; Baker, Dannon; van den Beek, Marius; Gruening, Bjoern; Bouvier, Dave; Coraor, Nate; Shank, Stephen D; Zehr, Jordan D; Schatz, Michael C; Nekrutenko, Anton.

BMC Bioinformatics ; 24(1): 263, 2023 Jun 23.

Article in English | MEDLINE | ID: mdl-37353753

ABSTRACT

BACKGROUND: Protein-protein interactions play a crucial role in almost all cellular processes. Identifying interacting proteins reveals insight into living organisms and yields novel drug targets for disease treatment. Here, we present a publicly available, automated pipeline to predict genome-wide protein-protein interactions and produce high-quality multimeric structural models. RESULTS: Application of our method to the Human and Yeast genomes yield protein-protein interaction networks similar in quality to common experimental methods. We identified and modeled Human proteins likely to interact with the papain-like protease of SARS-CoV2's non-structural protein 3. We also produced models of SARS-CoV2's spike protein (S) interacting with myelin-oligodendrocyte glycoprotein receptor and dipeptidyl peptidase-4. CONCLUSIONS: The presented method is capable of confidently identifying interactions while providing high-quality multimeric structural models for experimental validation. The interactome modeling pipeline is available at usegalaxy.org and usegalaxy.eu.

Subject(s)

COVID-19 , Protein Interaction Mapping , Humans , RNA, Viral/metabolism , SARS-CoV-2 , Saccharomyces cerevisiae/metabolism

2.

Integrating Multimeric Threading With High-throughput Experiments for Structural Interactome of Escherichia coli.

Gong, Weikang; Guerler, Aysam; Zhang, Chengxin; Warner, Elisa; Li, Chunhua; Zhang, Yang.

J Mol Biol ; 433(10): 166944, 2021 05 14.

Article in English | MEDLINE | ID: mdl-33741411

ABSTRACT

Genome-wide protein-protein interaction (PPI) determination remains a significant unsolved problem in structural biology. The difficulty is twofold since high-throughput experiments (HTEs) have often a relatively high false-positive rate in assigning PPIs, and PPI quaternary structures are more difficult to solve than tertiary structures using traditional structural biology techniques. We proposed a uniform pipeline, Threpp, to address both problems. Starting from a pair of monomer sequences, Threpp first threads both sequences through a complex structure library, where the alignment score is combined with HTE data using a naïve Bayesian classifier model to predict the likelihood of two chains to interact with each other. Next, quaternary complex structures of the identified PPIs are constructed by reassembling monomeric alignments with dimeric threading frameworks through interface-specific structural alignments. The pipeline was applied to the Escherichia coli genome and created 35,125 confident PPIs which is 4.5-fold higher than HTE alone. Graphic analyses of the PPI networks show a scale-free cluster size distribution, consistent with previous studies, which was found critical to the robustness of genome evolution and the centrality of functionally important proteins that are essential to E. coli survival. Furthermore, complex structure models were constructed for all predicted E. coli PPIs based on the quaternary threading alignments, where 6771 of them were found to have a high confidence score that corresponds to the correct fold of the complexes with a TM-score >0.5, and 39 showed a close consistency with the later released experimental structures with an average TM-score = 0.73. These results demonstrated the significant usefulness of threading-based homologous modeling in both genome-wide PPI network detection and complex structural construction.

Subject(s)

Escherichia coli Proteins/genetics , Escherichia coli/genetics , HSP70 Heat-Shock Proteins/genetics , Phosphotransferases/genetics , Proteome/genetics , Transcription Factors/genetics , Bayes Theorem , Cluster Analysis , Escherichia coli/metabolism , Escherichia coli Proteins/chemistry , Escherichia coli Proteins/metabolism , Gene Expression Regulation, Bacterial , Genome, Bacterial , HSP70 Heat-Shock Proteins/chemistry , HSP70 Heat-Shock Proteins/metabolism , Phosphotransferases/chemistry , Phosphotransferases/metabolism , Protein Folding , Protein Interaction Mapping , Protein Interaction Maps/genetics , Protein Structure, Quaternary , Proteome/chemistry , Proteome/metabolism , Signal Transduction , Transcription Factors/chemistry , Transcription Factors/metabolism

3.

The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update.

Afgan, Enis; Baker, Dannon; Batut, Bérénice; van den Beek, Marius; Bouvier, Dave; Cech, Martin; Chilton, John; Clements, Dave; Coraor, Nate; Grüning, Björn A; Guerler, Aysam; Hillman-Jackson, Jennifer; Hiltemann, Saskia; Jalili, Vahid; Rasche, Helena; Soranzo, Nicola; Goecks, Jeremy; Taylor, James; Nekrutenko, Anton; Blankenberg, Daniel.

Nucleic Acids Res ; 46(W1): W537-W544, 2018 07 02.

Article in English | MEDLINE | ID: mdl-29790989

ABSTRACT

Galaxy (homepage: https://galaxyproject.org, main public server: https://usegalaxy.org) is a web-based scientific analysis platform used by tens of thousands of scientists across the world to analyze large biomedical datasets such as those found in genomics, proteomics, metabolomics and imaging. Started in 2005, Galaxy continues to focus on three key challenges of data-driven biomedical science: making analyses accessible to all researchers, ensuring analyses are completely reproducible, and making it simple to communicate analyses so that they can be reused and extended. During the last two years, the Galaxy team and the open-source community around Galaxy have made substantial improvements to Galaxy's core framework, user interface, tools, and training materials. Framework and user interface improvements now enable Galaxy to be used for analyzing tens of thousands of datasets, and >5500 tools are now available from the Galaxy ToolShed. The Galaxy community has led an effort to create numerous high-quality tutorials focused on common types of genomic analyses. The Galaxy developer and user communities continue to grow and be integral to Galaxy's development. The number of Galaxy public servers, developers contributing to the Galaxy framework and its tools, and users of the main Galaxy server have all increased substantially.

Subject(s)

Genomics/statistics & numerical data , Metabolomics/statistics & numerical data , Molecular Imaging/statistics & numerical data , Proteomics/statistics & numerical data , User-Computer Interface , Datasets as Topic , Humans , Information Dissemination , International Cooperation , Internet , Reproducibility of Results

4.

The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update.

Afgan, Enis; Baker, Dannon; van den Beek, Marius; Blankenberg, Daniel; Bouvier, Dave; Cech, Martin; Chilton, John; Clements, Dave; Coraor, Nate; Eberhard, Carl; Grüning, Björn; Guerler, Aysam; Hillman-Jackson, Jennifer; Von Kuster, Greg; Rasche, Eric; Soranzo, Nicola; Turaga, Nitesh; Taylor, James; Nekrutenko, Anton; Goecks, Jeremy.

Nucleic Acids Res ; 44(W1): W3-W10, 2016 07 08.

Article in English | MEDLINE | ID: mdl-27137889

ABSTRACT

High-throughput data production technologies, particularly 'next-generation' DNA sequencing, have ushered in widespread and disruptive changes to biomedical research. Making sense of the large datasets produced by these technologies requires sophisticated statistical and computational methods, as well as substantial computational power. This has led to an acute crisis in life sciences, as researchers without informatics training attempt to perform computation-dependent analyses. Since 2005, the Galaxy project has worked to address this problem by providing a framework that makes advanced computational tools usable by non experts. Galaxy seeks to make data-intensive research more accessible, transparent and reproducible by providing a Web-based environment in which users can perform computational analyses and have all of the details automatically tracked for later inspection, publication, or reuse. In this report we highlight recently added features enabling biomedical analyses on a large scale.

Subject(s)

Computational Biology/statistics & numerical data , Datasets as Topic/statistics & numerical data , User-Computer Interface , Biomedical Research , Computational Biology/methods , Databases, Genetic , Humans , Internet , Reproducibility of Results

5.

A Network of Splice Isoforms for the Mouse.

Li, Hong-Dong; Menon, Rajasree; Eksi, Ridvan; Guerler, Aysam; Zhang, Yang; Omenn, Gilbert S; Guan, Yuanfang.

Sci Rep ; 6: 24507, 2016 Apr 15.

Article in English | MEDLINE | ID: mdl-27079421

ABSTRACT

The laboratory mouse is the primary mammalian species used for studying alternative splicing events. Recent studies have generated computational models to predict functions for splice isoforms in the mouse. However, the functional relationship network, describing the probability of splice isoforms participating in the same biological process or pathway, has not yet been studied in the mouse. Here we describe a rich genome-wide resource of mouse networks at the isoform level, which was generated using a unique framework that was originally developed to infer isoform functions. This network was built through integrating heterogeneous genomic and protein data, including RNA-seq, exon array, protein docking and pseudo-amino acid composition. Through simulation and cross-validation studies, we demonstrated the accuracy of the algorithm in predicting isoform-level functional relationships. We showed that this network enables the users to reveal functional differences of the isoforms of the same gene, as illustrated by literature evidence with Anxa6 (annexin a6) as an example. We expect this work will become a useful resource for the mouse genetics community to understand gene functions. The network is publicly available at: http://guanlab.ccmb.med.umich.edu/isoformnetwork.

Subject(s)

Alternative Splicing , Gene Regulatory Networks , RNA Isoforms , Algorithms , Animals , Computational Biology/methods , Computer Simulation , Genomics/methods , Machine Learning , Mice , Protein Interaction Mapping , Reproducibility of Results

6.

Mapping monomeric threading to protein-protein structure prediction.

Guerler, Aysam; Govindarajoo, Brandon; Zhang, Yang.

J Chem Inf Model ; 53(3): 717-25, 2013 Mar 25.

Article in English | MEDLINE | ID: mdl-23413988

ABSTRACT

The key step of template-based protein-protein structure prediction is the recognition of complexes from experimental structure libraries that have similar quaternary fold. Maintaining two monomer and dimer structure libraries is however laborious, and inappropriate library construction can degrade template recognition coverage. We propose a novel strategy SPRING to identify complexes by mapping monomeric threading alignments to protein-protein interactions based on the original oligomer entries in the PDB, which does not rely on library construction and increases the efficiency and quality of complex template recognitions. SPRING is tested on 1838 nonhomologous protein complexes which can recognize correct quaternary template structures with a TM score >0.5 in 1115 cases after excluding homologous proteins. The average TM score of the first model is 60% and 17% higher than that by HHsearch and COTH, respectively, while the number of targets with an interface RMSD <2.5 Å by SPRING is 134% and 167% higher than these competing methods. SPRING is controlled with ZDOCK on 77 docking benchmark proteins. Although the relative performance of SPRING and ZDOCK depends on the level of homology filters, a combination of the two methods can result in a significantly higher model quality than ZDOCK at all homology thresholds. These data demonstrate a new efficient approach to quaternary structure recognition that is ready to use for genome-scale modeling of protein-protein interactions due to the high speed and accuracy.

Subject(s)

Protein Conformation , Proteins/chemistry , Algorithms , Benchmarking , Databases, Protein , Dimerization , Models, Molecular , Sequence Alignment , Software

7.

GIS: a comprehensive source for protein structure similarities.

Guerler, Aysam; Knapp, Ernst-Walter.

Nucleic Acids Res ; 38(Web Server issue): W46-52, 2010 Jul.

Article in English | MEDLINE | ID: mdl-20460464

ABSTRACT

A web service for analysis of protein structures that are sequentially or non-sequentially similar was generated. Recently, the non-sequential structure alignment algorithm GANGSTA+ was introduced. GANGSTA+ can detect non-sequential structural analogs for proteins stated to possess novel folds. Since GANGSTA+ ignores the polypeptide chain connectivity of secondary structure elements (i.e. alpha-helices and beta-strands), it is able to detect structural similarities also between proteins whose sequences were reshuffled during evolution. GANGSTA+ was applied in an all-against-all comparison on the ASTRAL40 database (SCOP version 1.75), which consists of >10,000 protein domains yielding about 55 x 10(6) possible protein structure alignments. Here, we provide the resulting protein structure alignments as a public web-based service, named GANGSTA+ Internet Services (GIS). We also allow to browse the ASTRAL40 database of protein structures with GANGSTA+ relative to an externally given protein structure using different constraints to select specific results. GIS allows us to analyze protein structure families according to the SCOP classification scheme. Additionally, users can upload their own protein structures for pairwise protein structure comparison, alignment against all protein structures of the ASTRAL40 database (SCOP version 1.75) or symmetry analysis. GIS is publicly available at http://agknapp.chemie.fu-berlin.de/gplus.

Subject(s)

Software , Structural Homology, Protein , Algorithms , Computer Graphics , Databases, Protein , Internet , Protein Structure, Tertiary

8.

Strategies of non-sequential protein structure alignments.

Guerler, Aysam; Knapp, Ernst-Walter.

Genome Inform ; 22: 21-9, 2010 Jan.

Article in English | MEDLINE | ID: mdl-20238416

ABSTRACT

Due to the large number of available protein structure alignment algorithms, a lot of effort has been made to define robust measures to evaluate their performances and the quality of generated alignments. Most quality measures involve the number of aligned residues and the RMSD. In this work, we analyze how these two properties are influenced by different residue assignment strategies as employed in common non-sequential structure alignment algorithms. Therefore, we implemented different residue assignment strategies into our non-sequential structure alignment algorithm GANGSTA+. We compared the resulting numbers of aligned residues and RMSDs for each residue assignment strategy and different alignment algorithms on a benchmark set of circular-permuted protein pairs. Unfortunately, differences in the residue assignment strategies are often ignored when comparing the performances of different algorithms. However, our results clearly show that this may strongly bias the observations. Bringing residue assignment strategies in line can explain observed performance differences between entirely different alignment algorithms. Our results suggest that performance comparison of non-sequential protein structure alignment algorithms should be based on the same residue assignment strategy.

Subject(s)

Algorithms , Computational Biology , Proteins/chemistry , Sequence Alignment/methods , Databases, Factual , Humans , Software

9.

Circular permuted proteins in the universe of protein folds.

Schmidt-Goenner, Tobias; Guerler, Aysam; Kolbeck, Bjoern; Knapp, Ernst Walter.

Proteins ; 78(7): 1618-30, 2010 May 15.

Article in English | MEDLINE | ID: mdl-20112421

ABSTRACT

Finding and identifying circular permuted protein pairs (CPP) is one of the harder tasks for structure alignment programs, because of the different location of the break in the polypeptide chain connectivity. The protein structure alignment tool GANGSTA+ was used to search for CPPs in a database of nearly 10,000 protein structures. It also allows determination of the statistical significance of the occurrence of circular permutations in the protein universe. The number of detected CPPs was found to be higher than expected, raising questions about the evolutionary processes leading to CPPs. The GANGSTA+ protein structure alignment tool is available online via the web server at http://gangsta.chemie.fu-berlin.de. On the same webpage the complete data base of similar protein structure pairs based on the ASTRAL40 set of protein domains is provided and one can select CPPs specifically.

Subject(s)

Models, Chemical , Protein Folding , Proteins/chemistry , Models, Molecular , Protein Conformation , Sequence Alignment , Software

10.

Symmetric structures in the universe of protein folds.

Guerler, Aysam; Wang, Connie; Knapp, Ernst-Walter.

J Chem Inf Model ; 49(9): 2147-51, 2009 Sep.

Article in English | MEDLINE | ID: mdl-19728738

ABSTRACT

Insights in structural biology can be gained by analyzing protein architectures and characterizing their structural similarities. Current computational approaches enable a comparison of a variety of structural and physicochemical properties in protein space. Here we describe the automated detection of rotational symmetries within a representative set of nearly 10,000 nonhomologous protein structures. To find structural symmetries in proteins initially, equivalent pairs of secondary structure elements (SSE), i.e., alpha-helices and beta-strands, are assigned. Thereby, we also allow SSE pairs to be assigned in reverse sequential order. The results highlight that the generation of symmetric, i.e., repetitive, protein structures is one of nature's major strategies to explore the universe of possible protein folds. This way structurally separated 'islands' of protein folds with a significant amount of symmetry were identified. The complete results of the present study are available at http://agknapp.chemie.fu-berlin.de/gplus, where symmetry analysis of new protein structures can also be performed.

Subject(s)

Protein Folding , Proteins/chemistry , Databases, Protein , Models, Molecular , Protein Conformation , Proteins/metabolism , Rotation

11.

Novel protein folds and their nonsequential structural analogs.

Guerler, Aysam; Knapp, Ernst-Walter.

Protein Sci ; 17(8): 1374-82, 2008 Aug.

Article in English | MEDLINE | ID: mdl-18583523

ABSTRACT

Newly determined protein structures are classified to belong to a new fold, if the structures are sufficiently dissimilar from all other so far known protein structures. To analyze structural similarities of proteins, structure alignment tools are used. We demonstrate that the usage of nonsequential structure alignment tools, which neglect the polypeptide chain connectivity, can yield structure alignments with significant similarities between proteins of known three-dimensional structure and newly determined protein structures that possess a new fold. The recently introduced protein structure alignment tool, GANGSTA, is specialized to perform nonsequential alignments with proper assignment of the secondary structure types by focusing on helices and strands only. In the new version, GANGSTA+, the underlying algorithms were completely redesigned, yielding enhanced quality of structure alignments, offering alignment against a larger database of protein structures, and being more efficient. We applied DaliLite, TM-align, and GANGSTA+ on three protein crystal structures considered to be novel folds. Applying GANGSTA+ to these novel folds, we find proteins in the ASTRAL40 database, which possess significant structural similarities, albeit the alignments are nonsequential and in some cases involve secondary structure elements aligned in reverse orientation. A web server is available at http://agknapp.chemie.fu-berlin.de/gplus for pairwise alignment, visualization, and database comparison.

Subject(s)

Algorithms , Sequence Analysis, Protein/methods , Computational Biology/methods , Databases, Protein , Models, Molecular , Protein Folding , Protein Structure, Secondary , Structural Homology, Protein

12.

Superimpose: a 3D structural superposition server.

Bauer, Raphael A; Bourne, Philip E; Formella, Arno; Frömmel, Cornelius; Gille, Christoph; Goede, Andrean; Guerler, Aysam; Hoppe, Andreas; Knapp, Ernst-Walter; Pöschel, Thorsten; Wittig, Burghardt; Ziegler, Valentin; Preissner, Robert.

Nucleic Acids Res ; 36(Web Server issue): W47-54, 2008 Jul 01.

Article in English | MEDLINE | ID: mdl-18492720

ABSTRACT

The Superimposé webserver performs structural similarity searches with a preference towards 3D structure-based methods. Similarities can be detected between small molecules (e.g. drugs), parts of large structures (e.g. binding sites of proteins) and entire proteins. For this purpose, a number of algorithms were implemented and various databases are provided. Superimposé assists the user regarding the selection of a suitable combination of algorithm and database. After the computation on our server infrastructure, a visual assessment of the results is provided. The structure-based in silico screening for similar drug-like compounds enables the detection of scaffold-hoppers with putatively similar effects. The possibility to find similar binding sites can be of special interest in the functional analysis of proteins. The search for structurally similar proteins allows the detection of similar folds with different backbone topology. The Superimposé server is available at: http://bioinformatics.charite.de/superimpose.

Subject(s)

Molecular Conformation , Software , Structural Homology, Protein , Algorithms , Binding Sites , Databases, Factual , Internet , Models, Molecular , Pharmaceutical Preparations/chemistry , Proteins/chemistry

13.

Sampling geometries of protein-protein complexes.

Guerler, Aysam; Lorenzen, Stephan; Krull, Florian; Knapp, Ernst-Walter.

Genome Inform ; 20: 260-9, 2008.

Article in English | MEDLINE | ID: mdl-19425140

ABSTRACT

Protein-protein docking is a major task in structural biology. In general, the geometries of protein pairs are sampled by generating docked conformations, analyzing them with scoring functions and selecting appropriate geometries for further refinement. Here, we present an algorithm in real space to sample geometries of protein pairs. Therefore, we initially determine uniformly distributed points on the surfaces of the two protein structures to be docked and additionally define a set of uniformly distributed rotations. Then, the sampling method generates structures of protein pairs as follows: (i) We rotate one protein of the protein pair according to a selected rotation and (ii) translate it along a line connecting two surface points belonging to different proteins such that these surface points coincide. The resulting protein pair geometries are then analyzed and selected using a scoring function that considers residues and atom pairs. We applied this approach to a set of 22 enzyme-inhibitor complexes and demonstrate that a discretisation of the rigid-body search in real space provides an efficient and robust sampling scheme. Our method generates decoy sets with a considerable fraction of near-native geometries for all considered enzyme-inhibitor complexes.

Subject(s)

Proteins/chemistry , Proteins/metabolism , Algorithms , Conserved Sequence , Databases, Protein , Homeostasis , Ligands , Models, Molecular , Protein Binding , Protein Conformation , Receptors, Cell Surface/chemistry , Receptors, Cell Surface/metabolism , Serine Endopeptidases/chemistry , Serine Endopeptidases/metabolism , Serine Proteinase Inhibitors/chemistry , Serine Proteinase Inhibitors/pharmacology , Surface Properties , Thermodynamics

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL