Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 12 de 12
Filter
Add more filters










Publication year range
1.
ArXiv ; 2024 Jun 19.
Article in English | MEDLINE | ID: mdl-38947934

ABSTRACT

We introduce ProteinWorkshop, a comprehensive benchmark suite for representation learning on protein structures with Geometric Graph Neural Networks. We consider large-scale pre-training and downstream tasks on both experimental and predicted structures to enable the systematic evaluation of the quality of the learned structural representation and their usefulness in capturing functional relationships for downstream tasks. We find that: (1) large-scale pretraining on AlphaFold structures and auxiliary tasks consistently improve the performance of both rotation-invariant and equivariant GNNs, and (2) more expressive equivariant GNNs benefit from pretraining to a greater extent compared to invariant models. We aim to establish a common ground for the machine learning and computational biology communities to rigorously compare and advance protein structure representation learning. Our open-source codebase reduces the barrier to entry for working with large protein structure datasets by providing: (1) storage-efficient dataloaders for large-scale structural databases including AlphaFoldDB and ESM Atlas, as well as (2) utilities for constructing new tasks from the entire PDB. ProteinWorkshop is available at: github.com/a-r-j/ProteinWorkshop.

2.
ArXiv ; 2024 Jun 19.
Article in English | MEDLINE | ID: mdl-38947930

ABSTRACT

We introduce RNA-FrameFlow, the first generative model for 3D RNA backbone design. We build upon SE(3) flow matching for protein backbone generation and establish protocols for data preparation and evaluation to address unique challenges posed by RNA modeling. We formulate RNA structures as a set of rigid-body frames and associated loss functions which account for larger, more conformationally flexible RNA backbones (13 atoms per nucleotide) vs. proteins (4 atoms per residue). Toward tackling the lack of diversity in 3D RNA datasets, we explore training with structural clustering and cropping augmentations. Additionally, we define a suite of evaluation metrics to measure whether the generated RNA structures are globally self-consistent (via inverse folding followed by forward folding) and locally recover RNA-specific structural descriptors. The most performant version of RNA-FrameFlow generates locally realistic RNA backbones of 40-150 nucleotides, over 40% of which pass our validity criteria as measured by a self-consistency TM-score >= 0.45, at which two RNAs have the same global fold. Open-source code: https://github.com/rish-16/rna-backbone-design.

3.
bioRxiv ; 2024 May 25.
Article in English | MEDLINE | ID: mdl-38826198

ABSTRACT

Computational RNA design tasks are often posed as inverse problems, where sequences are designed based on adopting a single desired secondary structure without considering 3D geometry and conformational diversity. We introduce gRNAde, a geometric RNA design pipeline operating on 3D RNA backbones to design sequences that explicitly account for structure and dynamics. Under the hood, gRNAde is a multi-state Graph Neural Network that generates candidate RNA sequences conditioned on one or more 3D backbone structures where the identities of the bases are unknown. On a single-state fixed backbone re-design benchmark of 14 RNA structures from the PDB identified by Das et al. [2010], gRNAde obtains higher native sequence recovery rates (56% on average) compared to Rosetta (45% on average), taking under a second to produce designs compared to the reported hours for Rosetta. We further demonstrate the utility of gRNAde on a new benchmark of multi-state design for structurally flexible RNAs, as well as zero-shot ranking of mutational fitness landscapes in a retrospective analysis of a recent RNA polymerase ribozyme structure.

4.
ArXiv ; 2024 May 25.
Article in English | MEDLINE | ID: mdl-38827456

ABSTRACT

Computational RNA design tasks are often posed as inverse problems, where sequences are designed based on adopting a single desired secondary structure without considering 3D geometry and conformational diversity. We introduce gRNAde, a geometric RNA design pipeline operating on 3D RNA backbones to design sequences that explicitly account for structure and dynamics. Under the hood, gRNAde is a multi-state Graph Neural Network that generates candidate RNA sequences conditioned on one or more 3D backbone structures where the identities of the bases are unknown. On a single-state fixed backbone re-design benchmark of 14 RNA structures from the PDB identified by Das et al. [2010], gRNAde obtains higher native sequence recovery rates (56% on average) compared to Rosetta (45% on average), taking under a second to produce designs compared to the reported hours for Rosetta. We further demonstrate the utility of gRNAde on a new benchmark of multi-state design for structurally flexible RNAs, as well as zero-shot ranking of mutational fitness landscapes in a retrospective analysis of a recent RNA polymerase ribozyme structure. Open source code: https://github.com/chaitjo/geometric-rna-design.

5.
Chem Sci ; 13(45): 13541-13551, 2022 Nov 23.
Article in English | MEDLINE | ID: mdl-36507171

ABSTRACT

Photoswitchable molecules display two or more isomeric forms that may be accessed using light. Separating the electronic absorption bands of these isomers is key to selectively addressing a specific isomer and achieving high photostationary states whilst overall red-shifting the absorption bands serves to limit material damage due to UV-exposure and increases penetration depth in photopharmacological applications. Engineering these properties into a system through synthetic design however, remains a challenge. Here, we present a data-driven discovery pipeline for molecular photoswitches underpinned by dataset curation and multitask learning with Gaussian processes. In the prediction of electronic transition wavelengths, we demonstrate that a multioutput Gaussian process (MOGP) trained using labels from four photoswitch transition wavelengths yields the strongest predictive performance relative to single-task models as well as operationally outperforming time-dependent density functional theory (TD-DFT) in terms of the wall-clock time for prediction. We validate our proposed approach experimentally by screening a library of commercially available photoswitchable molecules. Through this screen, we identified several motifs that displayed separated electronic absorption bands of their isomers, exhibited red-shifted absorptions, and are suited for information transfer and photopharmacological applications. Our curated dataset, code, as well as all models are made available at https://github.com/Ryan-Rhys/The-Photoswitch-Dataset.

6.
Comput Struct Biotechnol J ; 19: 3938-3953, 2021.
Article in English | MEDLINE | ID: mdl-34234921

ABSTRACT

Viruses often encode proteins that mimic host proteins in order to facilitate infection. Little work has been done to understand the potential mimicry of the SARS-CoV-2, SARS-CoV, and MERS-CoV spike proteins, particularly the receptor-binding motifs, which could be important in determining tropism and druggability of the virus. Peptide and epitope motifs have been detected on coronavirus spike proteins using sequence homology approaches; however, comparing the three-dimensional shape of the protein has been shown as more informative in predicting mimicry than sequence-based comparisons. Here, we use structural bioinformatics software to characterize potential mimicry of the three coronavirus spike protein receptor-binding motifs. We utilize sequence-independent alignment tools to compare structurally known protein models with the receptor-binding motifs and verify potential mimicked interactions with protein docking simulations. Both human and non-human proteins were returned for all three receptor-binding motifs. For example, all three were similar to several proteins containing EGF-like domains: some of which are endogenous to humans, such as thrombomodulin, and others exogenous, such as Plasmodium falciparum MSP-1. Similarity to human proteins may reveal which pathways the spike protein is co-opting, while analogous non-human proteins may indicate shared host interaction partners and overlapping antibody cross-reactivity. These findings can help guide experimental efforts to further understand potential interactions between human and coronavirus proteins.

7.
Methods Mol Biol ; 2361: 263-288, 2021.
Article in English | MEDLINE | ID: mdl-34236667

ABSTRACT

Protein-protein interactions (PPIs) are central to cellular functions. Experimental methods for predicting PPIs are well developed but are time and resource expensive and suffer from high false-positive error rates at scale. Computational prediction of PPIs is highly desirable for a mechanistic understanding of cellular processes and offers the potential to identify highly selective drug targets. In this chapter, details of developing a deep learning approach to predicting which residues in a protein are involved in forming a PPI-a task known as PPI site prediction-are outlined. The key decisions to be made in defining a supervised machine learning project in this domain are here highlighted. Alternative training regimes for deep learning models to address shortcomings in existing approaches and provide starting points for further research are discussed. This chapter is written to serve as a companion to developing deep learning approaches to protein-protein interaction site prediction, and an introduction to developing geometric deep learning projects operating on protein structure graphs.


Subject(s)
Deep Learning , Protein Interaction Mapping , Proteins , Supervised Machine Learning
8.
Brief Bioinform ; 22(6)2021 11 05.
Article in English | MEDLINE | ID: mdl-34013350

ABSTRACT

Graph machine learning (GML) is receiving growing interest within the pharmaceutical and biotechnology industries for its ability to model biomolecular structures, the functional relationships between them, and integrate multi-omic datasets - amongst other data types. Herein, we present a multidisciplinary academic-industrial review of the topic within the context of drug discovery and development. After introducing key terms and modelling approaches, we move chronologically through the drug development pipeline to identify and summarize work incorporating: target identification, design of small molecules and biologics, and drug repurposing. Whilst the field is still emerging, key milestones including repurposed drugs entering in vivo studies, suggest GML will become a modelling framework of choice within biomedical machine learning.


Subject(s)
Computer Graphics , Drug Development/methods , Drug Discovery/methods , Machine Learning , Models, Molecular , Molecular Structure , Algorithms , Drug Repositioning , Neural Networks, Computer
9.
Brief Bioinform ; 22(2): 769-780, 2021 03 22.
Article in English | MEDLINE | ID: mdl-33416848

ABSTRACT

The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a rapidly growing infectious disease, widely spread with high mortality rates. Since the release of the SARS-CoV-2 genome sequence in March 2020, there has been an international focus on developing target-based drug discovery, which also requires knowledge of the 3D structure of the proteome. Where there are no experimentally solved structures, our group has created 3D models with coverage of 97.5% and characterized them using state-of-the-art computational approaches. Models of protomers and oligomers, together with predictions of substrate and allosteric binding sites, protein-ligand docking, SARS-CoV-2 protein interactions with human proteins, impacts of mutations, and mapped solved experimental structures are freely available for download. These are implemented in SARS CoV-2 3D, a comprehensive and user-friendly database, available at https://sars3d.com/. This provides essential information for drug discovery, both to evaluate targets and design new potential therapeutics.


Subject(s)
Antiviral Agents/pharmacology , COVID-19/virology , Databases, Protein , Drug Delivery Systems , Proteome , SARS-CoV-2/drug effects , Humans , SARS-CoV-2/isolation & purification
10.
Curr Biol ; 30(16): 3183-3199.e6, 2020 08 17.
Article in English | MEDLINE | ID: mdl-32619485

ABSTRACT

Nervous systems contain sensory neurons, local neurons, projection neurons, and motor neurons. To understand how these building blocks form whole circuits, we must distil these broad classes into neuronal cell types and describe their network connectivity. Using an electron micrograph dataset for an entire Drosophila melanogaster brain, we reconstruct the first complete inventory of olfactory projections connecting the antennal lobe, the insect analog of the mammalian olfactory bulb, to higher-order brain regions in an adult animal brain. We then connect this inventory to extant data in the literature, providing synaptic-resolution "holotypes" both for heavily investigated and previously unknown cell types. Projection neurons are approximately twice as numerous as reported by light level studies; cell types are stereotyped, but not identical, in cell and synapse numbers between brain hemispheres. The lateral horn, the insect analog of the mammalian cortical amygdala, is the main target for this olfactory information and has been shown to guide innate behavior. Here, we find new connectivity motifs, including axo-axonic connectivity between projection neurons, feedback, and lateral inhibition of these axons by a large population of neurons, and the convergence of different inputs, including non-olfactory inputs and memory-related feedback onto third-order olfactory neurons. These features are less prominent in the mushroom body calyx, the insect analog of the mammalian piriform cortex and a center for associative memory. Our work provides a complete neuroanatomical platform for future studies of the adult Drosophila olfactory system.


Subject(s)
Connectome , Drosophila melanogaster/physiology , Interneurons/metabolism , Mushroom Bodies/metabolism , Neurons/metabolism , Olfactory Pathways , Synapses/physiology , Animals , Female , Interneurons/cytology , Mushroom Bodies/cytology , Neurons/cytology , Smell
11.
Elife ; 82019 05 21.
Article in English | MEDLINE | ID: mdl-31112127

ABSTRACT

Most sensory systems are organized into parallel neuronal pathways that process distinct aspects of incoming stimuli. In the insect olfactory system, second order projection neurons target both the mushroom body, required for learning, and the lateral horn (LH), proposed to mediate innate olfactory behavior. Mushroom body neurons form a sparse olfactory population code, which is not stereotyped across animals. In contrast, odor coding in the LH remains poorly understood. We combine genetic driver lines, anatomical and functional criteria to show that the Drosophila LH has ~1400 neurons and >165 cell types. Genetically labeled LHNs have stereotyped odor responses across animals and on average respond to three times more odors than single projection neurons. LHNs are better odor categorizers than projection neurons, likely due to stereotyped pooling of related inputs. Our results reveal some of the principles by which a higher processing area can extract innate behavioral significance from sensory stimuli.


Subject(s)
Drosophila , Olfactory Cortex/anatomy & histology , Olfactory Cortex/physiology , Olfactory Perception , Animals
12.
PLoS Biol ; 15(10): e2003026, 2017 Oct.
Article in English | MEDLINE | ID: mdl-29049280

ABSTRACT

Here, we present the use of ethoscopes, which are machines for high-throughput analysis of behavior in Drosophila and other animals. Ethoscopes provide a software and hardware solution that is reproducible and easily scalable. They perform, in real-time, tracking and profiling of behavior by using a supervised machine learning algorithm, are able to deliver behaviorally triggered stimuli to flies in a feedback-loop mode, and are highly customizable and open source. Ethoscopes can be built easily by using 3D printing technology and rely on Raspberry Pi microcomputers and Arduino boards to provide affordable and flexible hardware. All software and construction specifications are available at http://lab.gilest.ro/ethoscope.


Subject(s)
Behavior, Animal/physiology , Drosophila melanogaster/physiology , Ethology/instrumentation , Algorithms , Animals , Ethology/methods , Machine Learning , Microcomputers , Printing, Three-Dimensional , Reproducibility of Results , Software
SELECTION OF CITATIONS
SEARCH DETAIL
...