Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 45
Filter
1.
PLoS Comput Biol ; 20(5): e1012061, 2024 May.
Article in English | MEDLINE | ID: mdl-38701099

ABSTRACT

To optimize proteins for particular traits holds great promise for industrial and pharmaceutical purposes. Machine Learning is increasingly applied in this field to predict properties of proteins, thereby guiding the experimental optimization process. A natural question is: How much progress are we making with such predictions, and how important is the choice of regressor and representation? In this paper, we demonstrate that different assessment criteria for regressor performance can lead to dramatically different conclusions, depending on the choice of metric, and how one defines generalization. We highlight the fundamental issues of sample bias in typical regression scenarios and how this can lead to misleading conclusions about regressor performance. Finally, we make the case for the importance of calibrated uncertainty in this domain.


Subject(s)
Computational Biology , Machine Learning , Protein Engineering , Protein Engineering/methods , Regression Analysis , Computational Biology/methods , Proteins/chemistry , Algorithms
2.
Res Sq ; 2024 Feb 02.
Article in English | MEDLINE | ID: mdl-38352328

ABSTRACT

Sub-cellular diffusion in living systems reflects cellular processes and interactions. Recent advances in optical microscopy allow the tracking of this nanoscale diffusion of individual objects with an unprecedented level of precision. However, the agnostic and automated extraction of functional information from the diffusion of molecules and organelles within the sub-cellular environment, is labor-intensive and poses a significant challenge. Here we introduce DeepSPT, a deep learning framework to interpret the diffusional 2D or 3D temporal behavior of objects in a rapid and efficient manner, agnostically. Demonstrating its versatility, we have applied DeepSPT to automated mapping of the early events of viral infections, identifying distinct types of endosomal organelles, and clathrin-coated pits and vesicles with up to 95% accuracy and within seconds instead of weeks. The fact that DeepSPT effectively extracts biological information from diffusion alone illustrates that besides structure, motion encodes function at the molecular and subcellular level.

3.
bioRxiv ; 2023 Nov 17.
Article in English | MEDLINE | ID: mdl-38014323

ABSTRACT

Sub-cellular diffusion in living systems reflects cellular processes and interactions. Recent advances in optical microscopy allow the tracking of this nanoscale diffusion of individual objects with an unprecedented level of precision. However, the agnostic and automated extraction of functional information from the diffusion of molecules and organelles within the sub-cellular environment, is labor-intensive and poses a significant challenge. Here we introduce DeepSPT, a deep learning framework to interpret the diffusional 2D or 3D temporal behavior of objects in a rapid and efficient manner, agnostically. Demonstrating its versatility, we have applied DeepSPT to automated mapping of the early events of viral infections, identifying distinct types of endosomal organelles, and clathrin-coated pits and vesicles with up to 95% accuracy and within seconds instead of weeks. The fact that DeepSPT effectively extracts biological information from diffusion alone indicates that besides structure, motion encodes function at the molecular and subcellular level.

4.
Protein Sci ; 32(9): e4733, 2023 09.
Article in English | MEDLINE | ID: mdl-37463013

ABSTRACT

Intrinsically disordered proteins (IDPs) are often multifunctional and frequently posttranslationally modified. Deleted in split hand/split foot 1 (Dss1-Sem1 in budding yeast) is a highly multifunctional IDP associated with a range of protein complexes. However, it remains unknown if the different functions relate to different modified states. In this work, we show that Schizosaccharomyces pombe Dss1 is a substrate for casein kinase 2 in vitro, and we identify three phosphorylated threonines in its linker region separating two known disordered ubiquitin-binding motifs. Phosphorylations of the threonines had no effect on ubiquitin-binding but caused a slight destabilization of the C-terminal α-helix and mediated a direct interaction with the forkhead-associated (FHA) domain of the RING-FHA E3-ubiquitin ligase defective in mitosis 1 (Dma1). The phosphorylation sites are not conserved and are absent in human Dss1. Sequence analyses revealed that the Txx(E/D) motif, which is important for phosphorylation and Dma1 binding, is not linked to certain branches of the evolutionary tree. Instead, we find that the motif appears randomly, supporting the mechanism of ex nihilo evolution of novel motifs. In support of this, other threonine-based motifs, although frequent, are nonconserved in the linker, pointing to additional functions connected to this region. We suggest that Dss1 acts as an adaptor protein that docks to Dma1 via the phosphorylated FHA-binding motifs, while the C-terminal α-helix is free to bind mitotic septins, thereby stabilizing the complex. The presence of Txx(D/E) motifs in the disordered regions of certain septin subunits may be of further relevance to the formation and stabilization of these complexes.


Subject(s)
Cell Cycle Proteins , Schizosaccharomyces pombe Proteins , Schizosaccharomyces , Ubiquitin-Protein Ligases , Humans , Cell Cycle Proteins/genetics , Cell Cycle Proteins/metabolism , Phosphorylation , Protein Binding , Schizosaccharomyces/genetics , Schizosaccharomyces/metabolism , Schizosaccharomyces pombe Proteins/genetics , Schizosaccharomyces pombe Proteins/metabolism , Ubiquitin-Protein Ligases/genetics , Ubiquitin-Protein Ligases/metabolism
5.
Cell Mol Life Sci ; 80(6): 143, 2023 May 09.
Article in English | MEDLINE | ID: mdl-37160462

ABSTRACT

In terms of its relative frequency, lysine is a common amino acid in the human proteome. However, by bioinformatics we find hundreds of proteins that contain long and evolutionarily conserved stretches completely devoid of lysine residues. These so-called lysine deserts show a high prevalence in intrinsically disordered proteins with known or predicted functions within the ubiquitin-proteasome system (UPS), including many E3 ubiquitin-protein ligases and UBL domain proteasome substrate shuttles, such as BAG6, RAD23A, UBQLN1 and UBQLN2. We show that introduction of lysine residues into the deserts leads to a striking increase in ubiquitylation of some of these proteins. In case of BAG6, we show that ubiquitylation is catalyzed by the E3 RNF126, while RAD23A is ubiquitylated by E6AP. Despite the elevated ubiquitylation, mutant RAD23A appears stable, but displays a partial loss of function phenotype in fission yeast. In case of UBQLN1 and BAG6, introducing lysine leads to a reduced abundance due to proteasomal degradation of the proteins. For UBQLN1 we show that arginine residues within the lysine depleted region are critical for its ability to form cytosolic speckles/inclusions. We propose that selective pressure to avoid lysine residues may be a common evolutionary mechanism to prevent unwarranted ubiquitylation and/or perhaps other lysine post-translational modifications. This may be particularly relevant for UPS components as they closely and frequently encounter the ubiquitylation machinery and are thus more susceptible to nonspecific ubiquitylation.


Subject(s)
Proteasome Endopeptidase Complex , Schizosaccharomyces , Humans , Ubiquitin , Lysine , Cytoplasm , Ubiquitination , Schizosaccharomyces/genetics , Molecular Chaperones , Autophagy-Related Proteins , Adaptor Proteins, Signal Transducing , Ubiquitin-Protein Ligases
6.
Elife ; 122023 05 15.
Article in English | MEDLINE | ID: mdl-37184062

ABSTRACT

Predicting the thermodynamic stability of proteins is a common and widely used step in protein engineering, and when elucidating the molecular mechanisms behind evolution and disease. Here, we present RaSP, a method for making rapid and accurate predictions of changes in protein stability by leveraging deep learning representations. RaSP performs on-par with biophysics-based methods and enables saturation mutagenesis stability predictions in less than a second per residue. We use RaSP to calculate ∼ 230 million stability changes for nearly all single amino acid changes in the human proteome, and examine variants observed in the human population. We find that variants that are common in the population are substantially depleted for severe destabilization, and that there are substantial differences between benign and pathogenic variants, highlighting the role of protein stability in genetic diseases. RaSP is freely available-including via a Web interface-and enables large-scale analyses of stability in experimental and predicted protein structures.


Subject(s)
Deep Learning , Humans , Proteins/metabolism , Mutagenesis , Amino Acids/genetics , Protein Stability , Computational Biology/methods
7.
Cell Mol Life Sci ; 79(9): 484, 2022 Aug 16.
Article in English | MEDLINE | ID: mdl-35974206

ABSTRACT

Ubiquitin is a small, globular protein that is conjugated to other proteins as a posttranslational event. A palette of small, folded domains recognizes and binds ubiquitin to translate and effectuate this posttranslational signal. Recent computational studies have suggested that protein regions can recognize ubiquitin via a process of folding upon binding. Using peptide binding arrays, bioinformatics, and NMR spectroscopy, we have uncovered a disordered ubiquitin-binding motif that likely remains disordered when bound and thus expands the palette of ubiquitin-binding proteins. We term this motif Disordered Ubiquitin-Binding Motif (DisUBM) and find it to be present in many proteins with known or predicted functions in degradation and transcription. We decompose the determinants of the motif showing it to rely on features of aromatic and negatively charged residues, and less so on distinct sequence positions in line with its disordered nature. We show that the affinity of the motif is low and moldable by the surrounding disordered chain, allowing for an enhanced interaction surface with ubiquitin, whereby the affinity increases ~ tenfold. Further affinity optimization using peptide arrays pushed the affinity into the low micromolar range, but compromised context dependence. Finally, we find that DisUBMs can emerge from unbiased screening of randomized peptide libraries, featuring in de novo cyclic peptides selected to bind ubiquitin chains. We suggest that naturally occurring DisUBMs can recognize ubiquitin as a posttranslational signal to act as affinity enhancers in IDPs that bind to folded and ubiquitylated binding partners.


Subject(s)
Intrinsically Disordered Proteins , Proteins , Amino Acid Sequence , Intrinsically Disordered Proteins/chemistry , Peptides/metabolism , Protein Binding , Proteins/metabolism , Ubiquitin/metabolism
8.
Nat Commun ; 13(1): 1914, 2022 04 08.
Article in English | MEDLINE | ID: mdl-35395843

ABSTRACT

How we choose to represent our data has a fundamental impact on our ability to subsequently extract information from them. Machine learning promises to automatically determine efficient representations from large unstructured datasets, such as those arising in biology. However, empirical evidence suggests that seemingly minor changes to these machine learning models yield drastically different data representations that result in different biological interpretations of data. This begs the question of what even constitutes the most meaningful representation. Here, we approach this question for representations of protein sequences, which have received considerable attention in the recent literature. We explore two key contexts in which representations naturally arise: transfer learning and interpretable learning. In the first context, we demonstrate that several contemporary practices yield suboptimal performance, and in the latter we demonstrate that taking representation geometry into account significantly improves interpretability and lets the models reveal biological information that is otherwise obscured.


Subject(s)
Machine Learning , Amino Acid Sequence
9.
Proc Natl Acad Sci U S A ; 118(31)2021 08 03.
Article in English | MEDLINE | ID: mdl-34321355

ABSTRACT

Single-particle tracking (SPT) is a key tool for quantitative analysis of dynamic biological processes and has provided unprecedented insights into a wide range of systems such as receptor localization, enzyme propulsion, bacteria motility, and drug nanocarrier delivery. The inherently complex diffusion in such biological systems can vary drastically both in time and across systems, consequently imposing considerable analytical challenges, and currently requires an a priori knowledge of the system. Here we introduce a method for SPT data analysis, processing, and classification, which we term "diffusional fingerprinting." This method allows for dissecting the features that underlie diffusional behavior and establishing molecular identity, regardless of the underlying diffusion type. The method operates by isolating 17 descriptive features for each observed motion trajectory and generating a diffusional map of all features for each type of particle. Precise classification of the diffusing particle identity is then obtained by training a simple logistic regression model. A linear discriminant analysis generates a feature ranking that outputs the main differences among diffusional features, providing key mechanistic insights. Fingerprinting operates by both training on and predicting experimental data, without the need for pretraining on simulated data. We found this approach to work across a wide range of simulated and experimentally diverse systems, such as tracked lipases on fat substrates, transcription factors diffusing in cells, and nanoparticles diffusing in mucus. This flexibility ultimately supports diffusional fingerprinting's utility as a universal paradigm for SPT diffusional analysis and prediction.


Subject(s)
Machine Learning , Single Molecule Imaging/methods , Computer Simulation , Diffusion , Image Interpretation, Computer-Assisted , Movement , Particle Size
10.
Sci Rep ; 11(1): 3246, 2021 02 05.
Article in English | MEDLINE | ID: mdl-33547335

ABSTRACT

Patients with severe COVID-19 have overwhelmed healthcare systems worldwide. We hypothesized that machine learning (ML) models could be used to predict risks at different stages of management and thereby provide insights into drivers and prognostic markers of disease progression and death. From a cohort of approx. 2.6 million citizens in Denmark, SARS-CoV-2 PCR tests were performed on subjects suspected for COVID-19 disease; 3944 cases had at least one positive test and were subjected to further analysis. SARS-CoV-2 positive cases from the United Kingdom Biobank was used for external validation. The ML models predicted the risk of death (Receiver Operation Characteristics-Area Under the Curve, ROC-AUC) of 0.906 at diagnosis, 0.818, at hospital admission and 0.721 at Intensive Care Unit (ICU) admission. Similar metrics were achieved for predicted risks of hospital and ICU admission and use of mechanical ventilation. Common risk factors, included age, body mass index and hypertension, although the top risk features shifted towards markers of shock and organ dysfunction in ICU patients. The external validation indicated fair predictive performance for mortality prediction, but suboptimal performance for predicting ICU admission. ML may be used to identify drivers of progression to more severe disease and for prognostication patients in patients with COVID-19. We provide access to an online risk calculator based on these findings.


Subject(s)
COVID-19/diagnosis , COVID-19/mortality , Computer Simulation , Machine Learning , Age Factors , Aged , Aged, 80 and over , Body Mass Index , COVID-19/complications , COVID-19/physiopathology , Comorbidity , Critical Care , Female , Hospitalization , Humans , Hypertension/complications , Intensive Care Units , Male , Middle Aged , Prognosis , Prospective Studies , ROC Curve , Respiration, Artificial , Risk Factors , Sex Factors
11.
Cell Commun Signal ; 18(1): 132, 2020 08 24.
Article in English | MEDLINE | ID: mdl-32831102

ABSTRACT

BACKGROUND: Class 1 cytokine receptors (C1CRs) are single-pass transmembrane proteins responsible for transmitting signals between the outside and the inside of cells. Remarkably, they orchestrate key biological processes such as proliferation, differentiation, immunity and growth through long disordered intracellular domains (ICDs), but without having intrinsic kinase activity. Despite these key roles, their characteristics remain rudimentarily understood. METHODS: The current paper asks the question of why disorder has evolved to govern signaling of C1CRs by reviewing the literature in combination with new sequence and biophysical analyses of chain properties across the family. RESULTS: We uncover that the C1CR-ICDs are fully disordered and brimming with SLiMs. Many of these short linear motifs (SLiMs) are overlapping, jointly signifying a complex regulation of interactions, including network rewiring by isoforms. The C1CR-ICDs have unique properties that distinguish them from most IDPs and we forward the perception that the C1CR-ICDs are far from simple strings with constitutively bound kinases. Rather, they carry both organizational and operational features left uncovered within their disorder, including mechanisms and complexities of regulatory functions. CONCLUSIONS: Critically, the understanding of the fascinating ability of these long, completely disordered chains to orchestrate complex cellular signaling pathways is still in its infancy, and we urge a perceptional shift away from the current simplistic view towards uncovering their full functionalities and potential. Video abstract.


Subject(s)
Intrinsically Disordered Proteins/chemistry , Intrinsically Disordered Proteins/metabolism , Receptors, Cytokine/chemistry , Receptors, Cytokine/metabolism , Signal Transduction , Amino Acid Motifs , Amino Acid Sequence , Humans , Protein Conformation , Protein Isoforms/chemistry , Protein Isoforms/metabolism
12.
Protein Sci ; 29(1): 169-183, 2020 01.
Article in English | MEDLINE | ID: mdl-31642121

ABSTRACT

Protein domains constitute regions of distinct structural properties and molecular functions that are retained when removed from the rest of the protein. However, due to the lack of tertiary structure, the identification of domains has been largely neglected for long (>50 residues) intrinsically disordered regions. Here we present a sequence-based approach to assess and visualize domain organization in long intrinsically disordered regions based on compositional sequence biases. An online tool to find putative intrinsically disordered domains (IDDomainSpotter) in any protein sequence or sequence alignment using any particular sequence trait is available at http://www.bio.ku.dk/sbinlab/IDDomainSpotter. Using this tool, we have identified a putative domain enriched in hydrophilic and disorder-promoting residues (Pro, Ser, and Thr) and depleted in positive charges (Arg and Lys) bordering the folded DNA-binding domains of several transcription factors (p53, GCR, NAC46, MYB28, and MYB29). This domain, from two different MYB transcription factors, was characterized biophysically to determine its properties. Our analyses show the domain to be extended, dynamic and highly disordered. It connects the DNA-binding domain to other disordered domains and is present and conserved in several transcription factors from different families and domains of life. This example illustrates the potential of IDDomainSpotter to predict, from sequence alone, putative domains of functional interest in otherwise uncharacterized disordered proteins.


Subject(s)
Arabidopsis Proteins/chemistry , Arabidopsis Proteins/genetics , Arabidopsis/chemistry , Arabidopsis/genetics , Transcription Factors/chemistry , Transcription Factors/genetics , Amino Acid Sequence , Arabidopsis/metabolism , Arabidopsis Proteins/metabolism , Bias , Binding Sites , Histone Acetyltransferases , Humans , Models, Molecular , Protein Binding , Protein Domains , Protein Unfolding , Scattering, Small Angle , Transcription Factors/metabolism , X-Ray Diffraction
13.
J Biomol NMR ; 73(12): 713-725, 2019 Dec.
Article in English | MEDLINE | ID: mdl-31598803

ABSTRACT

Phosphorylation is one of the main regulators of cellular signaling typically occurring in flexible parts of folded proteins and in intrinsically disordered regions. It can have distinct effects on the chemical environment as well as on the structural properties near the modification site. Secondary chemical shift analysis is the main NMR method for detection of transiently formed secondary structure in intrinsically disordered proteins (IDPs) and the reliability of the analysis depends on an appropriate choice of random coil model. Random coil chemical shifts and sequence correction factors were previously determined for an Ac-QQXQQ-NH2-peptide series with X being any of the 20 common amino acids. However, a matching dataset on the phosphorylated states has so far only been incompletely determined or determined only at a single pH value. Here we extend the database by the addition of the random coil chemical shifts of the phosphorylated states of serine, threonine and tyrosine measured over a range of pH values covering the pKas of the phosphates and at several temperatures (www.bio.ku.dk/sbinlab/randomcoil). The combined results allow for accurate random coil chemical shift determination of phosphorylated regions at any pH and temperature, minimizing systematic biases of the secondary chemical shifts. Comparison of chemical shifts using random coil sets with and without inclusion of the phosphoryl group, revealed under/over estimations of helicity of up to 33%. The expanded set of random coil values will improve the reliability in detection and quantification of transient secondary structure in phosphorylation-modified IDPs.


Subject(s)
Amino Acids/metabolism , Intrinsically Disordered Proteins/chemistry , Nuclear Magnetic Resonance, Biomolecular/methods , Hydrogen-Ion Concentration , Phosphorylation , Protein Structure, Secondary , Serine/metabolism , Temperature , Threonine/metabolism , Tyrosine/metabolism
14.
Cell Mol Life Sci ; 76(24): 4923-4943, 2019 Dec.
Article in English | MEDLINE | ID: mdl-31134302

ABSTRACT

Proliferating cell nuclear antigen (PCNA) is a cellular hub in DNA metabolism and a potential drug target. Its binding partners carry a short linear motif (SLiM) known as the PCNA-interacting protein-box (PIP-box), but sequence-divergent motifs have been reported to bind to the same binding pocket. To investigate how PCNA accommodates motif diversity, we assembled a set of 77 experimentally confirmed PCNA-binding proteins and analyzed features underlying their binding affinity. Combining NMR spectroscopy, affinity measurements and computational analyses, we corroborate that most PCNA-binding motifs reside in intrinsically disordered regions, that structure preformation is unrelated to affinity, and that the sequence-patterns that encode binding affinity extend substantially beyond the boundaries of the PIP-box. Our systematic multidisciplinary approach expands current views on PCNA interactions and reveals that the PIP-box affinity can be modulated over four orders of magnitude by positive charges in the flanking regions. Including the flanking regions as part of the motif is expected to have broad implications, particularly for interpretation of disease-causing mutations and drug-design, targeting DNA-replication and -repair.


Subject(s)
Amino Acid Motifs/genetics , DNA-Binding Proteins/chemistry , DNA/chemistry , Proliferating Cell Nuclear Antigen/chemistry , DNA/genetics , DNA Repair/genetics , DNA Replication/genetics , DNA-Binding Proteins/genetics , Humans , Intrinsically Disordered Proteins/chemistry , Intrinsically Disordered Proteins/genetics , Magnetic Resonance Spectroscopy , Proliferating Cell Nuclear Antigen/genetics , Protein Conformation
15.
RNA ; 25(2): 219-231, 2019 02.
Article in English | MEDLINE | ID: mdl-30420522

ABSTRACT

RNA molecules are highly dynamic systems characterized by a complex interplay between sequence, structure, dynamics, and function. Molecular simulations can potentially provide powerful insights into the nature of these relationships. The analysis of structures and molecular trajectories of nucleic acids can be nontrivial because it requires processing very high-dimensional data that are not easy to visualize and interpret. Here we introduce Barnaba, a Python library aimed at facilitating the analysis of nucleic acid structures and molecular simulations. The software consists of a variety of analysis tools that allow the user to (i) calculate distances between three-dimensional structures using different metrics, (ii) back-calculate experimental data from three-dimensional structures, (iii) perform cluster analysis and dimensionality reductions, (iv) search three-dimensional motifs in PDB structures and trajectories, and (v) construct elastic network models for nucleic acids and nucleic acids-protein complexes. In addition, Barnaba makes it possible to calculate torsion angles, pucker conformations, and to detect base-pairing/base-stacking interactions. Barnaba produces graphics that conveniently visualize both extended secondary structure and dynamics for a set of molecular conformations. The software is available as a command-line tool as well as a library, and supports a variety of file formats such as PDB, dcd, and xtc files. Source code, documentation, and examples are freely available at https://github.com/srnas/barnaba under GNU GPLv3 license.


Subject(s)
Computational Biology/methods , Nucleic Acid Conformation , RNA/ultrastructure , Software , Base Pairing/genetics , Databases, Protein , Models, Molecular
16.
J Phys Chem B ; 122(49): 11174-11185, 2018 12 13.
Article in English | MEDLINE | ID: mdl-30141937

ABSTRACT

Energy landscape theory suggests that native interactions are a major determinant of the folding mechanism of a protein. Thus, structure-based (Go̅) models have, aided by coarse-graining techniques, shown great success in capturing the mechanisms of protein folding and conformational changes. In certain cases, however, non-native interactions and atomic details are also essential to describe the protein dynamics, prompting the development of a variety of structure-based models that include non-native interactions, and differentiate between different types of attractive potentials. Here, we describe an all-protein-atom hybrid model, termed ProfasiGo, that integrates an implicit solvent all-atom physics-based model (called Profasi) and a structure-based Go̅ potential and its implementation in two software packages (PHAISTOS and ProFASi) that are developed for Monte Carlo sampling of protein molecules. We apply the ProfasiGo model to study the folding free energy landscapes of four topologically similar proteins, one of which can be folded by the simplified potential Profasi and two that have been folded by explicit solvent, all-atom molecular dynamics simulations with the CHARMM22* force field. Our results reveal that the hybrid ProfasiGo model is able to capture many of the details present in the physics-based potentials while retaining the advantages of Go̅ models for sampling and guiding to the native state. We expect that the model will be widely applicable to the study of the folding of more complex proteins or to the study of conformational dynamics and integration with experimental data.


Subject(s)
Homeodomain Proteins/chemistry , Monte Carlo Method , Protein Folding , Algorithms , Molecular Dynamics Simulation , Protein Domains , Thermodynamics
17.
J Phys Chem B ; 122(3): 1195-1204, 2018 01 25.
Article in English | MEDLINE | ID: mdl-29260565

ABSTRACT

Hybrid simulation procedures which combine molecular dynamics with Monte Carlo are attracting increasing attention as tools for improving the sampling efficiency in molecular simulations. In particular, encouraging results have been reported for nonequilibrium candidate protocols, in which a Monte Carlo move is applied gradually, and interleaved with a process that equilibrates the remaining degrees of freedom. Although initial studies have uncovered a substantial potential of the method, its practical applicability for sampling structural transitions in macromolecules remains incompletely understood. Here, we address this issue by systematically investigating the efficiency of the nonequilibrium candidate Monte Carlo on the sampling of rotameric distributions of two peptide systems at atomistic resolution both in vacuum and explicit solvent. The studied systems allow us to directly probe the efficiency with which a single or a few slow degrees of freedom can be driven between well-separated free-energy minima and to explore the sensitivity of the method toward the involved free parameters. In line with results on other systems, our study suggests that order-of-magnitude gains can be obtained in certain scenarios but also identifies challenges that arise when applying the procedure in explicit solvent.

18.
Biophys J ; 110(11): 2342-2348, 2016 06 07.
Article in English | MEDLINE | ID: mdl-27276252

ABSTRACT

Bactofilins constitute a recently discovered class of bacterial proteins that form cytoskeletal filaments. They share a highly conserved domain (DUF583) of which the structure remains unknown, in part due to the large size and noncrystalline nature of the filaments. Here, we describe the atomic structure of a bactofilin domain from Caulobacter crescentus. To determine the structure, we developed an approach that combines a biophysical model for proteins with recently obtained solid-state NMR spectroscopy data and amino acid contacts predicted from a detailed analysis of the evolutionary history of bactofilins. Our structure reveals a triangular ß-helical (solenoid) conformation with conserved residues forming the tightly packed core and polar residues lining the surface. The repetitive structure explains the presence of internal repeats as well as strongly conserved positions, and is reminiscent of other fibrillar proteins. Our work provides a structural basis for future studies of bactofilin biology and for designing molecules that target them, as well as a starting point for determining the organization of the entire bactofilin filament. Finally, our approach presents new avenues for determining structures that are difficult to obtain by traditional means.


Subject(s)
Bacterial Proteins/chemistry , Bacterial Proteins/genetics , Cytoskeleton/chemistry , Cytoskeleton/genetics , Amino Acid Sequence , Caulobacter crescentus , Computer Simulation , Models, Molecular , Monte Carlo Method , Nuclear Magnetic Resonance, Biomolecular , Protein Structure, Secondary , Surface Properties
19.
Proc Natl Acad Sci U S A ; 113(12): 3227-32, 2016 Mar 22.
Article in English | MEDLINE | ID: mdl-26957604

ABSTRACT

Formation of correct disulfide bonds in the endoplasmic reticulum is a crucial step for folding proteins destined for secretion. Protein disulfide isomerases (PDIs) play a central role in this process. We report a previously unidentified, hypervariable family of PDIs that represents the most diverse gene family of oxidoreductases described in a single genus to date. These enzymes are highly expressed specifically in the venom glands of predatory cone snails, animals that synthesize a remarkably diverse set of cysteine-rich peptide toxins (conotoxins). Enzymes in this PDI family, termed conotoxin-specific PDIs, significantly and differentially accelerate the kinetics of disulfide-bond formation of several conotoxins. Our results are consistent with a unique biological scenario associated with protein folding: The diversification of a family of foldases can be correlated with the rapid evolution of an unprecedented diversity of disulfide-rich structural domains expressed by venomous marine snails in the superfamily Conoidea.


Subject(s)
Mollusk Venoms/chemistry , Peptides/chemistry , Protein Disulfide-Isomerases/genetics , Amino Acid Sequence , Animals , Conus Snail , Molecular Sequence Data , Protein Disulfide-Isomerases/chemistry , Protein Folding , Sequence Homology, Amino Acid
20.
PeerJ ; 4: e1725, 2016.
Article in English | MEDLINE | ID: mdl-26966660

ABSTRACT

The ubiquitin-proteasome system targets misfolded proteins for degradation. Since the accumulation of such proteins is potentially harmful for the cell, their prompt removal is important. E3 ubiquitin-protein ligases mediate substrate ubiquitination by bringing together the substrate with an E2 ubiquitin-conjugating enzyme, which transfers ubiquitin to the substrate. For misfolded proteins, substrate recognition is generally delegated to molecular chaperones that subsequently interact with specific E3 ligases. An important exception is San1, a yeast E3 ligase. San1 harbors extensive regions of intrinsic disorder, which provide both conformational flexibility and sites for direct recognition of misfolded targets of vastly different conformations. So far, no mammalian ortholog of San1 is known, nor is it clear whether other E3 ligases utilize disordered regions for substrate recognition. Here, we conduct a bioinformatics analysis to examine >600 human and S. cerevisiae E3 ligases to identify enzymes that are similar to San1 in terms of function and/or mechanism of substrate recognition. An initial sequence-based database search was found to detect candidates primarily based on the homology of their ordered regions, and did not capture the unique disorder patterns that encode the functional mechanism of San1. However, by searching specifically for key features of the San1 sequence, such as long regions of intrinsic disorder embedded with short stretches predicted to be suitable for substrate interaction, we identified several E3 ligases with these characteristics. Our initial analysis revealed that another remarkable trait of San1 is shared with several candidate E3 ligases: long stretches of complete lysine suppression, which in San1 limits auto-ubiquitination. We encode these characteristic features into a San1 similarity-score, and present a set of proteins that are plausible candidates as San1 counterparts in humans. In conclusion, our work indicates that San1 is not a unique case, and that several other yeast and human E3 ligases have sequence properties that may allow them to recognize substrates by a similar mechanism as San1.

SELECTION OF CITATIONS
SEARCH DETAIL
...