Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 28
Filter
Add more filters










Publication year range
1.
Neuron ; 112(7): 1117-1132.e9, 2024 Apr 03.
Article in English | MEDLINE | ID: mdl-38266647

ABSTRACT

Mitochondria account for essential cellular pathways, from ATP production to nucleotide metabolism, and their deficits lead to neurological disorders and contribute to the onset of age-related diseases. Direct neuronal reprogramming aims at replacing neurons lost in such conditions, but very little is known about the impact of mitochondrial dysfunction on the direct reprogramming of human cells. Here, we explore the effects of mitochondrial dysfunction on the neuronal reprogramming of induced pluripotent stem cell (iPSC)-derived astrocytes carrying mutations in the NDUFS4 gene, important for Complex I and associated with Leigh syndrome. This led to the identification of the unfolded protein response as a major hurdle in the direct neuronal conversion of not only astrocytes and fibroblasts from patients but also control human astrocytes and fibroblasts. Its transient inhibition potently improves reprogramming by influencing the mitochondria-endoplasmic-reticulum-stress-mediated pathways. Taken together, disease modeling using patient cells unraveled novel general hurdles and ways to overcome these in human astrocyte-to-neuron reprogramming.


Subject(s)
Induced Pluripotent Stem Cells , Mitochondrial Diseases , Humans , Neurons/physiology , Mitochondria/metabolism , Induced Pluripotent Stem Cells/metabolism , Unfolded Protein Response , Astrocytes/metabolism , Mitochondrial Diseases/metabolism , Cellular Reprogramming , Electron Transport Complex I/genetics , Electron Transport Complex I/metabolism
2.
Life Sci Alliance ; 6(7)2023 07.
Article in English | MEDLINE | ID: mdl-37116939

ABSTRACT

H4 lysine 20 dimethylation (H4K20me2) is the most abundant histone modification in vertebrate chromatin. It arises from sequential methylation of unmodified histone H4 proteins by the mono-methylating enzyme PR-SET7/KMT5A, followed by conversion to the dimethylated state by SUV4-20H (KMT5B/C) enzymes. We have blocked the deposition of this mark by depleting Xenopus embryos of SUV4-20H1/H2 methyltransferases. In the larval epidermis, this results in a severe loss of cilia in multiciliated cells (MCC), a key component of mucociliary epithelia. MCC precursor cells are correctly specified, amplify centrioles, but ultimately fail in ciliogenesis because of the perturbation of cytoplasmic processes. Genome-wide transcriptome profiling reveals that SUV4-20H1/H2-depleted ectodermal explants preferentially down-regulate the expression of several hundred ciliogenic genes. Further analysis demonstrated that knockdown of SUV4-20H1 alone is sufficient to generate the MCC phenotype and that its catalytic activity is needed for axoneme formation. Overexpression of the H4K20me1-specific histone demethylase PHF8/KDM7B also rescues the ciliogenic defect in a significant manner. Taken together, this indicates that the conversion of H4K20me1 to H4K20me2 by SUV4-20H1 is critical for the formation of cilia tufts.


Subject(s)
Chromatin , Histones , Animals , Cell Differentiation/genetics , Histone Methyltransferases/genetics , Histone Methyltransferases/metabolism , Histones/metabolism , Xenopus laevis/genetics
3.
Proteomics ; 23(9): e2200179, 2023 05.
Article in English | MEDLINE | ID: mdl-36571325

ABSTRACT

Data-independent acquisition (DIA) of tandem mass spectrometry spectra has emerged as a promising technology to improve coverage and quantification of proteins in complex mixtures. The success of DIA experiments is dependent on the quality of spectral libraries used for data base searching. Frequently, these libraries need to be generated by labor and time intensive data dependent acquisition (DDA) experiments. Recently, several algorithms have been published that allow the generation of theoretical libraries by an efficient prediction of retention time and intensity of the fragment ions. Sequential windowed acquisition of all theoretical fragment ion spectra mass spectrometry (SWATH-MS) is a DIA method that can be applied at an unprecedented speed, but the fragmentation spectra suffer from a lower quality than data acquired on Orbitrap instruments. To reliably generate theoretical libraries that can be used in SWATH experiments, we developed deep-learning for SWATH analysis (dpSWATH), to improve the sensitivity and specificity of data generated by Q-TOF mass spectrometers. The theoretical library built by dpSWATH allowed us to increase the identification rate of proteins compared to traditional or library-free methods. Based on our analysis we conclude that dpSWATH is a superior prediction framework for SWATH-MS measurements than other algorithms based on Orbitrap data.


Subject(s)
Deep Learning , Tandem Mass Spectrometry/methods , Proteins , Algorithms , Databases, Factual
4.
ACS Omega ; 7(50): 46131-46145, 2022 Dec 20.
Article in English | MEDLINE | ID: mdl-36570227

ABSTRACT

Uncharacterized proteins have been underutilized as targets for the development of novel therapeutics for difficult-to-treat bacterial infections. To facilitate the exploration of these proteins, 2819 predicted, uncharacterized proteins (19.1% of the total) from reference strains of multidrug Acinetobacter baumannii, Klebsiella pneumoniae, and Pseudomonas aeruginosa species were organized using an unsupervised k-means machine learning algorithm. Classification using normalized values for protein length, pI, hydrophobicity, degree of conservation, structural disorder, and %AT of the coding gene rendered six natural clusters. Cluster proteins showed different trends regarding operon membership, expression, presence of unknown function domains, and interactomic relevance. Clusters 2, 4, and 5 were enriched with highly disordered proteins, nonworkable membrane proteins, and likely spurious proteins, respectively. Clusters 1, 3, and 6 showed closer distances to known antigens, antibiotic targets, and virulence factors. Up to 21.8% of proteins in these clusters were structurally covered by modeling, which allowed assessment of druggability and discontinuous B-cell epitopes. Five proteins (4 in Cluster 1) were potential druggable targets for antibiotherapy. Eighteen proteins (11 in Cluster 6) were strong B-cell and T-cell immunogen candidates for vaccine development. Conclusively, we provide a feature-based schema to fractionate the functional dark proteome of critical pathogens for fundamental and biomedical purposes.

5.
Science ; 376(6599): eabf9088, 2022 06 17.
Article in English | MEDLINE | ID: mdl-35709258

ABSTRACT

The centrosome provides an intracellular anchor for the cytoskeleton, regulating cell division, cell migration, and cilia formation. We used spatial proteomics to elucidate protein interaction networks at the centrosome of human induced pluripotent stem cell-derived neural stem cells (NSCs) and neurons. Centrosome-associated proteins were largely cell type-specific, with protein hubs involved in RNA dynamics. Analysis of neurodevelopmental disease cohorts identified a significant overrepresentation of NSC centrosome proteins with variants in patients with periventricular heterotopia (PH). Expressing the PH-associated mutant pre-mRNA-processing factor 6 (PRPF6) reproduced the periventricular misplacement in the developing mouse brain, highlighting missplicing of transcripts of a microtubule-associated kinase with centrosomal location as essential for the phenotype. Collectively, cell type-specific centrosome interactomes explain how genetic variants in ubiquitous proteins may convey brain-specific phenotypes.


Subject(s)
Centrosome , Neural Stem Cells , Neurogenesis , Neurons , Periventricular Nodular Heterotopia , Protein Interaction Maps , Alternative Splicing , Animals , Brain/abnormalities , Centrosome/metabolism , Humans , Induced Pluripotent Stem Cells , Mice , Microtubules/metabolism , Neurons/metabolism , Periventricular Nodular Heterotopia/metabolism , Proteome/metabolism , RNA Splicing Factors/metabolism , Transcription Factors/metabolism
6.
EMBO J ; 40(21): e107532, 2021 11 02.
Article in English | MEDLINE | ID: mdl-34549820

ABSTRACT

Astrocytes regulate brain-wide functions and also show region-specific differences, but little is known about how general and region-specific functions are aligned at the single-cell level. To explore this, we isolated adult mouse diencephalic astrocytes by ACSA-2-mediated magnetic-activated cell sorting (MACS). Single-cell RNA-seq revealed 7 gene expression clusters of astrocytes, with 4 forming a supercluster. Within the supercluster, cells differed by gene expression related to ion homeostasis or metabolism, with the former sharing gene expression with other regions and the latter being restricted to specific regions. All clusters showed expression of proliferation-related genes, and proliferation of diencephalic astrocytes was confirmed by immunostaining. Clonal analysis demonstrated low level of astrogenesis in the adult diencephalon, but not in cerebral cortex grey matter. This led to the identification of Smad4 as a key regulator of diencephalic astrocyte in vivo proliferation and in vitro neurosphere formation. Thus, astrocytes show diverse gene expression states related to distinct functions with some subsets being more widespread while others are more regionally restricted. However, all share low-level proliferation revealing the novel concept of adult astrogenesis in the diencephalon.


Subject(s)
Astrocytes/metabolism , Cell Lineage/genetics , Diencephalon/metabolism , Gene Expression Regulation, Developmental , Neurogenesis/genetics , Smad4 Protein/genetics , Animals , Astrocytes/classification , Astrocytes/cytology , Cell Cycle/genetics , Cell Differentiation , Cell Proliferation , Cerebral Cortex/cytology , Cerebral Cortex/growth & development , Cerebral Cortex/metabolism , Diencephalon/cytology , Diencephalon/growth & development , Gene Ontology , Gene Regulatory Networks , Gray Matter/cytology , Gray Matter/growth & development , Gray Matter/metabolism , Metabolic Networks and Pathways , Mice , Mice, Inbred C57BL , Mice, Transgenic , Molecular Sequence Annotation , Multigene Family , Signal Transduction , Smad4 Protein/metabolism
7.
J Proteome Res ; 20(7): 3749-3757, 2021 07 02.
Article in English | MEDLINE | ID: mdl-34137619

ABSTRACT

Trypsin is one of the most important and widely used proteolytic enzymes in mass spectrometry (MS)-based proteomic research. It exclusively cleaves peptide bonds at the C-terminus of lysine and arginine. However, the cleavage is also affected by several factors, including specific surrounding amino acids, resulting in frequent incomplete proteolysis and subsequent issues in peptide identification and quantification. The accurate annotations on missed cleavages are crucial to database searching in MS analysis. Here, we present deep-learning predicting missed cleavages (dpMC), a novel algorithm for the prediction of missed trypsin cleavage sites. This algorithm provides a very high accuracy for predicting missed cleavages with area under the curves (AUCs) of cross-validation and holdout testing above 0.99, along with the mean F1 score and the Matthews correlation coefficient (MCC) of 0.9677 and 0.9349, respectively. We tested our algorithm on data sets from different species and different experimental conditions, and its performance outperforms other currently available prediction methods. In addition, the method also provides a better insight into the detailed rules of trypsin cleavages coupled with propensity and motif analysis. Moreover, our method can be integrated into database searching in the MS analysis to identify and quantify mass spectra effectively and efficiently.


Subject(s)
Deep Learning , Proteomics , Mass Spectrometry , Peptides , Trypsin
8.
Nucleic Acids Res ; 47(17): 9069-9086, 2019 09 26.
Article in English | MEDLINE | ID: mdl-31350899

ABSTRACT

Pioneer transcription factors (PTF) can recognize their binding sites on nucleosomal DNA and trigger chromatin opening for recruitment of other non-pioneer transcription factors. However, critical properties of PTFs are still poorly understood, such as how these transcription factors selectively recognize cell type-specific binding sites and under which conditions they can initiate chromatin remodelling. Here we show that early endoderm binding sites of the paradigm PTF Foxa2 are epigenetically primed by low levels of active chromatin modifications in embryonic stem cells (ESC). Priming of these binding sites is supported by preferential recruitment of Foxa2 to endoderm binding sites compared to lineage-inappropriate binding sites, when ectopically expressed in ESCs. We further show that binding of Foxa2 is required for chromatin opening during endoderm differentiation. However, increased chromatin accessibility was only detected on binding sites which are synergistically bound with other endoderm transcription factors. Thus, our data suggest that binding site selection of PTFs is directed by the chromatin environment and that chromatin opening requires collaboration of PTFs with additional transcription factors.


Subject(s)
Chromatin/metabolism , Hepatocyte Nuclear Factor 3-beta/metabolism , Mouse Embryonic Stem Cells/metabolism , Animals , Binding Sites/genetics , Cell Differentiation/genetics , Chromatin Assembly and Disassembly/genetics , Endoderm/cytology , GATA4 Transcription Factor/genetics , GATA4 Transcription Factor/metabolism , Gene Expression Regulation, Developmental/genetics , Hepatocyte Nuclear Factor 3-beta/genetics , Histone Code , Histones/metabolism , Mice , Mice, Knockout , Models, Genetic , Mouse Embryonic Stem Cells/cytology , Signal Transduction
9.
Nucleic Acids Res ; 46(17): 8772-8787, 2018 09 28.
Article in English | MEDLINE | ID: mdl-30165493

ABSTRACT

With the availability of deep RNA sequencing, model organisms such as Xenopus offer an outstanding opportunity to investigate the genetic basis of vertebrate organ formation from its embryonic beginnings. Here we investigate dynamics of the RNA landscape during formation of the Xenopus tropicalis larval epidermis. Differentiation of non-neural ectoderm starts at gastrulation and takes about one day to produce a functional mucociliary epithelium, highly related to the one in human airways. To obtain RNA expression data, uncontaminated by non-epidermal tissues of the embryo, we use prospective ectodermal explants called Animal Caps (ACs), which differentiate autonomously into a ciliated epidermis. Their global transcriptome is investigated at three key timepoints, with a cumulative sequencing depth of ∼108 reads per developmental stage. This database is provided as online Web Tool to the scientific community. In this paper, we report on global changes in gene expression, an unanticipated diversity of mRNA splicing isoforms, expression patterns of repetitive DNA Elements, and the complexity of circular RNAs during this process. Computationally we derive transcription factor hubs from this data set, which may help in the future to define novel genetic drivers of epidermal differentiation in vertebrates.


Subject(s)
Amphibian Proteins/genetics , Epidermis/metabolism , Gene Expression Regulation, Developmental , RNA, Messenger/genetics , Transcriptome , Xenopus laevis/genetics , Alternative Splicing , Amphibian Proteins/metabolism , Animals , Cilia/genetics , Cilia/metabolism , Databases, Genetic , Ectoderm/growth & development , Ectoderm/metabolism , Embryo, Nonmammalian , Epidermis/growth & development , Gene Regulatory Networks , High-Throughput Nucleotide Sequencing , Larva/genetics , Larva/growth & development , Larva/metabolism , Morphogenesis/genetics , RNA/genetics , RNA/metabolism , RNA, Circular , RNA, Messenger/metabolism , Transcription Factors/genetics , Transcription Factors/metabolism , Xenopus laevis/growth & development , Xenopus laevis/metabolism
10.
RNA ; 24(9): 1195-1213, 2018 09.
Article in English | MEDLINE | ID: mdl-29914874

ABSTRACT

Long noncoding RNAs (lncRNAs), which are longer than 200 nucleotides but often unstable, contribute a substantial and diverse portion to pervasive noncoding transcriptomes. Most lncRNAs are poorly annotated and understood, although several play important roles in gene regulation and diseases. Here we systematically uncover and analyze lncRNAs in Schizosaccharomyces pombe. Based on RNA-seq data from twelve RNA-processing mutants and nine physiological conditions, we identify 5775 novel lncRNAs, nearly 4× the previously annotated lncRNAs. The expression of most lncRNAs becomes strongly induced under the genetic and physiological perturbations, most notably during late meiosis. Most lncRNAs are cryptic and suppressed by three RNA-processing pathways: the nuclear exosome, cytoplasmic exonuclease, and RNAi. Double-mutant analyses reveal substantial coordination and redundancy among these pathways. We classify lncRNAs by their dominant pathway into cryptic unstable transcripts (CUTs), Xrn1-sensitive unstable transcripts (XUTs), and Dicer-sensitive unstable transcripts (DUTs). XUTs and DUTs are enriched for antisense lncRNAs, while CUTs are often bidirectional and actively translated. The cytoplasmic exonuclease, along with RNAi, dampens the expression of thousands of lncRNAs and mRNAs that become induced during meiosis. Antisense lncRNA expression mostly negatively correlates with sense mRNA expression in the physiological, but not the genetic conditions. Intergenic and bidirectional lncRNAs emerge from nucleosome-depleted regions, upstream of positioned nucleosomes. Our results highlight both similarities and differences to lncRNA regulation in budding yeast. This broad survey of the lncRNA repertoire and characteristics in S. pombe, and the interwoven regulatory pathways that target lncRNAs, provides a rich framework for their further functional analyses.


Subject(s)
Exonucleases/metabolism , Exosomes/metabolism , RNA, Long Noncoding/genetics , Schizosaccharomyces/genetics , Sequence Analysis, RNA/methods , Cell Nucleus/metabolism , Cytoplasm/enzymology , Fungal Proteins/metabolism , Gene Expression Profiling/methods , Gene Expression Regulation, Fungal , Meiosis , Molecular Sequence Annotation , Mutation , RNA Interference , RNA Stability , RNA, Fungal/genetics , RNA, Long Noncoding/chemistry , Schizosaccharomyces/chemistry , Schizosaccharomyces/enzymology
11.
PLoS One ; 12(2): e0171798, 2017.
Article in English | MEDLINE | ID: mdl-28207793

ABSTRACT

Hybrid incompatibility between Drosophila melanogaster and D. simulans is caused by a lethal interaction of the proteins encoded by the Hmr and Lhr genes. In D. melanogaster the loss of HMR results in mitotic defects, an increase in transcription of transposable elements and a deregulation of heterochromatic genes. To better understand the molecular mechanisms that mediate HMR's function, we measured genome-wide localization of HMR in D. melanogaster tissue culture cells by chromatin immunoprecipitation. Interestingly, we find HMR localizing to genomic insulator sites that can be classified into two groups. One group belongs to gypsy insulators and another one borders HP1a bound regions at active genes. The transcription of the latter group genes is strongly affected in larvae and ovaries of Hmr mutant flies. Our data suggest a novel link between HMR and insulator proteins, a finding that implicates a potential role for genome organization in the formation of species.


Subject(s)
Drosophila Proteins/physiology , Drosophila/genetics , Genetic Speciation , Genome, Insect , Animals , Biodiversity , Drosophila Proteins/genetics , Drosophila Proteins/metabolism , Hybridization, Genetic
12.
Nature ; 537(7619): 244-248, 2016 09 08.
Article in English | MEDLINE | ID: mdl-27580037

ABSTRACT

The rules defining which small fraction of related DNA sequences can be selectively bound by a transcription factor are poorly understood. One of the most challenging tasks in DNA recognition is posed by dosage compensation systems that require the distinction between sex chromosomes and autosomes. In Drosophila melanogaster, the male-specific lethal dosage compensation complex (MSL-DCC) doubles the level of transcription from the single male X chromosome, but the nature of this selectivity is not known. Previous efforts to identify X-chromosome-specific target sequences were unsuccessful as the identified MSL recognition elements lacked discriminative power. Therefore, additional determinants such as co-factors, chromatin features, RNA and chromosome conformation have been proposed to refine targeting further. Here, using an in vitro genome-wide DNA binding assay, we show that recognition of the X chromosome is an intrinsic feature of the MSL-DCC. MSL2, the male-specific organizer of the complex, uses two distinct DNA interaction surfaces-the CXC and proline/basic-residue-rich domains-to identify complex DNA elements on the X chromosome. Specificity is provided by the CXC domain, which binds a novel motif defined by DNA sequence and shape. This motif characterizes a subclass of MSL2-binding sites, which we name PionX (pioneering sites on the X) as they appeared early during the recent evolution of an X chromosome in D. miranda and are the first chromosomal sites to be bound during de novo MSL-DCC assembly. Our data provide the first, to our knowledge, documented molecular mechanism through which the dosage compensation machinery distinguishes the X chromosome from an autosome. They highlight fundamental principles in the recognition of complex DNA elements by protein that will have a strong impact on many aspects of chromosome biology.


Subject(s)
Dosage Compensation, Genetic/genetics , Drosophila melanogaster/genetics , Multiprotein Complexes/metabolism , Regulatory Sequences, Nucleic Acid/genetics , X Chromosome/genetics , Amino Acid Motifs , Animals , Base Sequence , Binding Sites , DNA-Binding Proteins/metabolism , Drosophila Proteins/metabolism , Evolution, Molecular , Female , Genome, Insect/genetics , Male , Multiprotein Complexes/chemistry , Nuclear Proteins/metabolism , Nucleic Acid Conformation , Nucleotide Motifs , Protein Domains , Protein Subunits/chemistry , Protein Subunits/metabolism , Substrate Specificity , Transcription Factors/metabolism , X Chromosome/metabolism
13.
Methods Mol Biol ; 1415: 341-70, 2016.
Article in English | MEDLINE | ID: mdl-27115641

ABSTRACT

Obtaining diffracting quality crystals remains a major challenge in protein structure research. We summarize and compare methods for selecting the best protein targets for crystallization, construct optimization and crystallization condition design. Target selection methods are divided into algorithms predicting the chance of successful progression through all stages of structural determination (from cloning to solving the structure) and those focusing only on the crystallization step. We tried to highlight pros and cons of different approaches examining the following aspects: data size, redundancy and representativeness, overfitting during model construction, and results evaluation. In summary, although in recent years progress was made and several sequence properties were reported to be relevant for crystallization, the successful prediction of protein crystallization behavior and selection of corresponding crystallization conditions continue to challenge structural researchers.


Subject(s)
Genomics/methods , Proteins/chemistry , Algorithms , Crystallization , Crystallography, X-Ray , Databases, Protein , Magnetic Resonance Spectroscopy , Microscopy, Electron , Proteomics
14.
EMBO J ; 35(1): 24-45, 2016 Jan 04.
Article in English | MEDLINE | ID: mdl-26516211

ABSTRACT

Cell fate specification relies on the action of critical transcription factors that become available at distinct stages of embryonic development. One such factor is NeuroD1, which is essential for eliciting the neuronal development program and possesses the ability to reprogram other cell types into neurons. Given this capacity, it is important to understand its targets and the mechanism underlying neuronal specification. Here, we show that NeuroD1 directly binds regulatory elements of neuronal genes that are developmentally silenced by epigenetic mechanisms. This targeting is sufficient to initiate events that confer transcriptional competence, including reprogramming of transcription factor landscape, conversion of heterochromatin to euchromatin, and increased chromatin accessibility, indicating potential pioneer factor ability of NeuroD1. The transcriptional induction of neuronal fate genes is maintained via epigenetic memory despite a transient NeuroD1 induction during neurogenesis. NeuroD1 also induces genes involved in the epithelial-to-mesenchymal transition, thereby promoting neuronal migration. Our study not only reveals the NeuroD1-dependent gene regulatory program driving neurogenesis but also increases our understanding of how cell fate specification during development involves a concerted action of transcription factors and epigenetic mechanisms.


Subject(s)
Basic Helix-Loop-Helix Transcription Factors/metabolism , Cell Differentiation , Chromatin/metabolism , Gene Expression Regulation, Developmental , Neurons/physiology , Transcription Factors/metabolism , Animals , Cell Line , Epigenesis, Genetic , Gene Regulatory Networks , Mice
15.
Nucleic Acids Res ; 42(Database issue): D396-400, 2014 Jan.
Article in English | MEDLINE | ID: mdl-24214996

ABSTRACT

Knowledge about non-interacting proteins (NIPs) is important for training the algorithms to predict protein-protein interactions (PPIs) and for assessing the false positive rates of PPI detection efforts. We present the second version of Negatome, a database of proteins and protein domains that are unlikely to engage in physical interactions (available online at http://mips.helmholtz-muenchen.de/proj/ppi/negatome). Negatome is derived by manual curation of literature and by analyzing three-dimensional structures of protein complexes. The main methodological innovation in Negatome 2.0 is the utilization of an advanced text mining procedure to guide the manual annotation process. Potential non-interactions were identified by a modified version of Excerbt, a text mining tool based on semantic sentence analysis. Manual verification shows that nearly a half of the text mining results with the highest confidence values correspond to NIP pairs. Compared to the first version the contents of the database have grown by over 300%.


Subject(s)
Databases, Protein , Protein Interaction Domains and Motifs , Protein Interaction Mapping , Data Mining , Internet , Molecular Sequence Annotation , Protein Conformation
16.
FEBS J ; 279(12): 2192-200, 2012 Jun.
Article in English | MEDLINE | ID: mdl-22536855

ABSTRACT

Many fields of science and industry depend on efficient production of active protein using heterologous expression in Escherichia coli. The solubility of proteins upon expression is dependent on their amino acid sequence. Prediction of solubility from sequence is therefore highly valuable. We present a novel machine-learning-based model called PROSO II which makes use of new classification methods and growth in experimental data to improve coverage and accuracy of solubility predictions. The classification algorithm is organized as a two-layered structure in which the output of a primary Parzen window model for sequence similarity and a logistic regression classifier of amino acid k-mer composition serve as input for a second-level logistic regression classifier. Compared with previously published research our model is trained on five times more data than used by any other method before (82 000 proteins). When tested on a separate holdout set not used at any point of method development our server attained the best results in comparison with other currently available methods: accuracy 75.4%, Matthew's correlation coefficient 0.39, sensitivity 0.731, specificity 0.759, gain (soluble) 2.263. In summary, due to utilization of cutting edge machine learning technologies combined with the largest currently available experimental data set the PROSO II server constitutes a substantial improvement in protein solubility predictions. PROSO II is available at http://mips.helmholtz-muenchen.de/prosoII.


Subject(s)
Artificial Intelligence , Proteins/chemistry , Proteins/classification , Solubility
17.
Methods Mol Biol ; 609: 385-400, 2010.
Article in English | MEDLINE | ID: mdl-20221931

ABSTRACT

Obtaining well-diffracting crystals remains a major challenge in protein structure research. In this chapter, we review currently available computational methods to estimate the crystallization potential of a protein, to optimize amino acid sequences toward improved crystallization likelihood, and to design optimal crystal screen conditions.


Subject(s)
Computational Biology , Data Mining , Databases, Protein , Proteins/chemistry , Algorithms , Amino Acid Sequence , Animals , Crystallization , Humans , Protein Conformation , Proteins/genetics , Sequence Analysis, Protein
18.
Nucleic Acids Res ; 38(Database issue): D540-4, 2010 Jan.
Article in English | MEDLINE | ID: mdl-19920129

ABSTRACT

The Negatome is a collection of protein and domain pairs that are unlikely to be engaged in direct physical interactions. The database currently contains experimentally supported non-interacting protein pairs derived from two distinct sources: by manual curation of literature and by analyzing protein complexes with known 3D structure. More stringent lists of non-interacting pairs were derived from these two datasets by excluding interactions detected by high-throughput approaches. Additionally, non-interacting protein domains have been derived from the stringent manual and structural data, respectively. The Negatome is much less biased toward functionally dissimilar proteins than the negative data derived by randomly selecting proteins from different cellular locations. It can be used to evaluate protein and domain interactions from new experiments and improve the training of interaction prediction algorithms. The Negatome database is available at http://mips.helmholtz-muenchen.de/proj/ppi/negatome.


Subject(s)
Computational Biology/methods , Databases, Genetic , Databases, Nucleic Acid , Protein Interaction Mapping , Proteins/chemistry , Algorithms , Animals , Computational Biology/trends , Databases, Protein , Genome, Fungal , Humans , Information Storage and Retrieval/methods , Internet , Protein Structure, Tertiary , Saccharomyces cerevisiae/metabolism , Software
20.
BMC Genomics ; 9: 629, 2008 Dec 23.
Article in English | MEDLINE | ID: mdl-19108706

ABSTRACT

BACKGROUND: We have recently released a comprehensive, manually curated database of mammalian protein complexes called CORUM. Combining CORUM with other resources, we assembled a dataset of over 2700 mammalian complexes. The availability of a rich information resource allows us to search for organizational properties concerning these complexes. RESULTS: As the complexity of a protein complex in terms of the number of unique subunits increases, we observed that the number of such complexes and the mean non-synonymous to synonymous substitution ratio of associated genes tend to decrease. Similarly, as the number of different complexes a given protein participates in increases, the number of such proteins and the substitution ratio of the associated gene also tends to decrease. These observations provide evidence relating natural selection and the organization of mammalian complexes. We also observed greater homogeneity in terms of predicted protein isoelectric points, secondary structure and substitution ratio in annotated versus randomly generated complexes. A large proportion of the protein content and interactions in the complexes could be predicted from known binary protein-protein and domain-domain interactions. In particular, we found that large proteins interact preferentially with much smaller proteins. CONCLUSION: We observed similar trends in yeast and other data. Our results support the existence of conserved relations associated with the mammalian protein complexes.


Subject(s)
Databases, Protein , Evolution, Molecular , Multiprotein Complexes/analysis , Protein Interaction Mapping , Animals , Computational Biology/methods , Linear Models , Mammals , Models, Molecular , Protein Structure, Secondary , Proteomics/methods , Sequence Analysis, Protein
SELECTION OF CITATIONS
SEARCH DETAIL
...