Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Neuron ; 112(7): 1117-1132.e9, 2024 Apr 03.
Artigo em Inglês | MEDLINE | ID: mdl-38266647

RESUMO

Mitochondria account for essential cellular pathways, from ATP production to nucleotide metabolism, and their deficits lead to neurological disorders and contribute to the onset of age-related diseases. Direct neuronal reprogramming aims at replacing neurons lost in such conditions, but very little is known about the impact of mitochondrial dysfunction on the direct reprogramming of human cells. Here, we explore the effects of mitochondrial dysfunction on the neuronal reprogramming of induced pluripotent stem cell (iPSC)-derived astrocytes carrying mutations in the NDUFS4 gene, important for Complex I and associated with Leigh syndrome. This led to the identification of the unfolded protein response as a major hurdle in the direct neuronal conversion of not only astrocytes and fibroblasts from patients but also control human astrocytes and fibroblasts. Its transient inhibition potently improves reprogramming by influencing the mitochondria-endoplasmic-reticulum-stress-mediated pathways. Taken together, disease modeling using patient cells unraveled novel general hurdles and ways to overcome these in human astrocyte-to-neuron reprogramming.


Assuntos
Células-Tronco Pluripotentes Induzidas , Doenças Mitocondriais , Humanos , Neurônios/fisiologia , Mitocôndrias/metabolismo , Células-Tronco Pluripotentes Induzidas/metabolismo , Resposta a Proteínas não Dobradas , Astrócitos/metabolismo , Doenças Mitocondriais/metabolismo , Reprogramação Celular , Complexo I de Transporte de Elétrons/genética , Complexo I de Transporte de Elétrons/metabolismo
2.
Life Sci Alliance ; 6(7)2023 07.
Artigo em Inglês | MEDLINE | ID: mdl-37116939

RESUMO

H4 lysine 20 dimethylation (H4K20me2) is the most abundant histone modification in vertebrate chromatin. It arises from sequential methylation of unmodified histone H4 proteins by the mono-methylating enzyme PR-SET7/KMT5A, followed by conversion to the dimethylated state by SUV4-20H (KMT5B/C) enzymes. We have blocked the deposition of this mark by depleting Xenopus embryos of SUV4-20H1/H2 methyltransferases. In the larval epidermis, this results in a severe loss of cilia in multiciliated cells (MCC), a key component of mucociliary epithelia. MCC precursor cells are correctly specified, amplify centrioles, but ultimately fail in ciliogenesis because of the perturbation of cytoplasmic processes. Genome-wide transcriptome profiling reveals that SUV4-20H1/H2-depleted ectodermal explants preferentially down-regulate the expression of several hundred ciliogenic genes. Further analysis demonstrated that knockdown of SUV4-20H1 alone is sufficient to generate the MCC phenotype and that its catalytic activity is needed for axoneme formation. Overexpression of the H4K20me1-specific histone demethylase PHF8/KDM7B also rescues the ciliogenic defect in a significant manner. Taken together, this indicates that the conversion of H4K20me1 to H4K20me2 by SUV4-20H1 is critical for the formation of cilia tufts.


Assuntos
Cromatina , Histonas , Animais , Diferenciação Celular/genética , Histona Metiltransferases/genética , Histona Metiltransferases/metabolismo , Histonas/metabolismo , Xenopus laevis/genética
3.
Proteomics ; 23(9): e2200179, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-36571325

RESUMO

Data-independent acquisition (DIA) of tandem mass spectrometry spectra has emerged as a promising technology to improve coverage and quantification of proteins in complex mixtures. The success of DIA experiments is dependent on the quality of spectral libraries used for data base searching. Frequently, these libraries need to be generated by labor and time intensive data dependent acquisition (DDA) experiments. Recently, several algorithms have been published that allow the generation of theoretical libraries by an efficient prediction of retention time and intensity of the fragment ions. Sequential windowed acquisition of all theoretical fragment ion spectra mass spectrometry (SWATH-MS) is a DIA method that can be applied at an unprecedented speed, but the fragmentation spectra suffer from a lower quality than data acquired on Orbitrap instruments. To reliably generate theoretical libraries that can be used in SWATH experiments, we developed deep-learning for SWATH analysis (dpSWATH), to improve the sensitivity and specificity of data generated by Q-TOF mass spectrometers. The theoretical library built by dpSWATH allowed us to increase the identification rate of proteins compared to traditional or library-free methods. Based on our analysis we conclude that dpSWATH is a superior prediction framework for SWATH-MS measurements than other algorithms based on Orbitrap data.


Assuntos
Aprendizado Profundo , Espectrometria de Massas em Tandem/métodos , Proteínas , Algoritmos , Bases de Dados Factuais
4.
ACS Omega ; 7(50): 46131-46145, 2022 Dec 20.
Artigo em Inglês | MEDLINE | ID: mdl-36570227

RESUMO

Uncharacterized proteins have been underutilized as targets for the development of novel therapeutics for difficult-to-treat bacterial infections. To facilitate the exploration of these proteins, 2819 predicted, uncharacterized proteins (19.1% of the total) from reference strains of multidrug Acinetobacter baumannii, Klebsiella pneumoniae, and Pseudomonas aeruginosa species were organized using an unsupervised k-means machine learning algorithm. Classification using normalized values for protein length, pI, hydrophobicity, degree of conservation, structural disorder, and %AT of the coding gene rendered six natural clusters. Cluster proteins showed different trends regarding operon membership, expression, presence of unknown function domains, and interactomic relevance. Clusters 2, 4, and 5 were enriched with highly disordered proteins, nonworkable membrane proteins, and likely spurious proteins, respectively. Clusters 1, 3, and 6 showed closer distances to known antigens, antibiotic targets, and virulence factors. Up to 21.8% of proteins in these clusters were structurally covered by modeling, which allowed assessment of druggability and discontinuous B-cell epitopes. Five proteins (4 in Cluster 1) were potential druggable targets for antibiotherapy. Eighteen proteins (11 in Cluster 6) were strong B-cell and T-cell immunogen candidates for vaccine development. Conclusively, we provide a feature-based schema to fractionate the functional dark proteome of critical pathogens for fundamental and biomedical purposes.

5.
Science ; 376(6599): eabf9088, 2022 06 17.
Artigo em Inglês | MEDLINE | ID: mdl-35709258

RESUMO

The centrosome provides an intracellular anchor for the cytoskeleton, regulating cell division, cell migration, and cilia formation. We used spatial proteomics to elucidate protein interaction networks at the centrosome of human induced pluripotent stem cell-derived neural stem cells (NSCs) and neurons. Centrosome-associated proteins were largely cell type-specific, with protein hubs involved in RNA dynamics. Analysis of neurodevelopmental disease cohorts identified a significant overrepresentation of NSC centrosome proteins with variants in patients with periventricular heterotopia (PH). Expressing the PH-associated mutant pre-mRNA-processing factor 6 (PRPF6) reproduced the periventricular misplacement in the developing mouse brain, highlighting missplicing of transcripts of a microtubule-associated kinase with centrosomal location as essential for the phenotype. Collectively, cell type-specific centrosome interactomes explain how genetic variants in ubiquitous proteins may convey brain-specific phenotypes.


Assuntos
Centrossomo , Células-Tronco Neurais , Neurogênese , Neurônios , Heterotopia Nodular Periventricular , Mapas de Interação de Proteínas , Processamento Alternativo , Animais , Encéfalo/anormalidades , Centrossomo/metabolismo , Humanos , Células-Tronco Pluripotentes Induzidas , Camundongos , Microtúbulos/metabolismo , Neurônios/metabolismo , Heterotopia Nodular Periventricular/metabolismo , Proteoma/metabolismo , Fatores de Processamento de RNA/metabolismo , Fatores de Transcrição/metabolismo
6.
EMBO J ; 40(21): e107532, 2021 11 02.
Artigo em Inglês | MEDLINE | ID: mdl-34549820

RESUMO

Astrocytes regulate brain-wide functions and also show region-specific differences, but little is known about how general and region-specific functions are aligned at the single-cell level. To explore this, we isolated adult mouse diencephalic astrocytes by ACSA-2-mediated magnetic-activated cell sorting (MACS). Single-cell RNA-seq revealed 7 gene expression clusters of astrocytes, with 4 forming a supercluster. Within the supercluster, cells differed by gene expression related to ion homeostasis or metabolism, with the former sharing gene expression with other regions and the latter being restricted to specific regions. All clusters showed expression of proliferation-related genes, and proliferation of diencephalic astrocytes was confirmed by immunostaining. Clonal analysis demonstrated low level of astrogenesis in the adult diencephalon, but not in cerebral cortex grey matter. This led to the identification of Smad4 as a key regulator of diencephalic astrocyte in vivo proliferation and in vitro neurosphere formation. Thus, astrocytes show diverse gene expression states related to distinct functions with some subsets being more widespread while others are more regionally restricted. However, all share low-level proliferation revealing the novel concept of adult astrogenesis in the diencephalon.


Assuntos
Astrócitos/metabolismo , Linhagem da Célula/genética , Diencéfalo/metabolismo , Regulação da Expressão Gênica no Desenvolvimento , Neurogênese/genética , Proteína Smad4/genética , Animais , Astrócitos/classificação , Astrócitos/citologia , Ciclo Celular/genética , Diferenciação Celular , Proliferação de Células , Córtex Cerebral/citologia , Córtex Cerebral/crescimento & desenvolvimento , Córtex Cerebral/metabolismo , Diencéfalo/citologia , Diencéfalo/crescimento & desenvolvimento , Ontologia Genética , Redes Reguladoras de Genes , Substância Cinzenta/citologia , Substância Cinzenta/crescimento & desenvolvimento , Substância Cinzenta/metabolismo , Redes e Vias Metabólicas , Camundongos , Camundongos Endogâmicos C57BL , Camundongos Transgênicos , Anotação de Sequência Molecular , Família Multigênica , Transdução de Sinais , Proteína Smad4/metabolismo
7.
J Proteome Res ; 20(7): 3749-3757, 2021 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-34137619

RESUMO

Trypsin is one of the most important and widely used proteolytic enzymes in mass spectrometry (MS)-based proteomic research. It exclusively cleaves peptide bonds at the C-terminus of lysine and arginine. However, the cleavage is also affected by several factors, including specific surrounding amino acids, resulting in frequent incomplete proteolysis and subsequent issues in peptide identification and quantification. The accurate annotations on missed cleavages are crucial to database searching in MS analysis. Here, we present deep-learning predicting missed cleavages (dpMC), a novel algorithm for the prediction of missed trypsin cleavage sites. This algorithm provides a very high accuracy for predicting missed cleavages with area under the curves (AUCs) of cross-validation and holdout testing above 0.99, along with the mean F1 score and the Matthews correlation coefficient (MCC) of 0.9677 and 0.9349, respectively. We tested our algorithm on data sets from different species and different experimental conditions, and its performance outperforms other currently available prediction methods. In addition, the method also provides a better insight into the detailed rules of trypsin cleavages coupled with propensity and motif analysis. Moreover, our method can be integrated into database searching in the MS analysis to identify and quantify mass spectra effectively and efficiently.


Assuntos
Aprendizado Profundo , Proteômica , Espectrometria de Massas , Peptídeos , Tripsina
8.
Nucleic Acids Res ; 47(17): 9069-9086, 2019 09 26.
Artigo em Inglês | MEDLINE | ID: mdl-31350899

RESUMO

Pioneer transcription factors (PTF) can recognize their binding sites on nucleosomal DNA and trigger chromatin opening for recruitment of other non-pioneer transcription factors. However, critical properties of PTFs are still poorly understood, such as how these transcription factors selectively recognize cell type-specific binding sites and under which conditions they can initiate chromatin remodelling. Here we show that early endoderm binding sites of the paradigm PTF Foxa2 are epigenetically primed by low levels of active chromatin modifications in embryonic stem cells (ESC). Priming of these binding sites is supported by preferential recruitment of Foxa2 to endoderm binding sites compared to lineage-inappropriate binding sites, when ectopically expressed in ESCs. We further show that binding of Foxa2 is required for chromatin opening during endoderm differentiation. However, increased chromatin accessibility was only detected on binding sites which are synergistically bound with other endoderm transcription factors. Thus, our data suggest that binding site selection of PTFs is directed by the chromatin environment and that chromatin opening requires collaboration of PTFs with additional transcription factors.


Assuntos
Cromatina/metabolismo , Fator 3-beta Nuclear de Hepatócito/metabolismo , Células-Tronco Embrionárias Murinas/metabolismo , Animais , Sítios de Ligação/genética , Diferenciação Celular/genética , Montagem e Desmontagem da Cromatina/genética , Endoderma/citologia , Fator de Transcrição GATA4/genética , Fator de Transcrição GATA4/metabolismo , Regulação da Expressão Gênica no Desenvolvimento/genética , Fator 3-beta Nuclear de Hepatócito/genética , Código das Histonas , Histonas/metabolismo , Camundongos , Camundongos Knockout , Modelos Genéticos , Células-Tronco Embrionárias Murinas/citologia , Transdução de Sinais
9.
Nucleic Acids Res ; 46(17): 8772-8787, 2018 09 28.
Artigo em Inglês | MEDLINE | ID: mdl-30165493

RESUMO

With the availability of deep RNA sequencing, model organisms such as Xenopus offer an outstanding opportunity to investigate the genetic basis of vertebrate organ formation from its embryonic beginnings. Here we investigate dynamics of the RNA landscape during formation of the Xenopus tropicalis larval epidermis. Differentiation of non-neural ectoderm starts at gastrulation and takes about one day to produce a functional mucociliary epithelium, highly related to the one in human airways. To obtain RNA expression data, uncontaminated by non-epidermal tissues of the embryo, we use prospective ectodermal explants called Animal Caps (ACs), which differentiate autonomously into a ciliated epidermis. Their global transcriptome is investigated at three key timepoints, with a cumulative sequencing depth of ∼108 reads per developmental stage. This database is provided as online Web Tool to the scientific community. In this paper, we report on global changes in gene expression, an unanticipated diversity of mRNA splicing isoforms, expression patterns of repetitive DNA Elements, and the complexity of circular RNAs during this process. Computationally we derive transcription factor hubs from this data set, which may help in the future to define novel genetic drivers of epidermal differentiation in vertebrates.


Assuntos
Proteínas de Anfíbios/genética , Epiderme/metabolismo , Regulação da Expressão Gênica no Desenvolvimento , RNA Mensageiro/genética , Transcriptoma , Xenopus laevis/genética , Processamento Alternativo , Proteínas de Anfíbios/metabolismo , Animais , Cílios/genética , Cílios/metabolismo , Bases de Dados Genéticas , Ectoderma/crescimento & desenvolvimento , Ectoderma/metabolismo , Embrião não Mamífero , Epiderme/crescimento & desenvolvimento , Redes Reguladoras de Genes , Sequenciamento de Nucleotídeos em Larga Escala , Larva/genética , Larva/crescimento & desenvolvimento , Larva/metabolismo , Morfogênese/genética , RNA/genética , RNA/metabolismo , RNA Circular , RNA Mensageiro/metabolismo , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Xenopus laevis/crescimento & desenvolvimento , Xenopus laevis/metabolismo
10.
RNA ; 24(9): 1195-1213, 2018 09.
Artigo em Inglês | MEDLINE | ID: mdl-29914874

RESUMO

Long noncoding RNAs (lncRNAs), which are longer than 200 nucleotides but often unstable, contribute a substantial and diverse portion to pervasive noncoding transcriptomes. Most lncRNAs are poorly annotated and understood, although several play important roles in gene regulation and diseases. Here we systematically uncover and analyze lncRNAs in Schizosaccharomyces pombe. Based on RNA-seq data from twelve RNA-processing mutants and nine physiological conditions, we identify 5775 novel lncRNAs, nearly 4× the previously annotated lncRNAs. The expression of most lncRNAs becomes strongly induced under the genetic and physiological perturbations, most notably during late meiosis. Most lncRNAs are cryptic and suppressed by three RNA-processing pathways: the nuclear exosome, cytoplasmic exonuclease, and RNAi. Double-mutant analyses reveal substantial coordination and redundancy among these pathways. We classify lncRNAs by their dominant pathway into cryptic unstable transcripts (CUTs), Xrn1-sensitive unstable transcripts (XUTs), and Dicer-sensitive unstable transcripts (DUTs). XUTs and DUTs are enriched for antisense lncRNAs, while CUTs are often bidirectional and actively translated. The cytoplasmic exonuclease, along with RNAi, dampens the expression of thousands of lncRNAs and mRNAs that become induced during meiosis. Antisense lncRNA expression mostly negatively correlates with sense mRNA expression in the physiological, but not the genetic conditions. Intergenic and bidirectional lncRNAs emerge from nucleosome-depleted regions, upstream of positioned nucleosomes. Our results highlight both similarities and differences to lncRNA regulation in budding yeast. This broad survey of the lncRNA repertoire and characteristics in S. pombe, and the interwoven regulatory pathways that target lncRNAs, provides a rich framework for their further functional analyses.


Assuntos
Exonucleases/metabolismo , Exossomos/metabolismo , RNA Longo não Codificante/genética , Schizosaccharomyces/genética , Análise de Sequência de RNA/métodos , Núcleo Celular/metabolismo , Citoplasma/enzimologia , Proteínas Fúngicas/metabolismo , Perfilação da Expressão Gênica/métodos , Regulação Fúngica da Expressão Gênica , Meiose , Anotação de Sequência Molecular , Mutação , Interferência de RNA , Estabilidade de RNA , RNA Fúngico/genética , RNA Longo não Codificante/química , Schizosaccharomyces/química , Schizosaccharomyces/enzimologia
11.
PLoS One ; 12(2): e0171798, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28207793

RESUMO

Hybrid incompatibility between Drosophila melanogaster and D. simulans is caused by a lethal interaction of the proteins encoded by the Hmr and Lhr genes. In D. melanogaster the loss of HMR results in mitotic defects, an increase in transcription of transposable elements and a deregulation of heterochromatic genes. To better understand the molecular mechanisms that mediate HMR's function, we measured genome-wide localization of HMR in D. melanogaster tissue culture cells by chromatin immunoprecipitation. Interestingly, we find HMR localizing to genomic insulator sites that can be classified into two groups. One group belongs to gypsy insulators and another one borders HP1a bound regions at active genes. The transcription of the latter group genes is strongly affected in larvae and ovaries of Hmr mutant flies. Our data suggest a novel link between HMR and insulator proteins, a finding that implicates a potential role for genome organization in the formation of species.


Assuntos
Proteínas de Drosophila/fisiologia , Drosophila/genética , Especiação Genética , Genoma de Inseto , Animais , Biodiversidade , Proteínas de Drosophila/genética , Proteínas de Drosophila/metabolismo , Hibridização Genética
12.
Nature ; 537(7619): 244-248, 2016 09 08.
Artigo em Inglês | MEDLINE | ID: mdl-27580037

RESUMO

The rules defining which small fraction of related DNA sequences can be selectively bound by a transcription factor are poorly understood. One of the most challenging tasks in DNA recognition is posed by dosage compensation systems that require the distinction between sex chromosomes and autosomes. In Drosophila melanogaster, the male-specific lethal dosage compensation complex (MSL-DCC) doubles the level of transcription from the single male X chromosome, but the nature of this selectivity is not known. Previous efforts to identify X-chromosome-specific target sequences were unsuccessful as the identified MSL recognition elements lacked discriminative power. Therefore, additional determinants such as co-factors, chromatin features, RNA and chromosome conformation have been proposed to refine targeting further. Here, using an in vitro genome-wide DNA binding assay, we show that recognition of the X chromosome is an intrinsic feature of the MSL-DCC. MSL2, the male-specific organizer of the complex, uses two distinct DNA interaction surfaces-the CXC and proline/basic-residue-rich domains-to identify complex DNA elements on the X chromosome. Specificity is provided by the CXC domain, which binds a novel motif defined by DNA sequence and shape. This motif characterizes a subclass of MSL2-binding sites, which we name PionX (pioneering sites on the X) as they appeared early during the recent evolution of an X chromosome in D. miranda and are the first chromosomal sites to be bound during de novo MSL-DCC assembly. Our data provide the first, to our knowledge, documented molecular mechanism through which the dosage compensation machinery distinguishes the X chromosome from an autosome. They highlight fundamental principles in the recognition of complex DNA elements by protein that will have a strong impact on many aspects of chromosome biology.


Assuntos
Mecanismo Genético de Compensação de Dose/genética , Drosophila melanogaster/genética , Complexos Multiproteicos/metabolismo , Sequências Reguladoras de Ácido Nucleico/genética , Cromossomo X/genética , Motivos de Aminoácidos , Animais , Sequência de Bases , Sítios de Ligação , Proteínas de Ligação a DNA/metabolismo , Proteínas de Drosophila/metabolismo , Evolução Molecular , Feminino , Genoma de Inseto/genética , Masculino , Complexos Multiproteicos/química , Proteínas Nucleares/metabolismo , Conformação de Ácido Nucleico , Motivos de Nucleotídeos , Domínios Proteicos , Subunidades Proteicas/química , Subunidades Proteicas/metabolismo , Especificidade por Substrato , Fatores de Transcrição/metabolismo , Cromossomo X/metabolismo
13.
Methods Mol Biol ; 1415: 341-70, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27115641

RESUMO

Obtaining diffracting quality crystals remains a major challenge in protein structure research. We summarize and compare methods for selecting the best protein targets for crystallization, construct optimization and crystallization condition design. Target selection methods are divided into algorithms predicting the chance of successful progression through all stages of structural determination (from cloning to solving the structure) and those focusing only on the crystallization step. We tried to highlight pros and cons of different approaches examining the following aspects: data size, redundancy and representativeness, overfitting during model construction, and results evaluation. In summary, although in recent years progress was made and several sequence properties were reported to be relevant for crystallization, the successful prediction of protein crystallization behavior and selection of corresponding crystallization conditions continue to challenge structural researchers.


Assuntos
Genômica/métodos , Proteínas/química , Algoritmos , Cristalização , Cristalografia por Raios X , Bases de Dados de Proteínas , Espectroscopia de Ressonância Magnética , Microscopia Eletrônica , Proteômica
14.
EMBO J ; 35(1): 24-45, 2016 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-26516211

RESUMO

Cell fate specification relies on the action of critical transcription factors that become available at distinct stages of embryonic development. One such factor is NeuroD1, which is essential for eliciting the neuronal development program and possesses the ability to reprogram other cell types into neurons. Given this capacity, it is important to understand its targets and the mechanism underlying neuronal specification. Here, we show that NeuroD1 directly binds regulatory elements of neuronal genes that are developmentally silenced by epigenetic mechanisms. This targeting is sufficient to initiate events that confer transcriptional competence, including reprogramming of transcription factor landscape, conversion of heterochromatin to euchromatin, and increased chromatin accessibility, indicating potential pioneer factor ability of NeuroD1. The transcriptional induction of neuronal fate genes is maintained via epigenetic memory despite a transient NeuroD1 induction during neurogenesis. NeuroD1 also induces genes involved in the epithelial-to-mesenchymal transition, thereby promoting neuronal migration. Our study not only reveals the NeuroD1-dependent gene regulatory program driving neurogenesis but also increases our understanding of how cell fate specification during development involves a concerted action of transcription factors and epigenetic mechanisms.


Assuntos
Fatores de Transcrição Hélice-Alça-Hélice Básicos/metabolismo , Diferenciação Celular , Cromatina/metabolismo , Regulação da Expressão Gênica no Desenvolvimento , Neurônios/fisiologia , Fatores de Transcrição/metabolismo , Animais , Linhagem Celular , Epigênese Genética , Redes Reguladoras de Genes , Camundongos
15.
Nucleic Acids Res ; 42(Database issue): D396-400, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24214996

RESUMO

Knowledge about non-interacting proteins (NIPs) is important for training the algorithms to predict protein-protein interactions (PPIs) and for assessing the false positive rates of PPI detection efforts. We present the second version of Negatome, a database of proteins and protein domains that are unlikely to engage in physical interactions (available online at http://mips.helmholtz-muenchen.de/proj/ppi/negatome). Negatome is derived by manual curation of literature and by analyzing three-dimensional structures of protein complexes. The main methodological innovation in Negatome 2.0 is the utilization of an advanced text mining procedure to guide the manual annotation process. Potential non-interactions were identified by a modified version of Excerbt, a text mining tool based on semantic sentence analysis. Manual verification shows that nearly a half of the text mining results with the highest confidence values correspond to NIP pairs. Compared to the first version the contents of the database have grown by over 300%.


Assuntos
Bases de Dados de Proteínas , Domínios e Motivos de Interação entre Proteínas , Mapeamento de Interação de Proteínas , Mineração de Dados , Internet , Anotação de Sequência Molecular , Conformação Proteica
16.
FEBS J ; 279(12): 2192-200, 2012 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-22536855

RESUMO

Many fields of science and industry depend on efficient production of active protein using heterologous expression in Escherichia coli. The solubility of proteins upon expression is dependent on their amino acid sequence. Prediction of solubility from sequence is therefore highly valuable. We present a novel machine-learning-based model called PROSO II which makes use of new classification methods and growth in experimental data to improve coverage and accuracy of solubility predictions. The classification algorithm is organized as a two-layered structure in which the output of a primary Parzen window model for sequence similarity and a logistic regression classifier of amino acid k-mer composition serve as input for a second-level logistic regression classifier. Compared with previously published research our model is trained on five times more data than used by any other method before (82 000 proteins). When tested on a separate holdout set not used at any point of method development our server attained the best results in comparison with other currently available methods: accuracy 75.4%, Matthew's correlation coefficient 0.39, sensitivity 0.731, specificity 0.759, gain (soluble) 2.263. In summary, due to utilization of cutting edge machine learning technologies combined with the largest currently available experimental data set the PROSO II server constitutes a substantial improvement in protein solubility predictions. PROSO II is available at http://mips.helmholtz-muenchen.de/prosoII.


Assuntos
Inteligência Artificial , Proteínas/química , Proteínas/classificação , Solubilidade
17.
Methods Mol Biol ; 609: 385-400, 2010.
Artigo em Inglês | MEDLINE | ID: mdl-20221931

RESUMO

Obtaining well-diffracting crystals remains a major challenge in protein structure research. In this chapter, we review currently available computational methods to estimate the crystallization potential of a protein, to optimize amino acid sequences toward improved crystallization likelihood, and to design optimal crystal screen conditions.


Assuntos
Biologia Computacional , Mineração de Dados , Bases de Dados de Proteínas , Proteínas/química , Algoritmos , Sequência de Aminoácidos , Animais , Cristalização , Humanos , Conformação Proteica , Proteínas/genética , Análise de Sequência de Proteína
18.
Nucleic Acids Res ; 38(Database issue): D540-4, 2010 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-19920129

RESUMO

The Negatome is a collection of protein and domain pairs that are unlikely to be engaged in direct physical interactions. The database currently contains experimentally supported non-interacting protein pairs derived from two distinct sources: by manual curation of literature and by analyzing protein complexes with known 3D structure. More stringent lists of non-interacting pairs were derived from these two datasets by excluding interactions detected by high-throughput approaches. Additionally, non-interacting protein domains have been derived from the stringent manual and structural data, respectively. The Negatome is much less biased toward functionally dissimilar proteins than the negative data derived by randomly selecting proteins from different cellular locations. It can be used to evaluate protein and domain interactions from new experiments and improve the training of interaction prediction algorithms. The Negatome database is available at http://mips.helmholtz-muenchen.de/proj/ppi/negatome.


Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Bases de Dados de Ácidos Nucleicos , Mapeamento de Interação de Proteínas , Proteínas/química , Algoritmos , Animais , Biologia Computacional/tendências , Bases de Dados de Proteínas , Genoma Fúngico , Humanos , Armazenamento e Recuperação da Informação/métodos , Internet , Estrutura Terciária de Proteína , Saccharomyces cerevisiae/metabolismo , Software
20.
BMC Genomics ; 9: 629, 2008 Dec 23.
Artigo em Inglês | MEDLINE | ID: mdl-19108706

RESUMO

BACKGROUND: We have recently released a comprehensive, manually curated database of mammalian protein complexes called CORUM. Combining CORUM with other resources, we assembled a dataset of over 2700 mammalian complexes. The availability of a rich information resource allows us to search for organizational properties concerning these complexes. RESULTS: As the complexity of a protein complex in terms of the number of unique subunits increases, we observed that the number of such complexes and the mean non-synonymous to synonymous substitution ratio of associated genes tend to decrease. Similarly, as the number of different complexes a given protein participates in increases, the number of such proteins and the substitution ratio of the associated gene also tends to decrease. These observations provide evidence relating natural selection and the organization of mammalian complexes. We also observed greater homogeneity in terms of predicted protein isoelectric points, secondary structure and substitution ratio in annotated versus randomly generated complexes. A large proportion of the protein content and interactions in the complexes could be predicted from known binary protein-protein and domain-domain interactions. In particular, we found that large proteins interact preferentially with much smaller proteins. CONCLUSION: We observed similar trends in yeast and other data. Our results support the existence of conserved relations associated with the mammalian protein complexes.


Assuntos
Bases de Dados de Proteínas , Evolução Molecular , Complexos Multiproteicos/análise , Mapeamento de Interação de Proteínas , Animais , Biologia Computacional/métodos , Modelos Lineares , Mamíferos , Modelos Moleculares , Estrutura Secundária de Proteína , Proteômica/métodos , Análise de Sequência de Proteína
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...