Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Bioinformatics ; 31(12): i365-74, 2015 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-26072505

RESUMO

MOTIVATION: Proteins are responsible for a multitude of vital tasks in all living organisms. Given that a protein's function and role are strongly related to its subcellular location, protein location prediction is an important research area. While proteins move from one location to another and can localize to multiple locations, most existing location prediction systems assign only a single location per protein. A few recent systems attempt to predict multiple locations for proteins, however, their performance leaves much room for improvement. Moreover, such systems do not capture dependencies among locations and usually consider locations as independent. We hypothesize that a multi-location predictor that captures location inter-dependencies can improve location predictions for proteins. RESULTS: We introduce a probabilistic generative model for protein localization, and develop a system based on it-which we call MDLoc-that utilizes inter-dependencies among locations to predict multiple locations for proteins. The model captures location inter-dependencies using Bayesian networks and represents dependency between features and locations using a mixture model. We use iterative processes for learning model parameters and for estimating protein locations. We evaluate our classifier MDLoc, on a dataset of single- and multi-localized proteins derived from the DBMLoc dataset, which is the most comprehensive protein multi-localization dataset currently available. Our results, obtained by using MDLoc, significantly improve upon results obtained by an initial simpler classifier, as well as on results reported by other top systems. AVAILABILITY AND IMPLEMENTATION: MDLoc is available at: http://www.eecis.udel.edu/∼compbio/mdloc.


Assuntos
Bases de Dados de Proteínas , Modelos Teóricos , Proteínas/metabolismo , Teorema de Bayes , Humanos , Transporte Proteico , Frações Subcelulares
2.
PLoS One ; 7(11): e48723, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23166592

RESUMO

Quantitative predictions in computational life sciences are often based on regression models. The advent of machine learning has led to highly accurate regression models that have gained widespread acceptance. While there are statistical methods available to estimate the global performance of regression models on a test or training dataset, it is often not clear how well this performance transfers to other datasets or how reliable an individual prediction is-a fact that often reduces a user's trust into a computational method. In analogy to the concept of an experimental error, we sketch how estimators for individual prediction errors can be used to provide confidence intervals for individual predictions. Two novel statistical methods, named CONFINE and CONFIVE, can estimate the reliability of an individual prediction based on the local properties of nearby training data. The methods can be applied equally to linear and non-linear regression methods with very little computational overhead. We compare our confidence estimators with other existing confidence and applicability domain estimators on two biologically relevant problems (MHC-peptide binding prediction and quantitative structure-activity relationship (QSAR)). Our results suggest that the proposed confidence estimators perform comparable to or better than previously proposed estimation methods. Given a sufficient amount of training data, the estimators exhibit error estimates of high quality. In addition, we observed that the quality of estimated confidence intervals is predictable. We discuss how confidence estimation is influenced by noise, the number of features, and the dataset size. Estimating the confidence in individual prediction in terms of error intervals represents an important step from plain, non-informative predictions towards transparent and interpretable predictions that will help to improve the acceptance of computational methods in the biological community.


Assuntos
Biologia Computacional/métodos , Intervalos de Confiança , Interpretação Estatística de Dados , Valor Preditivo dos Testes , Análise de Regressão , Ligação Proteica , Relação Quantitativa Estrutura-Atividade
3.
Nucleic Acids Res ; 38(Web Server issue): W497-502, 2010 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-20507917

RESUMO

Predicting subcellular localization has become a valuable alternative to time-consuming experimental methods. Major drawbacks of many of these predictors is their lack of interpretability and the fact that they do not provide an estimate of the confidence of an individual prediction. We present YLoc, an interpretable web server for predicting subcellular localization. YLoc uses natural language to explain why a prediction was made and which biological property of the protein was mainly responsible for it. In addition, YLoc estimates the reliability of its own predictions. YLoc can, thus, assist in understanding protein localization and in location engineering of proteins. The YLoc web server is available online at www.multiloc.org/YLoc.


Assuntos
Proteínas/análise , Software , Animais , Proteínas Fúngicas/análise , Internet , Organelas/química , Proteínas de Plantas/análise , Reprodutibilidade dos Testes , Análise de Sequência de Proteína
4.
Bioinformatics ; 26(9): 1232-8, 2010 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-20299325

RESUMO

MOTIVATION: Protein subcellular localization is pivotal in understanding a protein's function. Computational prediction of subcellular localization has become a viable alternative to experimental approaches. While current machine learning-based methods yield good prediction accuracy, most of them suffer from two key problems: lack of interpretability and dealing with multiple locations. RESULTS: We present YLoc, a novel method for predicting protein subcellular localization that addresses these issues. Due to its simple architecture, YLoc can identify the relevant features of a protein sequence contributing to its subcellular localization, e.g. localization signals or motifs relevant to protein sorting. We present several example applications where YLoc identifies the sequence features responsible for protein localization, and thus reveals not only to which location a protein is transported to, but also why it is transported there. YLoc also provides a confidence estimate for the prediction. Thus, the user can decide what level of error is acceptable for a prediction. Due to a probabilistic approach and the use of several thousands of dual-targeted proteins, YLoc is able to predict multiple locations per protein. YLoc was benchmarked using several independent datasets for protein subcellular localization and performs on par with other state-of-the-art predictors. Disregarding low-confidence predictions, YLoc can achieve prediction accuracies of over 90%. Moreover, we show that YLoc is able to reliably predict multiple locations and outperforms the best predictors in this area. AVAILABILITY: www.multiloc.org/YLoc.


Assuntos
Biologia Computacional/métodos , Proteínas/metabolismo , Algoritmos , Animais , Inteligência Artificial , Teorema de Bayes , Perfilação da Expressão Gênica , Humanos , Modelos Biológicos , Modelos Estatísticos , Mutação , Probabilidade , Reprodutibilidade dos Testes , Software
5.
Eur J Cell Biol ; 89(2-3): 175-83, 2010.
Artigo em Inglês | MEDLINE | ID: mdl-20047775

RESUMO

Reversible phosphorylation plays a crucial role in regulating the activity of enzymes and other proteins in all living organisms. Particularly, the phosphorylation of transcription factors can modulate their capability to regulate downstream target genes. In plants, basic domain-containing leucine-zipper (bZIP) transcription factors have an important function in the regulation of many developmental processes and adaptive responses to the environment. By a comprehensive sequence analysis, we identified a set of highly conserved, potentially phospho-accepting serines within the DNA-binding domain of plant bZIPs. Structural modelling revealed that these serines are in physical contact with the DNA and predicts that their phosphorylation will have a major influence on the DNA-binding activity of plant bZIPs. In support of this, we show, by means of a quantitative in vitro binding assay, that phosphorylation-mimicking substitutions of some of these serines strongly interfere with the DNA binding of two prototypical Arabidopsis bZIPs, namely AtZIP63 and HY5. Our data suggest that the identified serines could serve as in vivo targets for kinases and phosphatases, allowing the fine-tuning of bZIP factor activity at the DNA-protein interaction level.


Assuntos
Proteínas de Arabidopsis/metabolismo , Arabidopsis/metabolismo , Fatores de Transcrição de Zíper de Leucina Básica/metabolismo , Proteínas Nucleares/metabolismo , Serina/metabolismo , Sequência de Aminoácidos , Animais , Arabidopsis/genética , Proteínas de Arabidopsis/genética , Fatores de Transcrição de Zíper de Leucina Básica/genética , DNA de Plantas/genética , DNA de Plantas/metabolismo , Regulação da Expressão Gênica de Plantas , Modelos Moleculares , Dados de Sequência Molecular , Mutagênese Sítio-Dirigida , Proteínas Nucleares/genética , Fosforilação , Conformação Proteica
6.
BMC Bioinformatics ; 10: 274, 2009 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-19723330

RESUMO

BACKGROUND: Knowledge of subcellular localization of proteins is crucial to proteomics, drug target discovery and systems biology since localization and biological function are highly correlated. In recent years, numerous computational prediction methods have been developed. Nevertheless, there is still a need for prediction methods that show more robustness and higher accuracy. RESULTS: We extended our previous MultiLoc predictor by incorporating phylogenetic profiles and Gene Ontology terms. Two different datasets were used for training the system, resulting in two versions of this high-accuracy prediction method. One version is specialized for globular proteins and predicts up to five localizations, whereas a second version covers all eleven main eukaryotic subcellular localizations. In a benchmark study with five localizations, MultiLoc2 performs considerably better than other methods for animal and plant proteins and comparably for fungal proteins. Furthermore, MultiLoc2 performs clearly better when using a second dataset that extends the benchmark study to all eleven main eukaryotic subcellular localizations. CONCLUSION: MultiLoc2 is an extensive high-performance subcellular protein localization prediction system. By incorporating phylogenetic profiles and Gene Ontology terms MultiLoc2 yields higher accuracies compared to its previous version. Moreover, it outperforms other prediction systems in two benchmarks studies. MultiLoc2 is available as user-friendly and free web-service, available at: http://www-bs.informatik.uni-tuebingen.de/Services/MultiLoc2.


Assuntos
Biologia Computacional/métodos , Filogenia , Proteínas/análise , Proteínas/química , Proteômica/métodos , Software , Bases de Dados de Proteínas
7.
J Proteome Res ; 8(11): 5363-6, 2009 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-19764776

RESUMO

SherLoc2 is a comprehensive high-accuracy subcellular localization prediction system. It is applicable to animal, fungal, and plant proteins and covers all main eukaryotic subcellular locations. SherLoc2 integrates several sequence-based features as well as text-based features. In addition, we incorporate phylogenetic profiles and Gene Ontology (GO) terms derived from the protein sequence to considerably improve the prediction performance. SherLoc2 achieves an overall classification accuracy of up to 93% in 5-fold cross-validation. A novel feature, DiaLoc, allows users to manually provide their current background knowledge by describing a protein in a short abstract which is then used to improve the prediction. SherLoc2 is available both as a free Web service and as a stand-alone version at http://www-bs.informatik.uni-tuebingen.de/Services/SherLoc2.


Assuntos
Proteínas Fúngicas , Proteínas de Plantas , Proteínas , Software , Frações Subcelulares/química , Animais , Proteínas Fúngicas/análise , Proteínas Fúngicas/classificação , Filogenia , Proteínas de Plantas/análise , Proteínas de Plantas/classificação , Proteínas/análise , Proteínas/classificação , Reprodutibilidade dos Testes
8.
BMC Bioinformatics ; 10 Suppl 1: S61, 2009 Jan 30.
Artigo em Inglês | MEDLINE | ID: mdl-19208165

RESUMO

BACKGROUND: The COMPARABILITY EDITING problem appears in the context of hierarchical disease classification based on noisy data. We are given a directed graph G representing hierarchical relationships between patient subgroups. The task is to identify the minimum number of edge insertions or deletions to transform G into a transitive graph, that is, if edges (u, v) and (v, w) are present then edge (u, w) must be present, too. RESULTS: We present two new approaches for the problem based on fixed-parameter algorithmics and integer linear programming. In contrast to previously used heuristics, our approaches compute provably optimal solutions. CONCLUSION: Our computational results demonstrate that our exact algorithms are by far more efficient in practice than a previously used heuristic approach. In addition to the superior running time performance, our algorithms are capable of enumerating all optimal solutions, and naturally solve the weighted version of the problem.


Assuntos
Algoritmos , Técnicas de Diagnóstico Molecular/métodos , Humanos , Técnicas de Diagnóstico Molecular/estatística & dados numéricos , Programação Linear , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...