Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
1.
Adv Bioinformatics ; : 787128, 2009.
Artigo em Inglês | MEDLINE | ID: mdl-20182643

RESUMO

A protein network shows physical interactions as well as functional associations. An important usage of such networks is to discover unknown members of partially known complexes and pathways. A number of methods exist for such analyses, and they can be divided into two main categories based on their treatment of highly connected proteins. In this paper, we show that methods that are not affected by the degree (number of linkages) of a protein give more accurate predictions for certain complexes and pathways. We propose a network flow-based technique to compute the association probability of a pair of proteins. We extend the proposed technique using hierarchical clustering in order to scale well with the size of proteome. We also show that top-k queries are not suitable for a large number of cases, and threshold queries are more meaningful in these cases. Network flow technique with clustering is able to optimize meaningful threshold queries and answer them with high efficiency compared to a similar method that uses Monte Carlo simulation.

2.
Bioinformatics ; 22(13): 1585-92, 2006 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-16595556

RESUMO

MOTIVATION: A global view of the protein space is essential for functional and evolutionary analysis of proteins. In order to achieve this, a similarity network can be built using pairwise relationships among proteins. However, existing similarity networks employ a single similarity measure and therefore their utility depends highly on the quality of the selected measure. A more robust representation of the protein space can be realized if multiple sources of information are used. RESULTS: We propose a novel approach for analyzing multi-attribute similarity networks by combining random walks on graphs with Bayesian theory. A multi-attribute network is created by combining sequence and structure based similarity measures. For each attribute of the similarity network, one can compute a measure of affinity from a given protein to every other protein in the network using random walks. This process makes use of the implicit clustering information of the similarity network, and we show that it is superior to naive, local ranking methods. We then combine the computed affinities using a Bayesian framework. In particular, when we train a Bayesian model for automated classification of a novel protein, we achieve high classification accuracy and outperform single attribute networks. In addition, we demonstrate the effectiveness of our technique by comparison with a competing kernel-based information integration approach.


Assuntos
Biologia Computacional/métodos , Proteínas/química , Algoritmos , Automação , Teorema de Bayes , Análise por Conglomerados , Bases de Dados de Proteínas , Evolução Molecular , Modelos Estatísticos , Probabilidade , Conformação Proteica , Dobramento de Proteína , Análise de Sequência de Proteína
3.
J Bioinform Comput Biol ; 3(3): 717-42, 2005 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-16108091

RESUMO

We propose a novel technique for automatically generating the SCOP classification of a protein structure with high accuracy. We achieve accurate classification by combining the decisions of multiple methods using the consensus of a committee (or an ensemble) classifier. Our technique, based on decision trees, is rooted in machine learning which shows that by judicially employing component classifiers, an ensemble classifier can be constructed to outperform its components. We use two sequence- and three structure-comparison tools as component classifiers. Given a protein structure and using the joint hypothesis, we first determine if the protein belongs to an existing category (family, superfamily, fold) in the SCOP hierarchy. For the proteins that are predicted as members of the existing categories, we compute their family-, superfamily-, and fold-level classifications using the consensus classifier. We show that we can significantly improve the classification accuracy compared to the individual component classifiers. In particular, we achieve error rates that are 3-12 times less than the individual classifiers' error rates at the family level, 1.5-4.5 times less at the superfamily level, and 1.1-2.4 times less at the fold level.


Assuntos
Algoritmos , Inteligência Artificial , Árvores de Decisões , Proteínas/química , Proteínas/classificação , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Simulação por Computador , Bases de Dados de Proteínas , Armazenamento e Recuperação da Informação/métodos , Modelos Químicos , Modelos Moleculares , Dados de Sequência Molecular , Reconhecimento Automatizado de Padrão/métodos , Proteínas/análise , Integração de Sistemas
4.
J Bioinform Comput Biol ; 2(1): 99-126, 2004 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-15272435

RESUMO

We propose new methods for finding similarities in protein structure databases. These methods extract feature vectors on triplets of SSEs (Secondary Structure Elements) of proteins. The feature vectors are then indexed using a multidimensional index structure. Our first technique considers the problem of finding proteins similar to a given query protein in a protein dataset. It quickly finds promising proteins using the index structure. These proteins are then aligned to the query protein using a popular pairwise alignment tool such as VAST. We also develop a novel statistical model to estimate the goodness of a match using the SSEs. Our second technique considers the problem of joining two protein datasets to find an all-to-all similarity. Experimental results show that our techniques improve the pruning time of VAST 3 to 3.5 times, while keeping the sensitivity similar. Our technique can also be incorporated with DALI and CE to improve their running times by a factor of 2 and 2.7 respectively. The software is available online at http://bioserver.cs.ucsb.edu/.


Assuntos
Algoritmos , Bases de Dados de Proteínas , Proteínas/química , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Análise por Conglomerados , Dados de Sequência Molecular , Reconhecimento Automatizado de Padrão , Conformação Proteica , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Homologia de Sequência de Aminoácidos
5.
Artigo em Inglês | MEDLINE | ID: mdl-16448016

RESUMO

We propose a novel technique for automatically generating the SCOP classification of a protein structure with high accuracy. High accuracy is achieved by combining the decisions of multiple methods using the consensus of a committee (or an ensemble) classifier. Our technique is rooted in machine learning which shows that by judicially employing component classifiers, an ensemble classifier can be constructed to outperform its components. We use two sequence- and three structure-comparison tools as component classifiers. Given a protein structure, using the joint hypothesis, we first determine if the protein belongs to an existing category (family, superfamily, fold) in the SCOP hierarchy. For the proteins that are predicted as members of the existing categories, we compute their family-, superfamily-, and fold-level classifications using the consensus classifier. We show that we can significantly improve the classification accuracy compared to the individual component classifiers. In particular, we achieve error rates that are 3-12 times less than the individual classifiers' error rates at the family level, 1.5-4.5 times less at the superfamily level, and 1.1-2.4 times less at the fold level.


Assuntos
Algoritmos , Inteligência Artificial , Técnicas de Apoio para a Decisão , Reconhecimento Automatizado de Padrão/métodos , Proteínas/química , Proteínas/classificação , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Análise por Conglomerados , Sequência Consenso , Dados de Sequência Molecular , Proteínas/análise
6.
Bioinformatics ; 19 Suppl 1: i81-3, 2003.
Artigo em Inglês | MEDLINE | ID: mdl-12855441

RESUMO

MOTIVATION: We consider the problem of finding similarities in protein structure databases. Current techniques sequentially compare the given query protein to all of the proteins in the database to find similarities. Therefore, the cost of similarity queries increases linearly as the volume of the protein databases increase. As the sizes of experimentally determined and theoretically estimated protein structure databases grow, there is a need for scalable searching techniques. RESULTS: Our techniques extract feature vectors on triplets of SSEs (Secondary Structure Elements). Later, these feature vectors are indexed using a multidimensional index structure. For a given query protein, this index structure is used to quickly prune away unpromising proteins in the database. The remaining proteins are then aligned using a popular alignment tool such as VAST. We also develop a novel statistical model to estimate the goodness of a match using the SSEs. Experimental results show that our techniques improve the pruning time of VAST 3 to 3.5 times while maintaining similar sensitivity.


Assuntos
Algoritmos , Bases de Dados de Proteínas , Armazenamento e Recuperação da Informação/métodos , Proteínas/química , Proteínas/classificação , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Modelos Químicos , Modelos Moleculares , Dados de Sequência Molecular , Conformação Proteica , Homologia de Sequência de Aminoácidos , Software
7.
Artigo em Inglês | MEDLINE | ID: mdl-16452789

RESUMO

We propose two methods for finding similarities in protein structure databases. Our techniques extract feature vectors on triplets of SSEs (Secondary Structure Elements) of proteins. These feature vectors are then indexed using a multidimensional index structure. Our first technique considers the problem of finding proteins similar to a given query protein in a protein dataset. This technique quickly finds promising proteins using the index structure. These proteins are then aligned to the query protein using a popular pairwise alignment tool such as VAST. We also develop a novel statistical model to estimate the goodness of a match using the SSEs. Our second technique considers the problem of joining two protein datasets to find an all-to-all similarity. Experimental results show that our techniques improve the pruning time of VAST 3 to 3.5 times while keeping the sensitivity similar.


Assuntos
Algoritmos , Bases de Dados de Proteínas , Armazenamento e Recuperação da Informação/métodos , Proteínas/química , Proteínas/classificação , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Inteligência Artificial , Sistemas de Gerenciamento de Base de Dados , Dados de Sequência Molecular , Reconhecimento Automatizado de Padrão/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...