Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Genomics ; 112(1): 174-183, 2020 01.
Artigo em Inglês | MEDLINE | ID: mdl-30660789

RESUMO

Protein complexes are one of the most important functional units for deriving biological processes within the cell. Experimental methods have provided valuable data to infer protein complexes. However, these methods have inherent limitations. Considering these limitations, many computational methods have been proposed to predict protein complexes, in the last decade. Almost all of these in-silico methods predict protein complexes from the ever-increasing protein-protein interaction (PPI) data. These computational approaches usually use the PPI data in the format of a huge protein-protein interaction network (PPIN) as input and output various sub-networks of the given PPIN as the predicted protein complexes. Some of these methods have already reached a promising efficiency in protein complex detection. Nonetheless, there are challenges in prediction of other types of protein complexes, specially sparse and small ones. New methods should further incorporate the knowledge of biological properties of proteins to improve the performance. Additionally, there are several challenges that should be considered more effectively in designing the new complex prediction algorithms in the future. This article not only reviews the history of computational protein complex prediction but also provides new insight for improvement of new methodologies. In this article, most important computational methods for protein complex prediction are evaluated and compared. In addition, some of the challenges in the reconstruction of the protein complexes are discussed. Finally, various tools for protein complex prediction and PPIN analysis as well as the current high-throughput databases are reviewed.


Assuntos
Complexos Multiproteicos/metabolismo , Mapeamento de Interação de Proteínas , Biologia Computacional/métodos , Software
2.
Bioinformatics ; 32(14): 2205-7, 2016 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-27153639

RESUMO

UNLABELLED: We present a new R package for training gapped-kmer SVM classifiers for DNA and protein sequences. We describe an improved algorithm for kernel matrix calculation that speeds run time by about 2 to 5-fold over our original gkmSVM algorithm. This package supports several sequence kernels, including: gkmSVM, kmer-SVM, mismatch kernel and wildcard kernel. AVAILABILITY AND IMPLEMENTATION: gkmSVM package is freely available through the Comprehensive R Archive Network (CRAN), for Linux, Mac OS and Windows platforms. The C ++ implementation is available at www.beerlab.org/gkmsvm CONTACT: mghandi@gmail.com or mbeer@jhu.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Análise de Sequência de DNA/métodos , Análise de Sequência de Proteína/métodos , Software , Máquina de Vetores de Suporte , Algoritmos
3.
Genomics ; 104(6 Pt B): 496-503, 2014 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-25458812

RESUMO

Protein-protein interaction (PPI) detection is one of the central goals of functional genomics and systems biology. Knowledge about the nature of PPIs can help fill the widening gap between sequence information and functional annotations. Although experimental methods have produced valuable PPI data, they also suffer from significant limitations. Computational PPI prediction methods have attracted tremendous attentions. Despite considerable efforts, PPI prediction is still in its infancy in complex multicellular organisms such as humans. Here, we propose a novel ensemble learning method, LocFuse, which is useful in human PPI prediction. This method uses eight different genomic and proteomic features along with four types of different classifiers. The prediction performance of this classifier selection method was found to be considerably better than methods employed hitherto. This confirms the complex nature of the PPI prediction problem and also the necessity of using biological information for classifier fusion. The LocFuse is available at: http://lbb.ut.ac.ir/Download/LBBsoft/LocFuse. BIOLOGICAL SIGNIFICANCE: The results revealed that if we divide proteome space according to the cellular localization of proteins, then the utility of some classifiers in PPI prediction can be improved. Therefore, to predict the interaction for any given protein pair, we can select the most accurate classifier with regard to the cellular localization information. Based on the results, we can say that the importance of different features for PPI prediction varies between differently localized proteins; however in general, our novel features, which were extracted from position-specific scoring matrices (PSSMs), are the most important ones and the Random Forest (RF) classifier performs best in most cases. LocFuse was developed with a user-friendly graphic interface and it is freely available for Linux, Mac OSX and MS Windows operating systems.


Assuntos
Metabolômica/métodos , Processamento de Proteína Pós-Traducional , Proteoma/metabolismo , Proteômica/métodos , Software , Inteligência Artificial , Humanos , Ligação Proteica , Transporte Proteico
4.
PLoS Comput Biol ; 10(7): e1003711, 2014 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-25033408

RESUMO

Oligomers of length k, or k-mers, are convenient and widely used features for modeling the properties and functions of DNA and protein sequences. However, k-mers suffer from the inherent limitation that if the parameter k is increased to resolve longer features, the probability of observing any specific k-mer becomes very small, and k-mer counts approach a binary variable, with most k-mers absent and a few present once. Thus, any statistical learning approach using k-mers as features becomes susceptible to noisy training set k-mer frequencies once k becomes large. To address this problem, we introduce alternative feature sets using gapped k-mers, a new classifier, gkm-SVM, and a general method for robust estimation of k-mer frequencies. To make the method applicable to large-scale genome wide applications, we develop an efficient tree data structure for computing the kernel matrix. We show that compared to our original kmer-SVM and alternative approaches, our gkm-SVM predicts functional genomic regulatory elements and tissue specific enhancers with significantly improved accuracy, increasing the precision by up to a factor of two. We then show that gkm-SVM consistently outperforms kmer-SVM on human ENCODE ChIP-seq datasets, and further demonstrate the general utility of our method using a Naïve-Bayes classifier. Although developed for regulatory sequence analysis, these methods can be applied to any sequence classification problem.


Assuntos
Biologia Computacional/métodos , Modelos Genéticos , Sequências Reguladoras de Ácido Nucleico/genética , Análise de Sequência de DNA/métodos , Sequência de Bases , Teorema de Bayes , Imunoprecipitação da Cromatina , Oligonucleotídeos/genética , Especificidade de Órgãos/genética , Máquina de Vetores de Suporte
5.
Bioinformatics ; 30(9): 1250-8, 2014 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-24407223

RESUMO

MOTIVATION: RNAs play fundamental roles in cellular processes. The function of an RNA is highly dependent on its 3D conformation, which is referred to as the RNA tertiary structure. Because the prediction or experimental determination of these structures is difficult, so many works focus on the problems associated with the RNA secondary structure. Here, we consider the RNA inverse folding problem, in which an RNA secondary structure is given as a target structure and the goal is to design an RNA sequence that folds into the target structure. In this article, we introduce a new evolutionary algorithm for the RNA inverse folding problem. Our algorithm, entitled Evolutionary RNA Design, generates a sequence whose minimum free energy structure is the same as the target structure. RESULTS: We compare our algorithm with INFO-RNA, MODENA, RNAiFold and NUPACK approaches for some biological test sets. The results presented in this article indicate that for longer structures, our algorithm performs better than the other mentioned algorithms in terms of the energy range, accuracy, speedup and nucleotide distribution. Particularly, the generated RNA sequences in our method are much more reliable and similar to the natural RNA sequences.


Assuntos
RNA/química , Análise de Sequência de RNA/métodos , Algoritmos , Conformação de Ácido Nucleico , Dobramento de RNA , Software
6.
J Math Biol ; 69(2): 469-500, 2014 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-23861010

RESUMO

Oligomers of fixed length, k, commonly known as k-mers, are often used as fundamental elements in the description of DNA sequence features of diverse biological function, or as intermediate elements in the constuction of more complex descriptors of sequence features such as position weight matrices. k-mers are very useful as general sequence features because they constitute a complete and unbiased feature set, and do not require parameterization based on incomplete knowledge of biological mechanisms. However, a fundamental limitation in the use of k-mers as sequence features is that as k is increased, larger spatial correlations in DNA sequence elements can be described, but the frequency of observing any specific k-mer becomes very small, and rapidly approaches a sparse matrix of binary counts. Thus any statistical learning approach using k-mers will be susceptible to noisy estimation of k-mer frequencies once k becomes large. Because all molecular DNA interactions have limited spatial extent, gapped k-mers often carry the relevant biological signal. Here we use gapped k-mer counts to more robustly estimate the ungapped k-mer frequencies, by deriving an equation for the minimum norm estimate of k-mer frequencies given an observed set of gapped k-mer frequencies. We demonstrate that this approach provides a more accurate estimate of the k-mer frequencies in real biological sequences using a sample of CTCF binding sites in the human genome.


Assuntos
DNA/química , Genoma Humano , Fatores de Transcrição/química , Sítios de Ligação , Humanos
7.
Genomics ; 102(4): 237-42, 2013 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-23747746

RESUMO

Protein-protein interactions regulate a variety of cellular processes. There is a great need for computational methods as a complement to experimental methods with which to predict protein interactions due to the existence of many limitations involved in experimental techniques. Here, we introduce a novel evolutionary based feature extraction algorithm for protein-protein interaction (PPI) prediction. The algorithm is called PPIevo and extracts the evolutionary feature from Position-Specific Scoring Matrix (PSSM) of protein with known sequence. The algorithm does not depend on the protein annotations, and the features are based on the evolutionary history of the proteins. This enables the algorithm to have more power for predicting protein-protein interaction than many sequence based algorithms. Results on the HPRD database show better performance and robustness of the proposed method. They also reveal that the negative dataset selection could lead to an acute performance overestimation which is the principal drawback of the available methods.


Assuntos
Biologia Computacional/métodos , Evolução Molecular , Matrizes de Pontuação de Posição Específica , Mapas de Interação de Proteínas , Proteínas/química , Proteínas/metabolismo , Algoritmos , Sequência de Aminoácidos , Inteligência Artificial , Bases de Dados de Proteínas , Humanos , Filogenia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...