Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 38
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
IEEE/ACM Trans Comput Biol Bioinform ; 16(5): 1702-1711, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-28678711

RESUMO

We consider the problem of sorting signed permutations by reversals, transpositions, transreversals, and block-interchanges and give a 2-approximation scheme, called the GSB (Genome Sorting by Bridges) scheme. Our result extends 2-approximation algorithm of He and Chen [12] that allowed only reversals and block-interchanges, and also the 1.5 approximation algorithm of Hartman and Sharan [11] that allowed only transreversals and transpositions. We prove this result by introducing three bridge structures in the breakpoint graph, namely, the L-bridge, T-bridge, and X-bridge and show that they model "proper" reversals, transpositions, transreversals, and block-interchanges, respectively. We show that we can always find at least one of these three bridges in any breakpoint graph, thus giving an upper bound on the number of operations needed. We prove a lower bound on the distance and use it to show that GSB has a 2-approximation ratio. An ${\text{O(n}}^{3})$O(n3) algorithm called GSB-I that is based on the GSB approximation scheme presented in this paper has recently been published by Yu, Hao, and Leong in [17] . We note that our 2-approximation scheme admits many possible implementations by varying the order we search for proper rearrangement operations.


Assuntos
Rearranjo Gênico/genética , Genoma/genética , Genômica/métodos , Algoritmos , Modelos Genéticos
2.
J Bioinform Comput Biol ; 16(3): 1840010, 2018 06.
Artigo em Inglês | MEDLINE | ID: mdl-29566638

RESUMO

The accurate detection of genomic islands (GIs) in microbial genomes is important for both evolutionary study and medical research, because GIs may promote genome evolution and contain genes involved in pathogenesis. Various computational methods have been developed to predict GIs over the years. However, most of them cannot make full use of GI-associated features to achieve desirable performance. Additionally, many methods cannot be directly applied to newly sequenced genomes. We develop a new method called GI-Cluster, which provides an effective way to integrate multiple GI-related features via consensus clustering. GI-Cluster does not require training datasets or existing genome annotations, but it can still achieve comparable or better performance than supervised learning methods in comprehensive evaluations. Moreover, GI-Cluster is widely applicable, either to complete and incomplete genomes or to initial GI predictions from other programs. GI-Cluster also provides plots to visualize the distribution of predicted GIs and related features. GI-Cluster is available at https://github.com/icelu/GI_Cluster.


Assuntos
Análise por Conglomerados , Biologia Computacional/métodos , Ilhas Genômicas , Genômica/métodos , Bases de Dados Genéticas , Transferência Genética Horizontal , Genoma , Salmonella typhi/genética , Vibrio cholerae/genética
3.
Gene ; 626: 132-139, 2017 Aug 30.
Artigo em Inglês | MEDLINE | ID: mdl-28512059

RESUMO

The first genome-scale metabolic network of Cordyceps militaris (iWV1170) was constructed representing its whole metabolisms, which consisted of 894 metabolites and 1,267 metabolic reactions across five compartments, including the plasma membrane, cytoplasm, mitochondria, peroxisome and extracellular space. The iWV1170 could be exploited to explain its phenotypes of growth ability, cordycepin and other metabolites production on various substrates. A high number of genes encoding extracellular enzymes for degradation of complex carbohydrates, lipids and proteins were existed in C. militaris genome. By comparative genome-scale analysis, the adenine metabolic pathway towards putative cordycepin biosynthesis was reconstructed, indicating their evolutionary relationships across eleven species of entomopathogenic fungi. The overall metabolic routes involved in the putative cordycepin biosynthesis were also identified in C. militaris, including central carbon metabolism, amino acid metabolism (glycine, l-glutamine and l-aspartate) and nucleotide metabolism (adenosine and adenine). Interestingly, a lack of the sequence coding for ribonucleotide reductase inhibitor was observed in C. militaris that might contribute to its over-production of cordycepin.


Assuntos
Cordyceps/genética , Genoma Fúngico , Redes e Vias Metabólicas , Cordyceps/metabolismo , Cordyceps/patogenicidade , Desoxiadenosinas/biossíntese , Desoxiadenosinas/genética , Proteínas Fúngicas/genética , Proteínas Fúngicas/metabolismo , Hidrolases/genética , Hidrolases/metabolismo , Ribonucleotídeo Redutases/genética , Ribonucleotídeo Redutases/metabolismo , Virulência/genética
4.
BMC Genomics ; 18(Suppl 2): 111, 2017 03 14.
Artigo em Inglês | MEDLINE | ID: mdl-28361712

RESUMO

BACKGROUND: Over the past two decades, phylogenetic networks have been studied to model reticulate evolutionary events. The relationships among phylogenetic networks, phylogenetic trees and clusters serve as the basis for reconstruction and comparison of phylogenetic networks. To understand these relationships, two problems are raised: the tree containment problem, which asks whether a phylogenetic tree is displayed in a phylogenetic network, and the cluster containment problem, which asks whether a cluster is represented at a node in a phylogenetic network. Both the problems are NP-complete. RESULTS: A fast exponential-time algorithm for the cluster containment problem on arbitrary networks is developed and implemented in C. The resulting program is further extended into a computer program for fast computation of the Soft Robinson-Foulds distance between phylogenetic networks. CONCLUSIONS: Two computer programs are developed for facilitating reconstruction and validation of phylogenetic network models in evolutionary and comparative genomics. Our simulation tests indicated that they are fast enough for use in practice. Additionally, the distribution of the Soft Robinson-Foulds distance between phylogenetic networks is demonstrated to be unlikely normal by our simulation data.


Assuntos
Algoritmos , Biologia Computacional/estatística & dados numéricos , Modelos Genéticos , Filogenia , Software , Animais , Evolução Biológica , Culicidae/classificação , Culicidae/genética , Proteínas de Plantas/genética , Poaceae/classificação , Poaceae/genética , RNA de Cadeia Dupla/genética , RNA Fúngico/genética , Rhizoctonia/classificação , Rhizoctonia/genética
5.
Comput Struct Biotechnol J ; 14: 200-6, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27293536

RESUMO

Clusters of genes acquired by lateral gene transfer in microbial genomes, are broadly referred to as genomic islands (GIs). GIs often carry genes important for genome evolution and adaptation to niches, such as genes involved in pathogenesis and antibiotic resistance. Therefore, GI prediction has gradually become an important part of microbial genome analysis. Despite inherent difficulties in identifying GIs, many computational methods have been developed and show good performance. In this mini-review, we first summarize the general challenges in predicting GIs. Then we group existing GI detection methods by their input, briefly describe representative methods in each group, and discuss their advantages as well as limitations. Finally, we look into the potential improvements for better GI prediction.

6.
J Bioinform Comput Biol ; 14(1): 1640003, 2016 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-26907990

RESUMO

Genomic islands (GIs) are clusters of functionally related genes acquired by lateral genetic transfer (LGT), and they are present in many bacterial genomes. GIs are extremely important for bacterial research, because they not only promote genome evolution but also contain genes that enhance adaption and enable antibiotic resistance. Many methods have been proposed to predict GI. But most of them rely on either annotations or comparisons with other closely related genomes. Hence these methods cannot be easily applied to new genomes. As the number of newly sequenced bacterial genomes rapidly increases, there is a need for methods to detect GI based solely on sequences of a single genome. In this paper, we propose a novel method, GI-SVM, to predict GIs given only the unannotated genome sequence. GI-SVM is based on one-class support vector machine (SVM), utilizing composition bias in terms of k-mer content. From our evaluations on three real genomes, GI-SVM can achieve higher recall compared with current methods, without much loss of precision. Besides, GI-SVM allows flexible parameter tuning to get optimal results for each genome. In short, GI-SVM provides a more sensitive method for researchers interested in a first-pass detection of GI in newly sequenced genomes.


Assuntos
Genoma Bacteriano , Ilhas Genômicas , Genômica/métodos , Máquina de Vetores de Suporte , Corynebacterium diphtheriae/genética , Transferência Genética Horizontal , Pseudomonas aeruginosa/genética , Salmonella typhi/genética , Análise de Sequência de DNA
7.
J Bioinform Comput Biol ; 14(1): 1640002, 2016 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-26707923

RESUMO

We consider the problem of sorting signed permutations by reversals, transpositions, transreversals, and block-interchanges. The problem arises in the study of species evolution via large-scale genome rearrangement operations. Recently, Hao et al. gave a 2-approximation scheme called genome sorting by bridges (GSB) for solving this problem. Their result extended and unified the results of (i) He and Chen - a 2-approximation algorithm allowing reversals, transpositions, and block-interchanges (by also allowing transversals) and (ii) Hartman and Sharan - a 1.5-approximation algorithm allowing reversals, transpositions, and transversals (by also allowing block-interchanges). The GSB result is based on introduction of three bridge structures in the breakpoint graph, the L-bridge, T-bridge, and X-bridge that models goodreversal, transposition/transreversal, and block-interchange, respectively. However, the paper by Hao et al. focused on proving the 2-approximation GSB scheme and only mention a straightforward [Formula: see text] algorithm. In this paper, we give an [Formula: see text] algorithm for implementing the GSB scheme. The key idea behind our faster GSB algorithm is to represent cycles in the breakpoint graph by their canonical sequences, which greatly simplifies the search for these bridge structures. We also give some comparison results (running time and computed distances) against the original GSB implementation.


Assuntos
Algoritmos , Genômica/métodos , Biologia Computacional/métodos , Elementos de DNA Transponíveis , Genoma , Modelos Genéticos
8.
J Bioinform Comput Biol ; 13(5): 1543003, 2015 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-26542446

RESUMO

Determining the entire complement of enzymes and their enzymatic functions is a fundamental step for reconstructing the metabolic network of cells. High quality enzyme annotation helps in enhancing metabolic networks reconstructed from the genome, especially by reducing gaps and increasing the enzyme coverage. Currently, structure-based and network-based approaches can only cover a limited number of enzyme families, and the accuracy of homology-based approaches can be further improved. Bottom-up homology-based approach improves the coverage by rebuilding Hidden Markov Model (HMM) profiles for all known enzymes. However, its clustering procedure relies firmly on BLAST similarity score, ignoring protein domains/patterns, and is sensitive to changes in cut-off thresholds. Here, we use functional domain architecture to score the association between domain families and enzyme families (Domain-Enzyme Association Scoring, DEAS). The DEAS score is used to calculate the similarity between proteins, which is then used in clustering procedure, instead of using sequence similarity score. We improve the enzyme annotation protocol using a stringent classification procedure, and by choosing optimal threshold settings and checking for active sites. Our analysis shows that our stringent protocol EnzDP can cover up to 90% of enzyme families available in Swiss-Prot. It achieves a high accuracy of 94.5% based on five-fold cross-validation. EnzDP outperforms existing methods across several testing scenarios. Thus, EnzDP serves as a reliable automated tool for enzyme annotation and metabolic network reconstruction. Available at: www.comp.nus.edu.sg/~nguyennn/EnzDP .


Assuntos
Biologia Computacional/métodos , Enzimas/química , Enzimas/metabolismo , Redes e Vias Metabólicas , Domínio Catalítico , Análise por Conglomerados , Bases de Dados de Proteínas , Enzimas/classificação , Aprendizado de Máquina , Cadeias de Markov , Estrutura Terciária de Proteína , Alinhamento de Sequência , Homologia Estrutural de Proteína
9.
Artigo em Inglês | MEDLINE | ID: mdl-26355510

RESUMO

Cancer forms a robust system capable of maintaining stable functioning (cell sustenance and proliferation) despite perturbations. Cancer progresses as stages over time typically with increasing aggressiveness and worsening prognosis. Characterizing these stages and identifying the genes driving transitions between them is critical to understand cancer progression and to develop effective anti-cancer therapies. In this work, we propose a novel model for the `cancer system' as a Boolean state space in which a Boolean network, built from protein-interaction and gene-expression data from different stages of cancer, transits between Boolean satisfiability states by "editing" interactions and "flipping" genes. Edits reflect rewiring of the PPI network while flipping of genes reflect activation or silencing of genes between stages. We formulate a minimization problem min flip to identify these genes driving the transitions. The application of our model (called BoolSpace) on three case studies-pancreatic and breast tumours in human and post spinal-cord injury (SCI) in rats-reveals valuable insights into the phenomenon of cancer progression: (i) interactions involved in core cell-cycle and DNA-damage repair pathways are significantly rewired in tumours, indicating significant impact to key genome-stabilizing mechanisms; (ii) several of the genes flipped are serine/threonine kinases which act as biological switches, reflecting cellular switching mechanisms between stages; and (iii) different sets of genes are flipped during the initial and final stages indicating a pattern to tumour progression. Based on these results, we hypothesize that robustness of cancer partly stems from "passing of the baton" between genes at different stages-genes from different biological processes and/or cellular components are involved in different stages of tumour progression thereby allowing tumour cells to evade targeted therapy, and therefore an effective therapy should target a "cover set" of these genes. A C/C++ implementation of BoolSpace is freely available at: http://www.bioinformatics.org.au/tools-data.


Assuntos
Biologia Computacional/métodos , Evolução Molecular , Modelos Genéticos , Neoplasias/genética , Algoritmos , Humanos , Neoplasias/diagnóstico , Neoplasias/terapia , Prognóstico , Mapas de Interação de Proteínas , Transcriptoma
10.
J Bioinform Comput Biol ; 11(6): 1343004, 2013 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-24372033

RESUMO

A synteny block represents a set of contiguous genes located within the same chromosome and well conserved among various species. Through long evolutionary processes and genome rearrangement events, large numbers of synteny blocks remain highly conserved across multiple species. Understanding distribution of conserved gene blocks facilitates evolutionary biologists to trace the diversity of life, and it also plays an important role for orthologous gene detection and gene annotation in the genomic era. In this work, we focus on collinear synteny detection in which the order of genes is required and well conserved among multiple species. To achieve this goal, the suffix tree based algorithms for efficiently identifying homologous synteny blocks was proposed. The traditional suffix tree algorithm was modified by considering a chromosome as a string and each gene in a chromosome is encoded as a symbol character. Hence, a suffix tree can be built for different query chromosomes from various species. We can then efficiently search for conserved synteny blocks that are modeled as overlapped contiguous edges in our suffix tree. In addition, we defined a novel Synteny Block Conserved Index (SBCI) to evaluate the relationship of synteny block distribution between two species, and which could be applied as an evolutionary indicator for constructing a phylogenetic tree from multiple species instead of performing large computational requirements through whole genome sequence alignment.


Assuntos
Algoritmos , Evolução Molecular , Modelos Genéticos , Sintenia , Animais , Cromossomos , Genoma , Humanos
11.
J Bioinform Comput Biol ; 11(2): 1230002, 2013 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-23600810

RESUMO

Complexes of physically interacting proteins are one of the fundamental functional units responsible for driving key biological mechanisms within the cell. Their identification is therefore necessary to understand not only complex formation but also the higher level organization of the cell. With the advent of "high-throughput" techniques in molecular biology, significant amount of physical interaction data has been cataloged from organisms such as yeast, which has in turn fueled computational approaches to systematically mine complexes from the network of physical interactions among proteins (PPI network). In this survey, we review, classify and evaluate some of the key computational methods developed till date for the identification of protein complexes from PPI networks. We present two insightful taxonomies that reflect how these methods have evolved over the years toward improving automated complex prediction. We also discuss some open challenges facing accurate reconstruction of complexes, the crucial ones being the presence of high proportion of errors and noise in current high-throughput datasets and some key aspects overlooked by current complex detection methods. We hope this review will not only help to condense the history of computational complex detection for easy reference but also provide valuable insights to drive further research in this area.


Assuntos
Complexos Multiproteicos/química , Mapas de Interação de Proteínas , Algoritmos , Animais , Análise por Conglomerados , Biologia Computacional , Bases de Dados de Proteínas/estatística & dados numéricos , Evolução Molecular , Humanos , Cadeias de Markov , Proteínas de Membrana/química , Complexos Multiproteicos/classificação , Complexos Multiproteicos/genética , Mapeamento de Interação de Proteínas/estatística & dados numéricos
12.
BMC Bioinformatics ; 14 Suppl 16: S8, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24564762

RESUMO

BACKGROUND: Protein complexes conserved across species indicate processes that are core to cellular machinery (e.g. cell-cycle or DNA damage-repair complexes conserved across human and yeast). While numerous computational methods have been devised to identify complexes from the protein interaction (PPI) networks of individual species, these are severely limited by noise and errors (false positives) in currently available datasets. Our analysis using human and yeast PPI networks revealed that these methods missed several important complexes including those conserved between the two species (e.g. the MLH1-MSH2-PMS2-PCNA mismatch-repair complex). Here, we note that much of the functionalities of yeast complexes have been conserved in human complexes not only through sequence conservation of proteins but also of critical functional domains. Therefore, integrating information of domain conservation might throw further light on conservation patterns between yeast and human complexes. RESULTS: We identify conserved complexes by constructing an interolog network (IN) leveraging on the functional conservation of proteins between species through domain conservation (from Ensembl) in addition to sequence similarity. We employ 'state-of-the-art' methods to cluster the interolog network, and map these clusters back to the original PPI networks to identify complexes conserved between the species. Evaluation of our IN-based approach (called COCIN) on human and yeast interaction data identifies several additional complexes (76% recall) compared to direct complex detection from the original PINs (54% recall). Our analysis revealed that the IN-construction removes several non-conserved interactions many of which are false positives, thereby improving complex prediction. In fact removing non-conserved interactions from the original PINs also resulted in higher number of conserved complexes, thereby validating our IN-based approach. These complexes included the mismatch repair complex, MLH1-MSH2-PMS2-PCNA, and other important ones namely, RNA polymerase-II, EIF3 and MCM complexes, all of which constitute core cellular processes known to be conserved across the two species. CONCLUSIONS: Our method based on integrating domain conservation and sequence similarity to construct interolog networks helps to identify considerably more conserved complexes between the PPI networks from two species compared to direct complex prediction from the PPI networks. We observe from our experiments that protein complexes are not conserved from yeast to human in a straightforward way, that is, it is not the case that a yeast complex is a (proper) sub-set of a human complex with a few additional proteins present in the human complex. Instead complexes have evolved multifold with considerable re-organization of proteins and re-distribution of their functions across complexes. This finding can have significant implications on attempts to extrapolate other kinds of relationships such as synthetic lethality from yeast to human, for example in the identification of novel cancer targets. AVAILABILITY: http://www.comp.nus.edu.sg/~leonghw/COCIN/.


Assuntos
Biologia Computacional/métodos , Mapeamento de Interação de Proteínas/métodos , Sequência Conservada , Humanos , Proteínas/metabolismo , Saccharomyces cerevisiae/metabolismo
13.
BMC Syst Biol ; 6 Suppl 1: S22, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23046607

RESUMO

BACKGROUND: Identifying corresponding genes (orthologs) in different species is an important step in genome-wide comparative analysis. In particular, one-to-one correspondences between genes in different species greatly simplify certain problems such as transfer of function annotation and genome rearrangement studies. Positional homologs are the direct descendants of a single ancestral gene in the most recent common ancestor and by definition form one-to-one correspondence. RESULTS: In this work, we present a simple yet effective method (BBH-LS) for the identification of positional homologs from the comparative analysis of two genomes. Our BBH-LS method integrates sequence similarity and gene context similarity in order to get more accurate ortholog assignments. Specifically, BBH-LS applies the bidirectional best hit heuristic to a combination of sequence similarity and gene context similarity scores. CONCLUSION: We applied our method to the human, mouse, and rat genomes and found that BBH-LS produced the best results when using both sequence and gene context information equally. Compared to the state-of-the-art algorithms, such as MSOAR2, BBH-LS is able to identify more positional homologs with fewer false positives.


Assuntos
Algoritmos , Genômica/métodos , Homologia de Sequência do Ácido Nucleico , Animais , Sequência de Bases , Cães , Humanos , Camundongos , Ratos
14.
Phys Rev E Stat Nonlin Soft Matter Phys ; 86(3 Pt 1): 031902, 2012 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-23030939

RESUMO

We derive the grand partition function of protein chain by restricting dihedral angles to exist only in five distinct states and assume that the dominant noncovalent potential is the hydrogen bond interaction. We investigate the phase transition of protein secondary structures and the order of the transition through analyzing its heat capacity. Our theory demonstrates the presence of α-ß-coil structural phase transition in the protein polyalanine.


Assuntos
Modelos Moleculares , Proteínas/química , Ligação de Hidrogênio , Estrutura Secundária de Proteína , Eletricidade Estática
15.
Int J Bioinform Res Appl ; 8(3-4): 286-304, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22961456

RESUMO

UNLABELLED: Over the last few years, several computational techniques have been devised to recover protein complexes from the protein interaction (PPI) networks of organisms. These techniques model 'dense' subnetworks within PPI networks as complexes. However, our comprehensive evaluations revealed that these techniques fail to reconstruct many 'gold standard' complexes that are 'sparse' in the networks (only 71 recovered out of 123 known yeast complexes embedded in a network of 9704 interactions among 1622 proteins). In this work, we propose a novel index called Component-Edge (CE) score to quantitatively measure the notion of 'complex derivability' from PPI networks. Using this index, we theoretically categorise complexes as 'sparse' or 'dense' with respect to a given network. We then devise an algorithm SPARC that selectively employs functional interactions to improve the CE scores of predicted complexes, and thereby elevates many of the 'sparse' complexes to 'dense'. This empowers existing methods to detect these 'sparse' complexes. We demonstrate that our approach is effective in reconstructing significantly many complexes missed previously (104 recovered out of the 123 known complexes or ~47% improvement). AVAILABILITY: http://www.comp.nus.edu.sg/leonghw/MCL-CAw/


Assuntos
Algoritmos , Mapeamento de Interação de Proteínas/métodos , Saccharomyces cerevisiae/metabolismo , Sítios de Ligação , Simulação por Computador , Bases de Dados de Proteínas
16.
J Bioinform Comput Biol ; 10(6): 1231002, 2012 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-22867628

RESUMO

This paper is a self-contained introductory tutorial on the problem in proteomics known as peptide sequencing using tandem mass spectrometry. This tutorial deals specifically with de novo sequencing methods (as opposed to database search methods). We first give an introduction to peptide sequencing, its importance and history and some background on proteins. Next we show the relationship between a peptide and the final spectrum produced from a tandem mass spectrometer, together with a description of the various sources of complications that arise during the process of generating the mass spectrum. From there we model the computational problem of de novo peptide sequencing, which is basically the reverse problem of identifying the peptide which produced the spectrum. We then present several major approaches to solve it (including reviewing some of the current algorithms in each approach), and also discuss related problems and post-processing approaches.


Assuntos
Espectrometria de Massas/métodos , Peptídeos/química , Proteômica/métodos , Algoritmos , Sequência de Aminoácidos , Bases de Dados de Proteínas , Dados de Sequência Molecular , Proteínas/química , Análise de Sequência de Proteína , Espectrometria de Massas em Tandem/métodos
17.
Eur Phys J E Soft Matter ; 35(4): 9704, 2012 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-22526978

RESUMO

By introducing an additional hydrogen bond to hydrogen bond interaction in the force field of the CSAW (Conditioned Self-Avoiding Walk) model, we investigate into the mechanism of antiparallel ß-sheet formation based on the folding of a short polyalanine in gas phase. Through our numerical simulation, we detect the possible presence of a transient helix during ß-sheet formation, whose presence is shown to have slowed the formation of ß-sheets by an order of magnitude. While we observe the mechanisms of nucleation, zipping and induction that drives the formation of a ß-sheet, we uncover a new mechanism that involves transient ß-turns and short ß-sheets during the formation of long ß-sheets. Our results have enabled us to provide an overview on the mechanisms of ß-sheet formation via two main folding pathways: slow folding through the intermediate state of transient helix, and fast folding from the nucleation of ß-turn.


Assuntos
Modelos Químicos , Peptídeos/química , Estrutura Secundária de Proteína , Simulação por Computador , Transferência de Energia , Gases/química , Ligação de Hidrogênio , Interações Hidrofóbicas e Hidrofílicas , Modelos Moleculares , Dobramento de Proteína
18.
BMC Bioinformatics ; 13 Suppl 17: S16, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23282200

RESUMO

Complexes of physically interacting proteins are one of the fundamental functional units responsible for driving key biological mechanisms within the cell. With the advent of high-throughput techniques, significant amount of protein interaction (PPI) data has been catalogued for organisms such as yeast, which has in turn fueled computational methods for systematic identification and study of protein complexes. However, many complexes are dynamic entities - their subunits are known to assemble at a particular cellular space and time to perform a particular function and disassemble after that - and while current computational analyses have concentrated on studying the dynamics of individual or pairs of proteins in PPI networks, a crucial aspect overlooked is the dynamics of whole complex formations. In this work, using yeast as our model, we incorporate 'time' in the form of cell-cycle phases into the prediction of complexes from PPI networks and study the temporal phenomena of complex assembly and disassembly across phases. We hypothesize that 'staticness' (constitutive expression) of proteins might be related to their temporal "reusability" across complexes, and test this hypothesis using complexes predicted from large-scale PPI networks across the yeast cell cycle phases. Our results hint towards a biological design principle underlying cellular mechanisms - cells maintain generic proteins as 'static' to enable their "reusability" across multiple temporal complexes. We also demonstrate that these findings provide additional support and alternative explanations to findings from existing works on the dynamics in PPI networks.


Assuntos
Ciclo Celular , Modelos Biológicos , Complexos Multiproteicos/metabolismo , Mapeamento de Interação de Proteínas , Proteínas/metabolismo , Algoritmos , Saccharomyces cerevisiae/citologia , Saccharomyces cerevisiae/metabolismo
19.
Int J Data Min Bioinform ; 5(6): 611-25, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-22295747

RESUMO

For a set of multiple sequences, their patterns, Longest Common Subsequences (LCS) and Shortest Common Supersequences (SCS) represent different aspects of these sequences' profile. Revealing the relationship between the patterns and LCS/SCS might provide us with a deeper view of the patterns. In this paper, we have showed that patterns LCS and SCS were closely related to each other. Based on their relations, the PALS algorithms are proposed to discover patterns in a set of biological sequences based on LCS and SCS results. Experiments show that the PALS algorithms are superior in efficiency and accuracy on a variety of sequences.


Assuntos
Algoritmos , Alinhamento de Sequência/métodos , Sequência de Aminoácidos , Sequência de Bases , Sequência Conservada , Dados de Sequência Molecular
20.
BMC Bioinformatics ; 11: 504, 2010 Oct 12.
Artigo em Inglês | MEDLINE | ID: mdl-20939868

RESUMO

BACKGROUND: The reconstruction of protein complexes from the physical interactome of organisms serves as a building block towards understanding the higher level organization of the cell. Over the past few years, several independent high-throughput experiments have helped to catalogue enormous amount of physical protein interaction data from organisms such as yeast. However, these individual datasets show lack of correlation with each other and also contain substantial number of false positives (noise). Over these years, several affinity scoring schemes have also been devised to improve the qualities of these datasets. Therefore, the challenge now is to detect meaningful as well as novel complexes from protein interaction (PPI) networks derived by combining datasets from multiple sources and by making use of these affinity scoring schemes. In the attempt towards tackling this challenge, the Markov Clustering algorithm (MCL) has proved to be a popular and reasonably successful method, mainly due to its scalability, robustness, and ability to work on scored (weighted) networks. However, MCL produces many noisy clusters, which either do not match known complexes or have additional proteins that reduce the accuracies of correctly predicted complexes. RESULTS: Inspired by recent experimental observations by Gavin and colleagues on the modularity structure in yeast complexes and the distinctive properties of "core" and "attachment" proteins, we develop a core-attachment based refinement method coupled to MCL for reconstruction of yeast complexes from scored (weighted) PPI networks. We combine physical interactions from two recent "pull-down" experiments to generate an unscored PPI network. We then score this network using available affinity scoring schemes to generate multiple scored PPI networks. The evaluation of our method (called MCL-CAw) on these networks shows that: (i) MCL-CAw derives larger number of yeast complexes and with better accuracies than MCL, particularly in the presence of natural noise; (ii) Affinity scoring can effectively reduce the impact of noise on MCL-CAw and thereby improve the quality (precision and recall) of its predicted complexes; (iii) MCL-CAw responds well to most available scoring schemes. We discuss several instances where MCL-CAw was successful in deriving meaningful complexes, and where it missed a few proteins or whole complexes due to affinity scoring of the networks. We compare MCL-CAw with several recent complex detection algorithms on unscored and scored networks, and assess the relative performance of the algorithms on these networks. Further, we study the impact of augmenting physical datasets with computationally inferred interactions for complex detection. Finally, we analyse the essentiality of proteins within predicted complexes to understand a possible correlation between protein essentiality and their ability to form complexes. CONCLUSIONS: We demonstrate that core-attachment based refinement in MCL-CAw improves the predictions of MCL on yeast PPI networks. We show that affinity scoring improves the performance of MCL-CAw.


Assuntos
Cadeias de Markov , Mapeamento de Interação de Proteínas/métodos , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/metabolismo , Software , Análise por Conglomerados , Bases de Dados de Proteínas , Proteínas/química , Proteínas/metabolismo , Proteínas de Saccharomyces cerevisiae/química
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...