Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Artigo em Inglês | MEDLINE | ID: mdl-18989036

RESUMO

The topology of beta-sheets is defined by the pattern of hydrogen-bonded strand pairing. Therefore, predicting hydrogen bonded strand partners is a fundamental step towards predicting beta-sheet topology. At the same time, finding the correct partners is very difficult due to long range interactions involved in strand pairing. Additionally, patterns of amino acids involved, in beta-sheet formations are very general and therefore difficult to use for computational recognition of specific contacts between strands. In this work, we report a new strand pairing algorithm. To address above mentioned difficulties, our algorithm attempts to mimic elements of the folding process. Namely, in addition to ensuring that the predicted hydrogen bonded strand pairs satisfy basic global consistency constraints, it takes into account hypothetical folding pathways. Consistently with this view, introducing hydrogen bonds between a pair of strands changes the probabilities of forming hydrogen bonds between other pairs of strand. We demonstrate that this approach provides an improvement over previously proposed algorithms. We also compare the performance of this method to that of a global optimization algorithm that poses the problem as integer linear programming optimization problem and solves it using ILOG CPLEX package.


Assuntos
Modelos Químicos , Modelos Moleculares , Proteínas/química , Proteínas/ultraestrutura , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Simulação por Computador , Ligação de Hidrogênio , Dados de Sequência Molecular , Conformação Proteica , Dobramento de Proteína
2.
BMC Syst Biol ; 2: 12, 2008 Jan 31.
Artigo em Inglês | MEDLINE | ID: mdl-18237406

RESUMO

BACKGROUND: We investigate the cycles in the transcription network of Saccharomyces cerevisiae. Unlike a similar network of Escherichia coli, it contains many cycles. We characterize properties of these cycles and their place in the regulatory mechanism of the cell. RESULTS: Almost all cycles in the transcription network of Saccharomyces cerevisiae are contained in a single strongly connected component, which we call LSCC (L for "largest"), except for a single cycle of two transcription factors. The fact that LSCC includes almost all cycles is well explained by the properties of a random graph with the same in- and out-degrees of the nodes. Among different physiological conditions, cell cycle has the most significant relationship with LSCC, as the set of 64 transcription interactions that are active in all phases of the cell cycle has overlap of 27 with the interactions of LSCC (of which there are 49).Conversely, if we remove the interactions that are active in all phases of the cell cycle (25% of interactions to transcription factors), the LSCC would have only three nodes and 5 edges, many fewer than expected. This subgraph of the transcription network consists mostly of interactions that are active only in the stress response subnetwork. We also characterize the role of LSCC in the topology of the network. We show that LSCC can be used to define a natural hierarchy in the network and that in every physiological subnetwork LSCC plays a pivotal role. CONCLUSION: Apart from those well-defined conditions, the transcription network of Saccharomyces cerevisiae is devoid of cycles. It was observed that two conditions that were studied and that have no cycles of their own are exogenous: diauxic shift and DNA repair, while cell cycle and sporulation are endogenous. We claim that in a certain sense (slow recovery) stress response is endogenous as well.


Assuntos
Redes Reguladoras de Genes , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Algoritmos , Ciclo Celular/genética , Retroalimentação Fisiológica/genética , Regulação Fúngica da Expressão Gênica , Humanos , Modelos Biológicos , Fatores de Transcrição/metabolismo
3.
Ann N Y Acad Sci ; 1115: 132-41, 2007 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-17925351

RESUMO

This paper studies a computational problem motivated by the modular response analysis method for reverse engineering of protein and gene networks. This set-cover problem is hard to solve exactly for large networks, but efficient approximation algorithms are given and their complexity is analyzed.


Assuntos
Algoritmos , Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Expressão Gênica/fisiologia , Modelos Biológicos , Proteoma/metabolismo , Transdução de Sinais/fisiologia , Engenharia Biomédica/métodos , Simulação por Computador , Regulação da Expressão Gênica/fisiologia
4.
Phys Rev E Stat Nonlin Soft Matter Phys ; 75(3 Pt 2): 036104, 2007 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-17500756

RESUMO

We study local search algorithms for networks with heterogeneous edge weights, testing them on scale-free and Erdös-Rényi networks. We assume that the location of the destination node is discovered when it is two edges away, and that the search cost is additive. It was previously shown that a search strategy preferring high-degree nodes reduces the average search cost over a simple random walk. In the prior work, for the case when the edge costs are randomly distributed, a different preference was investigated [high local betweenness centrality (LBC)], and was found to be superior to high-degree preference in scale-free networks, with the exception for the most sparse ones. We have found several preference criteria that are simpler and which, in all networks we tested, yield a lower cost than other criteria including high-degree, high-LBC, and low-edge cost.

5.
Bioinformatics ; 23(8): 917-25, 2007 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-17308341

RESUMO

MOTIVATION: Complex genomes contain numerous repeated sequences, and genomic duplication is believed to be a main evolutionary mechanism to obtain new functions. Several tools are available for de novo repeat sequence identification, and many approaches exist for clustering homologous protein sequences. We present an efficient new approach to identify and cluster homologous DNA sequences with high accuracy at the level of whole genomes, excluding low-complexity repeats, tandem repeats and annotated interspersed repeats. We also determine the boundaries of each group member so that it closely represents a biological unit, e.g. a complete gene, or a partial gene coding a protein domain. RESULTS: We developed a program called HomologMiner to identify homologous groups applicable to genome sequences that have been properly marked for low-complexity repeats and annotated interspersed repeats. We applied it to the whole genomes of human (hg17), macaque (rheMac2) and mouse (mm8). Groups obtained include gene families (e.g. olfactory receptor gene family, zinc finger families), unannotated interspersed repeats and additional homologous groups that resulted from recent segmental duplications. Our program incorporates several new methods: a new abstract definition of consistent duplicate units, a new criterion to remove moderately frequent tandem repeats, and new algorithmic techniques. We also provide preliminary analysis of the output on the three genomes mentioned above, and show several applications including identifying boundaries of tandem gene clusters and novel interspersed repeat families. AVAILABILITY: All programs and datasets are downloadable from www.bx.psu.edu/miller_lab.


Assuntos
Algoritmos , Mapeamento Cromossômico/métodos , Sequências Repetitivas de Ácido Nucleico/genética , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Homologia de Sequência do Ácido Nucleico , Software , Sequência de Bases , Dados de Sequência Molecular
6.
BMC Struct Biol ; 6: 3, 2006 Mar 08.
Artigo em Inglês | MEDLINE | ID: mdl-16524467

RESUMO

BACKGROUND: It has been proposed that secondary structure information can be used to classify (to some extend) protein folds. Since this method utilizes very limited information about the protein structure, it is not surprising that it has a higher error rate than the approaches that use full 3D fold description. On the other hand, the comparing of 3D protein structures is computing intensive. This raises the question to what extend the error rate can be decreased with each new source of information, especially if the new information can still be used with simple alignment algorithms. We consider the question whether the information about closed loops can improve the accuracy of this approach. While the answer appears to be obvious, we had to overcome two challenges. First, how to code and to compare topological information in such a way that local alignment of strings will properly identify similar structures. Second, how to properly measure the effect of new information in a large data sample. We investigate alternative ways of computing and presenting this information. RESULTS: We used the set of beta proteins with at most 30% pairwise identity to test the approach; local alignment scores were used to build a tree of clusters which was evaluated using a new log-odd cluster scoring function. In particular, we derive a closed formula for the probability of obtaining a given score by chance. Parameters of local alignment function were optimized using a genetic algorithm. Of 81 folds that had more than one representative in our data set, log-odds scores registered significantly better clustering in 27 cases and significantly worse in 6 cases, and small differences in the remaining cases. Various notions of the significant change or average change were considered and tried, and the results were all pointing in the same direction. CONCLUSION: We found that, on average, properly presented information about the loop topology improves noticeably the accuracy of the method but the benefits vary between fold families as measured by log-odds cluster score.


Assuntos
Algoritmos , Estrutura Secundária de Proteína , Sequência de Aminoácidos , Análise por Conglomerados , Dobramento de Proteína , Proteínas/classificação , Alinhamento de Sequência
7.
J Comput Biol ; 11(4): 766-85, 2004.
Artigo em Inglês | MEDLINE | ID: mdl-15579244

RESUMO

In this paper, we consider several variations of the following basic tiling problem: given a sequence of real numbers with two size-bound parameters, we want to find a set of tiles of maximum total weight such that each tiles satisfies the size bounds. A solution to this problem is important to a number of computational biology applications such as selecting genomic DNA fragments for PCR-based amplicon microarrays and performing homology searches with long sequence queries. Our goal is to design efficient algorithms with linear or near-linear time and space in the normal range of parameter values for these problems. For this purpose, we first discuss the solution to a basic online interval maximum problem via a sliding-window approach and show how to use this solution in a nontrivial manner for many of the tiling problems introduced. We also discuss NP-hardness results and approximation algorithms for generalizing our basic tiling problem to higher dimensions. Finally, computational results from applying our tiling algorithms to genomic sequences of five model eukaryotes are reported.


Assuntos
Genômica/estatística & dados numéricos , Análise de Sequência com Séries de Oligonucleotídeos/estatística & dados numéricos , Alinhamento de Sequência/estatística & dados numéricos , Algoritmos , Biologia Computacional , DNA/genética , Matemática , Modelos Genéticos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...