Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 54
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
J Math Biol ; 67(5): 1261-78, 2013 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-23053535

RESUMO

To an RNA pseudoknot structure is naturally associated a topological surface, which has its associated genus, and structures can thus be classified by the genus. Based on earlier work of Harer-Zagier, we compute the generating function Dg,σ (z) = ∑n dg,σ (n)zn for the number dg,σ (n) of those structures of fixed genus g and minimum stack size σ with n nucleotides so that no two consecutive nucleotides are basepaired and show that Dg,σ (z) is algebraic. In particular, we prove that dg,2(n) ∼ kg n3(g−1/2 )γ n2, where γ2 ≈ 1.9685. Thus, for stack size at least two, the genus only enters through the sub-exponential factor, and the slow growth rate compared to the number of RNA molecules implies the existence of neutral networks of distinct molecules with the same structure of any genus. Certain RNA structures called shapes are shown to be in natural one-to-one correspondence with the cells in the Penner-Strebel decomposition of Riemann's moduli space of a surface of genus g with one boundary component, thus providing a link between RNA enumerative problems and the geometry of Riemann's moduli space.


Assuntos
RNA/química , Conformação de Ácido Nucleico , RNA/classificação
3.
Proc Natl Acad Sci U S A ; 98(17): 9748-53, 2001 Aug 14.
Artigo em Inglês | MEDLINE | ID: mdl-11504945

RESUMO

For the last 20 years, fragment assembly in DNA sequencing followed the "overlap-layout-consensus" paradigm that is used in all currently available assembly tools. Although this approach proved useful in assembling clones, it faces difficulties in genomic shotgun assembly. We abandon the classical "overlap-layout-consensus" approach in favor of a new euler algorithm that, for the first time, resolves the 20-year-old "repeat problem" in fragment assembly. Our main result is the reduction of the fragment assembly to a variation of the classical Eulerian path problem that allows one to generate accurate solutions of large-scale sequencing problems. euler, in contrast to the celera assembler, does not mask such repeats but uses them instead as a powerful fragment assembly tool.


Assuntos
Algoritmos , Mapeamento de Sequências Contíguas/métodos , Análise de Sequência de DNA/métodos , Campylobacter jejuni/genética , DNA Bacteriano/genética , Genoma Bacteriano , Lactococcus lactis/genética , Modelos Teóricos , Neisseria meningitidis/genética , Alinhamento de Sequência/métodos , Software
4.
J Comput Biol ; 7(1-2): 1-46, 2000.
Artigo em Inglês | MEDLINE | ID: mdl-10890386

RESUMO

In the following, an overview is given on statistical and probabilistic properties of words, as occurring in the analysis of biological sequences. Counts of occurrence, counts of clumps, and renewal counts are distinguished, and exact distributions as well as normal approximations, Poisson process approximations, and compound Poisson approximations are derived. Here, a sequence is modelled as a stationary ergodic Markov chain; a test for determining the appropriate order of the Markov chain is described. The convergence results take the error made by estimating the Markovian transition probabilities into account. The main tools involved are moment generating functions, martingales, Stein's method, and the Chen-Stein method. Similar results are given for occurrences of multiple patterns, and, as an example, the problem of unique recoverability of a sequence from SBH chip data is discussed. Special emphasis lies on disentangling the complicated dependence structure between word occurrences, due to self-overlap as well as due to overlap between words. The results can be used to derive approximate, and conservative, confidence intervals for tests.


Assuntos
Biometria , Análise de Sequência de DNA/estatística & dados numéricos , Sequência de Bases , Cadeias de Markov , Modelos Estatísticos , Hibridização de Ácido Nucleico , Reconhecimento Automatizado de Padrão
5.
J Comput Biol ; 5(3): 505-15, 1998.
Artigo em Inglês | MEDLINE | ID: mdl-9773346

RESUMO

A fundamentally new molecular-biology approach in constructing restriction maps, Optical Mapping, has been developed by Schwartz et al. (1993). Using this method restriction maps are constructed by measuring the relevant fluorescence intensity and length measurements. However, it is difficult to directly estimate the restriction site locations of single DNA molecules based on these optical mapping data because of the precision of length measurements and the unknown number of true restriction sites in the data. We propose the use of a hierarchical Bayes model based on a mixture model with normals and random noise. In this model we explicitly consider the missing observation structure of the data, such as the orientations of molecules, the allocations of cutting sites to restriction sites, and the indicator variables of whether observed cut sites are true or false. Because of the complexity of the model, the large number of missing data, and the unknown number of restriction sites, we use Reversible-Jump Markov Chain Monte Carlo (MCMC) to estimate the number and the locations of the restriction sites. Since there exists a high multimodality due to unknown orientations of molecules, we also use a combination of our MCMC approach and the flipping algorithm suggested by Dancík and Waterman (1997). The study is highly computer-intensive and the development of an efficient algorithm is required.


Assuntos
Algoritmos , Mapeamento por Restrição/métodos , Cadeias de Markov , Modelos Genéticos , Método de Monte Carlo
6.
Appl Environ Microbiol ; 63(6): 2338-46, 1997 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-9172353

RESUMO

A new computational method (chimeric alignment) has been developed to detect chimeric 16S rRNA artifacts generated during PCR amplification from mixed bacterial populations. In contrast to other nearest-neighbor methods (e.g., CHECK_CHIMERA) that define sequence similarity by k-tuple matching, the chimeric alignment method uses the score from dynamic programming alignments. Further, the chimeric alignments are displayed to the user to assist in sequence classification. The distribution of improvement scores for 500 authentic, nonchimeric sequences and 300 artificial chimeras (constructed from authentic sequences) was used to study the sensitivity and accuracy of both chimeric alignment and CHECK_CHIMERA. At a constant rate of authentic sequence misclassification (5%), chimeric alignment incorrectly classified 13% of the artificial chimeras versus 14% for CHECK_CHIMERA. Interestingly, only 1% of nonchimeras and 10% of chimeras were misclassified by both programs, suggesting that optimum performance is obtained by using the two methods to assign sequences to three classes: high-probability nonchimeras, high-probability chimeras, and sequences that need further study by other means. This study suggests that k-tuple-based matching methods are more sensitive than alignment-based methods when there is significant parental sequence similarity, while the opposite becomes true as the sequences become more distantly related. The software and a World Wide Web-based server are available at http://www-hto.usc.edu/software/mglobal CHI.


Assuntos
Bactérias/genética , RNA Bacteriano/genética , RNA Ribossômico 16S/genética , Alinhamento de Sequência/métodos , Software , Bactérias/classificação , Sequência de Bases , Biometria , Quimera/genética , Bases de Dados Factuais , Estudos de Avaliação como Assunto , Dados de Sequência Molecular , Reação em Cadeia da Polimerase , RNA Bacteriano/isolamento & purificação , RNA Ribossômico 16S/isolamento & purificação , Alinhamento de Sequência/estatística & dados numéricos , Homologia de Sequência do Ácido Nucleico
7.
J Mol Biol ; 258(4): 650-60, 1996 May 17.
Artigo em Inglês | MEDLINE | ID: mdl-8636999

RESUMO

We construct a mathematical model for in vitro molecular selection with amplification. Using DNA-protein binding as the illustrative example, we obtain an expression for the probability that a randomly selected molecule from the final in vitro selection products is a molecule with the highest binding affinity. Experiments of this type have been reported for several examples of DNA binding proteins. Our study requires a model of the DNA-protein binding constant between DNA molecules and the target protein. The relationship between binding constants and selection probabilities is presented under simplifying but reasonable assumptions. From our analysis, we find that for successful in vitro selection experiments there should be a certain relationship between the number of polymerase chain reaction cycles and the concentration of free protein. The results obtained should be widely applicable to a variety of selection-amplification procedures.


Assuntos
Proteínas de Ligação a DNA/metabolismo , DNA/metabolismo , Modelos Teóricos , Reação em Cadeia da Polimerase , Seleção Genética , Proteínas de Ligação a DNA/genética , Ligação Proteica
8.
J Comput Biol ; 3(3): 425-63, 1996.
Artigo em Inglês | MEDLINE | ID: mdl-8891959

RESUMO

Sequencing by hybridization is a tool to determine a DNA sequence from the unordered list of all l-tuples contained in this sequence; typical numbers for l are l = 8, 10, 12. For theoretical purposes we assume that the multiset of all l-tuples is known. This multiset determines the DNA sequence uniquely if none of the so-called Ukkonen transformations are possible. These transformations require repeats of (l-1)-tuples in the sequence, with these repeats occurring in certain spatial patterns. We model DNA as an i.i.d. sequence. We first prove Poisson process approximations for the process of indicators of all leftmost long repeats allowing self-overlap and for the process of indicators of all left-most long repeats without self-overlap. Using the Chen-Stein method, we get bounds on the error of these approximations. As a corollary, we approximate the distribution of longest repeats. In the second step we analyze the spatial patterns of the repeats. Finally we combine these two steps to prove an approximation for the probability that a random sequence is uniquely recoverable from its list of l-tuples. For all our results we give some numerical examples including error bounds.


Assuntos
Hibridização de Ácido Nucleico/métodos , Distribuição de Poisson , Análise de Sequência de DNA/métodos , Algoritmos , Projeto Genoma Humano , Humanos , Sequências Repetitivas de Ácido Nucleico
9.
Nucleic Acids Res ; 23(15): 3034-40, 1995 Aug 11.
Artigo em Inglês | MEDLINE | ID: mdl-7659528

RESUMO

We construct a mathematical model for two whole genome amplification strategies, primer extension preamplification (PEP) and tagged polymerase chain reaction (tagged PCR). An explicit formula for the expected target yield of PEP is obtained. The distribution of the target yield and the coverage properties of these two strategies are studied by simulations. From our studies we find that polymerase with high processivity may increase the efficiency of PEP and tagged PCR.


Assuntos
Genoma , Modelos Genéticos , Técnicas de Amplificação de Ácido Nucleico , Reação em Cadeia da Polimerase/métodos , Simulação por Computador , Primers do DNA , DNA Polimerase Dirigida por DNA
10.
Genomics ; 26(1): 84-100, 1995 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-7782090

RESUMO

Physical maps can be constructed by "fingerprinting" a large number of random clones and inferring overlap between clones when the fingerprints are sufficiently similar. E. Lander and M. Waterman (Genomics 2: 231-239, 1988) gave a mathematical analysis of such mapping strategies. The analysis is useful for comparing various fingerprinting methods. Recently it has been proposed that ends of clones rather than the entire clone be fingerprinted or characterized. Such fingerprints, which include sequenced clone ends, require a mathematical analysis deeper than that of Lander-Waterman. This paper studies clone islands, which can include uncharacterized regions, and also the islands that are formed entirely from the ends of clones.


Assuntos
Mapeamento Cromossômico/métodos , Impressões Digitais de DNA , Modelos Teóricos , Biblioteca Genômica , Matemática , Análise de Sequência de DNA
11.
J Comput Biol ; 2(2): 291-306, 1995.
Artigo em Inglês | MEDLINE | ID: mdl-7497130

RESUMO

Since the advent of rapid DNA sequencing methods in 1976, scientists have had the problem of inferring DNA sequences from sequenced fragments. Shotgun sequencing is a well-established biological and computational method used in practice. Many conventional algorithms for shotgun sequencing are based on the notion of pairwise fragment overlap. While shotgun sequencing infers a DNA sequence given the sequences of overlapping fragments, a recent and complementary method, called sequencing by hybridization (SBH), infers a DNA sequence given the set of oligomers that represents all subwords of some fixed length, k. In this paper, we propose a new computer algorithm for DNA sequence assembly that combines in a novel way the techniques of both shotgun and SBH methods. Based on our preliminary investigations, the algorithm promises to be very fast and practical for DNA sequence assembly.


Assuntos
Algoritmos , Sequência de Bases , DNA/química , Matemática , Modelos Teóricos , Oligodesoxirribonucleotídeos , Dados de Sequência Molecular , Conformação de Ácido Nucleico , Hibridização de Ácido Nucleico , Software
12.
Nucleic Acids Res ; 22(22): 4828-36, 1994 Nov 11.
Artigo em Inglês | MEDLINE | ID: mdl-7984436

RESUMO

A significant portion of DNA consists of repeating patterns of various sizes, from very small (one, two and three nucleotides) to very large (over 300 nucleotides). Although the functions of these repeating regions are not well understood, they appear important for understanding the expression, regulation and evolution of DNA. For example, increases in the number of trinucleotide repeats have been associated with human genetic disease, including Fragile-X mental retardation and Huntington's disease. Repeats are also useful as a tool in mapping and identifying DNA; the number of copies of a particular pattern at a site is often variable among individuals (polymorphic) and is therefore helpful in locating genes via linkage studies and also in providing DNA fingerprints of individuals. The number of repeating regions is unknown as is the distribution of pattern sizes. It would be useful to search for such regions in the DNA database in order that they may be studied more fully. The DNA database currently consists of approximately 150 million basepairs and is growing exponentially. Therefore, any program to look for repeats must be efficient and fast. In this paper, we present some new techniques that are useful in recognizing repeating patterns and describe a new program for rapidly detecting repeat regions in the DNA database where the basic unit of the repeat has size up to 32 nucleotides. It is our hope that the examples in this paper will illustrate the unrealized diversity of repeats in DNA and that the program we have developed will be a useful tool for locating new and interesting repeats.


Assuntos
Bases de Dados Factuais , Sequências Repetitivas de Ácido Nucleico , Software , Sequência de Bases , Evolução Biológica , Sequência Consenso , DNA/genética , Humanos , Modelos Estatísticos , Dados de Sequência Molecular , Alinhamento de Sequência
13.
Bull Math Biol ; 56(4): 743-67, 1994 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-8054893

RESUMO

Recently algorithms for parametric alignment (Waterman et al., 1992, Natl Acad. Sci. USA 89, 6090-6093; Gusfield et al., 1992, Proceedings of the Third Annual ACM-SIAM Discrete Algorithms) find optimal scores for all penalty parameters, both for global and local sequence alignment. This paper reviews those techniques. Then in the main part of this paper dynamic programming methods are used to compute ensemble alignment, finding all alignment scores for all parameters. Both global and local ensemble alignments are studied, and parametric alignment is used to compute near optimal ensemble alignments.


Assuntos
Algoritmos , Sequência de Bases , DNA/química , Homologia de Sequência do Ácido Nucleico , DNA/genética , Dados de Sequência Molecular
14.
Proc Natl Acad Sci U S A ; 91(11): 4625-8, 1994 May 24.
Artigo em Inglês | MEDLINE | ID: mdl-8197109

RESUMO

A central question in sequence comparison is the statistical significance of an observed similarity. For local alignment containing gaps to optimize sequence similarity this problem has so far not been solved mathematically. Using as a basis the Chen-Stein theory of Poisson approximation, we present a practical method to approximate the probability that a local alignment score is a result of chance alone. For a set of similarity scores and gap penalties only one simulation of random alignments needs to be calculated to derive the key information allowing us to estimate the significance of any alignment calculated under this setting. We present applications to data base searching and the analysis of pairwise and self-comparisons of proteins.


Assuntos
Bases de Dados Factuais , Modelos Estatísticos , Alinhamento de Sequência , Algoritmos , Humanos
16.
J Comput Biol ; 1(2): 93-104, 1994.
Artigo em Inglês | MEDLINE | ID: mdl-8790456

RESUMO

Profiles, which are summaries of multiple alignments of a sequence family, are used to find new instances of the family in databases. In this paper, we study the maximum score M obtained when the profile is aligned without indels at all possible positions of a random sequence. The main theorem gives an approximation to the distribution function of M with an explicit bound on the error. This theorem implies that M has a limiting extreme value distribution.


Assuntos
Simulação por Computador , Modelos Estatísticos , Alinhamento de Sequência/métodos , Homologia de Sequência , Bases de Dados Factuais , Matemática , Homologia de Sequência do Ácido Nucleico
17.
J Mol Biol ; 235(1): 1-12, 1994 Jan 07.
Artigo em Inglês | MEDLINE | ID: mdl-8289235

RESUMO

Alignment algorithms to compare DNA or amino acid sequences are widely used tools in molecular biology. The algorithms depend on the setting of various parameters, most notably gap penalties. The effect that such parameters have on the resulting alignments is still poorly understood. This paper begins by reviewing two recent advances in algorithms and probability that enable us to take a new approach to this question. The first tool we introduce is a newly developed method to delineate efficiently all optimal alignments arising under all choices of parameters. The second tool comprises insights into the statistical behavior of optimal alignment scores. From this we gain a better understanding of the dependence of alignments on parameters in general. We propose novel criteria to detect biologically good alignments and highlight some specific features about the interaction between similarity matrices and gap penalties. To illustrate our analysis we present a detailed study of the comparison of two immunoglobulin sequences.


Assuntos
Sequência de Aminoácidos , Sequência de Bases , DNA/química , Biologia Molecular/métodos , Proteínas/química , Algoritmos , Animais , Humanos , Cadeias Pesadas de Imunoglobulinas/química , Cadeias Leves de Imunoglobulina/química , Região Variável de Imunoglobulina/química , Matemática , Dados de Sequência Molecular , Estatística como Assunto
18.
Comput Appl Biosci ; 8(5): 511-20, 1992 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-1422885

RESUMO

For most sequence comparison problems there is a corresponding map comparison algorithm. While map data may appear to be incompatible with dynamic programming, we show in this paper that the rigor and efficiency of dynamic programming algorithms carry over to the map comparison algorithms. We present algorithms for restriction map comparison that deal with two types of map errors: (i) closely spaced sites for different enzymes can be ordered incorrectly, and (ii) closely spaced sites for the same enzyme can be mapped as a single site. The new algorithms are a natural extension of a previous map comparison model. Dynamic programming algorithms for computing optimal global and local alignments under the new model are described. The new algorithms take about the same order of time as previous map comparison algorithms. Programs implementing some of the new algorithms are used to find similar regions within the Escherichia coli restriction map of Kohara et al.


Assuntos
Algoritmos , Mapeamento por Restrição , Software , Escherichia coli/genética
19.
Genomics ; 14(1): 89-98, 1992 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-1358801

RESUMO

In this paper we describe a method for the statistical reconstruction of a large DNA sequence from a set of sequenced fragments. We assume that the fragments have been assembled and address the problem of determining the degree to which the reconstructed sequence is free from errors, i.e., its accuracy. A consensus distribution is derived from the assembled fragment configuration based upon the rates of sequencing errors in the individual fragments. The consensus distribution can be used to find a minimally redundant consensus sequence that meets a prespecified confidence level, either base by base or across any region of the sequence. A likelihood-based procedure for the estimation of the sequencing error rates, which utilizes an iterative EM algorithm, is described. Prior knowledge of the error rates is easily incorporated into the estimation procedure. The methods are applied to a set of assembled sequence fragments from the human G6PD locus. We close the paper with a brief discussion of the relevance and practical implications of this work.


Assuntos
DNA/análise , Algoritmos , Sequência de Bases , Glucosefosfato Desidrogenase/genética , Humanos , Modelos Genéticos , Modelos Teóricos , Dados de Sequência Molecular , Polimorfismo de Fragmento de Restrição , Reprodutibilidade dos Testes
20.
Bull Math Biol ; 54(5): 785-812, 1992 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-1638260

RESUMO

DNA and protein sequence comparisons are performed by a number of computational algorithms. Most of these algorithms search for the alignment of two sequences that optimizes some alignment score. It is an important problem to assess the statistical significance of a given score. In this paper we use newly developed methods for Poisson approximation to derive estimates of the statistical significance of k-word matches on a diagonal of a sequence comparison. We require at least q of the k letters of the words to match where 0 less than q less than or equal to k. The distribution of the number of matches on a diagonal is approximated as well as the distribution of the order statistics of the sizes of clumps of matches on the diagonal. These methods provide an easily computed approximation of the distribution of the longest exact matching word between sequences. The methods are validated using comparisons of vertebrate and E. coli protein sequences. In addition, we compare two HLA class II transplantation antigens by this method and contrast the results with a dynamic programming approach. Several open problems are outlined in the last section.


Assuntos
Interpretação Estatística de Dados , Distribuição de Poisson , Proteínas/química , Alinhamento de Sequência/estatística & dados numéricos , Algoritmos , Sequência de Aminoácidos , Animais , Matemática , Dados de Sequência Molecular
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...