Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 18 de 18
Filter
1.
Bioinformatics ; 28(10): 1336-44, 2012 May 15.
Article in English | MEDLINE | ID: mdl-22492645

ABSTRACT

MOTIVATION: The expansion of DNA sequencing capacity has enabled the sequencing of whole genomes from a number of related species. These genomes can be combined in a multiple alignment that provides useful information about the evolutionary history at each genomic locus. One area in which evolutionary information can productively be exploited is in aligning a new sequence to a database of existing, aligned genomes. However, existing high-throughput alignment tools are not designed to work effectively with multiple genome alignments. RESULTS: We introduce PhyLAT, the phylogenetic local alignment tool, to compute local alignments of a query sequence against a fixed multiple-genome alignment of closely related species. PhyLAT uses a known phylogenetic tree on the species in the multiple alignment to improve the quality of its computed alignments while also estimating the placement of the query on this tree. It combines a probabilistic approach to alignment with seeding and expansion heuristics to accelerate discovery of significant alignments. We provide evidence, using alignments of human chromosome 22 against a five-species alignment from the UCSC Genome Browser database, that PhyLAT's alignments are more accurate than those of other commonly used programs, including BLAST, POY, MAFFT, MUSCLE and CLUSTAL. PhyLAT also identifies more alignments in coding DNA than does pairwise alignment alone. Finally, our tool determines the evolutionary relationship of query sequences to the database more accurately than do POY, RAxML, EPA or pplacer.


Subject(s)
Phylogeny , Sequence Alignment/methods , Sequence Analysis, DNA/methods , Animals , Artificial Intelligence , Biological Evolution , Databases, Genetic , Genome , Humans , Opossums/classification , Opossums/genetics , Software
2.
J Comput Biol ; 19(2): 139-47, 2012 Feb.
Article in English | MEDLINE | ID: mdl-22300316

ABSTRACT

Regulatory sites that control gene expression are essential to the proper functioning of cells, and identifying them is critical for modeling regulatory networks. We have developed Magma (Multiple Aligner of Genomic Multiple Alignments), a software tool for multiple species, multiple gene motif discovery. Magma identifies putative regulatory sites that are conserved across multiple species and occur near multiple genes throughout a reference genome. Magma takes as input multiple alignments that can include gaps. It uses efficient clustering methods that make it about 70 times faster than PhyloNet, a previous program for this task, with slightly greater sensitivity. We ran Magma on all non-coding DNA conserved between Caenorhabditis elegans and five additional species, about 70 Mbp in total, in <4 h. We obtained 2,309 motifs with lengths of 6-20 bp, each occurring at least 10 times throughout the genome, which collectively covered about 566 kbp of the genomes, approximately 0.8% of the input. Predicted sites occurred in all types of non-coding sequence but were especially enriched in the promoter regions. Comparisons to several experimental datasets show that Magma motifs correspond to a variety of known regulatory motifs.


Subject(s)
Genome, Helminth , Models, Genetic , Software , Algorithms , Animals , Base Sequence , Binding Sites , Caenorhabditis elegans/genetics , Caenorhabditis elegans Proteins/genetics , Cluster Analysis , Computer Simulation , Conserved Sequence , DNA, Intergenic/genetics , Likelihood Functions , Promoter Regions, Genetic , Sequence Alignment , Transcription Factors/genetics
3.
Article in English | MEDLINE | ID: mdl-22084145

ABSTRACT

Detecting members of known noncoding RNA (ncRNA) families in genomic DNA is an important part of sequence annotation. However, the most widely used tool for modeling ncRNA families, the covariance model (CM), incurs a high-computational cost when used for genome-wide search. This cost can be reduced by using a filter to exclude sequences that are unlikely to contain the ncRNA of interest, applying the CM only where it is likely to match strongly. Despite recent advances, designing an efficient filter that can detect ncRNA instances lacking strong conservation while excluding most irrelevant sequences remains challenging. In this work, we design three types of filters based on multiple secondary structure profiles (SSPs). An SSP augments a regular profile (i.e., a position weight matrix) with secondary structure information but can still be efficiently scanned against long sequences. Multi-SSPbased filters combine evidence from multiple SSP matches and can achieve high sensitivity and specificity. Our SSP-based filters are extensively tested in BRAliBase III data set, Rfam 9.0, and a published soil metagenomic data set. In addition, we compare the SSPbased filters with several other ncRNA search tools including Infernal (with profile HMMs as filters), ERPIN, and tRNAscan-SE. Our experiments demonstrate that carefully designed SSP filters can achieve significant speedup over unfiltered CM search while maintaining high sensitivity for various ncRNA families. The designed filters and filter-scanning programs are available at our website: www.cse.msu.edu/~yannisun/ssp/.


Subject(s)
Algorithms , Markov Chains , RNA, Untranslated/chemistry , DNA/chemistry , Databases, Factual , Genome , Nucleic Acid Conformation , Sequence Alignment , Sequence Analysis, RNA
4.
Genetics ; 185(4): 1519-34, 2010 Aug.
Article in English | MEDLINE | ID: mdl-20479145

ABSTRACT

The distal arm of the fourth ("dot") chromosome of Drosophila melanogaster is unusual in that it exhibits an amalgamation of heterochromatic properties (e.g., dense packaging, late replication) and euchromatic properties (e.g., gene density similar to euchromatic domains, replication during polytenization). To examine the evolution of this unusual domain, we undertook a comparative study by generating high-quality sequence data and manually curating gene models for the dot chromosome of D. virilis (Tucson strain 15010-1051.88). Our analysis shows that the dot chromosomes of D. melanogaster and D. virilis have higher repeat density, larger gene size, lower codon bias, and a higher rate of gene rearrangement compared to a reference euchromatic domain. Analysis of eight "wanderer" genes (present in a euchromatic chromosome arm in one species and on the dot chromosome in the other) shows that their characteristics are similar to other genes in the same domain, which suggests that these characteristics are features of the domain and are not required for these genes to function. Comparison of this strain of D. virilis with the strain sequenced by the Drosophila 12 Genomes Consortium (Tucson strain 15010-1051.87) indicates that most genes on the dot are under weak purifying selection. Collectively, despite the heterochromatin-like properties of this domain, genes on the dot evolve to maintain function while being responsive to changes in their local environment.


Subject(s)
Chromosomes, Insect/genetics , Drosophila/genetics , Evolution, Molecular , Genome, Insect/genetics , Animals , Chromosome Mapping , Drosophila/classification , Drosophila Proteins/genetics , Drosophila melanogaster/genetics , Euchromatin/genetics , Genes, Insect/genetics , Heterochromatin/genetics , INDEL Mutation/genetics , Open Reading Frames/genetics , Species Specificity , Synteny , Tandem Repeat Sequences/genetics
5.
CBE Life Sci Educ ; 9(1): 55-69, 2010.
Article in English | MEDLINE | ID: mdl-20194808

ABSTRACT

Genomics is not only essential for students to understand biology but also provides unprecedented opportunities for undergraduate research. The goal of the Genomics Education Partnership (GEP), a collaboration between a growing number of colleges and universities around the country and the Department of Biology and Genome Center of Washington University in St. Louis, is to provide such research opportunities. Using a versatile curriculum that has been adapted to many different class settings, GEP undergraduates undertake projects to bring draft-quality genomic sequence up to high quality and/or participate in the annotation of these sequences. GEP undergraduates have improved more than 2 million bases of draft genomic sequence from several species of Drosophila and have produced hundreds of gene models using evidence-based manual annotation. Students appreciate their ability to make a contribution to ongoing research, and report increased independence and a more active learning approach after participation in GEP projects. They show knowledge gains on pre- and postcourse quizzes about genes and genomes and in bioinformatic analysis. Participating faculty also report professional gains, increased access to genomics-related technology, and an overall positive experience. We have found that using a genomics research project as the core of a laboratory course is rewarding for both faculty and students.


Subject(s)
Genetic Research , Genomics/education , Laboratories , Universities , Animals , Faculty , Students/psychology
6.
Article in English | MEDLINE | ID: mdl-19407348

ABSTRACT

Profile HMMs are powerful tools for modeling conserved motifs in proteins. They are widely used by search tools to classify new protein sequences into families based on domain architecture. However, the proliferation of known motifs and new proteomic sequence data poses a computational challenge for search, requiring days of CPU time to annotate an organism's proteome. It is highly desirable to speed up HMM search in large databases. We design PROSITE-like patterns and short profiles that are used as filters to rapidly eliminate protein-motif pairs for which a full profile HMM comparison does not yield a significant match. The design of the pattern-based filters is formulated as a multichoice knapsack problem. Profile-based filters with high sensitivity are extracted from a profile HMM based on their theoretical sensitivity and false positive rate. Experiments show that our profile-based filters achieve high sensitivity (near 100 percent) while keeping around 20\times speedup with respect to the unfiltered search program. Pattern-based filters typically retain at least 90 percent of the sensitivity of the source HMM with 30-40\times speedup. The profile-based filters have sensitivity comparable to the multistage filtering strategy HMMERHEAD [15] and are faster in most of our experiments.


Subject(s)
Conserved Sequence , Databases, Protein , Markov Chains , Proteins/chemistry , Sequence Analysis, Protein , Algorithms , Amino Acid Motifs , Amino Acid Sequence , Pattern Recognition, Automated , Sensitivity and Specificity , Sequence Homology, Amino Acid
7.
Microprocess Microsyst ; 33(4): 281-289, 2009 Jun 01.
Article in English | MEDLINE | ID: mdl-20160873

ABSTRACT

The amount of biosequence data being produced each year is growing exponentially. Extracting useful information from this massive amount of data efficiently is becoming an increasingly difficult task. There are many available software tools that molecular biologists use for comparing genomic data. This paper focuses on accelerating the most widely used such tool, BLAST. Mercury BLAST takes a streaming approach to the BLAST computation by off loading the performance-critical sections to specialized hardware. This hardware is then used in combination with the processor of the host system to deliver BLAST results in a fraction of the time of the general-purpose processor alone.This paper presents the design of the ungapped extension stage of Mercury BLAST. The architecture of the ungapped extension stage is described along with the context of this stage within the Mercury BLAST system. The design is compact and runs at 100 MHz on available FPGAs, making it an effective and powerful component for accelerating biosequence comparisons. The performance of this stage is 25× that of the standard software distribution, yielding close to 50× performance improvement on the complete BLAST application. The sensitivity is essentially equivalent to that of the standard distribution.

8.
Article in English | MEDLINE | ID: mdl-19492068

ABSTRACT

Large-scale protein sequence comparison is an important but compute-intensive task in molecular biology. BLASTP is the most popular tool for comparative analysis of protein sequences. In recent years, an exponential increase in the size of protein sequence databases has required either exponentially more running time or a cluster of machines to keep pace. To address this problem, we have designed and built a high-performance FPGA-accelerated version of BLASTP, Mercury BLASTP. In this paper, we describe the architecture of the portions of the application that are accelerated in the FPGA, and we also describe the integration of these FPGA-accelerated portions with the existing BLASTP software. We have implemented Mercury BLASTP on a commodity workstation with two Xilinx Virtex-II 6000 FPGAs. We show that the new design runs 11-15 times faster than software BLASTP on a modern CPU while delivering close to 99% identical results.

9.
Article in English | MEDLINE | ID: mdl-19642276

ABSTRACT

Detecting non-coding RNAs (ncRNAs) in genomic DNA is an important part of annotation. However, the most widely used tool for modeling ncRNA families, the covariance model (CM), incurs a high computational cost when used for search. This cost can be reduced by using a filter to exclude sequence that is unlikely to contain the ncRNA of interest, applying the CM only where it is likely to match strongly. Despite recent advances, designing an efficient filter that can detect nearly all ncRNA instances while excluding most irrelevant sequences remains challenging. This work proposes a systematic procedure to convert a CM for an ncRNA family to a secondary structure profile (SSP), which augments a conservation profile with secondary structure information but can still be efficiently scanned against long sequences. We use dynamic programming to estimate an SSP's sensitivity and FP rate, yielding an efficient, fully automated filter design algorithm. Our experiments demonstrate that designed SSP filters can achieve significant speedup over unfiltered CM search while maintaining high sensitivity for various ncRNA families, including those with and without strong sequence conservation. For highly structured ncRNA families, including secondary structure conservation yields better performance than using primary sequence conservation alone.


Subject(s)
Algorithms , RNA, Untranslated/chemistry , RNA, Untranslated/genetics , Sequence Alignment/methods , Sequence Analysis, RNA/methods , Base Sequence , Molecular Sequence Data , Nucleic Acid Conformation
10.
Methods Mol Biol ; 402: 245-68, 2007.
Article in English | MEDLINE | ID: mdl-17951799

ABSTRACT

Single-nucleotide polymorphism (SNP) genotyping is an important molecular genetics process, which can produce results that will be useful in the medical field. Because of inherent complexities in DNA manipulation and analysis, many different methods have been proposed for a standard assay. One of the proposed techniques for performing SNP genotyping requires amplifying regions of DNA surrounding a large number of SNP loci. To automate a portion of this particular method, it is necessary to select a set of primers for the experiment. Selecting these primers can be formulated as the Multiple Degenerate Primer Design (MDPD) problem. The Multiple, Iterative Primer Selector (MIPS) is an iterative beam-search algorithm for MDPD. Theoretical and experimental analyses show that this algorithm performs well compared with the limits of degenerate primer design. Furthermore, MIPS outperforms an existing algorithm that was designed for a related degenerate primer selection problem.


Subject(s)
Algorithms , DNA Primers/chemistry , Polymorphism, Single Nucleotide , Quantitative Trait Loci , Sequence Analysis, DNA , Genotype , Sequence Analysis, DNA/methods
11.
Bioinformatics ; 23(2): e36-43, 2007 Jan 15.
Article in English | MEDLINE | ID: mdl-17237102

ABSTRACT

MOTIVATION: Profile HMMs are a powerful tool for modeling conserved motifs in proteins. These models are widely used by search tools to classify new protein sequences into families based on domain architecture. However, the proliferation of known motifs and new proteomic sequence data poses a computational challenge for search, requiring days of CPU time to annotate an organism's proteome. RESULTS: We use PROSITE-like patterns as a filter to speed up the comparison between protein sequence and profile HMM. A set of patterns is designed starting from the HMM, and only sequences matching one of these patterns are compared to the HMM by full dynamic programming. We give an algorithm to design patterns with maximal sensitivity subject to a bound on the false positive rate. Experiments show that our patterns typically retain at least 90% of the sensitivity of the source HMM while accelerating search by an order of magnitude. AVAILABILITY: Contact the first author at the address below.


Subject(s)
Algorithms , Conserved Sequence , Models, Chemical , Sequence Alignment/methods , Sequence Analysis, Protein/methods , Amino Acid Motifs , Amino Acid Sequence , Artificial Intelligence , Computer Simulation , Markov Chains , Models, Statistical , Molecular Sequence Data , Pattern Recognition, Automated/methods , Sequence Homology, Amino Acid
12.
Article in English | MEDLINE | ID: mdl-18846267

ABSTRACT

Biosequence similarity search is an important application in modern molecular biology. Search algorithms aim to identify sets of sequences whose extensional similarity suggests a common evolutionary origin or function. The most widely used similarity search tool for biosequences is BLAST, a program designed to compare query sequences to a database. Here, we present the design of BLASTN, the version of BLAST that searches DNA sequences, on the Mercury system, an architecture that supports high-volume, high-throughput data movement off a data store and into reconfigurable hardware. An important component of application deployment on the Mercury system is the functional decomposition of the application onto both the reconfigurable hardware and the traditional processor. Both the Mercury BLASTN application design and its performance analysis are described.

13.
BMC Bioinformatics ; 7: 133, 2006 Mar 13.
Article in English | MEDLINE | ID: mdl-16533404

ABSTRACT

BACKGROUND: Seeded alignment is an important component of algorithms for fast, large-scale DNA similarity search. A good seed matching heuristic can reduce the execution time of genomic-scale sequence comparison without degrading sensitivity. Recently, many types of seed have been proposed to improve on the performance of traditional contiguous seeds as used in, e.g., NCBI BLASTN. Choosing among these seed types, particularly those that use information besides the presence or absence of matching residue pairs, requires practical guidance based on a rigorous comparison, including assessment of sensitivity, specificity, and computational efficiency. This work performs such a comparison, focusing on alignments in DNA outside widely studied coding regions. RESULTS: We compare seeds of several types, including those allowing transition mutations rather than matches at fixed positions, those allowing transitions at arbitrary positions ("BLASTZ" seeds), and those using a more general scoring matrix. For each seed type, we use an extended version of our Mandala seed design software to choose seeds with optimized sensitivity for various levels of specificity. Our results show that, on a test set biased toward alignments of noncoding DNA, transition information significantly improves seed performance, while finer distinctions between different types of mismatches do not. BLASTZ seeds perform especially well. These results depend on properties of our test set that are not shared by EST-based test sets with a strong bias toward coding DNA. CONCLUSION: Practical seed design requires careful attention to the properties of the alignments being sought. For noncoding DNA sequences, seeds that use transition information, especially BLASTZ-style seeds, are particularly useful. The Mandala seed design software can be found at http://www.cse.wustl.edu/~yanni/mandala/.


Subject(s)
Algorithms , Artificial Intelligence , Chromosome Mapping/methods , Pattern Recognition, Automated/methods , Sequence Alignment/methods , Sequence Analysis, DNA/methods , Base Sequence , Molecular Sequence Data
14.
Genome Biol ; 7(2): R15, 2006.
Article in English | MEDLINE | ID: mdl-16507169

ABSTRACT

BACKGROUND: Chromosome four of Drosophila melanogaster, known as the dot chromosome, is largely heterochromatic, as shown by immunofluorescent staining with antibodies to heterochromatin protein 1 (HP1) and histone H3K9me. In contrast, the absence of HP1 and H3K9me from the dot chromosome in D. virilis suggests that this region is euchromatic. D. virilis diverged from D. melanogaster 40 to 60 million years ago. RESULTS: Here we describe finished sequencing and analysis of 11 fosmids hybridizing to the dot chromosome of D. virilis (372,650 base-pairs) and seven fosmids from major euchromatic chromosome arms (273,110 base-pairs). Most genes from the dot chromosome of D. melanogaster remain on the dot chromosome in D. virilis, but many inversions have occurred. The dot chromosomes of both species are similar to the major chromosome arms in gene density and coding density, but the dot chromosome genes of both species have larger introns. The D. virilis dot chromosome fosmids have a high repeat density (22.8%), similar to homologous regions of D. melanogaster (26.5%). There are, however, major differences in the representation of repetitive elements. Remnants of DNA transposons make up only 6.3% of the D. virilis dot chromosome fosmids, but 18.4% of the homologous regions from D. melanogaster; DINE-1 and 1360 elements are particularly enriched in D. melanogaster. Euchromatic domains on the major chromosomes in both species have very few DNA transposons (less than 0.4 %). CONCLUSION: Combining these results with recent findings about RNAi, we suggest that specific repetitive elements, as well as density, play a role in determining higher-order chromatin packaging.


Subject(s)
Chromosome Mapping , DNA Transposable Elements/genetics , Drosophila melanogaster/genetics , Drosophila/genetics , Heterochromatin/genetics , Animals , DNA/genetics , Drosophila Proteins/genetics , Expressed Sequence Tags , Genome , In Situ Hybridization , Models, Genetic , Models, Statistical , RNA Interference , Repetitive Sequences, Nucleic Acid , Retroelements/genetics , Statistics, Nonparametric
15.
J Comput Biol ; 12(6): 847-61, 2005.
Article in English | MEDLINE | ID: mdl-16108721

ABSTRACT

The challenge of similarity search in massive DNA sequence databases has inspired major changes in BLAST-style alignment tools, which accelerate search by inspecting only pairs of sequences sharing a common short "seed," or pattern of matching residues. Some of these changes raise the possibility of improving search performance by probing sequence pairs with several distinct seeds, any one of which is sufficient for a seed match. However, designing a set of seeds to maximize their combined sensitivity to biologically meaningful sequence alignments is computationally difficult, even given recent advances in designing single seeds. This work describes algorithmic improvements to seed design that address the problem of designing a set of n seeds to be used simultaneously. We give a new local search method to optimize the sensitivity of seed sets. The method relies on efficient incremental computation of the probability that an alignment contains a match to a seed pi, given that it has already failed to match any of the seeds in a set Pi. We demonstrate experimentally that multi-seed designs, even with relatively few seeds, can be significantly more sensitive than even optimized single-seed designs.


Subject(s)
Algorithms , Genome , Sequence Alignment/methods , Sequence Analysis, DNA/methods , Software , Animals , Computational Biology , Databases, Genetic , Humans , Markov Chains , Mice , Sensitivity and Specificity
16.
Science ; 307(5717): 1955-9, 2005 Mar 25.
Article in English | MEDLINE | ID: mdl-15790854

ABSTRACT

Germ-free mice were maintained on polysaccharide-rich or simple-sugar diets and colonized for 10 days with an organism also found in human guts, Bacteroides thetaiotaomicron, followed by whole-genome transcriptional profiling of bacteria and mass spectrometry of cecal glycans. We found that these bacteria assembled on food particles and mucus, selectively induced outer-membrane polysaccharide-binding proteins and glycoside hydrolases, prioritized the consumption of liberated hexose sugars, and revealed a capacity to turn to host mucus glycans when polysaccharides were absent from the diet. This flexible foraging behavior should contribute to ecosystem stability and functional diversity.


Subject(s)
Bacterial Proteins/genetics , Bacteroides/metabolism , Cecum/microbiology , Polysaccharides/metabolism , Symbiosis , Adaptation, Physiological , Animals , Bacterial Proteins/metabolism , Bacteroides/enzymology , Bacteroides/genetics , Bacteroides/growth & development , Cluster Analysis , Diet , Dietary Carbohydrates/metabolism , Ecosystem , Gene Expression Profiling , Gene Expression Regulation, Bacterial , Germ-Free Life , Glycoside Hydrolases/genetics , Glycoside Hydrolases/metabolism , Hexoses/metabolism , Intestines/microbiology , Male , Mice , Mucus/metabolism , Oligonucleotide Array Sequence Analysis , Operon , Polysaccharide-Lyases/genetics , Polysaccharide-Lyases/metabolism , Transcription, Genetic , Up-Regulation
17.
J Comput Biol ; 10(3-4): 399-417, 2003.
Article in English | MEDLINE | ID: mdl-13677335

ABSTRACT

Fast algorithms for pairwise biosequence similarity search frequently use filtering and indexing strategies to identify potential matches between a query sequence and a database. For the most part, these strategies are not informed by the substitution score matrices commonly used by comparison algorithms to assign numerical scores to pairs of aligned residues. Consequently, although many filtering strategies offer strong formal guarantees about their ability to detect pairs of sequences differing by few substitutions, these methods can make no guarantee of detecting pairs with high similarity scores. We describe a general technique, score simulation, to help resolve the tension between existing filtering techniques and the use of score matrices. Score simulation, using score matrices, maps ungapped similarity search problems to the simpler problem of finding pairs of strings that differ by few substitutions. Score simulation leads to indexing schemes for biosequences that permit efficient ungapped similarity search with arbitrary score matrices while maintaining strong formal guarantees of sensitivity. We introduce the LSH-ALL-PAIRS-SIM algorithm for finding local similarities in large biosequence collections and show that it is both computationally feasible and sensitive in practice.


Subject(s)
Computational Biology/methods , Databases, Genetic , Sequence Alignment/methods , Sequence Analysis, DNA/methods , Sequence Analysis, Protein/methods
18.
J Comput Biol ; 9(2): 225-42, 2002.
Article in English | MEDLINE | ID: mdl-12015879

ABSTRACT

The DNA motif discovery problem abstracts the task of discovering short, conserved sites in genomic DNA. Pevzner and Sze recently described a precise combinatorial formulation of motif discovery that motivates the following algorithmic challenge: find twenty planted occurrences of a motif of length fifteen in roughly twelve kilobases of genomic sequence, where each occurrence of the motif differs from its consensus in four randomly chosen positions. Such "subtle" motifs, though statistically highly significant, expose a weakness in existing motif-finding algorithms, which typically fail to discover them. Pevzner and Sze introduced new algorithms to solve their (15,4)-motif challenge, but these methods do not scale efficiently to more difficult problems in the same family, such as the (14,4)-, (16,5)-, and (18,6)-motif problems. We introduce a novel motif-discovery algorithm, PROJECTION, designed to enhance the performance of existing motif finders using random projections of the input's substrings. Experiments on synthetic data demonstrate that PROJECTION remedies the weakness observed in existing algorithms, typically solving the difficult (14,4)-, (16,5)-, and (18,6)-motif problems. Our algorithm is robust to nonuniform background sequence distributions and scales to larger amounts of sequence than that specified in the original challenge. A probabilistic estimate suggests that related motif-finding problems that PROJECTION fails to solve are in all likelihood inherently intractable. We also test the performance of our algorithm on realistic biological examples, including transcription factor binding sites in eukaryotes and ribosome binding sites in prokaryotes.


Subject(s)
Algorithms , DNA/genetics , Base Composition , Base Sequence , Binding Sites/genetics , Computational Biology , Conserved Sequence , DNA/chemistry , DNA/metabolism , Ribosomes/metabolism , Sequence Analysis, DNA/statistics & numerical data , Transcription Factors/metabolism
SELECTION OF CITATIONS
SEARCH DETAIL
...