Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 22
Filter
Add more filters










Publication year range
1.
Comput Biol Chem ; 33(2): 121-36, 2009 Apr.
Article in English | MEDLINE | ID: mdl-19152793

ABSTRACT

Despite the rapidly increasing number of sequenced and re-sequenced genomes, many issues regarding the computational assembly of large-scale sequencing data have remain unresolved. Computational assembly is crucial in large genome projects as well for the evolving high-throughput technologies and plays an important role in processing the information generated by these methods. Here, we provide a comprehensive overview of the current publicly available sequence assembly programs. We describe the basic principles of computational assembly along with the main concerns, such as repetitive sequences in genomic DNA, highly expressed genes and alternative transcripts in EST sequences. We summarize existing comparisons of different assemblers and provide a detailed descriptions and directions for download of assembly programs at: http://genome.ku.dk/resources/assembly/methods.html.


Subject(s)
Genomics/methods , Computational Biology/methods , DNA/chemistry , Expressed Sequence Tags , Genome , Polymorphism, Single Nucleotide , Repetitive Sequences, Nucleic Acid
2.
Anim Genet ; 39(2): 193-5, 2008 Apr.
Article in English | MEDLINE | ID: mdl-18261187

ABSTRACT

The Sino-Danish pig genome project produced 685 851 ESTs (Gorodkin et al. 2007), of which 41 499 originated from the mitochondrial genome. In this study, the mitochondrial ESTs were assembled, and 374 putative SNPs were found. Chromatograms for the ESTs containing SNPs were manually inspected, and 112 total (52 non-synonymous) SNPs were found to be of high confidence (five of them are close to disease-causing SNPs in humans). Nine of the high-confidence SNPs were tested experimentally, and eight were confirmed. The SNPs can be accessed online at http://pigest.ku.dk/more/mito.


Subject(s)
Expressed Sequence Tags , Mitochondria/genetics , Polymorphism, Single Nucleotide , Swine/genetics , Animals , Confidence Intervals , Gene Frequency , Genome , Humans , Species Specificity
3.
Anim Genet ; 38(4): 401-5, 2007 Aug.
Article in English | MEDLINE | ID: mdl-17559553

ABSTRACT

A total of 10 882 porcine microsatellite repeats were identified in genomic shotgun sequences from the Sino-Danish Pig Genome Sequencing Consortium (http://www.piggenome.dk). Of these, 4528 microsatellites were placed on a pig-human comparative map by blast analysis of porcine sequences against the human genome (blast cut-off threshold =1 x 10(-5)). All microsatellite sequences placed on the comparative map are accessible at http://www.animalgenome.org/QTLdb/pig.html. These sequences increase the number of identified microsatellites in the porcine genome by several orders of magnitude. They are a new resource of microsatellite sequences for generating markers to be used in linkage studies and in fine mapping and positional cloning of quantitative trait loci.


Subject(s)
Microsatellite Repeats , Swine/genetics , Animals , Chromosome Mapping , Computational Biology , Genetic Linkage , Genetic Markers , Genome , Humans
4.
Comput Biol Chem ; 30(4): 249-54, 2006 Aug.
Article in English | MEDLINE | ID: mdl-16798093

ABSTRACT

The processing of micro RNAs (miRNAs) from their stemloop precursor have revealed asymmetry in the processing of the mature and its star sequence. Furthermore, the miRNA processing system between organism differ. To assess this at the sequence level we have investigated mature miRNAs in their genomic contexts. We have compared profiles of mature miRNAs within their genomic context of the 5' and 3' stemloop precursor arms and we find asymmetry between mature sequences of the 5' and 3' stemloop precursor arms. The main observation is that vertebrate organisms have a characteristic motif on the 5' arm which is in contrast to the 3' arm motif which mainly show the conserved U at the position of the mature start. Also the vertebrate 5' arm motif show a semi-conserved G 13 nucleotides upstream from the first position. We compared the 5' and 3' arm profiles using the average log likelihood ratio (ALLR) score, as defined by Wang and Stormo (2003) [Wang T., Stormo, G.D., 2003. Combining phylogenetic data with co-regulated genes to identify regulatory motifs. Bioinformatics 2369-2380.] and computing a p-value we find that the two profiles differs significantly in their 3' end where the 5' arm motif (in contrast to the 3' arm motif) has a semi-conserved GU rich region. Similar findings are also obtained for other organisms, such as fly, worm and plants. The observed similarities and differences between closely and distantly related organisms are discussed and related to current knowledge of miRNA processing.


Subject(s)
MicroRNAs/chemistry , Animals , Arabidopsis/genetics , Base Sequence , Computational Biology , Conserved Sequence , Databases, Nucleic Acid , Genomics , Humans
5.
Anim Genet ; 37(3): 199-204, 2006 Jun.
Article in English | MEDLINE | ID: mdl-16734676

ABSTRACT

Single nucleotide polymorphisms (SNPs) were discovered in porcine expressed sequence tags (ESTs) orthologous to genes from human chromosome 13 (HSA13) and predicted to be located on pig chromosome 11 (SSC11). The SNPs were identified as sequence variants in clusters of EST sequences from pig cDNA libraries constructed in the Sino-Danish pig genome project. In total, 312 human gene sequences from HSA13 were used for similarity searches in our pig EST database. Pig ESTs showing significant similarity with HSA13 genes were clustered and candidate SNPs were identified. Allele frequencies for 26 SNPs were estimated in a group of 80 unrelated pigs from Danish commercial pig breeds: Duroc, Hampshire, Landrace and Large White. Eighteen of the 26 SNPs genotyped in the PiGMaP Reference Families were mapped by linkage analysis to SSC11. The EST-based SNPs published here are new genetic markers useful for linkage and association studies in commercial and experimental pig populations. This study represents the first gene-associated SNP linkage map of pig chromosome 11 and adds new comparative mapping information between SSC11 and HSA13. Furthermore, our data facilitate future studies aimed at the identification of interesting regions on pig chromosome 11, positional cloning and fine mapping of quantitative trait loci in pig.


Subject(s)
Chromosome Mapping , Chromosomes, Mammalian , Genetic Linkage , Polymorphism, Single Nucleotide , Swine/genetics , Animals , Breeding , Denmark , Expressed Sequence Tags , Gene Frequency , Genotype , Swine/classification
6.
Comput Biol Chem ; 28(5-6): 367-74, 2004 Dec.
Article in English | MEDLINE | ID: mdl-15556477

ABSTRACT

Predicted assignments of biological sequences are often evaluated by Matthews correlation coefficient. However, Matthews correlation coefficient applies only to cases where the assignments belong to two categories, and cases with more than two categories are often artificially forced into two categories by considering what belongs and what does not belong to one of the categories, leading to the loss of information. Here, an extended correlation coefficient that applies to K-categories is proposed, and this measure is shown to be highly applicable for evaluating prediction of RNA secondary structure in cases where some predicted pairs go into the category "unknown" due to lack of reliability in predicted pairs or unpaired residues. Hence, predicting base pairs of RNA secondary structure can be a three-category problem. The measure is further shown to be well in agreement with existing performance measures used for ranking protein secondary structure predictions. Server and software is available at http://rk.kvl.dk/.


Subject(s)
Computational Biology , Databases, Factual , RNA/chemistry , RNA/classification , Sequence Alignment , Base Pairing , Molecular Sequence Data , Protein Structure, Secondary , Proteins/chemistry , Proteins/classification
7.
Comput Biol Chem ; 28(3): 219-26, 2004 Jul.
Article in English | MEDLINE | ID: mdl-15261152

ABSTRACT

Predicting RNA secondary structure using evolutionary history can be carried out by using an alignment of related RNA sequences with conserved structure. Accurately determining evolutionary substitution rates for base pairs and single stranded nucleotides is a concern for methods based on this type of approach. Determining these rates can be hard to do reliably without a large and accurate initial alignment, which ideally also has structural annotation. Hence, one must often apply rates extracted from other RNA families with trusted alignments and structures. Here, we investigate this problem by applying rates derived from tRNA and rRNA to the prediction of the much more rapidly evolving 5'-region of HIV-1. We find that the HIV-1 prediction is in agreement with experimental data, even though the relative evolutionary rate between A and G is significantly increased, both in stem and loop regions. In addition we obtained an alignment of the 5' HIV-1 region that is more consistent with the structure than that currently in the database. We added randomized noise to the original values of the rates to investigate the stability of predictions to rate matrix deviations. We find that changes within a fairly large range still produce reliable predictions and conclude that using rates from a limited set of RNA sequences is valid over a broader range of sequences.


Subject(s)
Evolution, Molecular , Nucleic Acid Conformation , RNA/chemistry , Algorithms , Base Pairing/genetics , Databases, Nucleic Acid , HIV-1/chemistry , HIV-1/genetics , Kinetics , Models, Genetic , Point Mutation/genetics , RNA/genetics , RNA, Ribosomal/chemistry , RNA, Ribosomal/genetics , RNA, Transfer/chemistry , RNA, Transfer/genetics , RNA, Viral/chemistry , RNA, Viral/genetics , Sequence Alignment/methods
8.
Bioinformatics ; 17(7): 642-5, 2001 Jul.
Article in English | MEDLINE | ID: mdl-11448882

ABSTRACT

UNLABELLED: We have developed a series of programs which assist in maintenance of structural RNA databases. A main program BLASTs the RNA database against GenBank and automatically extends and realigns the sequences to include the entire range of the RNA query sequences. After manual update of the database, other programs can examine base pair consistency and phylogenetic support. The output can be applied iteratively to refine the structural alignment of the RNA database. Using these tools, the number of potential misannotations per sequence was reduced from 20 to 3 in the Signal Recognition Particle RNA database. AVAILABILITY: A quick-server and programs are available at http://www.bioinf.au.dk/rnadbtool/


Subject(s)
Databases as Topic , RNA/genetics , Sequence Alignment/statistics & numerical data , Base Sequence , Computational Biology , Molecular Sequence Data , RNA, Messenger/genetics , Sequence Analysis, RNA/statistics & numerical data , Sequence Homology, Nucleic Acid , Software
9.
Comput Chem ; 25(3): 301-7, 2001 May.
Article in English | MEDLINE | ID: mdl-11339412

ABSTRACT

Through computational analysis of high-performance liquid chromatography (HPLC) traces we find correlations between secondary metabolites and growth conditions of six varieties of barley. Using artificial neural networks, it was possible to classify chromatograms for which the varieties were fertilized by nitrogen and treated by fungicide. For each variety of barley we could also differentiate it from the others. Surprisingly, all these classification tasks could be solved successfully by a simple network with no hidden units. When adding to the methodology pruning of the network weights, we were able to reduce the set of peaks in the chromatograms and obtain a necessary subset from which the growth conditions and differentiation may be decided. In some instances, more complex networks with hidden units could lead to a further reduction of the number of peaks used. In most cases, far more than half of the peaks are redundant. We find that it requires fewer information-rich peaks to perform the variety differentiation tasks than to recognize any of the growth conditions. Analysis of the network weights reveals correlations between weighted combinations of peaks.


Subject(s)
Hordeum/chemistry , Hordeum/genetics , Neural Networks, Computer , Phenols/chemistry , Chromatography/methods , Chromatography, High Pressure Liquid , Fertilizers/analysis , Fungicides, Industrial/analysis , Hordeum/growth & development , Nitrates/analysis , Species Specificity
10.
Nucleic Acids Res ; 29(10): 2135-44, 2001 May 15.
Article in English | MEDLINE | ID: mdl-11353083

ABSTRACT

Post-transcriptional regulation of gene expression is often accomplished by proteins binding to specific sequence motifs in mRNA molecules, to affect their translation or stability. The motifs are often composed of a combination of sequence and structural constraints such that the overall structure is preserved even though much of the primary sequence is variable. While several methods exist to discover transcriptional regulatory sites in the DNA sequences of coregulated genes, the RNA motif discovery problem is much more difficult because of covariation in the positions. We describe the combined use of two approaches for RNA structure prediction, FOLDALIGN and COVE, that together can discover and model stem-loop RNA motifs in unaligned sequences, such as UTRs from post-transcriptionally coregulated genes. We evaluate the method on two datasets, one a section of rRNA genes with randomly truncated ends so that a global alignment is not possible, and the other a hyper-variable collection of IRE-like elements that were inserted into randomized UTR sequences. In both cases the combined method identified the motifs correctly, and in the rRNA example we show that it is capable of determining the structure, which includes bulge and internal loops as well as a variable length hairpin loop. Those automated results are quantitatively evaluated and found to agree closely with structures contained in curated databases, with correlation coefficients up to 0.9. A basic server, Stem-Loop Align SearcH (SLASH), which will perform stem-loop searches in unaligned RNA sequences, is available at http://www.bioinf.au.dk/slash/.


Subject(s)
Computational Biology , Nucleic Acid Conformation , RNA/chemistry , RNA/genetics , Software , Algorithms , Base Sequence , Databases as Topic , Internet , Molecular Sequence Data , RNA/metabolism , RNA, Archaeal/chemistry , RNA, Archaeal/genetics , RNA, Archaeal/metabolism , RNA, Ribosomal/chemistry , RNA, Ribosomal/genetics , RNA, Ribosomal/metabolism , Regulatory Sequences, Nucleic Acid/genetics , Sensitivity and Specificity , Sequence Alignment , Untranslated Regions/chemistry , Untranslated Regions/genetics , Untranslated Regions/metabolism
11.
Nucleic Acids Res ; 29(1): 169-70, 2001 Jan 01.
Article in English | MEDLINE | ID: mdl-11125080

ABSTRACT

Signal recognition particle (SRP) is a stable cytoplasmic ribonucleoprotein complex that serves to translocate secretory proteins across membranes during translation. The SRP Database (SRPDB) provides compilations of SRP components, ordered alphabetically and phylogenetically. Alignments emphasize phylogenetically-supported base pairs in SRP RNA and conserved residues in the proteins. Data are provided in various formats including a column arrangement for improved access and simplified computational usability. Included are motifs for identification of new sequences, SRP RNA secondary structure diagrams, 3-D models and links to high-resolution structures. This release includes 11 new SRP RNA sequences (total of 129), two protein SRP9 sequences (total of seven), two protein SRP14 sequences (total of 10), two protein SRP19 sequences (total of 16), 10 new SRP54 (ffh) sequences (total of 66), two protein SRP68 sequences (total of seven) and two protein SRP72 sequences (total of nine). Seven sequences of the SRP receptor alpha-subunit and its FtsY homolog (total of 51) are new. Also considered are ss-subunit of SRP receptor, Flhf, Hbsu, CaM kinase II and cpSRP43. Access to SRPDB is at http://psyche.uthct. edu/dbs/SRPDB/SRPDB.html and the European mirror http://www.medkem. gu.se/dbs/SRPDB/SRPDB.html


Subject(s)
Databases, Factual , Signal Recognition Particle/genetics , Internet , Proteins/genetics , RNA/genetics
12.
Nucleic Acids Res ; 29(1): 171-2, 2001 Jan 01.
Article in English | MEDLINE | ID: mdl-11125081

ABSTRACT

The tmRNA database (tmRDB) is maintained at the University of Texas Health Science Center at Tyler, Texas, and accessible on the World Wide Web at the URL http://psyche.uthct.edu/dbs/tmRDB/tmRDB.++ +html. Mirror sites are located at Auburn University, Auburn, Alabama (http://www.ag.auburn.edu/mirror/tmRDB/) and the Institute of Biological Sciences, Aarhus, Denmark (http://www.bioinf.au. dk/tmRDB/). The tmRDB provides information and citation links about tmRNA, a molecule that combines functions of tRNA and mRNA in trans-translation. tmRNA is likely to be present in all bacteria and has been found in algae chloroplasts, the cyanelle of Cyanophora paradoxa and the mitochondrion of the flagellate Reclinomonas americana. This release adds 26 new sequences and corresponding predicted tmRNA-encoded tag peptides for a total of 86 tmRNAs, ordered alphabetically and phylogenetically. Secondary structures and three-dimensional models in PDB format for representative molecules are being made available. tmRNA alignments prove individual base pairs and are generated manually assisted by computational tools. The alignments with their corresponding structural annotation can be obtained in various formats, including a new column format designed to improve and simplify computational usability of the data.


Subject(s)
Databases, Factual , RNA, Messenger/genetics , RNA, Transfer/genetics , Internet , Phylogeny , Prokaryotic Cells/metabolism , Sequence Alignment
13.
Genome Inform ; 12: 184-93, 2001.
Article in English | MEDLINE | ID: mdl-11791237

ABSTRACT

When a set of coregulated genes share a common structural RNA motif, e.g. a hairpin, most motif search approaches fail to locate the covarying but structurally conserved motif. There do exist methods that can locate structural RNA motifs, like FOLDALIGN, but the main problem with these methods is that they are computationally expensive. In FOLDALIGN, a major contribution to this is the use of a greedy algorithm to construct the multiple alignment. To ensure good quality many redundant computations must be made. However, by applying the greedy algorithm on a carefully selected subset of sequences, near full greedy quality can be obtained. The basic idea is to estimate the order in which the sequences entered a good greedy alignment. If such a ranking, found from all pairwise alignments, is in good agreement with the order of appearance in the multiple alignment, the core structural motif can be found by performing the greedy algorithm on just the top sequences in the ranking. The ranking used in this mini-greedy algorithm is found by using two complementing approaches: 1) When interpreting the FOLDALIGN score as an inner product (kernel), the sequences can be ranked according to their distance to their center of mass; 2) We construct an algorithm that attempts to find the K closest sequences in the vector space associated with the inner product, and the remaining sequences can be ranked by their minimum distance to any of the sequences, or to the center of mass in this set. The two approaches arecompared and merged, and the results discussed. We also show that structural alignments of near full greedy quality can found in significantly reduced time, using these methods. The algorithm is being included in the SLASH (Stem-Loop Align SearcH) server available at http://www.bioinf.au.dk/slash.


Subject(s)
Algorithms , RNA/chemistry , RNA/genetics , Base Sequence , Computational Biology , Databases, Nucleic Acid , Nucleic Acid Conformation , Sequence Alignment/statistics & numerical data
14.
Bioinformatics ; 15(9): 769-70, 1999 Sep.
Article in English | MEDLINE | ID: mdl-10498780

ABSTRACT

UNLABELLED: MatrixPlot is a program for making high-quality matrix plots, such as mutual information plots of sequence alignments and distance matrices of sequences with known three-dimensional coordinates. The user can add information about the sequences (e.g. a sequence logo profile) along the edges of the plot, as well as zoom in on any region in the plot. AVAILABILITY: MatrixPlot can be obtained on request, and can also be accessed online at http://www. cbs.dtu.dk/services/MatrixPlot. CONTACT: gorodkin@cbs.dtu.dk


Subject(s)
Proteins/chemistry , Sequence Alignment , Software , Nucleic Acids/chemistry , Sequence Analysis, Protein
15.
Article in English | MEDLINE | ID: mdl-10786291

ABSTRACT

Correlations between sequence separation (in residues) and distance (in Angstrom) of any pair of amino acids in polypeptide chains are investigated. For each sequence separation we define a distance threshold. For pairs of amino acids where the distance between C alpha atoms is smaller than the threshold, a characteristic sequence (logo) motif, is found. The motifs change as the sequence separation increases: for small separations they consist of one peak located in between the two residues, then additional peaks at these residues appear, and finally the center peak smears out for very large separations. We also find correlations between the residues in the center of the motif. This and other statistical analysis are used to design neural networks with enhanced performance compared to earlier work. Importantly, the statistical analysis explains why neural networks perform better than simple statistical data-driven approaches such as pair probability density functions. The statistical results also explain characteristics of the network performance for increasing sequence separation. The improvement of the new network design is significant in the sequence separation range 10-30 residues. Finally, we find that the performance curve for increasing sequence separation is directly correlated to the corresponding information content. A WWW server, distanceP, is available at http://www.cbs.dtu.dk/services/distanceP/.


Subject(s)
Amino Acid Motifs , Neural Networks, Computer , Proteins/chemistry , Algorithms , Databases, Factual , Entropy , Models, Statistical
16.
Article in English | MEDLINE | ID: mdl-9783207

ABSTRACT

We study from a computational standpoint several different physical scales associated with structural features of DNA sequences, including dinucleotide scales such as base stacking energy and propeller twist, and trinucleotide scales such as bendability and nucleosome positioning. We show that these scales provide an alternative or complementary compact representation of DNA sequences. As an example we construct a strand invariant representation of DNA sequences. The scales can also be used to analyze and discover new DNA structural patterns, especially in combinations with hidden Markov models (HMMs). The scales are applied to HMMs of human promoter sequences revealing a number of significant differences between regions upstream and downstream of the transcriptional start point. Finally we show, with some qualifications, that such scales are by and large independent, and therefore complement each other.


Subject(s)
DNA/chemistry , Artificial Intelligence , Base Sequence , DNA/genetics , Humans , Markov Chains , Molecular Structure , Oligodeoxyribonucleotides/chemistry , Oligodeoxyribonucleotides/genetics , Pattern Recognition, Automated , Promoter Regions, Genetic , TATA Box , Thermodynamics
17.
Nucleic Acids Res ; 25(18): 3724-32, 1997 Sep 15.
Article in English | MEDLINE | ID: mdl-9278497

ABSTRACT

We present a computational scheme to locally align a collection of RNA sequences using sequence and structure constraints. In addition, the method searches for the resulting alignments with the most significant common motifs, among all possible collections. The first part utilizes a simplified version of the Sankoff algorithm for simultaneous folding and alignment of RNA sequences, but maintains tractability by constructing multi-sequence alignments from pairwise comparisons. The algorithm finds the multiple alignments using a greedy approach and has similarities to both CLUSTAL and CONSENSUS, but the core algorithm assures that the pairwise alignments are optimized for both sequence and structure conservation. The choice of scoring system and the method of progressively constructing the final solution are important considerations that are discussed. Example solutions, and comparisons with other approaches, are provided. The solutions include finding consensus structures identical to published ones.


Subject(s)
Computer Simulation , RNA/genetics , Sequence Analysis , Algorithms , Animals , Databases, Factual , Humans
18.
Article in English | MEDLINE | ID: mdl-9322025

ABSTRACT

We present a computational scheme to search for the most common motif, composed of a combination of sequence and structure constraints, among a collection of RNA sequences. The method uses a simplified version of the Sankoff algorithm for simultaneous folding and alignment of RNA sequences, but maintains tractability by constructing multi-sequence alignments from pairwise comparisons. The overall method has similarities to both CLUSTAL and CONSENSUS, but the core algorithm assures that the pairwise alignments are optimized for both sequence and structure conservation. Example solutions, and comparisons with other approaches, are provided. The solutions include finding consensus structures identical to published ones.


Subject(s)
Algorithms , RNA/chemistry , RNA/genetics , Sequence Alignment/methods , Base Sequence , Databases, Factual , Molecular Structure , Nucleic Acid Conformation , Software
19.
Int J Neural Syst ; 8(5-6): 489-98, 1997.
Article in English | MEDLINE | ID: mdl-10065831

ABSTRACT

A better understanding of pruning methods based on a ranking of weights according to their saliency in a trained network requires further information on the statistical properties of such saliencies. We focus on two-layer networks with either a linear or nonlinear output unit, and obtain analytic expressions for the distribution of saliencies and their logarithms. Our results reveal unexpected universal properties of the log-saliency distribution and suggest a novel algorithm for saliency-based weight ranking that avoids the numerical cost of second derivative evaluations.


Subject(s)
Artificial Intelligence , Neural Networks, Computer , Algorithms , Linear Models , Nonlinear Dynamics
20.
Comput Appl Biosci ; 13(6): 583-6, 1997 Dec.
Article in English | MEDLINE | ID: mdl-9475985

ABSTRACT

MOTIVATION: We extend the standard 'Sequence Logo' method of Schneider and Stevens (Nucleic Acids Res., 18, 6097-6100, 1990) to incorporate prior frequencies on the bases, allow for gaps in the alignments, and indicate the mutual information of base-paired regions in RNA. RESULTS: Given an alignment of RNA sequences with the base pairings indicated, the program will calculate the information at each position, including the mutual information of the base pairs, and display the results in a 'Structure Logo'. Alignments without base pairing can also be displayed in a 'Sequence Logo', but still allowing gaps and incorporating prior frequencies if desired. AVAILABILITY: The code is available from, and an Internet server can be used to run the program at, http://www.cbs.dtu.dk/gorodkin/appl/slogo. html.


Subject(s)
Computational Biology/methods , RNA/chemistry , Sequence Alignment/methods , Algorithms , Base Composition/genetics , Computer Simulation , Mathematics , Nucleic Acid Conformation , Nucleic Acid Hybridization/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...