Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 27
Filter
Add more filters










Publication year range
1.
Bioinformatics ; 36(9): 2690-2696, 2020 05 01.
Article in English | MEDLINE | ID: mdl-31999322

ABSTRACT

MOTIVATION: Position-specific probability matrices (PPMs, also called position-specific weight matrices) have been the dominating model for transcription factor (TF)-binding motifs in DNA. There is, however, increasing recent evidence of better performance of higher order models such as Markov models of order one, also called adjacent dinucleotide matrices (ADMs). ADMs can model dependencies between adjacent nucleotides, unlike PPMs. A modeling technique and software tool that would estimate such models simultaneously both for monomers and their dimers have been missing. RESULTS: We present an ADM-based mixture model for monomeric and dimeric TF-binding motifs and an expectation maximization algorithm MODER2 for learning such models from training data and seeds. The model is a mixture that includes monomers and dimers, built from the monomers, with a description of the dimeric structure (spacing, orientation). The technique is modular, meaning that the co-operative effect of dimerization is made explicit by evaluating the difference between expected and observed models. The model is validated using HT-SELEX and generated datasets, and by comparing to some earlier PPM and ADM techniques. The ADM models explain data slightly better than PPM models for 314 tested TFs (or their DNA-binding domains) from four families (bHLH, bZIP, ETS and Homeodomain), the ADM mixture models by MODER2 being the best on average. AVAILABILITY AND IMPLEMENTATION: Software implementation is available from https://github.com/jttoivon/moder2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Software , Transcription Factors , Algorithms , Binding Sites , Nucleotide Motifs , Position-Specific Scoring Matrices , Protein Binding , Transcription Factors/genetics
2.
Nucleic Acids Res ; 46(8): e44, 2018 05 04.
Article in English | MEDLINE | ID: mdl-29385521

ABSTRACT

In some dimeric cases of transcription factor (TF) binding, the specificity of dimeric motifs has been observed to differ notably from what would be expected were the two factors to bind to DNA independently of each other. Current motif discovery methods are unable to learn monomeric and dimeric motifs in modular fashion such that deviations from the expected motif would become explicit and the noise from dimeric occurrences would not corrupt monomeric models. We propose a novel modeling technique and an expectation maximization algorithm, implemented as software tool MODER, for discovering monomeric TF binding motifs and their dimeric combinations. Given training data and seeds for monomeric motifs, the algorithm learns in the same probabilistic framework a mixture model which represents monomeric motifs as standard position-specific probability matrices (PPMs), and dimeric motifs as pairs of monomeric PPMs, with associated orientation and spacing preferences. For dimers the model represents deviations from pure modular model of two independent monomers, thus making co-operative binding effects explicit. MODER can analyze in reasonable time tens of Mbps of training data. We validated the tool on HT-SELEX and ChIP-seq data. Our findings include some TFs whose expected model has palindromic symmetry but the observed model is directional.


Subject(s)
DNA/chemistry , DNA/metabolism , Transcription Factors/metabolism , Algorithms , Base Sequence , Binding Sites , Chromatin Immunoprecipitation , Computational Biology/methods , Machine Learning , Models, Statistical , Nucleotide Motifs , Probability , SELEX Aptamer Technique , Software
3.
Bioinformatics ; 33(6): 799-806, 2017 03 15.
Article in English | MEDLINE | ID: mdl-27273673

ABSTRACT

Motivation: New long read sequencing technologies, like PacBio SMRT and Oxford NanoPore, can produce sequencing reads up to 50 000 bp long but with an error rate of at least 15%. Reducing the error rate is necessary for subsequent utilization of the reads in, e.g. de novo genome assembly. The error correction problem has been tackled either by aligning the long reads against each other or by a hybrid approach that uses the more accurate short reads produced by second generation sequencing technologies to correct the long reads. Results: We present an error correction method that uses long reads only. The method consists of two phases: first, we use an iterative alignment-free correction method based on de Bruijn graphs with increasing length of k -mers, and second, the corrected reads are further polished using long-distance dependencies that are found using multiple alignments. According to our experiments, the proposed method is the most accurate one relying on long reads only for read sets with high coverage. Furthermore, when the coverage of the read set is at least 75×, the throughput of the new method is at least 20% higher. Availability and Implementation: LoRMA is freely available at http://www.cs.helsinki.fi/u/lmsalmel/LoRMA/ . Contact: leena.salmela@cs.helsinki.fi.


Subject(s)
Sequence Analysis, DNA/methods , Software , Algorithms , Escherichia coli/genetics , Genome , High-Throughput Nucleotide Sequencing/methods , Saccharomyces cerevisiae/genetics
4.
Bioinformatics ; 33(4): 514-521, 2017 02 15.
Article in English | MEDLINE | ID: mdl-28011774

ABSTRACT

Motivation: While the position weight matrix (PWM) is the most popular model for sequence motifs, there is growing evidence of the usefulness of more advanced models such as first-order Markov representations, and such models are also becoming available in well-known motif databases. There has been lots of research of how to learn these models from training data but the problem of predicting putative sites of the learned motifs by matching the model against new sequences has been given less attention. Moreover, motif site analysis is often concerned about how different variants in the sequence affect the sites. So far, though, the corresponding efficient software tools for motif matching have been lacking. Results: We develop fast motif matching algorithms for the aforementioned tasks. First, we formalize a framework based on high-order position weight matrices for generic representation of motif models with dinucleotide or general q -mer dependencies, and adapt fast PWM matching algorithms to the high-order PWM framework. Second, we show how to incorporate different types of sequence variants , such as SNPs and indels, and their combined effects into efficient PWM matching workflows. Benchmark results show that our algorithms perform well in practice on genome-sized sequence sets and are for multiple motif search much faster than the basic sliding window algorithm. Availability and Implementation: Implementations are available as a part of the MOODS software package under the GNU General Public License v3.0 and the Biopython license ( http://www.cs.helsinki.fi/group/pssmfind ). Contact: janne.h.korhonen@gmail.com.


Subject(s)
INDEL Mutation , Polymorphism, Single Nucleotide , Position-Specific Scoring Matrices , Sequence Analysis, DNA/methods , Software , Algorithms , Chromosomes, Human, Pair 22 , Humans
5.
Nat Commun ; 5: 4737, 2014 Sep 05.
Article in English | MEDLINE | ID: mdl-25189940

ABSTRACT

Previous studies have reported that chromosome synteny in Lepidoptera has been well conserved, yet the number of haploid chromosomes varies widely from 5 to 223. Here we report the genome (393 Mb) of the Glanville fritillary butterfly (Melitaea cinxia; Nymphalidae), a widely recognized model species in metapopulation biology and eco-evolutionary research, which has the putative ancestral karyotype of n=31. Using a phylogenetic analyses of Nymphalidae and of other Lepidoptera, combined with orthologue-level comparisons of chromosomes, we conclude that the ancestral lepidopteran karyotype has been n=31 for at least 140 My. We show that fusion chromosomes have retained the ancestral chromosome segments and very few rearrangements have occurred across the fusion sites. The same, shortest ancestral chromosomes have independently participated in fusion events in species with smaller karyotypes. The short chromosomes have higher rearrangement rate than long ones. These characteristics highlight distinctive features of the evolutionary dynamics of butterflies and moths.


Subject(s)
Butterflies/genetics , Chromosome Aberrations , Evolution, Molecular , Genome/genetics , Phylogeny , Synteny , Animals , Base Sequence , Chromosome Mapping , Karyotype , Likelihood Functions , Models, Genetic , Molecular Sequence Data , Sequence Analysis, DNA
6.
J Comput Biol ; 20(9): 621-30, 2013 Sep.
Article in English | MEDLINE | ID: mdl-23919388

ABSTRACT

The problem of finding the locations in DNA sequences that match a given motif describing the binding specificities of a transcription factor (TF) has many applications in computational biology. This problem has been extensively studied when the position weight matrix (PWM) model is used to represent motifs. We investigate it under the feature motif model, a generalization of the PWM model that does not assume independence between positions in the pattern while being compatible with the original PWM. We present a new method for finding the binding sites of a transcription factor in a DNA sequence when the feature motif model is used to describe transcription factor binding specificities. The experimental results on random and real data show that the search algorithm is fast in practice.


Subject(s)
Models, Genetic , Response Elements/genetics , Transcription Factors/genetics , Amino Acid Motifs , Computational Biology/methods
7.
Cell ; 152(1-2): 327-39, 2013 Jan 17.
Article in English | MEDLINE | ID: mdl-23332764

ABSTRACT

Although the proteins that read the gene regulatory code, transcription factors (TFs), have been largely identified, it is not well known which sequences TFs can recognize. We have analyzed the sequence-specific binding of human TFs using high-throughput SELEX and ChIP sequencing. A total of 830 binding profiles were obtained, describing 239 distinctly different binding specificities. The models represent the majority of human TFs, approximately doubling the coverage compared to existing systematic studies. Our results reveal additional specificity determinants for a large number of factors for which a partial specificity was known, including a commonly observed A- or T-rich stretch that flanks the core motifs. Global analysis of the data revealed that homodimer orientation and spacing preferences, and base-stacking interactions, have a larger role in TF-DNA binding than previously appreciated. We further describe a binding model incorporating these features that is required to understand binding of TFs to DNA.


Subject(s)
Chromatin Immunoprecipitation , Models, Biological , SELEX Aptamer Technique , Transcription Factors/metabolism , Animals , DNA/chemistry , Humans , Markov Chains , Mice , Phylogeny , Transcription Factors/genetics
8.
Bioinformatics ; 27(23): 3259-65, 2011 Dec 01.
Article in English | MEDLINE | ID: mdl-21998153

ABSTRACT

MOTIVATION: Assembling genomes from short read data has become increasingly popular, but the problem remains computationally challenging especially for larger genomes. We study the scaffolding phase of sequence assembly where preassembled contigs are ordered based on mate pair data. RESULTS: We present MIP Scaffolder that divides the scaffolding problem into smaller subproblems and solves these with mixed integer programming. The scaffolding problem can be represented as a graph and the biconnected components of this graph can be solved independently. We present a technique for restricting the size of these subproblems so that they can be solved accurately with mixed integer programming. We compare MIP Scaffolder to two state of the art methods, SOPRA and SSPACE. MIP Scaffolder is fast and produces better or as good scaffolds as its competitors on large genomes. AVAILABILITY: The source code of MIP Scaffolder is freely available at http://www.cs.helsinki.fi/u/lmsalmel/mip-scaffolder/. CONTACT: leena.salmela@cs.helsinki.fi.


Subject(s)
Genome , Sequence Analysis, DNA/methods , Software , Algorithms , Animals , Caenorhabditis elegans/genetics , Escherichia coli/genetics , High-Throughput Nucleotide Sequencing , Pseudomonas syringae/genetics
9.
Algorithms Mol Biol ; 6: 5, 2011 Mar 23.
Article in English | MEDLINE | ID: mdl-21429203

ABSTRACT

BACKGROUND: The discovery of surprisingly frequent patterns is of paramount interest in bioinformatics and computational biology. Among the patterns considered, those consisting of pairs of solid words that co-occur within a prescribed maximum distance -or gapped factors- emerge in a variety of contexts of DNA and protein sequence analysis. A few algorithms and tools have been developed in connection with specific formulations of the problem, however, none can handle comprehensively each of the multiple ways in which the distance between the two terms in a pair may be defined. RESULTS: This paper presents efficient algorithms and tools for the extraction of all pairs of words up to an arbitrarily large length that co-occur surprisingly often in close proximity within a sequence. Whereas the number of such pairs in a sequence of n characters can be Θ(n4), it is shown that an exhaustive discovery process can be carried out in O(n2) or O(n3), depending on the way distance is measured. This is made possible by a prudent combination of properties of pattern maximality and monotonicity of scores, which lead to reduce the number of word pairs to be weighed explicitly, while still producing also the scores attained by any of the pairs not explicitly considered. We applied our approach to the discovery of spaced dyads in DNA sequences. CONCLUSIONS: Experiments on biological datasets prove that the method is effective and much faster than exhaustive enumeration of candidate patterns. Software is available freely by academic users via the web interface at http://bcb.dei.unipd.it:8080/dyweb.

10.
Article in English | MEDLINE | ID: mdl-21071798

ABSTRACT

Position weight matrices are an important method for modeling signals or motifs in biological sequences, both in DNA and protein contexts. In this paper, we present fast algorithms for the problem of finding significant matches of such matrices. Our algorithms are of the online type, and they generalize classical multipattern matching, filtering, and superalphabet techniques of combinatorial string matching to the problem of weight matrix matching. Several variants of the algorithms are developed, including multiple matrix extensions that perform the search for several matrices in one scan through the sequence database. Experimental performance evaluation is provided to compare the new techniques against each other as well as against some other online and index-based algorithms proposed in the literature. Compared to the brute-force O(mn) approach, our solutions can be faster by a factor that is proportional to the matrix length m. Our multiple-matrix filtration algorithm had the best performance in the experiments. On a current PC, this algorithm finds significant matches (p = 0.0001) of the 123 JASPAR matrices in the human genome in about 18 minutes.


Subject(s)
Algorithms , Computational Biology/methods , Pattern Recognition, Automated/methods , Sequence Analysis, DNA/methods , Sequence Analysis, Protein/methods , DNA/chemistry , Humans , Proteins/chemistry
11.
EMBO J ; 29(13): 2147-60, 2010 Jul 07.
Article in English | MEDLINE | ID: mdl-20517297

ABSTRACT

Members of the large ETS family of transcription factors (TFs) have highly similar DNA-binding domains (DBDs)-yet they have diverse functions and activities in physiology and oncogenesis. Some differences in DNA-binding preferences within this family have been described, but they have not been analysed systematically, and their contributions to targeting remain largely uncharacterized. We report here the DNA-binding profiles for all human and mouse ETS factors, which we generated using two different methods: a high-throughput microwell-based TF DNA-binding specificity assay, and protein-binding microarrays (PBMs). Both approaches reveal that the ETS-binding profiles cluster into four distinct classes, and that all ETS factors linked to cancer, ERG, ETV1, ETV4 and FLI1, fall into just one of these classes. We identify amino-acid residues that are critical for the differences in specificity between all the classes, and confirm the specificities in vivo using chromatin immunoprecipitation followed by sequencing (ChIP-seq) for a member of each class. The results indicate that even relatively small differences in in vitro binding specificity of a TF contribute to site selectivity in vivo.


Subject(s)
DNA/metabolism , Genome-Wide Association Study , Proto-Oncogene Proteins c-ets/metabolism , Animals , Base Sequence , Binding Sites , Cell Line , DNA/chemistry , Humans , Mice , Models, Molecular , Protein Binding , Proto-Oncogene Proteins c-ets/chemistry , Sequence Analysis, DNA
12.
Genome Res ; 20(6): 861-73, 2010 Jun.
Article in English | MEDLINE | ID: mdl-20378718

ABSTRACT

The genetic code-the binding specificity of all transfer-RNAs--defines how protein primary structure is determined by DNA sequence. DNA also dictates when and where proteins are expressed, and this information is encoded in a pattern of specific sequence motifs that are recognized by transcription factors. However, the DNA-binding specificity is only known for a small fraction of the approximately 1400 human transcription factors (TFs). We describe here a high-throughput method for analyzing transcription factor binding specificity that is based on systematic evolution of ligands by exponential enrichment (SELEX) and massively parallel sequencing. The method is optimized for analysis of large numbers of TFs in parallel through the use of affinity-tagged proteins, barcoded selection oligonucleotides, and multiplexed sequencing. Data are analyzed by a new bioinformatic platform that uses the hundreds of thousands of sequencing reads obtained to control the quality of the experiments and to generate binding motifs for the TFs. The described technology allows higher throughput and identification of much longer binding profiles than current microarray-based methods. In addition, as our method is based on proteins expressed in mammalian cells, it can also be used to characterize DNA-binding preferences of full-length proteins or proteins requiring post-translational modifications. We validate the method by determining binding specificities of 14 different classes of TFs and by confirming the specificities for NFATC1 and RFX3 using ChIP-seq. Our results reveal unexpected dimeric modes of binding for several factors that were thought to preferentially bind DNA as monomers.


Subject(s)
SELEX Aptamer Technique , Transcription Factors/metabolism , Affinity Labels , Base Sequence , Binding Sites , DNA , Humans , Molecular Sequence Data
14.
Curr Opin Biotechnol ; 21(1): 70-7, 2010 Feb.
Article in English | MEDLINE | ID: mdl-20171871

ABSTRACT

In the wake of numerous sequenced genomes becoming available, computational methods for the reconstruction of metabolic networks have received considerable attention. Here, we review recent methods and software tools useful along the reconstruction workflow, from sequence annotation and network assembly to model verification and testing against experimental data. Reconstruction methods can be divided into three categories, depending on the magnitude of network context which is taken into account in the process of assembling the metabolic model: First, each enzyme may be predicted independently by annotation transfer or machine learning methods. Second, the presence of a metabolic pathway may be detected from genome and experimental evidence, often utilizing a reference pathway database. Third, the method may attempt to directly reconstruct a consistent metabolic network without relying on predefined reference pathways. Regardless of the chosen context, all methods strive to reconstruct genome-scale metabolic reconstructions. Currently a gap exists between software platforms dedicated to genome annotation and computational tools for automatically repairing network inconsistencies and validating against measurement data. We argue that to accelerate the reconstruction efforts, computational tools need to be developed that bridge the phases of the reconstruction workflow. In particular, the goal of finding consistent metabolic models suitable for computational analysis should be taken into account already in the beginning phases of reconstruction.


Subject(s)
Algorithms , Gene Expression Profiling/methods , Models, Biological , Proteome/metabolism , Signal Transduction/physiology , Computer Simulation , Metabolic Clearance Rate
15.
Bioinformatics ; 25(23): 3181-2, 2009 Dec 01.
Article in English | MEDLINE | ID: mdl-19773334

ABSTRACT

UNLABELLED: MOODS (MOtif Occurrence Detection Suite) is a software package for matching position weight matrices against DNA sequences. MOODS implements state-of-the-art online matching algorithms, achieving considerably faster scanning speed than with a simple brute-force search. MOODS is written in C++, with bindings for the popular BioPerl and Biopython toolkits. It can easily be adapted for different purposes and integrated into existing workflows. It can also be used as a C++ library. AVAILABILITY: The package with documentation and examples of usage is available at http://www.cs.helsinki.fi/group/pssmfind. The source code is also available under the terms of a GNU General Public License (GPL).


Subject(s)
Computational Biology/methods , Sequence Analysis, DNA/methods , Software , Algorithms , Base Sequence , Position-Specific Scoring Matrices
16.
Nat Genet ; 41(8): 885-90, 2009 Aug.
Article in English | MEDLINE | ID: mdl-19561604

ABSTRACT

Homozygosity for the G allele of rs6983267 at 8q24 increases colorectal cancer (CRC) risk approximately 1.5 fold. We report here that the risk allele G shows copy number increase during CRC development. Our computer algorithm, Enhancer Element Locator (EEL), identified an enhancer element that contains rs6983267. The element drove expression of a reporter gene in a pattern that is consistent with regulation by the key CRC pathway Wnt. rs6983267 affects a binding site for the Wnt-regulated transcription factor TCF4, with the risk allele G showing stronger binding in vitro and in vivo. Genome-wide ChIP assay revealed the element as the strongest TCF4 binding site within 1 Mb of MYC. An unambiguous correlation between rs6983267 genotype and MYC expression was not detected, and additional work is required to scrutinize all possible targets of the enhancer. Our work provides evidence that the common CRC predisposition associated with 8q24 arises from enhanced responsiveness to Wnt signaling.


Subject(s)
Chromosomes, Human, Pair 8/genetics , Colorectal Neoplasms/genetics , Genetic Predisposition to Disease , Polymorphism, Single Nucleotide/genetics , Signal Transduction , Wnt Proteins/metabolism , Animals , Base Sequence , Binding Sites , Conserved Sequence , Embryo, Mammalian/metabolism , Enhancer Elements, Genetic/genetics , Gene Dosage , Genome-Wide Association Study , Humans , Mice , Mice, Transgenic , Molecular Sequence Data , Organ Specificity , Protein Binding , Proto-Oncogene Proteins c-myc/metabolism , Reproducibility of Results , TCF Transcription Factors/metabolism , Transcription Factor 7-Like 2 Protein , beta Catenin/metabolism
17.
Genome Biol ; 10(1): 202, 2009.
Article in English | MEDLINE | ID: mdl-19226437

ABSTRACT

With genome analysis expanding from the study of genes to the study of gene regulation, 'regulatory genomics' utilizes sequence information, evolution and functional genomics measurements to unravel how regulatory information is encoded in the genome.


Subject(s)
Genomics/methods , Regulatory Elements, Transcriptional/genetics , Base Sequence , Computational Biology/methods , Databases, Nucleic Acid , Evolution, Molecular
18.
BMC Bioinformatics ; 9: 266, 2008 Jun 06.
Article in English | MEDLINE | ID: mdl-18534038

ABSTRACT

BACKGROUND: Metabolic fluxes provide invaluable insight on the integrated response of a cell to environmental stimuli or genetic modifications. Current computational methods for estimating the metabolic fluxes from 13C isotopomer measurement data rely either on manual derivation of analytic equations constraining the fluxes or on the numerical solution of a highly nonlinear system of isotopomer balance equations. In the first approach, analytic equations have to be tediously derived for each organism, substrate or labelling pattern, while in the second approach, the global nature of an optimum solution is difficult to prove and comprehensive measurements of external fluxes to augment the 13C isotopomer data are typically needed. RESULTS: We present a novel analytic framework for estimating metabolic flux ratios in the cell from 13C isotopomer measurement data. In the presented framework, equation systems constraining the fluxes are derived automatically from the model of the metabolism of an organism. The framework is designed to be applicable with all metabolic network topologies, 13C isotopomer measurement techniques, substrates and substrate labelling patterns. By analyzing nuclear magnetic resonance (NMR) and mass spectrometry (MS) measurement data obtained from the experiments on glucose with the model micro-organisms Bacillus subtilis and Saccharomyces cerevisiae we show that our framework is able to automatically produce the flux ratios discovered so far by the domain experts with tedious manual analysis. Furthermore, we show by in silico calculability analysis that our framework can rapidly produce flux ratio equations--as well as predict when the flux ratios are unobtainable by linear means--also for substrates not related to glucose. CONCLUSION: The core of 13C metabolic flux analysis framework introduced in this article constitutes of flow and independence analysis of metabolic fragments and techniques for manipulating isotopomer measurements with vector space techniques. These methods facilitate efficient, analytic computation of the ratios between the fluxes of pathways that converge to a common junction metabolite. The framework can been seen as a generalization and formalization of existing tradition for computing metabolic flux ratios where equations constraining flux ratios are manually derived, usually without explicitly showing the formal proofs of the validity of the equations.


Subject(s)
Bacillus subtilis/metabolism , Bacterial Proteins/analysis , Carbon Isotopes/pharmacokinetics , Fungal Proteins/analysis , Glucose/metabolism , Saccharomyces cerevisiae/metabolism , Artificial Intelligence , Bacterial Proteins/metabolism , Citric Acid Cycle/physiology , Computer Simulation , Databases, Factual , Fungal Proteins/metabolism , Glycolysis/physiology , Isomerism , Isotope Labeling , Magnetic Resonance Spectroscopy , Mass Spectrometry , Neural Networks, Computer , Pentose Phosphate Pathway/physiology , Research Design , Statistics as Topic/methods
19.
Brief Bioinform ; 9(3): 250-3, 2008 May.
Article in English | MEDLINE | ID: mdl-18216087

ABSTRACT

Over recent years, five European PhD programmes have organized a series of 'Bioinformatics Research and Education Workshops'. These workshops address the needs of first-year PhD students and have been designed to combine a maximum of educational impact and scientific stimulation with a minimum of financial and administrative effort. We describe the BREW experience and argue that this type of event constitutes an attractive component of PhD education in computational biology and beyond.


Subject(s)
Computational Biology/education , Curriculum , Education, Graduate/organization & administration , Education, Professional/organization & administration , Genomics/education , Teaching/methods , Education, Graduate/methods , Europe
20.
J Integr Bioinform ; 5(2)2008 Aug 25.
Article in English | MEDLINE | ID: mdl-20134058

ABSTRACT

ReMatch is a web-based, user-friendly tool that constructs stoichiometric network models for metabolic flux analysis, integrating user-developed models into a database collected from several comprehensive metabolic data resources, including KEGG, MetaCyc and CheBI. Particularly, ReMatch augments the metabolic reactions of the model with carbon mappings to facilitate (13)C metabolic flux analysis. The construction of a network model consisting of biochemical reactions is the first step in most metabolic modelling tasks. This model construction can be a tedious task as the required information is usually scattered to many separate databases whose interoperability is suboptimal, due to the heterogeneous naming conventions of metabolites in different databases. Another, particularly severe data integration problem is faced in (13)C metabolic flux analysis, where the mappings of carbon atoms from substrates into products in the model are required. ReMatch has been developed to solve the above data integration problems. First, ReMatch matches the imported user-developed model against the internal ReMatch database while considering a comprehensive metabolite name thesaurus. This, together with wild card support, allows the user to specify the model quickly without having to look the names up manually. Second, ReMatch is able to augment reactions of the model with carbon mappings, obtained either from the internal database or given by the user with an easy-touse tool. The constructed models can be exported into 13C-FLUX and SBML file formats. Further, a stoichiometric matrix and visualizations of the network model can be generated. The constructed models of metabolic networks can be optionally made available to the other users of ReMatch. Thus, ReMatch provides a common repository for metabolic network models with carbon mappings for the needs of metabolic flux analysis community. ReMatch is freely available for academic use at http://www.cs.helsinki.fi/group/sysfys/software/rematch/.


Subject(s)
Carbon/metabolism , Computational Biology/methods , Metabolic Networks and Pathways , Software , Databases, Factual , Internet , User-Computer Interface
SELECTION OF CITATIONS
SEARCH DETAIL
...