Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 27
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Bioinformatics ; 36(9): 2690-2696, 2020 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-31999322

RESUMO

MOTIVATION: Position-specific probability matrices (PPMs, also called position-specific weight matrices) have been the dominating model for transcription factor (TF)-binding motifs in DNA. There is, however, increasing recent evidence of better performance of higher order models such as Markov models of order one, also called adjacent dinucleotide matrices (ADMs). ADMs can model dependencies between adjacent nucleotides, unlike PPMs. A modeling technique and software tool that would estimate such models simultaneously both for monomers and their dimers have been missing. RESULTS: We present an ADM-based mixture model for monomeric and dimeric TF-binding motifs and an expectation maximization algorithm MODER2 for learning such models from training data and seeds. The model is a mixture that includes monomers and dimers, built from the monomers, with a description of the dimeric structure (spacing, orientation). The technique is modular, meaning that the co-operative effect of dimerization is made explicit by evaluating the difference between expected and observed models. The model is validated using HT-SELEX and generated datasets, and by comparing to some earlier PPM and ADM techniques. The ADM models explain data slightly better than PPM models for 314 tested TFs (or their DNA-binding domains) from four families (bHLH, bZIP, ETS and Homeodomain), the ADM mixture models by MODER2 being the best on average. AVAILABILITY AND IMPLEMENTATION: Software implementation is available from https://github.com/jttoivon/moder2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Software , Fatores de Transcrição , Algoritmos , Sítios de Ligação , Motivos de Nucleotídeos , Matrizes de Pontuação de Posição Específica , Ligação Proteica , Fatores de Transcrição/genética
2.
Nucleic Acids Res ; 46(8): e44, 2018 05 04.
Artigo em Inglês | MEDLINE | ID: mdl-29385521

RESUMO

In some dimeric cases of transcription factor (TF) binding, the specificity of dimeric motifs has been observed to differ notably from what would be expected were the two factors to bind to DNA independently of each other. Current motif discovery methods are unable to learn monomeric and dimeric motifs in modular fashion such that deviations from the expected motif would become explicit and the noise from dimeric occurrences would not corrupt monomeric models. We propose a novel modeling technique and an expectation maximization algorithm, implemented as software tool MODER, for discovering monomeric TF binding motifs and their dimeric combinations. Given training data and seeds for monomeric motifs, the algorithm learns in the same probabilistic framework a mixture model which represents monomeric motifs as standard position-specific probability matrices (PPMs), and dimeric motifs as pairs of monomeric PPMs, with associated orientation and spacing preferences. For dimers the model represents deviations from pure modular model of two independent monomers, thus making co-operative binding effects explicit. MODER can analyze in reasonable time tens of Mbps of training data. We validated the tool on HT-SELEX and ChIP-seq data. Our findings include some TFs whose expected model has palindromic symmetry but the observed model is directional.


Assuntos
DNA/química , DNA/metabolismo , Fatores de Transcrição/metabolismo , Algoritmos , Sequência de Bases , Sítios de Ligação , Imunoprecipitação da Cromatina , Biologia Computacional/métodos , Aprendizado de Máquina , Modelos Estatísticos , Motivos de Nucleotídeos , Probabilidade , Técnica de Seleção de Aptâmeros , Software
3.
Bioinformatics ; 33(6): 799-806, 2017 03 15.
Artigo em Inglês | MEDLINE | ID: mdl-27273673

RESUMO

Motivation: New long read sequencing technologies, like PacBio SMRT and Oxford NanoPore, can produce sequencing reads up to 50 000 bp long but with an error rate of at least 15%. Reducing the error rate is necessary for subsequent utilization of the reads in, e.g. de novo genome assembly. The error correction problem has been tackled either by aligning the long reads against each other or by a hybrid approach that uses the more accurate short reads produced by second generation sequencing technologies to correct the long reads. Results: We present an error correction method that uses long reads only. The method consists of two phases: first, we use an iterative alignment-free correction method based on de Bruijn graphs with increasing length of k -mers, and second, the corrected reads are further polished using long-distance dependencies that are found using multiple alignments. According to our experiments, the proposed method is the most accurate one relying on long reads only for read sets with high coverage. Furthermore, when the coverage of the read set is at least 75×, the throughput of the new method is at least 20% higher. Availability and Implementation: LoRMA is freely available at http://www.cs.helsinki.fi/u/lmsalmel/LoRMA/ . Contact: leena.salmela@cs.helsinki.fi.


Assuntos
Análise de Sequência de DNA/métodos , Software , Algoritmos , Escherichia coli/genética , Genoma , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Saccharomyces cerevisiae/genética
4.
Bioinformatics ; 33(4): 514-521, 2017 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-28011774

RESUMO

Motivation: While the position weight matrix (PWM) is the most popular model for sequence motifs, there is growing evidence of the usefulness of more advanced models such as first-order Markov representations, and such models are also becoming available in well-known motif databases. There has been lots of research of how to learn these models from training data but the problem of predicting putative sites of the learned motifs by matching the model against new sequences has been given less attention. Moreover, motif site analysis is often concerned about how different variants in the sequence affect the sites. So far, though, the corresponding efficient software tools for motif matching have been lacking. Results: We develop fast motif matching algorithms for the aforementioned tasks. First, we formalize a framework based on high-order position weight matrices for generic representation of motif models with dinucleotide or general q -mer dependencies, and adapt fast PWM matching algorithms to the high-order PWM framework. Second, we show how to incorporate different types of sequence variants , such as SNPs and indels, and their combined effects into efficient PWM matching workflows. Benchmark results show that our algorithms perform well in practice on genome-sized sequence sets and are for multiple motif search much faster than the basic sliding window algorithm. Availability and Implementation: Implementations are available as a part of the MOODS software package under the GNU General Public License v3.0 and the Biopython license ( http://www.cs.helsinki.fi/group/pssmfind ). Contact: janne.h.korhonen@gmail.com.


Assuntos
Mutação INDEL , Polimorfismo de Nucleotídeo Único , Matrizes de Pontuação de Posição Específica , Análise de Sequência de DNA/métodos , Software , Algoritmos , Cromossomos Humanos Par 22 , Humanos
5.
Nat Commun ; 5: 4737, 2014 Sep 05.
Artigo em Inglês | MEDLINE | ID: mdl-25189940

RESUMO

Previous studies have reported that chromosome synteny in Lepidoptera has been well conserved, yet the number of haploid chromosomes varies widely from 5 to 223. Here we report the genome (393 Mb) of the Glanville fritillary butterfly (Melitaea cinxia; Nymphalidae), a widely recognized model species in metapopulation biology and eco-evolutionary research, which has the putative ancestral karyotype of n=31. Using a phylogenetic analyses of Nymphalidae and of other Lepidoptera, combined with orthologue-level comparisons of chromosomes, we conclude that the ancestral lepidopteran karyotype has been n=31 for at least 140 My. We show that fusion chromosomes have retained the ancestral chromosome segments and very few rearrangements have occurred across the fusion sites. The same, shortest ancestral chromosomes have independently participated in fusion events in species with smaller karyotypes. The short chromosomes have higher rearrangement rate than long ones. These characteristics highlight distinctive features of the evolutionary dynamics of butterflies and moths.


Assuntos
Borboletas/genética , Aberrações Cromossômicas , Evolução Molecular , Genoma/genética , Filogenia , Sintenia , Animais , Sequência de Bases , Mapeamento Cromossômico , Cariótipo , Funções Verossimilhança , Modelos Genéticos , Dados de Sequência Molecular , Análise de Sequência de DNA
6.
J Comput Biol ; 20(9): 621-30, 2013 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-23919388

RESUMO

The problem of finding the locations in DNA sequences that match a given motif describing the binding specificities of a transcription factor (TF) has many applications in computational biology. This problem has been extensively studied when the position weight matrix (PWM) model is used to represent motifs. We investigate it under the feature motif model, a generalization of the PWM model that does not assume independence between positions in the pattern while being compatible with the original PWM. We present a new method for finding the binding sites of a transcription factor in a DNA sequence when the feature motif model is used to describe transcription factor binding specificities. The experimental results on random and real data show that the search algorithm is fast in practice.


Assuntos
Modelos Genéticos , Elementos de Resposta/genética , Fatores de Transcrição/genética , Motivos de Aminoácidos , Biologia Computacional/métodos
7.
Cell ; 152(1-2): 327-39, 2013 Jan 17.
Artigo em Inglês | MEDLINE | ID: mdl-23332764

RESUMO

Although the proteins that read the gene regulatory code, transcription factors (TFs), have been largely identified, it is not well known which sequences TFs can recognize. We have analyzed the sequence-specific binding of human TFs using high-throughput SELEX and ChIP sequencing. A total of 830 binding profiles were obtained, describing 239 distinctly different binding specificities. The models represent the majority of human TFs, approximately doubling the coverage compared to existing systematic studies. Our results reveal additional specificity determinants for a large number of factors for which a partial specificity was known, including a commonly observed A- or T-rich stretch that flanks the core motifs. Global analysis of the data revealed that homodimer orientation and spacing preferences, and base-stacking interactions, have a larger role in TF-DNA binding than previously appreciated. We further describe a binding model incorporating these features that is required to understand binding of TFs to DNA.


Assuntos
Imunoprecipitação da Cromatina , Modelos Biológicos , Técnica de Seleção de Aptâmeros , Fatores de Transcrição/metabolismo , Animais , DNA/química , Humanos , Cadeias de Markov , Camundongos , Filogenia , Fatores de Transcrição/genética
8.
Bioinformatics ; 27(23): 3259-65, 2011 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-21998153

RESUMO

MOTIVATION: Assembling genomes from short read data has become increasingly popular, but the problem remains computationally challenging especially for larger genomes. We study the scaffolding phase of sequence assembly where preassembled contigs are ordered based on mate pair data. RESULTS: We present MIP Scaffolder that divides the scaffolding problem into smaller subproblems and solves these with mixed integer programming. The scaffolding problem can be represented as a graph and the biconnected components of this graph can be solved independently. We present a technique for restricting the size of these subproblems so that they can be solved accurately with mixed integer programming. We compare MIP Scaffolder to two state of the art methods, SOPRA and SSPACE. MIP Scaffolder is fast and produces better or as good scaffolds as its competitors on large genomes. AVAILABILITY: The source code of MIP Scaffolder is freely available at http://www.cs.helsinki.fi/u/lmsalmel/mip-scaffolder/. CONTACT: leena.salmela@cs.helsinki.fi.


Assuntos
Genoma , Análise de Sequência de DNA/métodos , Software , Algoritmos , Animais , Caenorhabditis elegans/genética , Escherichia coli/genética , Sequenciamento de Nucleotídeos em Larga Escala , Pseudomonas syringae/genética
9.
Algorithms Mol Biol ; 6: 5, 2011 Mar 23.
Artigo em Inglês | MEDLINE | ID: mdl-21429203

RESUMO

BACKGROUND: The discovery of surprisingly frequent patterns is of paramount interest in bioinformatics and computational biology. Among the patterns considered, those consisting of pairs of solid words that co-occur within a prescribed maximum distance -or gapped factors- emerge in a variety of contexts of DNA and protein sequence analysis. A few algorithms and tools have been developed in connection with specific formulations of the problem, however, none can handle comprehensively each of the multiple ways in which the distance between the two terms in a pair may be defined. RESULTS: This paper presents efficient algorithms and tools for the extraction of all pairs of words up to an arbitrarily large length that co-occur surprisingly often in close proximity within a sequence. Whereas the number of such pairs in a sequence of n characters can be Θ(n4), it is shown that an exhaustive discovery process can be carried out in O(n2) or O(n3), depending on the way distance is measured. This is made possible by a prudent combination of properties of pattern maximality and monotonicity of scores, which lead to reduce the number of word pairs to be weighed explicitly, while still producing also the scores attained by any of the pairs not explicitly considered. We applied our approach to the discovery of spaced dyads in DNA sequences. CONCLUSIONS: Experiments on biological datasets prove that the method is effective and much faster than exhaustive enumeration of candidate patterns. Software is available freely by academic users via the web interface at http://bcb.dei.unipd.it:8080/dyweb.

10.
Artigo em Inglês | MEDLINE | ID: mdl-21071798

RESUMO

Position weight matrices are an important method for modeling signals or motifs in biological sequences, both in DNA and protein contexts. In this paper, we present fast algorithms for the problem of finding significant matches of such matrices. Our algorithms are of the online type, and they generalize classical multipattern matching, filtering, and superalphabet techniques of combinatorial string matching to the problem of weight matrix matching. Several variants of the algorithms are developed, including multiple matrix extensions that perform the search for several matrices in one scan through the sequence database. Experimental performance evaluation is provided to compare the new techniques against each other as well as against some other online and index-based algorithms proposed in the literature. Compared to the brute-force O(mn) approach, our solutions can be faster by a factor that is proportional to the matrix length m. Our multiple-matrix filtration algorithm had the best performance in the experiments. On a current PC, this algorithm finds significant matches (p = 0.0001) of the 123 JASPAR matrices in the human genome in about 18 minutes.


Assuntos
Algoritmos , Biologia Computacional/métodos , Reconhecimento Automatizado de Padrão/métodos , Análise de Sequência de DNA/métodos , Análise de Sequência de Proteína/métodos , DNA/química , Humanos , Proteínas/química
11.
EMBO J ; 29(13): 2147-60, 2010 Jul 07.
Artigo em Inglês | MEDLINE | ID: mdl-20517297

RESUMO

Members of the large ETS family of transcription factors (TFs) have highly similar DNA-binding domains (DBDs)-yet they have diverse functions and activities in physiology and oncogenesis. Some differences in DNA-binding preferences within this family have been described, but they have not been analysed systematically, and their contributions to targeting remain largely uncharacterized. We report here the DNA-binding profiles for all human and mouse ETS factors, which we generated using two different methods: a high-throughput microwell-based TF DNA-binding specificity assay, and protein-binding microarrays (PBMs). Both approaches reveal that the ETS-binding profiles cluster into four distinct classes, and that all ETS factors linked to cancer, ERG, ETV1, ETV4 and FLI1, fall into just one of these classes. We identify amino-acid residues that are critical for the differences in specificity between all the classes, and confirm the specificities in vivo using chromatin immunoprecipitation followed by sequencing (ChIP-seq) for a member of each class. The results indicate that even relatively small differences in in vitro binding specificity of a TF contribute to site selectivity in vivo.


Assuntos
DNA/metabolismo , Estudo de Associação Genômica Ampla , Proteínas Proto-Oncogênicas c-ets/metabolismo , Animais , Sequência de Bases , Sítios de Ligação , Linhagem Celular , DNA/química , Humanos , Camundongos , Modelos Moleculares , Ligação Proteica , Proteínas Proto-Oncogênicas c-ets/química , Análise de Sequência de DNA
12.
Genome Res ; 20(6): 861-73, 2010 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-20378718

RESUMO

The genetic code-the binding specificity of all transfer-RNAs--defines how protein primary structure is determined by DNA sequence. DNA also dictates when and where proteins are expressed, and this information is encoded in a pattern of specific sequence motifs that are recognized by transcription factors. However, the DNA-binding specificity is only known for a small fraction of the approximately 1400 human transcription factors (TFs). We describe here a high-throughput method for analyzing transcription factor binding specificity that is based on systematic evolution of ligands by exponential enrichment (SELEX) and massively parallel sequencing. The method is optimized for analysis of large numbers of TFs in parallel through the use of affinity-tagged proteins, barcoded selection oligonucleotides, and multiplexed sequencing. Data are analyzed by a new bioinformatic platform that uses the hundreds of thousands of sequencing reads obtained to control the quality of the experiments and to generate binding motifs for the TFs. The described technology allows higher throughput and identification of much longer binding profiles than current microarray-based methods. In addition, as our method is based on proteins expressed in mammalian cells, it can also be used to characterize DNA-binding preferences of full-length proteins or proteins requiring post-translational modifications. We validate the method by determining binding specificities of 14 different classes of TFs and by confirming the specificities for NFATC1 and RFX3 using ChIP-seq. Our results reveal unexpected dimeric modes of binding for several factors that were thought to preferentially bind DNA as monomers.


Assuntos
Técnica de Seleção de Aptâmeros , Fatores de Transcrição/metabolismo , Marcadores de Afinidade , Sequência de Bases , Sítios de Ligação , DNA , Humanos , Dados de Sequência Molecular
14.
Curr Opin Biotechnol ; 21(1): 70-7, 2010 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-20171871

RESUMO

In the wake of numerous sequenced genomes becoming available, computational methods for the reconstruction of metabolic networks have received considerable attention. Here, we review recent methods and software tools useful along the reconstruction workflow, from sequence annotation and network assembly to model verification and testing against experimental data. Reconstruction methods can be divided into three categories, depending on the magnitude of network context which is taken into account in the process of assembling the metabolic model: First, each enzyme may be predicted independently by annotation transfer or machine learning methods. Second, the presence of a metabolic pathway may be detected from genome and experimental evidence, often utilizing a reference pathway database. Third, the method may attempt to directly reconstruct a consistent metabolic network without relying on predefined reference pathways. Regardless of the chosen context, all methods strive to reconstruct genome-scale metabolic reconstructions. Currently a gap exists between software platforms dedicated to genome annotation and computational tools for automatically repairing network inconsistencies and validating against measurement data. We argue that to accelerate the reconstruction efforts, computational tools need to be developed that bridge the phases of the reconstruction workflow. In particular, the goal of finding consistent metabolic models suitable for computational analysis should be taken into account already in the beginning phases of reconstruction.


Assuntos
Algoritmos , Perfilação da Expressão Gênica/métodos , Modelos Biológicos , Proteoma/metabolismo , Transdução de Sinais/fisiologia , Simulação por Computador , Taxa de Depuração Metabólica
15.
Bioinformatics ; 25(23): 3181-2, 2009 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-19773334

RESUMO

UNLABELLED: MOODS (MOtif Occurrence Detection Suite) is a software package for matching position weight matrices against DNA sequences. MOODS implements state-of-the-art online matching algorithms, achieving considerably faster scanning speed than with a simple brute-force search. MOODS is written in C++, with bindings for the popular BioPerl and Biopython toolkits. It can easily be adapted for different purposes and integrated into existing workflows. It can also be used as a C++ library. AVAILABILITY: The package with documentation and examples of usage is available at http://www.cs.helsinki.fi/group/pssmfind. The source code is also available under the terms of a GNU General Public License (GPL).


Assuntos
Biologia Computacional/métodos , Análise de Sequência de DNA/métodos , Software , Algoritmos , Sequência de Bases , Matrizes de Pontuação de Posição Específica
16.
Nat Genet ; 41(8): 885-90, 2009 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-19561604

RESUMO

Homozygosity for the G allele of rs6983267 at 8q24 increases colorectal cancer (CRC) risk approximately 1.5 fold. We report here that the risk allele G shows copy number increase during CRC development. Our computer algorithm, Enhancer Element Locator (EEL), identified an enhancer element that contains rs6983267. The element drove expression of a reporter gene in a pattern that is consistent with regulation by the key CRC pathway Wnt. rs6983267 affects a binding site for the Wnt-regulated transcription factor TCF4, with the risk allele G showing stronger binding in vitro and in vivo. Genome-wide ChIP assay revealed the element as the strongest TCF4 binding site within 1 Mb of MYC. An unambiguous correlation between rs6983267 genotype and MYC expression was not detected, and additional work is required to scrutinize all possible targets of the enhancer. Our work provides evidence that the common CRC predisposition associated with 8q24 arises from enhanced responsiveness to Wnt signaling.


Assuntos
Cromossomos Humanos Par 8/genética , Neoplasias Colorretais/genética , Predisposição Genética para Doença , Polimorfismo de Nucleotídeo Único/genética , Transdução de Sinais , Proteínas Wnt/metabolismo , Animais , Sequência de Bases , Sítios de Ligação , Sequência Conservada , Embrião de Mamíferos/metabolismo , Elementos Facilitadores Genéticos/genética , Dosagem de Genes , Estudo de Associação Genômica Ampla , Humanos , Camundongos , Camundongos Transgênicos , Dados de Sequência Molecular , Especificidade de Órgãos , Ligação Proteica , Proteínas Proto-Oncogênicas c-myc/metabolismo , Reprodutibilidade dos Testes , Fatores de Transcrição TCF/metabolismo , Proteína 2 Semelhante ao Fator 7 de Transcrição , beta Catenina/metabolismo
17.
Genome Biol ; 10(1): 202, 2009.
Artigo em Inglês | MEDLINE | ID: mdl-19226437

RESUMO

With genome analysis expanding from the study of genes to the study of gene regulation, 'regulatory genomics' utilizes sequence information, evolution and functional genomics measurements to unravel how regulatory information is encoded in the genome.


Assuntos
Genômica/métodos , Elementos Reguladores de Transcrição/genética , Sequência de Bases , Biologia Computacional/métodos , Bases de Dados de Ácidos Nucleicos , Evolução Molecular
18.
BMC Bioinformatics ; 9: 266, 2008 Jun 06.
Artigo em Inglês | MEDLINE | ID: mdl-18534038

RESUMO

BACKGROUND: Metabolic fluxes provide invaluable insight on the integrated response of a cell to environmental stimuli or genetic modifications. Current computational methods for estimating the metabolic fluxes from 13C isotopomer measurement data rely either on manual derivation of analytic equations constraining the fluxes or on the numerical solution of a highly nonlinear system of isotopomer balance equations. In the first approach, analytic equations have to be tediously derived for each organism, substrate or labelling pattern, while in the second approach, the global nature of an optimum solution is difficult to prove and comprehensive measurements of external fluxes to augment the 13C isotopomer data are typically needed. RESULTS: We present a novel analytic framework for estimating metabolic flux ratios in the cell from 13C isotopomer measurement data. In the presented framework, equation systems constraining the fluxes are derived automatically from the model of the metabolism of an organism. The framework is designed to be applicable with all metabolic network topologies, 13C isotopomer measurement techniques, substrates and substrate labelling patterns. By analyzing nuclear magnetic resonance (NMR) and mass spectrometry (MS) measurement data obtained from the experiments on glucose with the model micro-organisms Bacillus subtilis and Saccharomyces cerevisiae we show that our framework is able to automatically produce the flux ratios discovered so far by the domain experts with tedious manual analysis. Furthermore, we show by in silico calculability analysis that our framework can rapidly produce flux ratio equations--as well as predict when the flux ratios are unobtainable by linear means--also for substrates not related to glucose. CONCLUSION: The core of 13C metabolic flux analysis framework introduced in this article constitutes of flow and independence analysis of metabolic fragments and techniques for manipulating isotopomer measurements with vector space techniques. These methods facilitate efficient, analytic computation of the ratios between the fluxes of pathways that converge to a common junction metabolite. The framework can been seen as a generalization and formalization of existing tradition for computing metabolic flux ratios where equations constraining flux ratios are manually derived, usually without explicitly showing the formal proofs of the validity of the equations.


Assuntos
Bacillus subtilis/metabolismo , Proteínas de Bactérias/análise , Isótopos de Carbono/farmacocinética , Proteínas Fúngicas/análise , Glucose/metabolismo , Saccharomyces cerevisiae/metabolismo , Inteligência Artificial , Proteínas de Bactérias/metabolismo , Ciclo do Ácido Cítrico/fisiologia , Simulação por Computador , Bases de Dados Factuais , Proteínas Fúngicas/metabolismo , Glicólise/fisiologia , Isomerismo , Marcação por Isótopo , Espectroscopia de Ressonância Magnética , Espectrometria de Massas , Redes Neurais de Computação , Via de Pentose Fosfato/fisiologia , Projetos de Pesquisa , Estatística como Assunto/métodos
19.
Brief Bioinform ; 9(3): 250-3, 2008 May.
Artigo em Inglês | MEDLINE | ID: mdl-18216087

RESUMO

Over recent years, five European PhD programmes have organized a series of 'Bioinformatics Research and Education Workshops'. These workshops address the needs of first-year PhD students and have been designed to combine a maximum of educational impact and scientific stimulation with a minimum of financial and administrative effort. We describe the BREW experience and argue that this type of event constitutes an attractive component of PhD education in computational biology and beyond.


Assuntos
Biologia Computacional/educação , Currículo , Educação de Pós-Graduação/organização & administração , Educação Profissionalizante/organização & administração , Genômica/educação , Ensino/métodos , Educação de Pós-Graduação/métodos , Europa (Continente)
20.
J Integr Bioinform ; 5(2)2008 Aug 25.
Artigo em Inglês | MEDLINE | ID: mdl-20134058

RESUMO

ReMatch is a web-based, user-friendly tool that constructs stoichiometric network models for metabolic flux analysis, integrating user-developed models into a database collected from several comprehensive metabolic data resources, including KEGG, MetaCyc and CheBI. Particularly, ReMatch augments the metabolic reactions of the model with carbon mappings to facilitate (13)C metabolic flux analysis. The construction of a network model consisting of biochemical reactions is the first step in most metabolic modelling tasks. This model construction can be a tedious task as the required information is usually scattered to many separate databases whose interoperability is suboptimal, due to the heterogeneous naming conventions of metabolites in different databases. Another, particularly severe data integration problem is faced in (13)C metabolic flux analysis, where the mappings of carbon atoms from substrates into products in the model are required. ReMatch has been developed to solve the above data integration problems. First, ReMatch matches the imported user-developed model against the internal ReMatch database while considering a comprehensive metabolite name thesaurus. This, together with wild card support, allows the user to specify the model quickly without having to look the names up manually. Second, ReMatch is able to augment reactions of the model with carbon mappings, obtained either from the internal database or given by the user with an easy-touse tool. The constructed models can be exported into 13C-FLUX and SBML file formats. Further, a stoichiometric matrix and visualizations of the network model can be generated. The constructed models of metabolic networks can be optionally made available to the other users of ReMatch. Thus, ReMatch provides a common repository for metabolic network models with carbon mappings for the needs of metabolic flux analysis community. ReMatch is freely available for academic use at http://www.cs.helsinki.fi/group/sysfys/software/rematch/.


Assuntos
Carbono/metabolismo , Biologia Computacional/métodos , Redes e Vias Metabólicas , Software , Bases de Dados Factuais , Internet , Interface Usuário-Computador
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...