Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 35
Filtrar
1.
FEBS Lett ; 587(17): 2832-41, 2013 Sep 02.
Artículo en Inglés | MEDLINE | ID: mdl-23831062

RESUMEN

We present an experimental and computational pipeline for the generation of kinetic models of metabolism, and demonstrate its application to glycolysis in Saccharomyces cerevisiae. Starting from an approximate mathematical model, we employ a "cycle of knowledge" strategy, identifying the steps with most control over flux. Kinetic parameters of the individual isoenzymes within these steps are measured experimentally under a standardised set of conditions. Experimental strategies are applied to establish a set of in vivo concentrations for isoenzymes and metabolites. The data are integrated into a mathematical model that is used to predict a new set of metabolite concentrations and reevaluate the control properties of the system. This bottom-up modelling study reveals that control over the metabolic network most directly involved in yeast glycolysis is more widely distributed than previously thought.


Asunto(s)
Glucólisis , Modelos Biológicos , Proteínas de Saccharomyces cerevisiae/química , Saccharomyces cerevisiae/enzimología , Simulación por Computador , Isoenzimas/química , Cinética , Redes y Vías Metabólicas , Saccharomyces cerevisiae/metabolismo , Biología de Sistemas
2.
Sensors (Basel) ; 11(9): 8855-87, 2011.
Artículo en Inglés | MEDLINE | ID: mdl-22164110

RESUMEN

Sensing devices are increasingly being deployed to monitor the physical world around us. One class of application for which sensor data is pertinent is environmental decision support systems, e.g., flood emergency response. For these applications, the sensor readings need to be put in context by integrating them with other sources of data about the surrounding environment. Traditional systems for predicting and detecting floods rely on methods that need significant human resources. In this paper we describe a semantic sensor web architecture for integrating multiple heterogeneous datasets, including live and historic sensor data, databases, and map layers. The architecture provides mechanisms for discovering datasets, defining integrated views over them, continuously receiving data in real-time, and visualising on screen and interacting with the data. Our approach makes extensive use of web service standards for querying and accessing data, and semantic technologies to discover and integrate datasets. We demonstrate the use of our semantic sensor web architecture in the context of a flood response planning web application that uses data from sensor networks monitoring the sea-state around the coast of England.


Asunto(s)
Técnicas de Apoyo para la Decisión , Monitoreo del Ambiente
3.
BMC Bioinformatics ; 11: 582, 2010 Nov 29.
Artículo en Inglés | MEDLINE | ID: mdl-21114840

RESUMEN

BACKGROUND: The behaviour of biological systems can be deduced from their mathematical models. However, multiple sources of data in diverse forms are required in the construction of a model in order to define its components and their biochemical reactions, and corresponding parameters. Automating the assembly and use of systems biology models is dependent upon data integration processes involving the interoperation of data and analytical resources. RESULTS: Taverna workflows have been developed for the automated assembly of quantitative parameterised metabolic networks in the Systems Biology Markup Language (SBML). A SBML model is built in a systematic fashion by the workflows which starts with the construction of a qualitative network using data from a MIRIAM-compliant genome-scale model of yeast metabolism. This is followed by parameterisation of the SBML model with experimental data from two repositories, the SABIO-RK enzyme kinetics database and a database of quantitative experimental results. The models are then calibrated and simulated in workflows that call out to COPASIWS, the web service interface to the COPASI software application for analysing biochemical networks. These systems biology workflows were evaluated for their ability to construct a parameterised model of yeast glycolysis. CONCLUSIONS: Distributed information about metabolic reactions that have been described to MIRIAM standards enables the automated assembly of quantitative systems biology models of metabolic networks based on user-defined criteria. Such data integration processes can be implemented as Taverna workflows to provide a rapid overview of the components and their relationships within a biochemical system.


Asunto(s)
Redes y Vías Metabólicas , Biología de Sistemas/métodos , Bases de Datos Factuales , Modelos Biológicos
4.
FEBS J ; 277(18): 3769-79, 2010 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-20738395

RESUMEN

A limited number of publicly available resources provide access to enzyme kinetic parameters. These have been compiled through manual data mining of published papers, not from the original, raw experimental data from which the parameters were calculated. This is largely due to the lack of software or standards to support the capture, analysis, storage and dissemination of such experimental data. Introduced here is an integrative system to manage experimental enzyme kinetics data from instrument to browser. The approach is based on two interrelated databases: the existing SABIO-RK database, containing kinetic data and corresponding metadata, and the newly introduced experimental raw data repository, MeMo-RK. Both systems are publicly available by web browser and web service interfaces and are configurable to ensure privacy of unpublished data. Users of this system are provided with the ability to view both kinetic parameters and the experimental raw data from which they are calculated, providing increased confidence in the data. A data analysis and submission tool, the kineticswizard, has been developed to allow the experimentalist to perform data collection, analysis and submission to both data resources. The system is designed to be extensible, allowing integration with other manufacturer instruments covering a range of analytical techniques.


Asunto(s)
Bases de Datos de Proteínas , Enzimas/metabolismo , Biología de Sistemas/métodos , Minería de Datos , Procesamiento Automatizado de Datos , Internet , Cinética , Proteínas Recombinantes/metabolismo , Programas Informáticos
5.
Proteomics ; 10(17): 3073-81, 2010 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-20677327

RESUMEN

The Human Proteome Organisation's Proteomics Standards Initiative has developed the GelML (gel electrophoresis markup language) data exchange format for representing gel electrophoresis experiments performed in proteomics investigations. The format closely follows the reporting guidelines for gel electrophoresis, which are part of the Minimum Information About a Proteomics Experiment (MIAPE) set of modules. GelML supports the capture of metadata (such as experimental protocols) and data (such as gel images) resulting from gel electrophoresis so that laboratories can be compliant with the MIAPE Gel Electrophoresis guidelines, while allowing such data sets to be exchanged or downloaded from public repositories. The format is sufficiently flexible to capture data from a broad range of experimental processes, and complements other PSI formats for MS data and the results of protein and peptide identifications to capture entire gel-based proteome workflows. GelML has resulted from the open standardisation process of PSI consisting of both public consultation and anonymous review of the specifications.


Asunto(s)
Bases de Datos de Proteínas , Electroforesis en Gel de Poliacrilamida , Proteómica/métodos , Programas Informáticos , Humanos , Internet , Espectrometría de Masas , Modelos Químicos , Proteómica/normas , Estándares de Referencia , Interfaz Usuario-Computador
7.
Bioinformatics ; 26(7): 932-8, 2010 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-20176582

RESUMEN

MOTIVATION: Research in systems biology is carried out through a combination of experiments and models. Several data standards have been adopted for representing models (Systems Biology Markup Language) and various types of relevant experimental data (such as FuGE and those of the Proteomics Standards Initiative). However, until now, there has been no standard way to associate a model and its entities to the corresponding datasets, or vice versa. Such a standard would provide a means to represent computational simulation results as well as to frame experimental data in the context of a particular model. Target applications include model-driven data analysis, parameter estimation, and sharing and archiving model simulations. RESULTS: We propose the Systems Biology Results Markup Language (SBRML), an XML-based language that associates a model with several datasets. Each dataset is represented as a series of values associated with model variables, and their corresponding parameter values. SBRML provides a flexible way of indexing the results to model parameter values, which supports both spreadsheet-like data and multidimensional data cubes. We present and discuss several examples of SBRML usage in applications such as enzyme kinetics, microarray gene expression and various types of simulation results. AVAILABILITY AND IMPLEMENTATION: The XML Schema file for SBRML is available at http://www.comp-sys-bio.org/SBRML under the Academic Free License (AFL) v3.0.


Asunto(s)
Programas Informáticos , Biología de Sistemas/métodos , Biología Computacional/métodos , Bases de Datos Factuales , Análisis de Secuencia por Matrices de Oligonucleótidos
8.
BMC Bioinformatics ; 10: 226, 2009 Jul 21.
Artículo en Inglés | MEDLINE | ID: mdl-19622144

RESUMEN

BACKGROUND: High content live cell imaging experiments are able to track the cellular localisation of labelled proteins in multiple live cells over a time course. Experiments using high content live cell imaging will generate multiple large datasets that are often stored in an ad-hoc manner. This hinders identification of previously gathered data that may be relevant to current analyses. Whilst solutions exist for managing image data, they are primarily concerned with storage and retrieval of the images themselves and not the data derived from the images. There is therefore a requirement for an information management solution that facilitates the indexing of experimental metadata and results of high content live cell imaging experiments. RESULTS: We have designed and implemented a data model and information management solution for the data gathered through high content live cell imaging experiments. Many of the experiments to be stored measure the translocation of fluorescently labelled proteins from cytoplasm to nucleus in individual cells. The functionality of this database has been enhanced by the addition of an algorithm that automatically annotates results of these experiments with the timings of translocations and periods of any oscillatory translocations as they are uploaded to the repository. Testing has shown the algorithm to perform well with a variety of previously unseen data. CONCLUSION: Our repository is a fully functional example of how high throughput imaging data may be effectively indexed and managed to address the requirements of end users. By implementing the automated analysis of experimental results, we have provided a clear impetus for individuals to ensure that their data forms part of that which is stored in the repository. Although focused on imaging, the solution provided is sufficiently generic to be applied to other functional proteomics and genomics experiments. The software is available from: fhttp://code.google.com/p/livecellim/


Asunto(s)
Biología Computacional/métodos , Procesamiento de Imagen Asistido por Computador/métodos , Gestión de la Información/métodos , Microscopía , Bases de Datos Factuales , Almacenamiento y Recuperación de la Información , Programas Informáticos
9.
OMICS ; 13(3): 239-51, 2009 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-19441879

RESUMEN

The Functional Genomics Experiment data model (FuGE) has been developed to increase the consistency and efficiency of experimental data modeling in the life sciences, and it has been adopted by a number of high-profile standardization organizations. FuGE can be used: (1) directly, whereby generic modeling constructs are used to represent concepts from specific experimental activities; or (2) as a framework within which method-specific models can be developed. FuGE is both rich and flexible, providing a considerable number of modeling constructs, which can be used in a range of different ways. However, such richness and flexibility also mean that modelers and application developers have choices to make when applying FuGE in a given context. This paper captures emerging best practice in the use of FuGE in the light of the experience of several groups by: (1) proposing guidelines for the use and extension of the FuGE data model; (2) presenting design patterns that reflect recurring requirements in experimental data modeling; and (3) describing a community software tool kit (STK) that supports application development using FuGE. We anticipate that these guidelines will encourage consistent usage of FuGE, and as such, will contribute to the development of convergent data standards in omics research.


Asunto(s)
Biología Computacional/métodos , Genómica/métodos , Modelos Teóricos , Simulación por Computador , Citometría de Flujo/instrumentación , Citometría de Flujo/métodos , Reproducibilidad de los Resultados , Programas Informáticos , Interfaz Usuario-Computador
10.
Bioinformatics ; 25(11): 1404-11, 2009 Jun 01.
Artículo en Inglés | MEDLINE | ID: mdl-19336445

RESUMEN

MOTIVATION: Most experimental evidence on kinetic parameters is buried in the literature, whose manual searching is complex, time consuming and partial. These shortcomings become particularly acute in systems biology, where these parameters need to be integrated into detailed, genome-scale, metabolic models. These problems are addressed by KiPar, a dedicated information retrieval system designed to facilitate access to the literature relevant for kinetic modelling of a given metabolic pathway in yeast. Searching for kinetic data in the context of an individual pathway offers modularity as a way of tackling the complexity of developing a full metabolic model. It is also suitable for large-scale mining, since multiple reactions and their kinetic parameters can be specified in a single search request, rather than one reaction at a time, which is unsuitable given the size of genome-scale models. RESULTS: We developed an integrative approach, combining public data and software resources for the rapid development of large-scale text mining tools targeting complex biological information. The user supplies input in the form of identifiers used in relevant data resources to refer to the concepts of interest, e.g. EC numbers, GO and SBO identifiers. By doing so, the user is freed from providing any other knowledge or terminology concerned with these concepts and their relations, since they are retrieved from these and cross-referenced resources automatically. The terminology acquired is used to index the literature by mapping concepts to their synonyms, and then to textual documents mentioning them. The indexing results and the previously acquired knowledge about relations between concepts are used to formulate complex search queries aiming at documents relevant to the user's information needs. The conceptual approach is demonstrated in the implementation of KiPar. Evaluation reveals that KiPar performs better than a Boolean search. The precision achieved for abstracts (60%) and full-text articles (48%) is considerably better than the baseline precision (44% and 24%, respectively). The baseline recall is improved by 36% for abstracts and by 100% for full text. It appears that full-text articles are a much richer source of information on kinetic data than are their abstracts. Finally, the combined results for abstracts and full text compared with the curated literature provide high values for relative recall (88%) and novelty ratio (92%), suggesting that the system is able to retrieve a high proportion of new documents. AVAILABILITY: Source code and documentation are available at: (http://www.mcisb.org/resources/kipar/).


Asunto(s)
Biología Computacional/métodos , Sistemas de Información , Saccharomyces cerevisiae/metabolismo , Programas Informáticos , Sistemas de Información/normas , Redes y Vías Metabólicas , Biología de Sistemas
11.
Proteomics ; 9(5): 1220-9, 2009 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-19253293

RESUMEN

LC-MS experiments can generate large quantities of data, for which a variety of database search engines are available to make peptide and protein identifications. Decoy databases are becoming widely used to place statistical confidence in result sets, allowing the false discovery rate (FDR) to be estimated. Different search engines produce different identification sets so employing more than one search engine could result in an increased number of peptides (and proteins) being identified, if an appropriate mechanism for combining data can be defined. We have developed a search engine independent score, based on FDR, which allows peptide identifications from different search engines to be combined, called the FDR Score. The results demonstrate that the observed FDR is significantly different when analysing the set of identifications made by all three search engines, by each pair of search engines or by a single search engine. Our algorithm assigns identifications to groups according to the set of search engines that have made the identification, and re-assigns the score (combined FDR Score). The combined FDR Score can differentiate between correct and incorrect peptide identifications with high accuracy, allowing on average 35% more peptide identifications to be made at a fixed FDR than using a single search engine.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Almacenamiento y Recuperación de la Información , Péptidos/análisis , Proteómica/métodos , Bases de Datos de Proteínas , Modelos Estadísticos , Proteínas/análisis , Reproducibilidad de los Resultados , Programas Informáticos
12.
Bioinformatics ; 24(22): 2647-9, 2008 Nov 15.
Artículo en Inglés | MEDLINE | ID: mdl-18801749

RESUMEN

MOTIVATION: The Functional Genomics Experiment Object Model (FuGE) supports modelling of experimental processes either directly or through extensions that specialize FuGE for use in specific contexts. FuGE applications commonly include components that capture, store and search experiment descriptions, where the requirements of different applications have much in common. RESULTS: We describe a toolkit that supports data capture, storage and web-based search of FuGE experiment models; the toolkit can be used directly on FuGE compliant models or configured for use with FuGE extensions. The toolkit is illustrated using a FuGE extension standardized by the proteomics standards initiative, namely GelML. AVAILABILITY: The toolkit and a demonstration are available at http://code.google.com/p/fugetoolkit


Asunto(s)
Biología Computacional , Genómica/métodos , Modelos Genéticos , Programas Informáticos , Internet
14.
PLoS One ; 3(6): e2300, 2008 Jun 04.
Artículo en Inglés | MEDLINE | ID: mdl-18523684

RESUMEN

Fungi and oomycetes are the causal agents of many of the most serious diseases of plants. Here we report a detailed comparative analysis of the genome sequences of thirty-six species of fungi and oomycetes, including seven plant pathogenic species, that aims to explore the common genetic features associated with plant disease-causing species. The predicted translational products of each genome have been clustered into groups of potential orthologues using Markov Chain Clustering and the data integrated into the e-Fungi object-oriented data warehouse (http://www.e-fungi.org.uk/). Analysis of the species distribution of members of these clusters has identified proteins that are specific to filamentous fungal species and a group of proteins found only in plant pathogens. By comparing the gene inventories of filamentous, ascomycetous phytopathogenic and free-living species of fungi, we have identified a set of gene families that appear to have expanded during the evolution of phytopathogens and may therefore serve important roles in plant disease. We have also characterised the predicted set of secreted proteins encoded by each genome and identified a set of protein families which are significantly over-represented in the secretomes of plant pathogenic fungi, including putative effector proteins that might perturb host cell biology during plant infection. The results demonstrate the potential of comparative genome analysis for exploring the evolution of eukaryotic microbial pathogenesis.


Asunto(s)
Hongos/genética , Genoma Fúngico , Saccharomyces cerevisiae/genética , Evolución Biológica , Especificidad de la Especie
15.
BMC Bioinformatics ; 9 Suppl 5: S5, 2008 Apr 29.
Artículo en Inglés | MEDLINE | ID: mdl-18460187

RESUMEN

BACKGROUND: Many bioinformatics applications rely on controlled vocabularies or ontologies to consistently interpret and seamlessly integrate information scattered across public resources. Experimental data sets from metabolomics studies need to be integrated with one another, but also with data produced by other types of omics studies in the spirit of systems biology, hence the pressing need for vocabularies and ontologies in metabolomics. However, it is time-consuming and non trivial to construct these resources manually. RESULTS: We describe a methodology for rapid development of controlled vocabularies, a study originally motivated by the needs for vocabularies describing metabolomics technologies. We present case studies involving two controlled vocabularies (for nuclear magnetic resonance spectroscopy and gas chromatography) whose development is currently underway as part of the Metabolomics Standards Initiative. The initial vocabularies were compiled manually, providing a total of 243 and 152 terms. A total of 5,699 and 2,612 new terms were acquired automatically from the literature. The analysis of the results showed that full-text articles (especially the Materials and Methods sections) are the major source of technology-specific terms as opposed to paper abstracts. CONCLUSIONS: We suggest a text mining method for efficient corpus-based term acquisition as a way of rapidly expanding a set of controlled vocabularies with the terms used in the scientific literature. We adopted an integrative approach, combining relatively generic software and data resources for time- and cost-effective development of a text mining tool for expansion of controlled vocabularies across various domains, as a practical alternative to both manual term collection and tailor-made named entity recognition methods.


Asunto(s)
Indización y Redacción de Resúmenes/métodos , Metabolismo , Interfaz Usuario-Computador , Vocabulario Controlado , Cromatografía de Gases , Almacenamiento y Recuperación de la Información/métodos , MEDLINE , Espectroscopía de Resonancia Magnética , Procesamiento de Lenguaje Natural , Reconocimiento de Normas Patrones Automatizadas/métodos , Biología de Sistemas/instrumentación , Biología de Sistemas/estadística & datos numéricos , Integración de Sistemas , Tecnología , Terminología como Asunto , Estados Unidos
16.
BMC Bioinformatics ; 9: 183, 2008 Apr 10.
Artículo en Inglés | MEDLINE | ID: mdl-18402673

RESUMEN

BACKGROUND: The systematic capture of appropriately annotated experimental data is a prerequisite for most bioinformatics analyses. Data capture is required not only for submission of data to public repositories, but also to underpin integrated analysis, archiving, and sharing - both within laboratories and in collaborative projects. The widespread requirement to capture data means that data capture and annotation are taking place at many sites, but the small scale of the literature on tools, techniques and experiences suggests that there is work to be done to identify good practice and reduce duplication of effort. RESULTS: This paper reports on experience gained in the deployment of the Pedro data capture tool in a range of representative bioinformatics applications. The paper makes explicit the requirements that have recurred when capturing data in different contexts, indicates how these requirements are addressed in Pedro, and describes case studies that illustrate where the requirements have arisen in practice. CONCLUSION: Data capture is a fundamental activity for bioinformatics; all biological data resources build on some form of data capture activity, and many require a blend of import, analysis and annotation. Recurring requirements in data capture suggest that model-driven architectures can be used to construct data capture infrastructures that can be rapidly configured to meet the needs of individual use cases. We have described how one such model-driven infrastructure, namely Pedro, has been deployed in representative case studies, and discussed the extent to which the model-driven approach has been effective in practice.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Sistemas de Administración de Bases de Datos , Bases de Datos Factuales , Almacenamiento y Recuperación de la Información/métodos , Programas Informáticos
17.
Nucleic Acids Res ; 36(Web Server issue): W485-90, 2008 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-18440977

RESUMEN

Despite the growing volumes of proteomic data, integration of the underlying results remains problematic owing to differences in formats, data captured, protein accessions and services available from the individual repositories. To address this, we present the ISPIDER Central Proteomic Database search (http://www.ispider.manchester.ac.uk/cgi-bin/ProteomicSearch.pl), an integration service offering novel search capabilities over leading, mature, proteomic repositories including PRoteomics IDEntifications database (PRIDE), PepSeeker, PeptideAtlas and the Global Proteome Machine. It enables users to search for proteins and peptides that have been characterised in mass spectrometry-based proteomics experiments from different groups, stored in different databases, and view the collated results with specialist viewers/clients. In order to overcome limitations imposed by the great variability in protein accessions used by individual laboratories, the European Bioinformatics Institute's Protein Identifier Cross-Reference (PICR) service is used to resolve accessions from different sequence repositories. Custom-built clients allow users to view peptide/protein identifications in different contexts from multiple experiments and repositories, as well as integration with the Dasty2 client supporting any annotations available from Distributed Annotation System servers. Further information on the protein hits may also be added via external web services able to take a protein as input. This web server offers the first truly integrated access to proteomics repositories and provides a unique service to biologists interested in mass spectrometry-based proteomics.


Asunto(s)
Bases de Datos de Proteínas , Proteómica , Programas Informáticos , Gráficos por Computador , Internet , Espectrometría de Masas , Integración de Sistemas
18.
Brief Bioinform ; 9(2): 174-88, 2008 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-18281347

RESUMEN

Proteomics, the study of the protein complement of a biological system, is generating increasing quantities of data from rapidly developing technologies employed in a variety of different experimental workflows. Experimental processes, e.g. for comparative 2D gel studies or LC-MS/MS analyses of complex protein mixtures, involve a number of steps: from experimental design, through wet and dry lab operations, to publication of data in repositories and finally to data annotation and maintenance. The presence of inaccuracies throughout the processing pipeline, however, results in data that can be untrustworthy, thus offsetting the benefits of high-throughput technology. While researchers and practitioners are generally aware of some of the information quality issues associated with public proteomics data, there are few accepted criteria and guidelines for dealing with them. In this article, we highlight factors that impact on the quality of experimental data and review current approaches to information quality management in proteomics. Data quality issues are considered throughout the lifecycle of a proteomics experiment, from experiment design and technique selection, through data analysis, to archiving and sharing.


Asunto(s)
Almacenamiento y Recuperación de la Información , Proteómica , Control de Calidad , Sistemas de Administración de Bases de Datos , Electroforesis en Gel Bidimensional , Almacenamiento y Recuperación de la Información/métodos , Almacenamiento y Recuperación de la Información/normas , Espectrometría de Masas , Proteínas/análisis , Proteómica/instrumentación , Proteómica/métodos , Proteómica/normas , Programas Informáticos
19.
Biochem Soc Trans ; 36(Pt 1): 33-6, 2008 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-18208380

RESUMEN

Experimental processes in the life sciences are becoming increasingly complex. As a result, recording, archiving and sharing descriptions of these processes and of the results of experiments is becoming ever more challenging. However, validation of results, sharing of best practice and integrated analysis all require systematic description of experiments at carefully determined levels of detail. The present paper discusses issues associated with the management of experimental data in the life sciences, including: the different tasks that experimental data and metadata can support, the role of standards in informing data sharing and archiving, and the development of effective databases and tools, building on these standards.


Asunto(s)
Conducta Cooperativa , Sistemas de Administración de Bases de Datos/normas
20.
BMC Genomics ; 8: 426, 2007 Nov 20.
Artículo en Inglés | MEDLINE | ID: mdl-18028535

RESUMEN

BACKGROUND: The number of sequenced fungal genomes is ever increasing, with about 200 genomes already fully sequenced or in progress. Only a small percentage of those genomes have been comprehensively studied, for example using techniques from functional genomics. Comparative analysis has proven to be a useful strategy for enhancing our understanding of evolutionary biology and of the less well understood genomes. However, the data required for these analyses tends to be distributed in various heterogeneous data sources, making systematic comparative studies a cumbersome task. Furthermore, comparative analyses benefit from close integration of derived data sets that cluster genes or organisms in a way that eases the expression of requests that clarify points of similarity or difference between species. DESCRIPTION: To support systematic comparative analyses of fungal genomes we have developed the e-Fungi database, which integrates a variety of data for more than 30 fungal genomes. Publicly available genome data, functional annotations, and pathway information has been integrated into a single data repository and complemented with results of comparative analyses, such as MCL and OrthoMCL cluster analysis, and predictions of signaling proteins and the sub-cellular localisation of proteins. To access the data, a library of analysis tasks is available through a web interface. The analysis tasks are motivated by recent comparative genomics studies, and aim to support the study of evolutionary biology as well as community efforts for improving the annotation of genomes. Web services for each query are also available, enabling the tasks to be incorporated into workflows. CONCLUSION: The e-Fungi database provides fungal biologists with a resource for comparative studies of a large range of fungal genomes. Its analysis library supports the comparative study of genome data, functional annotation, and results of large scale analyses over all the genomes stored in the database. The database is accessible at http://www.e-fungi.org.uk, as is the WSDL for the web services.


Asunto(s)
Bases de Datos Genéticas , Genoma Fúngico/genética , Biología Computacional/métodos , Sistemas de Administración de Bases de Datos , Internet , Interfaz Usuario-Computador
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...