Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 21
Filter
Add more filters










Publication year range
1.
Front Genet ; 15: 1353553, 2024.
Article in English | MEDLINE | ID: mdl-38505828

ABSTRACT

Post-genomic implementations have expanded the experimental strategies to identify elements involved in the regulation of transcription initiation. Here, we present for the first time a detailed analysis of the sources of knowledge supporting the collection of transcriptional regulatory interactions (RIs) of Escherichia coli K-12. An RI groups the transcription factor, its effect (positive or negative) and the regulated target, a promoter, a gene or transcription unit. We improved the evidence codes so that specific methods are incorporated and classified into independent groups. On this basis we updated the computation of confidence levels, weak, strong, or confirmed, for the collection of RIs. These updates enabled us to map the RI set to the current collection of HT TF-binding datasets from ChIP-seq, ChIP-exo, gSELEX and DAP-seq in RegulonDB, enriching in this way the evidence of close to one-quarter (1329) of RIs from the current total 5446 RIs. Based on the new computational capabilities of our improved annotation of evidence sources, we can now analyze the internal architecture of evidence, their categories (experimental, classical, HT, computational), and confidence levels. This is how we know that the joint contribution of HT and computational methods increase the overall fraction of reliable RIs (the sum of confirmed and strong evidence) from 49% to 71%. Thus, the current collection has 3912 reliable RIs, with 2718 or 70% of them with classical evidence which can be used to benchmark novel HT methods. Users can selectively exclude the method they want to benchmark, or keep for instance only the confirmed interactions. The recovery of regulatory sites in RegulonDB by the different HT methods ranges between 33% by ChIP-exo to 76% by ChIP-seq although as discussed, many potential confounding factors limit their interpretation. The collection of improvements reported here provides a solid foundation to incorporate new methods and data, and to further integrate the diverse sources of knowledge of the different components of the transcriptional regulatory network. There is no other genomic database that offers this comprehensive high-quality architecture of knowledge supporting a corpus of transcriptional regulatory interactions.

2.
Nucleic Acids Res ; 52(D1): D255-D264, 2024 Jan 05.
Article in English | MEDLINE | ID: mdl-37971353

ABSTRACT

RegulonDB is a database that contains the most comprehensive corpus of knowledge of the regulation of transcription initiation of Escherichia coli K-12, including data from both classical molecular biology and high-throughput methodologies. Here, we describe biological advances since our last NAR paper of 2019. We explain the changes to satisfy FAIR requirements. We also present a full reconstruction of the RegulonDB computational infrastructure, which has significantly improved data storage, retrieval and accessibility and thus supports a more intuitive and user-friendly experience. The integration of graphical tools provides clear visual representations of genetic regulation data, facilitating data interpretation and knowledge integration. RegulonDB version 12.0 can be accessed at https://regulondb.ccg.unam.mx.


Subject(s)
Databases, Genetic , Escherichia coli K12 , Gene Expression Regulation, Bacterial , Computational Biology/methods , Escherichia coli K12/genetics , Internet , Transcription, Genetic
3.
EcoSal Plus ; 11(1): eesp00022023, 2023 Dec 12.
Article in English | MEDLINE | ID: mdl-37220074

ABSTRACT

EcoCyc is a bioinformatics database available online at EcoCyc.org that describes the genome and the biochemical machinery of Escherichia coli K-12 MG1655. The long-term goal of the project is to describe the complete molecular catalog of the E. coli cell, as well as the functions of each of its molecular parts, to facilitate a system-level understanding of E. coli. EcoCyc is an electronic reference source for E. coli biologists and for biologists who work with related microorganisms. The database includes information pages on each E. coli gene product, metabolite, reaction, operon, and metabolic pathway. The database also includes information on the regulation of gene expression, E. coli gene essentiality, and nutrient conditions that do or do not support the growth of E. coli. The website and downloadable software contain tools for the analysis of high-throughput data sets. In addition, a steady-state metabolic flux model is generated from each new version of EcoCyc and can be executed online. The model can predict metabolic flux rates, nutrient uptake rates, and growth rates for different gene knockouts and nutrient conditions. Data generated from a whole-cell model that is parameterized from the latest data on EcoCyc are also available. This review outlines the data content of EcoCyc and of the procedures by which this content is generated.


Subject(s)
Escherichia coli K12 , Escherichia coli Proteins , Escherichia coli/genetics , Escherichia coli/metabolism , Escherichia coli K12/genetics , Databases, Genetic , Software , Computational Biology , Escherichia coli Proteins/metabolism
4.
bioRxiv ; 2023 Dec 11.
Article in English | MEDLINE | ID: mdl-37163020

ABSTRACT

Post-genomic implementations have expanded the experimental strategies to identify elements involved in the regulation of transcription initiation. As new methodologies emerge, a natural step is to compare their results with those from established methodologies, such as the classic methods of molecular biology used to characterize transcription factor binding sites, promoters, or transcription units. In the case of Escherichia coli K-12, the best-studied microorganism, for the last 30 years we have continuously gathered such knowledge from original scientific publications, and have organized it in two databases, RegulonDB and EcoCyc. Furthermore, since RegulonDB version 11.0 (1), we offer comprehensive datasets of binding sites from chromatin immunoprecipitation combined with sequencing (ChIP-seq), ChIP combined with exonuclease digestion and next-generation sequencing (ChIP-exo), genomic SELEX screening (gSELEX), and DNA affinity purification sequencing (DAP-seq) HT technologies, as well as additional datasets for transcription start sites, transcription units and RNA sequencing (RNA-seq) expression profiles. Here, we present for the first time an analysis of the sources of knowledge supporting the collection of transcriptional regulatory interactions (RIs) of E. coli K-12. An RI is formed by the transcription factor, its positive or negative effect on a promoter, a gene or transcription unit. We improved the evidence codes so that the specific methods are described, and we classified them into seven independent groups. This is the basis for our updated computation of confidence levels, weak, strong, or confirmed, for the collection of RIs. We compare the confidence levels of the RI collection before and after adding HT evidence illustrating how knowledge will change as more HT data and methods appear in the future. Users can generate subsets filtering out the method they want to benchmark and avoid circularity, or keep for instance only the confirmed interactions. The comparison of different HT methods with the available datasets indicate that ChIP-seq recovers the highest fraction (>70%) of binding sites present in RegulonDB followed by gSELEX, DAP-seq and ChIP-exo. There is no other genomic database that offers this comprehensive high-quality anatomy of evidence supporting a corpus of transcriptional regulatory interactions.

5.
Front Bioeng Biotechnol ; 10: 823240, 2022.
Article in English | MEDLINE | ID: mdl-35237580

ABSTRACT

In free-living bacteria, the ability to regulate gene expression is at the core of adapting and interacting with the environment. For these systems to have a logic, a signal must trigger a genetic change that helps the cell to deal with what implies its presence in the environment; briefly, the response is expected to include a feedback to the signal. Thus, it makes sense to think of genetic sensory mechanisms of gene regulation. Escherichia coli K-12 is the bacterium model for which the largest number of regulatory systems and its sensing capabilities have been studied in detail at the molecular level. In this special issue focused on biomolecular sensing systems, we offer an overview of the transcriptional regulatory corpus of knowledge for E. coli that has been gathered in our database, RegulonDB, from the perspective of sensing regulatory systems. Thus, we start with the beginning of the information flux, which is the signal's chemical or physical elements detected by the cell as changes in the environment; these signals are internally transduced to transcription factors and alter their conformation. Signals transduced to effectors bind allosterically to transcription factors, and this defines the dominant sensing mechanism in E. coli. We offer an updated list of the repertoire of known allosteric effectors, as well as a list of the currently known different mechanisms of this sensing capability. Our previous definition of elementary genetic sensory-response units, GENSOR units for short, that integrate signals, transport, gene regulation, and the biochemical response of the regulated gene products of a given transcriptional factor fit perfectly with the purpose of this overview. We summarize the functional heterogeneity of their response, based on our updated collection of GENSORs, and we use them to identify the expected feedback as part of their response. Finally, we address the question of multiple sensing in the regulatory network of E. coli. This overview introduces the architecture of sensing and regulation of native components in E.coli K-12, which might be a source of inspiration to bioengineering applications.

6.
Front Microbiol ; 12: 711077, 2021.
Article in English | MEDLINE | ID: mdl-34394059

ABSTRACT

The EcoCyc model-organism database collects and summarizes experimental data for Escherichia coli K-12. EcoCyc is regularly updated by the manual curation of individual database entries, such as genes, proteins, and metabolic pathways, and by the programmatic addition of results from select high-throughput analyses. Updates to the Pathway Tools software that supports EcoCyc and to the web interface that enables user access have continuously improved its usability and expanded its functionality. This article highlights recent improvements to the curated data in the areas of metabolism, transport, DNA repair, and regulation of gene expression. New and revised data analysis and visualization tools include an interactive metabolic network explorer, a circular genome viewer, and various improvements to the speed and usability of existing tools.

7.
Nucleic Acids Res ; 47(D1): D212-D220, 2019 01 08.
Article in English | MEDLINE | ID: mdl-30395280

ABSTRACT

RegulonDB, first published 20 years ago, is a comprehensive electronic resource about regulation of transcription initiation of Escherichia coli K-12 with decades of knowledge from classic molecular biology experiments, and recently also from high-throughput genomic methodologies. We curated the literature to keep RegulonDB up to date, and initiated curation of ChIP and gSELEX experiments. We estimate that current knowledge describes between 10% and 30% of the expected total number of transcription factor- gene regulatory interactions in E. coli. RegulonDB provides datasets for interactions for which there is no evidence that they affect expression, as well as expression datasets. We developed a proof of concept pipeline to merge binding and expression evidence to identify regulatory interactions. These datasets can be visualized in the RegulonDB JBrowse. We developed the Microbial Conditions Ontology with a controlled vocabulary for the minimal properties to reproduce an experiment, which contributes to integrate data from high throughput and classic literature. At a higher level of integration, we report Genetic Sensory-Response Units for 200 transcription factors, including their regulation at the metabolic level, and include summaries for 70 of them. Finally, we summarize our research with Natural language processing strategies to enhance our biocuration work.


Subject(s)
Computational Biology/methods , Escherichia coli K12/genetics , Gene Expression Regulation, Bacterial , Genomics , Gene Ontology , Gene Regulatory Networks , Genomics/methods , High-Throughput Nucleotide Sequencing
8.
EcoSal Plus ; 8(1)2018 11.
Article in English | MEDLINE | ID: mdl-30406744

ABSTRACT

EcoCyc is a bioinformatics database available at EcoCyc.org that describes the genome and the biochemical machinery of Escherichia coli K-12 MG1655. The long-term goal of the project is to describe the complete molecular catalog of the E. coli cell, as well as the functions of each of its molecular parts, to facilitate a system-level understanding of E. coli. EcoCyc is an electronic reference source for E. coli biologists and for biologists who work with related microorganisms. The database includes information pages on each E. coli gene product, metabolite, reaction, operon, and metabolic pathway. The database also includes information on E. coli gene essentiality and on nutrient conditions that do or do not support the growth of E. coli. The website and downloadable software contain tools for analysis of high-throughput data sets. In addition, a steady-state metabolic flux model is generated from each new version of EcoCyc and can be executed via EcoCyc.org. The model can predict metabolic flux rates, nutrient uptake rates, and growth rates for different gene knockouts and nutrient conditions. This review outlines the data content of EcoCyc and of the procedures by which this content is generated.


Subject(s)
Databases, Genetic , Escherichia coli K12/genetics , Genome, Bacterial , Software , Computational Biology , Escherichia coli Proteins/genetics , Escherichia coli Proteins/metabolism , Gene Expression Regulation, Bacterial , Internet , Metabolic Flux Analysis , Metabolic Networks and Pathways/genetics , User-Computer Interface
11.
Database (Oxford) ; 2017(1)2017 01 01.
Article in English | MEDLINE | ID: mdl-28365731

ABSTRACT

Experimentally generated biological information needs to be organized and structured in order to become meaningful knowledge. However, the rate at which new information is being published makes manual curation increasingly unable to cope. Devising new curation strategies that leverage upon data mining and text analysis is, therefore, a promising avenue to help life science databases to cope with the deluge of novel information. In this article, we describe the integration of text mining technologies in the curation pipeline of the RegulonDB database, and discuss how the process can enhance the productivity of the curators. Specifically, a named entity recognition approach is used to pre-annotate terms referring to a set of domain entities which are potentially relevant for the curation process. The annotated documents are presented to the curator, who, thanks to a custom-designed interface, can select sentences containing specific types of entities, thus restricting the amount of text that needs to be inspected. Additionally, a module capable of computing semantic similarity between sentences across the entire collection of articles to be curated is being integrated in the system. We tested the module using three sets of scientific articles and six domain experts. All these improvements are gradually enabling us to obtain a high throughput curation process with the same quality as manual curation.


Subject(s)
Data Curation/methods , Data Mining/methods , Databases, Factual , Regulon/physiology , Data Curation/standards
12.
Nucleic Acids Res ; 45(D1): D543-D550, 2017 01 04.
Article in English | MEDLINE | ID: mdl-27899573

ABSTRACT

EcoCyc (EcoCyc.org) is a freely accessible, comprehensive database that collects and summarizes experimental data for Escherichia coli K-12, the best-studied bacterial model organism. New experimental discoveries about gene products, their function and regulation, new metabolic pathways, enzymes and cofactors are regularly added to EcoCyc. New SmartTable tools allow users to browse collections of related EcoCyc content. SmartTables can also serve as repositories for user- or curator-generated lists. EcoCyc now supports running and modifying E. coli metabolic models directly on the EcoCyc website.


Subject(s)
Computational Biology/methods , Databases, Genetic , Escherichia coli K12/genetics , Escherichia coli K12/metabolism , Energy Metabolism , Escherichia coli Proteins/genetics , Escherichia coli Proteins/metabolism , Gene Expression Regulation, Bacterial , Metabolic Networks and Pathways , Signal Transduction , Software , Transcription Factors/metabolism , Web Browser
13.
Nucleic Acids Res ; 44(D1): D133-43, 2016 Jan 04.
Article in English | MEDLINE | ID: mdl-26527724

ABSTRACT

RegulonDB (http://regulondb.ccg.unam.mx) is one of the most useful and important resources on bacterial gene regulation,as it integrates the scattered scientific knowledge of the best-characterized organism, Escherichia coli K-12, in a database that organizes large amounts of data. Its electronic format enables researchers to compare their results with the legacy of previous knowledge and supports bioinformatics tools and model building. Here, we summarize our progress with RegulonDB since our last Nucleic Acids Research publication describing RegulonDB, in 2013. In addition to maintaining curation up-to-date, we report a collection of 232 interactions with small RNAs affecting 192 genes, and the complete repertoire of 189 Elementary Genetic Sensory-Response units (GENSOR units), integrating the signal, regulatory interactions, and metabolic pathways they govern. These additions represent major progress to a higher level of understanding of regulated processes. We have updated the computationally predicted transcription factors, which total 304 (184 with experimental evidence and 120 from computational predictions); we updated our position-weight matrices and have included tools for clustering them in evolutionary families. We describe our semiautomatic strategy to accelerate curation, including datasets from high-throughput experiments, a novel coexpression distance to search for 'neighborhood' genes to known operons and regulons, and computational developments.


Subject(s)
Databases, Genetic , Escherichia coli K12/genetics , Gene Expression Regulation, Bacterial , Regulon , Cluster Analysis , Escherichia coli K12/metabolism , Gene Regulatory Networks , Operon , Position-Specific Scoring Matrices , RNA, Small Untranslated/metabolism , Transcription Factors/classification
14.
EcoSal Plus ; 6(1)2014 May.
Article in English | MEDLINE | ID: mdl-26442933

ABSTRACT

EcoCyc is a bioinformatics database available at EcoCyc.org that describes the genome and the biochemical machinery of Escherichia coli K-12 MG1655. The long-term goal of the project is to describe the complete molecular catalog of the E. coli cell, as well as the functions of each of its molecular parts, to facilitate a system-level understanding of E. coli. EcoCyc is an electronic reference source for E. coli biologists and for biologists who work with related microorganisms. The database includes information pages on each E. coli gene, metabolite, reaction, operon, and metabolic pathway. The database also includes information on E. coli gene essentiality and on nutrient conditions that do or do not support the growth of E. coli. The website and downloadable software contain tools for analysis of high-throughput data sets. In addition, a steady-state metabolic flux model is generated from each new version of EcoCyc. The model can predict metabolic flux rates, nutrient uptake rates, and growth rates for different gene knockouts and nutrient conditions. This review provides a detailed description of the data content of EcoCyc and of the procedures by which this content is generated.

15.
Nucleic Acids Res ; 41(Database issue): D203-13, 2013 Jan.
Article in English | MEDLINE | ID: mdl-23203884

ABSTRACT

This article summarizes our progress with RegulonDB (http://regulondb.ccg.unam.mx/) during the past 2 years. We have kept up-to-date the knowledge from the published literature regarding transcriptional regulation in Escherichia coli K-12. We have maintained and expanded our curation efforts to improve the breadth and quality of the encoded experimental knowledge, and we have implemented criteria for the quality of our computational predictions. Regulatory phrases now provide high-level descriptions of regulatory regions. We expanded the assignment of quality to various sources of evidence, particularly for knowledge generated through high-throughput (HT) technology. Based on our analysis of most relevant methods, we defined rules for determining the quality of evidence when multiple independent sources support an entry. With this latest release of RegulonDB, we present a new highly reliable larger collection of transcription start sites, a result of our experimental HT genome-wide efforts. These improvements, together with several novel enhancements (the tracks display, uploading format and curational guidelines), address the challenges of incorporating HT-generated knowledge into RegulonDB. Information on the evolutionary conservation of regulatory elements is also available now. Altogether, RegulonDB version 8.0 is a much better home for integrating knowledge on gene regulation from the sources of information currently available.


Subject(s)
Databases, Genetic , Escherichia coli K12/genetics , Gene Expression Regulation, Bacterial , Regulatory Elements, Transcriptional , Transcription, Genetic , Bacterial Proteins/metabolism , Databases, Genetic/standards , Evolution, Molecular , Genomics , Internet , Promoter Regions, Genetic , Regulon , Repressor Proteins/metabolism , Sequence Analysis, RNA , Transcription Factors/metabolism , Transcription Initiation Site
16.
Nucleic Acids Res ; 41(Database issue): D605-12, 2013 Jan.
Article in English | MEDLINE | ID: mdl-23143106

ABSTRACT

EcoCyc (http://EcoCyc.org) is a model organism database built on the genome sequence of Escherichia coli K-12 MG1655. Expert manual curation of the functions of individual E. coli gene products in EcoCyc has been based on information found in the experimental literature for E. coli K-12-derived strains. Updates to EcoCyc content continue to improve the comprehensive picture of E. coli biology. The utility of EcoCyc is enhanced by new tools available on the EcoCyc web site, and the development of EcoCyc as a teaching tool is increasing the impact of the knowledge collected in EcoCyc.


Subject(s)
Databases, Genetic , Escherichia coli K12/genetics , Binding Sites , Escherichia coli K12/metabolism , Escherichia coli Proteins/classification , Escherichia coli Proteins/metabolism , Gene Expression Regulation, Bacterial , Internet , Membrane Transport Proteins/classification , Membrane Transport Proteins/metabolism , Models, Genetic , Molecular Sequence Annotation , Phenotype , Position-Specific Scoring Matrices , Promoter Regions, Genetic , Systems Biology , Transcription Factors/metabolism , Transcription, Genetic
17.
Methods Mol Biol ; 804: 179-95, 2012.
Article in English | MEDLINE | ID: mdl-22144154

ABSTRACT

RegulonDB contains the largest and currently best-known data set on transcriptional regulation in a single free-living organism, that of Escherichia coli K-12 (Gama-Castro et al. Nucleic Acids Res 36:D120-D124, 2008). This organized knowledge has been the gold standard for the implementation of bioinformatic predictive methods on gene regulation in bacteria (Collado-Vides et al. J Bacteriol 191:23-31, 2009). Given the complexity of different types of interactions, the difficulty of visualizing in a single figure of the whole network, and the different uses of this knowledge, we are making available different views of the genetic network. This chapter describes case studies about how to access these views, via precomputed files, web services and SQL, including sigma-gene relationships corresponding to transcription of alternative RNA polymerase holoenzyme promoters; as well as, transcription factor (TF)-genes, TF-operons, TF-TF, and TF-regulon interactions. 17.


Subject(s)
Computational Biology/methods , Data Mining/methods , Databases, Genetic , Escherichia coli K12/genetics , Gene Regulatory Networks/genetics , Regulon/genetics , Internet , Operon/genetics , Transcription Factors/genetics
18.
Nucleic Acids Res ; 39(Database issue): D98-105, 2011 Jan.
Article in English | MEDLINE | ID: mdl-21051347

ABSTRACT

RegulonDB (http://regulondb.ccg.unam.mx/) is the primary reference database of the best-known regulatory network of any free-living organism, that of Escherichia coli K-12. The major conceptual change since 3 years ago is an expanded biological context so that transcriptional regulation is now part of a unit that initiates with the signal and continues with the signal transduction to the core of regulation, modifying expression of the affected target genes responsible for the response. We call these genetic sensory response units, or Gensor Units. We have initiated their high-level curation, with graphic maps and superreactions with links to other databases. Additional connectivity uses expandable submaps. RegulonDB has summaries for every transcription factor (TF) and TF-binding sites with internal symmetry. Several DNA-binding motifs and their sizes have been redefined and relocated. In addition to data from the literature, we have incorporated our own information on transcription start sites (TSSs) and transcriptional units (TUs), obtained by using high-throughput whole-genome sequencing technologies. A new portable drawing tool for genomic features is also now available, as well as new ways to download the data, including web services, files for several relational database manager systems and text files including BioPAX format.


Subject(s)
Databases, Genetic , Escherichia coli K12/genetics , Gene Expression Regulation, Bacterial , Gene Regulatory Networks , Transcription Factors/metabolism , Binding Sites , Escherichia coli K12/metabolism , Signal Transduction , Systems Integration , Transcription Initiation Site , Transcription, Genetic
19.
Nucleic Acids Res ; 39(Database issue): D583-90, 2011 Jan.
Article in English | MEDLINE | ID: mdl-21097882

ABSTRACT

EcoCyc (http://EcoCyc.org) is a comprehensive model organism database for Escherichia coli K-12 MG1655. From the scientific literature, EcoCyc captures the functions of individual E. coli gene products; their regulation at the transcriptional, post-transcriptional and protein level; and their organization into operons, complexes and pathways. EcoCyc users can search and browse the information in multiple ways. Recent improvements to the EcoCyc Web interface include combined gene/protein pages and a Regulation Summary Diagram displaying a graphical overview of all known regulatory inputs to gene expression and protein activity. The graphical representation of signal transduction pathways has been updated, and the cellular and regulatory overviews were enhanced with new functionality. A specialized undergraduate teaching resource using EcoCyc is being developed.


Subject(s)
Databases, Genetic , Escherichia coli K12/physiology , Binding Sites , Escherichia coli K12/genetics , Escherichia coli K12/metabolism , Escherichia coli Proteins/chemistry , Escherichia coli Proteins/metabolism , Gene Expression Regulation, Bacterial , Signal Transduction , Software , Transcription Factors/metabolism , Transcription, Genetic , User-Computer Interface
SELECTION OF CITATIONS
SEARCH DETAIL
...