Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Metab Eng Commun ; 17: e00225, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37435441

RESUMO

The goal of this study is to develop a general strategy for bacterial engineering using an integrated synthetic biology and machine learning (ML) approach. This strategy was developed in the context of increasing L-threonine production in Escherichia coli ATCC 21277. A set of 16 genes was initially selected based on metabolic pathway relevance to threonine biosynthesis and used for combinatorial cloning to construct a set of 385 strains to generate training data (i.e., a range of L-threonine titers linked to each of the specific gene combinations). Hybrid (regression/classification) deep learning (DL) models were developed and used to predict additional gene combinations in subsequent rounds of combinatorial cloning for increased L-threonine production based on the training data. As a result, E. coli strains built after just three rounds of iterative combinatorial cloning and model prediction generated higher L-threonine titers (from 2.7 g/L to 8.4 g/L) than those of patented L-threonine strains being used as controls (4-5 g/L). Interesting combinations of genes in L-threonine production included deletions of the tdh, metL, dapA, and dhaM genes as well as overexpression of the pntAB, ppc, and aspC genes. Mechanistic analysis of the metabolic system constraints for the best performing constructs offers ways to improve the models by adjusting weights for specific gene combinations. Graph theory analysis of pairwise gene modifications and corresponding levels of L-threonine production also suggests additional rules that can be incorporated into future ML models.

2.
BMC Bioinformatics ; 22(1): 252, 2021 May 17.
Artigo em Inglês | MEDLINE | ID: mdl-34001007

RESUMO

BACKGROUND: Motivated by the size and availability of cell line drug sensitivity data, researchers have been developing machine learning (ML) models for predicting drug response to advance cancer treatment. As drug sensitivity studies continue generating drug response data, a common question is whether the generalization performance of existing prediction models can be further improved with more training data. METHODS: We utilize empirical learning curves for evaluating and comparing the data scaling properties of two neural networks (NNs) and two gradient boosting decision tree (GBDT) models trained on four cell line drug screening datasets. The learning curves are accurately fitted to a power law model, providing a framework for assessing the data scaling behavior of these models. RESULTS: The curves demonstrate that no single model dominates in terms of prediction performance across all datasets and training sizes, thus suggesting that the actual shape of these curves depends on the unique pair of an ML model and a dataset. The multi-input NN (mNN), in which gene expressions of cancer cells and molecular drug descriptors are input into separate subnetworks, outperforms a single-input NN (sNN), where the cell and drug features are concatenated for the input layer. In contrast, a GBDT with hyperparameter tuning exhibits superior performance as compared with both NNs at the lower range of training set sizes for two of the tested datasets, whereas the mNN consistently performs better at the higher range of training sizes. Moreover, the trajectory of the curves suggests that increasing the sample size is expected to further improve prediction scores of both NNs. These observations demonstrate the benefit of using learning curves to evaluate prediction models, providing a broader perspective on the overall data scaling characteristics. CONCLUSIONS: A fitted power law learning curve provides a forward-looking metric for analyzing prediction performance and can serve as a co-design tool to guide experimental biologists and computational scientists in the design of future experiments in prospective research studies.


Assuntos
Neoplasias , Preparações Farmacêuticas , Linhagem Celular , Curva de Aprendizado , Aprendizado de Máquina , Neoplasias/tratamento farmacológico , Neoplasias/genética , Estudos Prospectivos
3.
Curr Opin Biotechnol ; 17(5): 448-56, 2006 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-16978855

RESUMO

Within the past five years genome-scale gene essentiality data sets have been published for ten diverse bacterial species. These data are a rich source of information about cellular networks that we are only beginning to explore. The analysis of these data, very heterogeneous in nature, is a challenging task. Even the definition of 'essential genes' in various genome-scale studies varies from genes 'absolutely required for survival' to those 'strongly contributing to fitness' and robust competitive growth. A comparative analysis of gene essentiality across multiple organisms based on projection of experimentally observed essential genes to functional roles in a collection of metabolic pathways and subsystems is emerging as a powerful tool of systems biology.


Assuntos
Genes Essenciais/genética , Redes e Vias Metabólicas/genética , Biologia de Sistemas/métodos , Biologia Computacional/métodos , Bases de Dados Genéticas , Genoma Bacteriano/genética , Modelos Biológicos
4.
Nucleic Acids Res ; 33(17): 5691-702, 2005.
Artigo em Inglês | MEDLINE | ID: mdl-16214803

RESUMO

The release of the 1000th complete microbial genome will occur in the next two to three years. In anticipation of this milestone, the Fellowship for Interpretation of Genomes (FIG) launched the Project to Annotate 1000 Genomes. The project is built around the principle that the key to improved accuracy in high-throughput annotation technology is to have experts annotate single subsystems over the complete collection of genomes, rather than having an annotation expert attempt to annotate all of the genes in a single genome. Using the subsystems approach, all of the genes implementing the subsystem are analyzed by an expert in that subsystem. An annotation environment was created where populated subsystems are curated and projected to new genomes. A portable notion of a populated subsystem was defined, and tools developed for exchanging and curating these objects. Tools were also developed to resolve conflicts between populated subsystems. The SEED is the first annotation environment that supports this model of annotation. Here, we describe the subsystem approach, and offer the first release of our growing library of populated subsystems. The initial release of data includes 180 177 distinct proteins with 2133 distinct functional roles. This data comes from 173 subsystems and 383 different organisms.


Assuntos
Genoma Arqueal , Genoma Bacteriano , Genômica/métodos , Software , Acil Coenzima A/metabolismo , Coenzima A/biossíntese , Biologia Computacional , Internet , Leucina/metabolismo , Proteínas Ribossômicas/classificação , Terminologia como Assunto , Vocabulário Controlado
5.
FEMS Microbiol Lett ; 250(2): 175-84, 2005 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-16099605

RESUMO

Genome features of the Bacillus cereus group genomes (representative strains of Bacillus cereus, Bacillus anthracis and Bacillus thuringiensis sub spp. israelensis) were analyzed and compared with the Bacillus subtilis genome. A core set of 1381 protein families among the four Bacillus genomes, with an additional set of 933 families common to the B. cereus group, was identified. Differences in signal transduction pathways, membrane transporters, cell surface structures, cell wall, and S-layer proteins suggesting differences in their phenotype were identified. The B. cereus group has signal transduction systems including a tyrosine kinase related to two-component system histidine kinases from B. subtilis. A model for regulation of the stress responsive sigma factor sigmaB in the B. cereus group different from the well studied regulation in B. subtilis has been proposed. Despite a high degree of chromosomal synteny among these genomes, significant differences in cell wall and spore coat proteins that contribute to the survival and adaptation in specific hosts has been identified.


Assuntos
Bacillus anthracis/genética , Bacillus cereus/genética , Bacillus subtilis/genética , Bacillus thuringiensis/genética , Genoma Bacteriano , Proteínas de Bactérias/genética , Parede Celular/genética , Genômica , Glicoproteínas de Membrana/genética , Proteínas de Membrana/genética , Proteínas de Membrana Transportadoras/genética , Transdução de Sinais/genética , Sintenia
6.
Nat Biotechnol ; 22(12): 1554-8, 2004 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-15543133

RESUMO

The lactic acid bacterium Streptococcus thermophilus is widely used for the manufacture of yogurt and cheese. This dairy species of major economic importance is phylogenetically close to pathogenic streptococci, raising the possibility that it has a potential for virulence. Here we report the genome sequences of two yogurt strains of S. thermophilus. We found a striking level of gene decay (10% pseudogenes) in both microorganisms. Many genes involved in carbon utilization are nonfunctional, in line with the paucity of carbon sources in milk. Notably, most streptococcal virulence-related genes that are not involved in basic cellular processes are either inactivated or absent in the dairy streptococcus. Adaptation to the constant milk environment appears to have resulted in the stabilization of the genome structure. We conclude that S. thermophilus has evolved mainly through loss-of-function events that remarkably mirror the environment of the dairy niche resulting in a severely diminished pathogenic potential.


Assuntos
Proteínas de Bactérias/genética , Mapeamento Cromossômico/métodos , Evolução Molecular , Instabilidade Genômica/genética , Infecções Estreptocócicas/genética , Streptococcus thermophilus/genética , Fatores de Virulência/genética , Iogurte/microbiologia , Sequência de Bases , Sequência Conservada , Genoma Bacteriano , Dados de Sequência Molecular , Análise de Sequência de DNA , Homologia de Sequência do Ácido Nucleico , Especificidade da Espécie , Streptococcus thermophilus/classificação , Streptococcus thermophilus/patogenicidade
7.
Nature ; 423(6935): 87-91, 2003 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-12721630

RESUMO

Bacillus cereus is an opportunistic pathogen causing food poisoning manifested by diarrhoeal or emetic syndromes. It is closely related to the animal and human pathogen Bacillus anthracis and the insect pathogen Bacillus thuringiensis, the former being used as a biological weapon and the latter as a pesticide. B. anthracis and B. thuringiensis are readily distinguished from B. cereus by the presence of plasmid-borne specific toxins (B. anthracis and B. thuringiensis) and capsule (B. anthracis). But phylogenetic studies based on the analysis of chromosomal genes bring controversial results, and it is unclear whether B. cereus, B. anthracis and B. thuringiensis are varieties of the same species or different species. Here we report the sequencing and analysis of the type strain B. cereus ATCC 14579. The complete genome sequence of B. cereus ATCC 14579 together with the gapped genome of B. anthracis A2012 enables us to perform comparative analysis, and hence to identify the genes that are conserved between B. cereus and B. anthracis, and the genes that are unique for each species. We use the former to clarify the phylogeny of the cereus group, and the latter to determine plasmid-independent species-specific markers.


Assuntos
Bacillus anthracis/genética , Bacillus cereus/genética , Genoma Bacteriano , Sequência de Bases , Sequência Conservada , Genes Bacterianos/genética , Dados de Sequência Molecular , Filogenia , Plasmídeos/genética , Análise de Sequência de DNA , Especificidade da Espécie
8.
Nucleic Acids Res ; 31(1): 164-71, 2003 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-12519973

RESUMO

The ERGO (http://ergo.integratedgenomics.com/ERGO/) genome analysis and discovery suite is an integration of biological data from genomics, biochemistry, high-throughput expression profiling, genetics and peer-reviewed journals to achieve a comprehensive analysis of genes and genomes. Far beyond any conventional systems that facilitate functional assignments, ERGO combines pattern-based analysis with comparative genomics by visualizing genes within the context of regulation, expression profiling, phylogenetic clusters, fusion events, networked cellular pathways and chromosomal neighborhoods of other functionally related genes. The result of this multifaceted approach is to provide an extensively curated database of the largest available integration of genomes, with a vast collection of reconstructed cellular pathways spanning all domains of life. Although access to ERGO is provided only under subscription, it is already widely used by the academic community. The current version of the system integrates 500 genomes from all domains of life in various levels of completion, 403 of which are available for subscription.


Assuntos
Bases de Dados Genéticas , Genoma , Genômica , Animais , Biologia Computacional , Perfilação da Expressão Gênica , Metabolismo , Proteínas/fisiologia
9.
J Bacteriol ; 184(16): 4555-72, 2002 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-12142426

RESUMO

Novel drug targets are required in order to design new defenses against antibiotic-resistant pathogens. Comparative genomics provides new opportunities for finding optimal targets among previously unexplored cellular functions, based on an understanding of related biological processes in bacterial pathogens and their hosts. We describe an integrated approach to identification and prioritization of broad-spectrum drug targets. Our strategy is based on genetic footprinting in Escherichia coli followed by metabolic context analysis of essential gene orthologs in various species. Genes required for viability of E. coli in rich medium were identified on a whole-genome scale using the genetic footprinting technique. Potential target pathways were deduced from these data and compared with a panel of representative bacterial pathogens by using metabolic reconstructions from genomic data. Conserved and indispensable functions revealed by this analysis potentially represent broad-spectrum antibacterial targets. Further target prioritization involves comparison of the corresponding pathways and individual functions between pathogens and the human host. The most promising targets are validated by direct knockouts in model pathogens. The efficacy of this approach is illustrated using examples from metabolism of adenylate cofactors NAD(P), coenzyme A, and flavin adenine dinucleotide. Several drug targets within these pathways, including three distantly related adenylyltransferases (orthologs of the E. coli genes nadD, coaD, and ribF), are discussed in detail.


Assuntos
Coenzima A/biossíntese , Escherichia coli/metabolismo , Flavina-Adenina Dinucleotídeo/biossíntese , NADP/biossíntese , Antibacterianos , Pegada de DNA , Elementos de DNA Transponíveis , Desenho de Fármacos , Farmacorresistência Bacteriana , Escherichia coli/efeitos dos fármacos , Escherichia coli/genética , Mononucleotídeo de Flavina/biossíntese , Genoma Bacteriano , Mutagênese Insercional , Nicotinamida-Nucleotídeo Adenililtransferase/metabolismo , Fosfotransferases (Aceptor do Grupo Álcool)/genética , Especificidade por Substrato
10.
J Bacteriol ; 184(7): 2005-18, 2002 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-11889109

RESUMO

We present a complete DNA sequence and metabolic analysis of the dominant oral bacterium Fusobacterium nucleatum. Although not considered a major dental pathogen on its own, this anaerobe facilitates the aggregation and establishment of several other species including the dental pathogens Porphyromonas gingivalis and Bacteroides forsythus. The F. nucleatum strain ATCC 25586 genome was assembled from shotgun sequences and analyzed using the ERGO bioinformatics suite (http://www.integratedgenomics.com). The genome contains 2.17 Mb encoding 2,067 open reading frames, organized on a single circular chromosome with 27% GC content. Despite its taxonomic position among the gram-negative bacteria, several features of its core metabolism are similar to that of gram-positive Clostridium spp., Enterococcus spp., and Lactococcus spp. The genome analysis has revealed several key aspects of the pathways of organic acid, amino acid, carbohydrate, and lipid metabolism. Nine very-high-molecular-weight outer membrane proteins are predicted from the sequence, none of which has been reported in the literature. More than 137 transporters for the uptake of a variety of substrates such as peptides, sugars, metal ions, and cofactors have been identified. Biosynthetic pathways exist for only three amino acids: glutamate, aspartate, and asparagine. The remaining amino acids are imported as such or as di- or oligopeptides that are subsequently degraded in the cytoplasm. A principal source of energy appears to be the fermentation of glutamate to butyrate. Additionally, desulfuration of cysteine and methionine yields ammonia, H(2)S, methyl mercaptan, and butyrate, which are capable of arresting fibroblast growth, thus preventing wound healing and aiding penetration of the gingival epithelium. The metabolic capabilities of F. nucleatum revealed by its genome are therefore consistent with its specialized niche in the mouth.


Assuntos
Fusobacterium nucleatum/genética , Genoma Bacteriano , Biossíntese de Proteínas , Transcrição Gênica , Aminoácidos/metabolismo , Proteínas da Membrana Bacteriana Externa/metabolismo , Transporte Biológico , Divisão Celular , Coenzimas/metabolismo , Reparo do DNA , Replicação do DNA , Elementos de DNA Transponíveis , DNA Bacteriano/análise , Farmacorresistência Bacteriana , Fusobacterium nucleatum/metabolismo , Metabolismo dos Lipídeos , Lipopolissacarídeos/metabolismo , Mutagênese Insercional , Nucleotídeos/metabolismo , Prótons , Transdução de Sinais/fisiologia , Virulência
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...