Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Front Genet ; 13: 935351, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35938008

RESUMO

Small proteins, encoded by small open reading frames, are only beginning to emerge with the current advancement of omics technology and bioinformatics. There is increasing evidence that small proteins play roles in diverse critical biological functions, such as adjusting cellular metabolism, regulating other protein activities, controlling cell cycles, and affecting disease physiology. In prokaryotes such as bacteria, the small proteins are largely unexplored for their sequence space and functional groups. For most bacterial species from a natural community, the sample cannot be easily isolated or cultured, and the bacterial peptides must be better characterized in a metagenomic manner. The bacterial peptides identified from metagenomic samples can not only enrich the pool of small proteins but can also reveal the community-specific microbe ecology information from a small protein perspective. In this study, metaBP (Bacterial Peptides for metagenomic sample) has been developed as a comprehensive toolkit to explore the small protein universe from metagenomic samples. It takes raw sequencing reads as input, performs protein-level meta-assembly, and computes bacterial peptide homolog groups with sample-specific mutations. The metaBP also integrates general protein annotation tools as well as our small protein-specific machine learning module metaBP-ML to construct a full landscape for bacterial peptides. The metaBP-ML shows advantages for discovering functions of bacterial peptides in a microbial community and increases the yields of annotations by up to five folds. The metaBP toolkit demonstrates its novelty in adopting the protein-level assembly to discover small proteins, integrating protein-clustering tool in a new and flexible environment of RBiotools, and presenting the first-time small protein landscape by metaBP-ML. Taken together, metaBP (and metaBP-ML) can profile functional bacterial peptides from metagenomic samples with potential diverse mutations, in order to depict a unique landscape of small proteins from a microbial community.

2.
Sci Rep ; 7: 40712, 2017 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-28102365

RESUMO

The development of rapid, economical genome sequencing has shed new light on the classification of viruses. As of October 2016, the National Center for Biotechnology Information (NCBI) database contained >2 million viral genome sequences and a reference set of ~4000 viral genome sequences that cover a wide range of known viral families. Whole-genome sequences can be used to improve viral classification and provide insight into the viral "tree of life". However, due to the lack of evolutionary conservation amongst diverse viruses, it is not feasible to build a viral tree of life using traditional phylogenetic methods based on conserved proteins. In this study, we used an alignment-free method that uses k-mers as genomic features for a large-scale comparison of complete viral genomes available in RefSeq. To determine the optimal feature length, k (an essential step in constructing a meaningful dendrogram), we designed a comprehensive strategy that combines three approaches: (1) cumulative relative entropy, (2) average number of common features among genomes, and (3) the Shannon diversity index. This strategy was used to determine k for all 3,905 complete viral genomes in RefSeq. The resulting dendrogram shows consistency with the viral taxonomy of the ICTV and the Baltimore classification of viruses.


Assuntos
Genoma Viral , Genômica , Filogenia , Vírus/classificação , Vírus/genética , Biologia Computacional , Bases de Dados Genéticas , Tamanho do Genoma , Genômica/métodos , Sequenciamento Completo do Genoma
3.
PLoS Comput Biol ; 12(2): e1004744, 2016 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-26844769

RESUMO

MicroRNAs are important regulators of gene expression, acting primarily by binding to sequence-specific locations on already transcribed messenger RNAs (mRNA) and typically down-regulating their stability or translation. Recent studies indicate that microRNAs may also play a role in up-regulating mRNA transcription levels, although a definitive mechanism has not been established. Double-helical DNA is capable of forming triple-helical structures through Hoogsteen and reverse Hoogsteen interactions in the major groove of the duplex, and we show physical evidence (i.e., NMR, FRET, SPR) that purine or pyrimidine-rich microRNAs of appropriate length and sequence form triple-helical structures with purine-rich sequences of duplex DNA, and identify microRNA sequences that favor triplex formation. We developed an algorithm (Trident) to search genome-wide for potential triplex-forming sites and show that several mammalian and non-mammalian genomes are enriched for strong microRNA triplex binding sites. We show that those genes containing sequences favoring microRNA triplex formation are markedly enriched (3.3 fold, p<2.2 × 10(-16)) for genes whose expression is positively correlated with expression of microRNAs targeting triplex binding sequences. This work has thus revealed a new mechanism by which microRNAs could interact with gene promoter regions to modify gene transcription.


Assuntos
DNA/genética , Regulação da Expressão Gênica/genética , MicroRNAs/genética , Algoritmos , Composição de Bases/genética , Sequência de Bases , Sítios de Ligação , Biologia Computacional , DNA/química , Humanos , Leucemia/genética
4.
FEMS Microbiol Rev ; 39(5): 764-78, 2015 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-26175035

RESUMO

The 2014 Ebola outbreak in West Africa is the largest documented for this virus. To examine the dynamics of this genome, we compare more than 100 currently available ebolavirus genomes to each other and to other viral genomes. Based on oligomer frequency analysis, the family Filoviridae forms a distinct group from all other sequenced viral genomes. All filovirus genomes sequenced to date encode proteins with similar functions and gene order, although there is considerable divergence in sequences between the three genera Ebolavirus, Cuevavirus and Marburgvirus within the family Filoviridae. Whereas all ebolavirus genomes are quite similar (multiple sequences of the same strain are often identical), variation is most common in the intergenic regions and within specific areas of the genes encoding the glycoprotein (GP), nucleoprotein (NP) and polymerase (L). We predict regions that could contain epitope-binding sites, which might be good vaccine targets. This information, combined with glycosylation sites and experimentally determined epitopes, can identify the most promising regions for the development of therapeutic strategies.This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).


Assuntos
Ebolavirus/genética , Genoma Viral/genética , Genômica , Filoviridae/genética
5.
Funct Integr Genomics ; 15(2): 141-61, 2015 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-25722247

RESUMO

Since the first two complete bacterial genome sequences were published in 1995, the science of bacteria has dramatically changed. Using third-generation DNA sequencing, it is possible to completely sequence a bacterial genome in a few hours and identify some types of methylation sites along the genome as well. Sequencing of bacterial genome sequences is now a standard procedure, and the information from tens of thousands of bacterial genomes has had a major impact on our views of the bacterial world. In this review, we explore a series of questions to highlight some insights that comparative genomics has produced. To date, there are genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. However, the distribution is quite skewed towards a few phyla that contain model organisms. But the breadth is continuing to improve, with projects dedicated to filling in less characterized taxonomic groups. The clustered regularly interspaced short palindromic repeats (CRISPR)-Cas system provides bacteria with immunity against viruses, which outnumber bacteria by tenfold. How fast can we go? Second-generation sequencing has produced a large number of draft genomes (close to 90 % of bacterial genomes in GenBank are currently not complete); third-generation sequencing can potentially produce a finished genome in a few hours, and at the same time provide methlylation sites along the entire chromosome. The diversity of bacterial communities is extensive as is evident from the genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. Genome sequencing can help in classifying an organism, and in the case where multiple genomes of the same species are available, it is possible to calculate the pan- and core genomes; comparison of more than 2000 Escherichia coli genomes finds an E. coli core genome of about 3100 gene families and a total of about 89,000 different gene families. Why do we care about bacterial genome sequencing? There are many practical applications, such as genome-scale metabolic modeling, biosurveillance, bioforensics, and infectious disease epidemiology. In the near future, high-throughput sequencing of patient metagenomic samples could revolutionize medicine in terms of speed and accuracy of finding pathogens and knowing how to treat them.


Assuntos
Genoma Bacteriano , Bactérias/classificação , Proteínas de Bactérias/genética , Códon , Variação Genética , Tamanho do Genoma , Genômica , Metagenômica , Anotação de Sequência Molecular , Filogenia , Análise de Sequência de DNA
6.
Gene Regul Syst Bio ; 6: 93-107, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22701314

RESUMO

Bacterial gene regulation involves transcription factors (TF) that bind to DNA recognition sequences in operon promoters. These recognition sequences, many of which are palindromic, are known as regulatory elements or transcription factor binding sites (TFBS). Some TFs are global regulators that can modulate the expression of hundreds of genes. In this study we examine global regulator half-sites, where a half-site, which we shall call a binding motif (BM), is one half of a palindromic TFBS. We explore the hypothesis that the number of BMs plays an important role in transcriptional regulation, examining empirical data from transcriptional profiling of the CRP and ArcA regulons. We compare the power of BM counts and of full TFBS characteristics to predict induced transcriptional activity. We find that CRP BM counts have a nonlinear effect on CRP-dependent transcriptional activity and predict this activity better than full TFBS quality or location.

7.
Bioinformatics ; 28(5): 750-1, 2012 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-22238270

RESUMO

UNLABELLED: The BioEnergy Science Center (BESC) is undertaking large experimental campaigns to understand the biosynthesis and biodegradation of biomass and to develop biofuel solutions. BESC is generating large volumes of diverse data, including genome sequences, omics data and assay results. The purpose of the BESC Knowledgebase is to serve as a centralized repository for experimentally generated data and to provide an integrated, interactive and user-friendly analysis framework. The Portal makes available tools for visualization, integration and analysis of data either produced by BESC or obtained from external resources. AVAILABILITY: http://besckb.ornl.gov.


Assuntos
Biocombustíveis , Bases de Conhecimento , Bactérias/metabolismo , Eucariotos/metabolismo , Genômica , Plantas/metabolismo
8.
Glycobiology ; 20(12): 1574-84, 2010 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-20696711

RESUMO

The Carbohydrate-Active Enzyme (CAZy) database provides a rich set of manually annotated enzymes that degrade, modify, or create glycosidic bonds. Despite rich and invaluable information stored in the database, software tools utilizing this information for annotation of newly sequenced genomes by CAZy families are limited. We have employed two annotation approaches to fill the gap between manually curated high-quality protein sequences collected in the CAZy database and the growing number of other protein sequences produced by genome or metagenome sequencing projects. The first approach is based on a similarity search against the entire nonredundant sequences of the CAZy database. The second approach performs annotation using links or correspondences between the CAZy families and protein family domains. The links were discovered using the association rule learning algorithm applied to sequences from the CAZy database. The approaches complement each other and in combination achieved high specificity and sensitivity when cross-evaluated with the manually curated genomes of Clostridium thermocellum ATCC 27405 and Saccharophagus degradans 2-40. The capability of the proposed framework to predict the function of unknown protein domains and of hypothetical proteins in the genome of Neurospora crassa is demonstrated. The framework is implemented as a Web service, the CAZymes Analysis Toolkit, and is available at http://cricket.ornl.gov/cgi-bin/cat.cgi.


Assuntos
Alteromonadaceae/enzimologia , Proteínas de Bactérias/genética , Carboidratos , Clostridium thermocellum/enzimologia , Bases de Dados de Proteínas , Enzimas/genética , Proteínas Fúngicas/genética , Neurospora crassa/enzimologia , Alteromonadaceae/genética , Proteínas de Bactérias/química , Proteínas de Bactérias/metabolismo , Clostridium thermocellum/genética , Enzimas/química , Enzimas/classificação , Proteínas Fúngicas/química , Proteínas Fúngicas/metabolismo , Genoma Bacteriano/fisiologia , Genoma Fúngico/fisiologia , Anotação de Sequência Molecular , Neurospora crassa/genética
9.
Database (Oxford) ; 2010: baq012, 2010 Jul 06.
Artigo em Inglês | MEDLINE | ID: mdl-20627862

RESUMO

Shewanellae are facultative gamma-proteobacteria whose remarkable respiratory versatility has resulted in interest in their utility for bioremediation of heavy metals and radionuclides and for energy generation in microbial fuel cells. Extensive experimental efforts over the last several years and the availability of 21 sequenced Shewanella genomes made it possible to collect and integrate a wealth of information on the genus into one public resource providing new avenues for making biological discoveries and for developing a system level understanding of the cellular processes. The Shewanella knowledgebase was established in 2005 to provide a framework for integrated genome-based studies on Shewanella ecophysiology. The present version of the knowledgebase provides access to a diverse set of experimental and genomic data along with tools for curation of genome annotations and visualization and integration of genomic data with experimental data. As a demonstration of the utility of this resource, we examined a single microarray data set from Shewanella oneidensis MR-1 for new insights into regulatory processes. The integrated analysis of the data predicted a new type of bacterial transcriptional regulation involving co-transcription of the intergenic region with the downstream gene and suggested a biological role for co-transcription that likely prevents the binding of a regulator of the upstream gene to the regulator binding site located in the intergenic region. Database URL: http://shewanella-knowledgebase.org:8080/Shewanella/ or http://spruce.ornl.gov:8080/Shewanella/


Assuntos
DNA Bacteriano/genética , DNA Intergênico/genética , Bases de Conhecimento , Shewanella/genética , Sequência de Bases , Bases de Dados Genéticas , Ecossistema , Inativação Gênica , Genoma Bacteriano , Dados de Sequência Molecular , Alinhamento de Sequência , Shewanella/fisiologia , Transcrição Gênica
10.
Int J Health Geogr ; 8: 45, 2009 Jul 17.
Artigo em Inglês | MEDLINE | ID: mdl-19615075

RESUMO

BACKGROUND: The Centers for Disease Control and Prevention's (CDC's) BioSense system provides near-real time situational awareness for public health monitoring through analysis of electronic health data. Determination of anomalous spatial and temporal disease clusters is a crucial part of the daily disease monitoring task. Our study focused on finding useful anomalies at manageable alert rates according to available BioSense data history. METHODS: The study dataset included more than 3 years of daily counts of military outpatient clinic visits for respiratory and rash syndrome groupings. We applied four spatial estimation methods in implementations of space-time scan statistics cross-checked in Matlab and C. We compared the utility of these methods according to the resultant background cluster rate (a false alarm surrogate) and sensitivity to injected cluster signals. The comparison runs used a spatial resolution based on the facility zip code in the patient record and a finer resolution based on the residence zip code. RESULTS: Simple estimation methods that account for day-of-week (DOW) data patterns yielded a clear advantage both in background cluster rate and in signal sensitivity. A 28-day baseline gave the most robust results for this estimation; the preferred baseline is long enough to remove daily fluctuations but short enough to reflect recent disease trends and data representation. Background cluster rates were lower for the rash syndrome counts than for the respiratory counts, likely because of seasonality and the large scale of the respiratory counts. CONCLUSION: The spatial estimation method should be chosen according to characteristics of the selected data streams. In this dataset with strong day-of-week effects, the overall best detection performance was achieved using subregion averages over a 28-day baseline stratified by weekday or weekend/holiday behavior. Changing the estimation method for particular scenarios involving different spatial resolution or other syndromes can yield further improvement.


Assuntos
Biovigilância/métodos , Análise por Conglomerados , Bases de Dados Factuais/normas , Humanos
11.
Bioinformation ; 4(4): 169-72, 2009 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-20198195

RESUMO

UNLABELLED: Shewanella oneidensis MR-1 is an important model organism for environmental research as it has an exceptional metabolic and respiratory versatility regulated by a complex regulatory network. We have developed a database to collect experimental and computational data relating to regulation of gene and protein expression, and, a visualization environment that enables integration of these data types. The regulatory information in the database includes predictions of DNA regulator binding sites, sigma factor binding sites, transcription units, operons, promoters, and RNA regulators including non-coding RNAs, riboswitches, and different types of terminators. AVAILABILITY: http://shewanella-knowledgebase.org:8080/Shewanella/gbrowserLanding.jsp.

12.
J Nutr ; 134(5): 1032-8, 2004 May.
Artigo em Inglês | MEDLINE | ID: mdl-15113941

RESUMO

Despite its potential importance in obesity and related disorders, little is known about regulation of lipogenesis in human adipose tissue. To investigate this area at the molecular and mechanistic levels, we studied lipogenesis and the regulation of 1 of its core enzymes, fatty acid synthase (FAS), in human adipose tissue in response to hormonal and nutritional manipulation. As a paradigm for lipogenic genes, we cloned the upstream region of the human FAS gene, compared its sequence to that of FAS orthologs from other species, and identified important regulatory elements that lie upstream of the FAS coding region. Lipogenesis, as assessed by glucose incorporation into lipids, was increased by insulin and more so by the combination of insulin and dexamethasone (Dex, a potent glucocorticoid analogue). In parallel, FAS expression, activity, and gene transcription rate were also significantly increased by these treatments. We also showed that linoleic acid, a representative PUFA, attenuated the actions of insulin and Dex on fatty acid and lipid synthesis as well as FAS activity and expression. Using reporter assays, we determined that the regions responsible for hormonal regulation of the FAS gene lie in the proximal portion of the gene's 5'-flanking region, within which we identified an insulin response element similar to the E-box sequence we identified previously in the rat FAS gene. In summary, we demonstrated that lipogenesis occurs in human adipose tissue and can be induced by insulin, further enhanced by glucocorticoids, and suppressed by PUFA in a hormone-dependent manner.


Assuntos
Tecido Adiposo/metabolismo , Ácido Graxo Sintases/genética , Regulação da Expressão Gênica , Lipídeos/biossíntese , Tecido Adiposo/enzimologia , Adulto , Sequência de Bases , Técnicas de Cultura , Dexametasona/farmacologia , Ácidos Graxos/biossíntese , Feminino , Expressão Gênica , Glucocorticoides/farmacologia , Glucose/metabolismo , Humanos , Insulina/farmacologia , Pessoa de Meia-Idade , Dados de Sequência Molecular , Regiões Promotoras Genéticas/fisiologia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...