Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
bioRxiv ; 2024 Apr 02.
Artigo em Inglês | MEDLINE | ID: mdl-38617363

RESUMO

Transcripts are potential therapeutic targets, yet bacterial transcripts remain biological dark matter with uncharacterized biodiversity. We developed and applied an algorithm to predict transcripts for Escherichia coli K12 and E2348/69 strains (Bacteria:gamma-Proteobacteria) with newly generated ONT direct RNA sequencing data while predicting transcripts for Listeria monocytogenes strains Scott A and RO15 (Bacteria:Firmicute), Pseudomonas aeruginosa strains SG17M and NN2 strains (Bacteria:gamma-Proteobacteria), and Haloferax volcanii (Archaea:Halobacteria) using publicly available data. From >5 million E. coli K12 ONT direct RNA sequencing reads, 2,484 mRNAs are predicted and contain more than half of the predicted E. coli proteins. While the number of predicted transcripts varied by strain based on the amount of sequence data used for the predictions, across all strains examined, the average size of the predicted mRNAs is 1.6-1.7 kbp while the median size of the predicted bacterial 5'- and 3'- UTRs are 30-90 bp. Given the lack of bacterial and archaeal transcript annotation, most predictions are of novel transcripts, but we also predicted many previously characterized mRNAs and ncRNAs, including post-transcriptionally generated transcripts and small RNAs associated with pathogenesis in the E. coli E2348/69 LEE pathogenicity islands. We predicted small transcripts in the 100-200 bp range as well as >10 kbp transcripts for all strains, with the longest transcript for two of the seven strains being the nuo operon transcript, and for another two strains it was a phage/prophage transcript. This quick, easy, inexpensive, and reproducible method will facilitate the presentation of operons, transcripts, and UTR predictions alongside CDS and protein predictions in bacterial genome annotation as important resources for the research community.

2.
bioRxiv ; 2024 Feb 28.
Artigo em Inglês | MEDLINE | ID: mdl-38464021

RESUMO

The rising quality and amount of multi-omic data across biomedical science demands that we build innovative solutions to harness their collective discovery potential. From publicly available repositories, we have assembled and curated a compendium of gene-level transcriptomic data focused on mammalian excitatory neurogenesis in the neocortex. This collection is open for exploration by both computational and cell biologists at nemoanalytics.org, and this report forms a demonstration of its utility. Applying our novel structured joint decomposition approach to mouse, macaque and human data from the collection, we define transcriptome dynamics that are conserved across mammalian excitatory neurogenesis and which map onto the genetics of human brain structure and disease. Leveraging additional data within NeMO Analytics via projection methods, we chart the dynamics of these fundamental molecular elements of neurogenesis across developmental time and space and into postnatal life. Reversing the direction of our investigation, we use transcriptomic data from laminar-specific dissection of adult human neocortex to define molecular signatures specific to excitatory neuronal cell types resident in individual layers of the mature neocortex, and trace their emergence across development. We show that while many lineage defining transcription factors are most highly expressed at early fetal ages, the laminar neuronal identities which they drive take years to decades to reach full maturity. Finally, we interrogated data from stem-cell derived cerebral organoid systems demonstrating that many fundamental elements of in vivo development are recapitulated with high-fidelity in vitro, while specific transcriptomic programs in neuronal maturation are absent. We propose these analyses as specific applications of the general approach of combining joint decomposition with large curated collections of analysis-ready multi-omics data matrices focused on particular cell and disease contexts. Importantly, these open environments are accessible to, and must be fueled with emerging data by, cell biologists with and without coding expertise.

3.
Front Immunol ; 14: 1179314, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37465667

RESUMO

Introduction: Host gene and protein expression impact susceptibility to clinical malaria, but the balance of immune cell populations, cytokines and genes that contributes to protection, remains incompletely understood. Little is known about the determinants of host susceptibility to clinical malaria at a time when acquired immunity is developing. Methods: We analyzed peripheral blood mononuclear cells (PBMCs) collected from children who differed in susceptibility to clinical malaria, all from a small town in Mali. PBMCs were collected from children aged 4-6 years at the start, peak and end of the malaria season. We characterized the immune cell composition and cytokine secretion for a subset of 20 children per timepoint (10 children with no symptomatic malaria age-matched to 10 children with >2 symptomatic malarial illnesses), and gene expression patterns for six children (three per cohort) per timepoint. Results: We observed differences between the two groups of children in the expression of genes related to cell death and inflammation; in particular, inflammatory genes such as CXCL10 and STAT1 and apoptotic genes such as XAF1 were upregulated in susceptible children before the transmission season began. We also noted higher frequency of HLA-DR+ CD4 T cells in protected children during the peak of the malaria season and comparable levels cytokine secretion after stimulation with malaria schizonts across all three time points. Conclusion: This study highlights the importance of baseline immune signatures in determining disease outcome. Our data suggests that differences in apoptotic and inflammatory gene expression patterns can serve as predictive markers of susceptibility to clinical malaria.


Assuntos
Malária Falciparum , Malária , Criança , Humanos , Leucócitos Mononucleares , Malária/genética , Citocinas , Imunidade Adaptativa
4.
Nucleic Acids Res ; 51(D1): D1075-D1085, 2023 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-36318260

RESUMO

Scalable technologies to sequence the transcriptomes and epigenomes of single cells are transforming our understanding of cell types and cell states. The Brain Research through Advancing Innovative Neurotechnologies (BRAIN) Initiative Cell Census Network (BICCN) is applying these technologies at unprecedented scale to map the cell types in the mammalian brain. In an effort to increase data FAIRness (Findable, Accessible, Interoperable, Reusable), the NIH has established repositories to make data generated by the BICCN and related BRAIN Initiative projects accessible to the broader research community. Here, we describe the Neuroscience Multi-Omic Archive (NeMO Archive; nemoarchive.org), which serves as the primary repository for genomics data from the BRAIN Initiative. Working closely with other BRAIN Initiative researchers, we have organized these data into a continually expanding, curated repository, which contains transcriptomic and epigenomic data from over 50 million brain cells, including single-cell genomic data from all of the major regions of the adult and prenatal human and mouse brains, as well as substantial single-cell genomic data from non-human primates. We make available several tools for accessing these data, including a searchable web portal, a cloud-computing interface for large-scale data processing (implemented on Terra, terra.bio), and a visualization and analysis platform, NeMO Analytics (nemoanalytics.org).


Assuntos
Encéfalo , Bases de Dados Genéticas , Epigenômica , Multiômica , Transcriptoma , Animais , Camundongos , Genômica , Mamíferos , Primatas , Encéfalo/citologia , Encéfalo/metabolismo
5.
Nature ; 598(7879): 103-110, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-34616066

RESUMO

Single-cell transcriptomics can provide quantitative molecular signatures for large, unbiased samples of the diverse cell types in the brain1-3. With the proliferation of multi-omics datasets, a major challenge is to validate and integrate results into a biological understanding of cell-type organization. Here we generated transcriptomes and epigenomes from more than 500,000 individual cells in the mouse primary motor cortex, a structure that has an evolutionarily conserved role in locomotion. We developed computational and statistical methods to integrate multimodal data and quantitatively validate cell-type reproducibility. The resulting reference atlas-containing over 56 neuronal cell types that are highly replicable across analysis methods, sequencing technologies and modalities-is a comprehensive molecular and genomic account of the diverse neuronal and non-neuronal cell types in the mouse primary motor cortex. The atlas includes a population of excitatory neurons that resemble pyramidal cells in layer 4 in other cortical regions4. We further discovered thousands of concordant marker genes and gene regulatory elements for these cell types. Our results highlight the complex molecular regulation of cell types in the brain and will directly enable the design of reagents to target specific cell types in the mouse primary motor cortex for functional analysis.


Assuntos
Epigenômica , Perfilação da Expressão Gênica , Córtex Motor/citologia , Neurônios/classificação , Análise de Célula Única , Transcriptoma , Animais , Atlas como Assunto , Conjuntos de Dados como Assunto , Epigênese Genética , Feminino , Masculino , Camundongos , Córtex Motor/anatomia & histologia , Neurônios/citologia , Neurônios/metabolismo , Especificidade de Órgãos , Reprodutibilidade dos Testes
7.
mSystems ; 6(1)2021 Jan 12.
Artigo em Inglês | MEDLINE | ID: mdl-33436511

RESUMO

Quantification tools for RNA sequencing (RNA-Seq) analyses are often designed and tested using human transcriptomics data sets, in which full-length transcript sequences are well annotated. For prokaryotic transcriptomics experiments, full-length transcript sequences are seldom known, and coding sequences must instead be used for quantification steps in RNA-Seq analyses. However, operons confound accurate quantification of coding sequences since a single transcript does not necessarily equate to a single gene. Here, we introduce FADU (Feature Aggregate Depth Utility), a quantification tool designed specifically for prokaryotic RNA-Seq analyses. FADU assigns partial count values proportional to the length of the fragment overlapping the target feature. To assess the ability of FADU to quantify genes in prokaryotic transcriptomics analyses, we compared its performance to those of eXpress, featureCounts, HTSeq, kallisto, and Salmon across three paired-end read data sets of (i) Ehrlichia chaffeensis, (ii) Escherichia coli, and (iii) the Wolbachia endosymbiont wBm. Across each of the three data sets, we find that FADU can more accurately quantify operonic genes by deriving proportional counts for multigene fragments within operons. FADU is available at https://github.com/IGS/FADUIMPORTANCE Most currently available quantification tools for transcriptomics analyses have been designed for human data sets, in which full-length transcript sequences, including the untranslated regions, are well annotated. In most prokaryotic systems, full-length transcript sequences have yet to be characterized, leading to prokaryotic transcriptomics analyses being performed based on only the coding sequences. In contrast to eukaryotes, prokaryotes contain polycistronic transcripts, and when genes are quantified based on coding sequences instead of transcript sequences, this leads to an increased abundance of improperly assigned ambiguous multigene fragments, specifically those mapping to multiple genes in operons. Here, we describe FADU, a quantification tool for prokaryotic RNA-Seq analyses designed to assign proportional counts with the purpose of better quantifying operonic genes while minimizing the pitfalls associated with improperly assigning fragment counts from ambiguous transcripts.

8.
Microb Genom ; 3(9): e000122, 2017 09.
Artigo em Inglês | MEDLINE | ID: mdl-29114401

RESUMO

As sequencing technologies have evolved, the tools to analyze these sequences have made similar advances. However, for multi-species samples, we observed important and adverse differences in alignment specificity and computation time for bwa- mem (Burrows-Wheeler aligner-maximum exact matches) relative to bwa-aln. Therefore, we sought to optimize bwa-mem for alignment of data from multi-species samples in order to reduce alignment time and increase the specificity of alignments. In the multi-species cases examined, there was one majority member (i.e. Plasmodium falciparum or Brugia malayi) and one minority member (i.e. human or the Wolbachia endosymbiont wBm) of the sequence data. Increasing bwa-mem seed length from the default value reduced the number of read pairs from the majority sequence member that incorrectly aligned to the reference genome of the minority sequence member. Combining both source genomes into a single reference genome increased the specificity of mapping, while also reducing the central processing unit (CPU) time. In Plasmodium, at a seed length of 18 nt, 24.1 % of reads mapped to the human genome using 1.7±0.1 CPU hours, while 83.6 % of reads mapped to the Plasmodium genome using 0.2±0.0 CPU hours (total: 107.7 % reads mapping; in 1.9±0.1 CPU hours). In contrast, 97.1 % of the reads mapped to a combined Plasmodium-human reference in only 0.7±0.0 CPU hours. Overall, the results suggest that combining all references into a single reference database and using a 23 nt seed length reduces the computational time, while maximizing specificity. Similar results were found for simulated sequence reads from a mock metagenomic data set. We found similar improvements to computation time in a publicly available human-only data set.


Assuntos
Alinhamento de Sequência/métodos , Análise de Sequência de DNA , Software , Animais , Brugia Malayi/genética , Mapeamento Cromossômico , Confiabilidade dos Dados , Bases de Dados Genéticas , Conjuntos de Dados como Assunto , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Metagenômica , Plasmodium falciparum/genética , Fatores de Tempo , Wolbachia/genética
9.
BMC Genomics ; 18(1): 332, 2017 04 27.
Artigo em Inglês | MEDLINE | ID: mdl-28449639

RESUMO

BACKGROUND: The benefit of increasing genomic sequence data to the scientific community depends on easy-to-use, scalable bioinformatics support. CloVR-Comparative combines commonly used bioinformatics tools into an intuitive, automated, and cloud-enabled analysis pipeline for comparative microbial genomics. RESULTS: CloVR-Comparative runs on annotated complete or draft genome sequences that are uploaded by the user or selected via a taxonomic tree-based user interface and downloaded from NCBI. CloVR-Comparative runs reference-free multiple whole-genome alignments to determine unique, shared and core coding sequences (CDSs) and single nucleotide polymorphisms (SNPs). Output includes short summary reports and detailed text-based results files, graphical visualizations (phylogenetic trees, circular figures), and a database file linked to the Sybil comparative genome browser. Data up- and download, pipeline configuration and monitoring, and access to Sybil are managed through CloVR-Comparative web interface. CloVR-Comparative and Sybil are distributed as part of the CloVR virtual appliance, which runs on local computers or the Amazon EC2 cloud. Representative datasets (e.g. 40 draft and complete Escherichia coli genomes) are processed in <36 h on a local desktop or at a cost of <$20 on EC2. CONCLUSIONS: CloVR-Comparative allows anybody with Internet access to run comparative genomics projects, while eliminating the need for on-site computational resources and expertise.


Assuntos
Computação em Nuvem , Genômica/métodos , Software , Automação , Genoma Microbiano/genética , Alinhamento de Sequência , Análise de Sequência
10.
Int J Comput Biol Drug Des ; 7(2-3): 130-45, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24878725

RESUMO

Bayesian-like operational taxonomic unit examiner (BOTUX) is a new tool for the classification of 16S rRNA gene sequences into operational taxonomic units (OTUs) that addresses the problem of overestimation caused by errors introduced during PCR amplification and DNA sequencing steps. BOTUX utilises a grammar-based assignment strategy, where Bayesian models are built from each word of a given length (e.g., 8-mers). de novo analysis is possible with BOTUX as it does not require a training set, and updates probabilistic models as new sequences are recruited to an OTU. In benchmarking tests performed with real and simulated datasets of 16S rDNA sequences, BOTUX accurately identifies OTUs with comparable or better clustering efficiency and lower execution times than other OTU algorithms tested. BOTUX is the only OTU classifier, which allows incremental analysis of large datasets, and is also adept in clustering both 454 and Illumina datasets in a reasonable timeframe.


Assuntos
RNA Ribossômico 16S/genética , Análise de Sequência de DNA , Algoritmos , Teorema de Bayes
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...