Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Microb Genom ; 10(5)2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38785221

RESUMO

Wastewater-based surveillance (WBS) is an important epidemiological and public health tool for tracking pathogens across the scale of a building, neighbourhood, city, or region. WBS gained widespread adoption globally during the SARS-CoV-2 pandemic for estimating community infection levels by qPCR. Sequencing pathogen genes or genomes from wastewater adds information about pathogen genetic diversity, which can be used to identify viral lineages (including variants of concern) that are circulating in a local population. Capturing the genetic diversity by WBS sequencing is not trivial, as wastewater samples often contain a diverse mixture of viral lineages with real mutations and sequencing errors, which must be deconvoluted computationally from short sequencing reads. In this study we assess nine different computational tools that have recently been developed to address this challenge. We simulated 100 wastewater sequence samples consisting of SARS-CoV-2 BA.1, BA.2, and Delta lineages, in various mixtures, as well as a Delta-Omicron recombinant and a synthetic 'novel' lineage. Most tools performed well in identifying the true lineages present and estimating their relative abundances and were generally robust to variation in sequencing depth and read length. While many tools identified lineages present down to 1 % frequency, results were more reliable above a 5 % threshold. The presence of an unknown synthetic lineage, which represents an unclassified SARS-CoV-2 lineage, increases the error in relative abundance estimates of other lineages, but the magnitude of this effect was small for most tools. The tools also varied in how they labelled novel synthetic lineages and recombinants. While our simulated dataset represents just one of many possible use cases for these methods, we hope it helps users understand potential sources of error or bias in wastewater sequencing analysis and to appreciate the commonalities and differences across methods.


Assuntos
COVID-19 , Genoma Viral , SARS-CoV-2 , Águas Residuárias , Águas Residuárias/virologia , SARS-CoV-2/genética , SARS-CoV-2/classificação , COVID-19/virologia , COVID-19/epidemiologia , Humanos , Biologia Computacional/métodos , Genômica/métodos , Vigilância Epidemiológica Baseada em Águas Residuárias , Filogenia
2.
Cell Rep Methods ; 2(10): 100313, 2022 10 24.
Artigo em Inglês | MEDLINE | ID: mdl-36159190

RESUMO

Wastewater surveillance has become essential for monitoring the spread of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The quantification of SARS-CoV-2 RNA in wastewater correlates with the coronavirus disease 2019 (COVID-19) caseload in a community. However, estimating the proportions of different SARS-CoV-2 haplotypes has remained technically difficult. We present a phylogenetic imputation method for improving the SARS-CoV-2 reference database and a method for estimating the relative proportions of SARS-CoV-2 haplotypes from wastewater samples. The phylogenetic imputation method uses the global SARS-CoV-2 phylogeny and imputes based on the maximum of the posterior probability of each nucleotide. We show that the imputation method has error rates comparable to, or lower than, typical sequencing error rates, which substantially improves the reference database and allows for accurate inferences of haplotype composition. Our method for estimating relative proportions of haplotypes uses an initial step to remove unlikely haplotypes and an expectation maximization (EM) algorithm for obtaining maximum likelihood estimates of the proportions of different haplotypes in a sample. Using simulations with a reference database of >3 million SARS-CoV-2 genomes, we show that the estimated proportions reflect the true proportions given sufficiently high sequencing depth.


Assuntos
COVID-19 , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , COVID-19/epidemiologia , Haplótipos , Filogenia , RNA Viral/genética , Águas Residuárias , Vigilância Epidemiológica Baseada em Águas Residuárias , Funções Verossimilhança
3.
Bioinformatics ; 38(3): 663-670, 2022 01 12.
Artigo em Inglês | MEDLINE | ID: mdl-34668516

RESUMO

MOTIVATION: Clustering is a fundamental task in the analysis of nucleotide sequences. Despite the exponential increase in the size of sequence databases of homologous genes, few methods exist to cluster divergent sequences. Traditional clustering methods have mostly focused on optimizing high speed clustering of highly similar sequences. We develop a phylogenetic clustering method which infers ancestral sequences for a set of initial clusters and then uses a greedy algorithm to cluster sequences. RESULTS: We describe a clustering program AncestralClust, which is developed for clustering divergent sequences. We compare this method with other state-of-the-art clustering methods using datasets of homologous sequences from different species. We show that, in divergent datasets, AncestralClust has higher accuracy and more even cluster sizes than current popular methods. AVAILABILITY AND IMPLEMENTATION: AncestralClust is an Open Source program available at https://github.com/lpipes/ancestralclust. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Software , Filogenia , Sequência de Bases , Análise por Conglomerados
4.
Virus Evol ; 7(1): veaa098, 2021 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-33500788

RESUMO

Human severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is most closely related, by average genetic distance, to two coronaviruses isolated from bats, RaTG13 and RmYN02. However, there is a segment of high amino acid similarity between human SARS-CoV-2 and a pangolin-isolated strain, GD410721, in the receptor-binding domain (RBD) of the spike protein, a pattern that can be caused by either recombination or by convergent amino acid evolution driven by natural selection. We perform a detailed analysis of the synonymous divergence, which is less likely to be affected by selection than amino acid divergence, between human SARS-CoV-2 and related strains. We show that the synonymous divergence between the bat-derived viruses and SARS-CoV-2 is larger than between GD410721 and SARS-CoV-2 in the RBD, providing strong additional support for the recombination hypothesis. However, the synonymous divergence between pangolin strain and SARS-CoV-2 is also relatively high, which is not consistent with a recent recombination between them, instead, it suggests a recombination into RaTG13. We also find a 14-fold increase in the dN /dS ratio from the lineage leading to SARS-CoV-2 to the strains of the current pandemic, suggesting that the vast majority of nonsynonymous mutations currently segregating within the human strains have a negative impact on viral fitness. Finally, we estimate that the time to the most recent common ancestor of SARS-CoV-2 and RaTG13 or RmYN02 based on synonymous divergence is 51.71 years (95% CI, 28.11-75.31) and 37.02 years (95% CI, 18.19-55.85), respectively.

5.
Mol Biol Evol ; 38(4): 1537-1543, 2021 04 13.
Artigo em Inglês | MEDLINE | ID: mdl-33295605

RESUMO

The rooting of the SARS-CoV-2 phylogeny is important for understanding the origin and early spread of the virus. Previously published phylogenies have used different rootings that do not always provide consistent results. We investigate several different strategies for rooting the SARS-CoV-2 tree and provide measures of statistical uncertainty for all methods. We show that methods based on the molecular clock tend to place the root in the B clade, whereas methods based on outgroup rooting tend to place the root in the A clade. The results from the two approaches are statistically incompatible, possibly as a consequence of deviations from a molecular clock or excess back-mutations. We also show that none of the methods provide strong statistical support for the placement of the root in any particular edge of the tree. These results suggest that phylogenetic evidence alone is unlikely to identify the origin of the SARS-CoV-2 virus and we caution against strong inferences regarding the early spread of the virus based solely on such evidence.


Assuntos
COVID-19/virologia , Genoma Viral , Mutação , Filogenia , SARS-CoV-2/genética , Algoritmos , Animais , Teorema de Bayes , Evolução Molecular , Humanos , Funções Verossimilhança , Cadeias de Markov , Modelos Genéticos , Modelos Estatísticos , Método de Monte Carlo , Mutação de Sentido Incorreto , RNA Viral/genética , Incerteza
6.
Proc Natl Acad Sci U S A ; 115(41): 10398-10403, 2018 10 09.
Artigo em Inglês | MEDLINE | ID: mdl-30228118

RESUMO

Animal domestication efforts have led to a shared spectrum of striking behavioral and morphological changes. To recapitulate this process, silver foxes have been selectively bred for tame and aggressive behaviors for more than 50 generations at the Institute for Cytology and Genetics in Novosibirsk, Russia. To understand the genetic basis and molecular mechanisms underlying the phenotypic changes, we profiled gene expression levels and coding SNP allele frequencies in two brain tissue specimens from 12 aggressive foxes and 12 tame foxes. Expression analysis revealed 146 genes in the prefrontal cortex and 33 genes in the basal forebrain that were differentially expressed, with a 5% false discovery rate (FDR). These candidates include genes in key pathways known to be critical to neurologic processing, including the serotonin and glutamate receptor pathways. In addition, 295 of the 31,000 exonic SNPs show significant allele frequency differences between the tame and aggressive populations (1% FDR), including genes with a role in neural crest cell fate determination.


Assuntos
Agressão , Comportamento Animal , Encéfalo/metabolismo , Raposas/genética , Genoma , Seleção Genética , Transcriptoma , Animais , Raposas/psicologia , Genômica , Masculino , Polimorfismo de Nucleotídeo Único , Federação Russa
7.
Nature ; 553(7686): 77-81, 2018 01 03.
Artigo em Inglês | MEDLINE | ID: mdl-29300007

RESUMO

In contrast to infections with human immunodeficiency virus (HIV) in humans and simian immunodeficiency virus (SIV) in macaques, SIV infection of a natural host, sooty mangabeys (Cercocebus atys), is non-pathogenic despite high viraemia. Here we sequenced and assembled the genome of a captive sooty mangabey. We conducted genome-wide comparative analyses of transcript assemblies from C. atys and AIDS-susceptible species, such as humans and macaques, to identify candidates for host genetic factors that influence susceptibility. We identified several immune-related genes in the genome of C. atys that show substantial sequence divergence from macaques or humans. One of these sequence divergences, a C-terminal frameshift in the toll-like receptor-4 (TLR4) gene of C. atys, is associated with a blunted in vitro response to TLR-4 ligands. In addition, we found a major structural change in exons 3-4 of the immune-regulatory protein intercellular adhesion molecule 2 (ICAM-2); expression of this variant leads to reduced cell surface expression of ICAM-2. These data provide a resource for comparative genomic studies of HIV and/or SIV pathogenesis and may help to elucidate the mechanisms by which SIV-infected sooty mangabeys avoid AIDS.


Assuntos
Síndrome da Imunodeficiência Adquirida/genética , Cercocebus atys/genética , Cercocebus atys/virologia , Predisposição Genética para Doença , Genoma/genética , Especificidade de Hospedeiro/genética , Vírus da Imunodeficiência Símia , Síndrome da Imunodeficiência Adquirida/virologia , Sequência de Aminoácidos , Animais , Moléculas de Adesão Celular/química , Moléculas de Adesão Celular/genética , Moléculas de Adesão Celular/metabolismo , Cercocebus atys/imunologia , Éxons/genética , Feminino , Mutação da Fase de Leitura/genética , Variação Genética , Genômica , HIV/patogenicidade , Humanos , Macaca/virologia , Deleção de Sequência , Síndrome de Imunodeficiência Adquirida dos Símios/genética , Síndrome de Imunodeficiência Adquirida dos Símios/virologia , Vírus da Imunodeficiência Símia/patogenicidade , Especificidade da Espécie , Receptor 4 Toll-Like/química , Receptor 4 Toll-Like/genética , Receptor 4 Toll-Like/imunologia , Transcriptoma/genética , Sequenciamento Completo do Genoma
8.
J Comp Neurol ; 524(2): 288-308, 2016 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-26132897

RESUMO

The human brain and human cognitive abilities are strikingly different from those of other great apes despite relatively modest genome sequence divergence. However, little is presently known about the interspecies divergence in gene structure and transcription that might contribute to these phenotypic differences. To date, most comparative studies of gene structure in the brain have examined humans, chimpanzees, and macaque monkeys. To add to this body of knowledge, we analyze here the brain transcriptome of the western lowland gorilla (Gorilla gorilla gorilla), an African great ape species that is phylogenetically closely related to humans, but with a brain that is approximately one-third the size. Manual transcriptome curation from a sample of the planum temporale region of the neocortex revealed 12 protein-coding genes and one noncoding-RNA gene with exons in the gorilla unmatched by public transcriptome data from the orthologous human loci. These interspecies gene structure differences accounted for a total of 134 amino acids in proteins found in the gorilla that were absent from protein products of the orthologous human genes. Proteins varying in structure between human and gorilla were involved in immunity and energy metabolism, suggesting their relevance to phenotypic differences. This gorilla neocortical transcriptome comprises an empirical, not homology- or prediction-driven, resource for orthologous gene comparisons between human and gorilla. These findings provide a unique repository of the sequences and structures of thousands of genes transcribed in the gorilla brain, pointing to candidate genes that may contribute to the traits distinguishing humans from other closely related great apes.


Assuntos
Encéfalo/metabolismo , Expressão Gênica/fisiologia , Sequenciamento de Nucleotídeos em Larga Escala , RNA/metabolismo , Animais , Proteínas de Transporte/genética , Proteínas de Transporte/metabolismo , Perfilação da Expressão Gênica , Gorilla gorilla/anatomia & histologia , Humanos/anatomia & histologia , Peptídeos e Proteínas de Sinalização Intracelular , Modelos Moleculares , Proteínas Musculares/genética , Proteínas Musculares/metabolismo , Coativador 1-alfa do Receptor gama Ativado por Proliferador de Peroxissomo , Fosfoproteínas Fosfatases/genética , Fosfoproteínas Fosfatases/metabolismo , Filogenia , Especificidade da Espécie , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , beta 2-Glicoproteína I/genética , beta 2-Glicoproteína I/metabolismo
9.
Nucleic Acids Res ; 43(Database issue): D737-42, 2015 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-25392405

RESUMO

The non-human primate reference transcriptome resource (NHPRTR, available online at http://nhprtr.org/) aims to generate comprehensive RNA-seq data from a wide variety of non-human primates (NHPs), from lemurs to hominids. In the 2012 Phase I of the NHPRTR project, 19 billion fragments or 3.8 terabases of transcriptome sequences were collected from pools of ∼ 20 tissues in 15 species and subspecies. Here we describe a major expansion of NHPRTR by adding 10.1 billion fragments of tissue-specific RNA-seq data. For this effort, we selected 11 of the original 15 NHP species and subspecies and constructed total RNA libraries for the same ∼ 15 tissues in each. The sequence quality is such that 88% of the reads align to human reference sequences, allowing us to compute the full list of expression abundance across all tissues for each species, using the reads mapped to human genes. This update also includes improved transcript annotations derived from RNA-seq data for rhesus and cynomolgus macaques, two of the most commonly used NHP models and additional RNA-seq data compiled from related projects. Together, these comprehensive reference transcriptomes from multiple primates serve as a valuable community resource for genome annotation, gene dynamics and comparative functional analysis.


Assuntos
Bases de Dados Genéticas , Perfilação da Expressão Gênica , Primatas/genética , Análise de Sequência de RNA , Animais , Internet , Macaca , Anotação de Sequência Molecular , Especificidade de Órgãos , Padrões de Referência , Alinhamento de Sequência/normas
10.
Concurr Comput ; 26(13): 2157-2166, 2014 Sep 10.
Artigo em Inglês | MEDLINE | ID: mdl-25294974

RESUMO

A variety of extremely challenging biological sequence analyses were conducted on the XSEDE large shared memory resource Blacklight, using current bioinformatics tools and encompassing a wide range of scientific applications. These include genomic sequence assembly, very large metagenomic sequence assembly, transcriptome assembly, and sequencing error correction. The data sets used in these analyses included uncategorized fungal species, reference microbial data, very large soil and human gut microbiome sequence data, and primate transcriptomes, composed of both short-read and long-read sequence data. A new parallel command execution program was developed on the Blacklight resource to handle some of these analyses. These results, initially reported previously at XSEDE13 and expanded here, represent significant advances for their respective scientific communities. The breadth and depth of the results achieved demonstrate the ease of use, versatility, and unique capabilities of the Blacklight XSEDE resource for scientific analysis of genomic and transcriptomic sequence data, and the power of these resources, together with XSEDE support, in meeting the most challenging scientific problems.

11.
J Med Primatol ; 43(5): 317-28, 2014 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-24810475

RESUMO

BACKGROUND: The genome annotations of rhesus (Macaca mulatta) and cynomolgus (Macaca fascicularis) macaques, two of the most common non-human primate animal models, are limited. METHODS: We analyzed large-scale macaque RNA-based next-generation sequencing (RNAseq) data to identify un-annotated macaque transcripts. RESULTS: For both macaque species, we uncovered thousands of novel isoforms for annotated genes and thousands of un-annotated intergenic transcripts enriched with non-coding RNAs. We also identified thousands of transcript sequences which are partially or completely 'missing' from current macaque genome assemblies. We showed that many newly identified transcripts were differentially expressed during SIV infection of rhesus macaques or during Ebola virus infection of cynomolgus macaques. CONCLUSIONS: For two important macaque species, we uncovered thousands of novel isoforms and un-annotated intergenic transcripts including coding and non-coding RNAs, polyadenylated and non-polyadenylated transcripts. This resource will greatly improve future macaque studies, as demonstrated by their applications in infectious disease studies.


Assuntos
Doença pelo Vírus Ebola/genética , Macaca fascicularis , Macaca mulatta , Doenças dos Macacos/genética , Síndrome de Imunodeficiência Adquirida dos Símios/genética , Transcriptoma , Animais , Ebolavirus/fisiologia , Doença pelo Vírus Ebola/virologia , Sequenciamento de Nucleotídeos em Larga Escala , Índia , Maurício , Dados de Sequência Molecular , Doenças dos Macacos/virologia , RNA não Traduzido/genética , RNA não Traduzido/metabolismo , Análise de Sequência de RNA , Síndrome de Imunodeficiência Adquirida dos Símios/virologia , Vírus da Imunodeficiência Símia/fisiologia
12.
Nucleic Acids Res ; 41(Database issue): D906-14, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-23203872

RESUMO

RNA-based next-generation sequencing (RNA-Seq) provides a tremendous amount of new information regarding gene and transcript structure, expression and regulation. This is particularly true for non-coding RNAs where whole transcriptome analyses have revealed that the much of the genome is transcribed and that many non-coding transcripts have widespread functionality. However, uniform resources for raw, cleaned and processed RNA-Seq data are sparse for most organisms and this is especially true for non-human primates (NHPs). Here, we describe a large-scale RNA-Seq data and analysis infrastructure, the NHP reference transcriptome resource (http://nhprtr.org); it presently hosts data from12 species of primates, to be expanded to 15 species/subspecies spanning great apes, old world monkeys, new world monkeys and prosimians. Data are collected for each species using pools of RNA from comparable tissues. We provide data access in advance of its deposition at NCBI, as well as browsable tracks of alignments against the human genome using the UCSC genome browser. This resource will continue to host additional RNA-Seq data, alignments and assemblies as they are generated over the coming years and provide a key resource for the annotation of NHP genomes as well as informing primate studies on evolution, reproduction, infection, immunity and pharmacology.


Assuntos
Bases de Dados de Ácidos Nucleicos , Genômica , Primatas/genética , Transcriptoma , Animais , Genoma Humano , Humanos , Internet , Primatas/metabolismo , Alinhamento de Sequência , Análise de Sequência de RNA
13.
Methods Enzymol ; 454: 367-404, 2009.
Artigo em Inglês | MEDLINE | ID: mdl-19216935

RESUMO

This work presents a new approach to the analysis of aperiodic pulsatile heteroscedastic time-series data, specifically hormone pulsatility. We have utilized growth hormone (GH) concentration time-series data as an example for the utilization of this new algorithm. While many previously published approaches used for the analysis of GH pulsatility are both subjective and cumbersome to use, AutoDecon is a nonsubjective, standardized, and completely automated algorithm. We have employed computer simulations to evaluate the true-positive, the false-positive, the false-negative, and the sensitivity percentages of several of the routinely employed algorithms when applied to GH concentration time-series data. Based on these simulations, it was concluded that this new algorithm provides a substantial improvement over the previous methods. This novel method has many direct applications in addition to hormone pulsatility, for example, to time-domain fluorescence lifetime measurements, as the mathematical forms that describe these experimental systems are both convolution integrals.


Assuntos
Algoritmos , Software
14.
Anal Biochem ; 381(1): 8-17, 2008 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-18639514

RESUMO

Hormone signaling is often pulsatile, and multiparameter deconvolution procedures have long been used to identify and characterize secretory events. However, the existing programs have serious limitations, including the subjective nature of initial peak selection, lack of statistical verification of presumed bursts, and user-unfriendliness of the application. Here we describe a novel deconvolution program, AutoDecon, which addresses these concerns. We validate AutoDecon for application to serum luteinizing hormone (LH) concentration time series using synthetic data mimicking real data from normal women and then comparing the performance of AutoDecon with the performance of the widely employed hormone pulsatility analysis program Cluster. The sensitivity of AutoDecon is higher than that of Cluster ( approximately 96% vs. 80%, P=0.001). However, Cluster had a lower false-positive detection rate than did AutoDecon (6% vs. 1%, P=0.001). Further analysis demonstrated that the pulsatility parameters recovered by AutoDecon were indistinguishable from those characterizing the synthetic data and that sampling at 5- or 10-min intervals was optimal for maximizing the sensitivity rates for LH. Accordingly, AutoDecon presents a viable nonsubjective alternative to previous pulse detection algorithms for the analysis of LH data. It is applicable to other pulsatile hormone concentration time series and many other pulsatile phenomena. The software is free and downloadable at http://mljohnson.pharm.virginia.edu/home.html.


Assuntos
Algoritmos , Hormônio Luteinizante/metabolismo , Modelos Biológicos , Software , Adulto , Animais , Reações Falso-Positivas , Feminino , Hormônio Liberador de Gonadotropina/sangue , Meia-Vida , Humanos , Hormônio Luteinizante/sangue , Ciclo Menstrual/sangue , Pessoa de Meia-Idade , Pós-Menopausa/sangue , Pré-Menopausa/sangue , Reprodutibilidade dos Testes , Ovinos , Fatores de Tempo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...