Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 12 de 12
Filter
Add more filters










Publication year range
1.
Microbiol Spectr ; 10(2): e0256421, 2022 04 27.
Article in English | MEDLINE | ID: mdl-35234489

ABSTRACT

Next-generation sequencing (NGS) is a powerful tool for detecting and investigating viral pathogens; however, analysis and management of the enormous amounts of data generated from these technologies remains a challenge. Here, we present VPipe (the Viral NGS Analysis Pipeline and Data Management System), an automated bioinformatics pipeline optimized for whole-genome assembly of viral sequences and identification of diverse species. VPipe automates the data quality control, assembly, and contig identification steps typically performed when analyzing NGS data. Users access the pipeline through a secure web-based portal, which provides an easy-to-use interface with advanced search capabilities for reviewing results. In addition, VPipe provides a centralized system for storing and analyzing NGS data, eliminating common bottlenecks in bioinformatics analyses for public health laboratories with limited on-site computational infrastructure. The performance of VPipe was validated through the analysis of publicly available NGS data sets for viral pathogens, generating high-quality assemblies for 12 data sets. VPipe also generated assemblies with greater contiguity than similar pipelines for 41 human respiratory syncytial virus isolates and 23 SARS-CoV-2 specimens. IMPORTANCE Computational infrastructure and bioinformatics analysis are bottlenecks in the application of NGS to viral pathogens. As of September 2021, VPipe has been used by the U.S. Centers for Disease Control and Prevention (CDC) and 12 state public health laboratories to characterize >17,500 and 1,500 clinical specimens and isolates, respectively. VPipe automates genome assembly for a wide range of viruses, including high-consequence pathogens such as SARS-CoV-2. Such automated functionality expedites public health responses to viral outbreaks and pathogen surveillance.


Subject(s)
COVID-19 , Viruses , Computational Biology/methods , High-Throughput Nucleotide Sequencing/methods , Humans , SARS-CoV-2/genetics , Viruses/genetics
2.
J Clin Virol ; 134: 104718, 2021 01.
Article in English | MEDLINE | ID: mdl-33360859

ABSTRACT

BACKGROUND: The family Caliciviridae consists of a genetically diverse group of RNA viruses that infect a wide range of host species including noroviruses and sapoviruses which cause acute gastroenteritis in humans. Typing of these viruses relies on sequence-based approaches, and therefore there is a need for rapid and accurate web-based typing tools. OBJECTIVE: To develop and evaluate a web-based tool for rapid and accurate genotyping of noroviruses and sapoviruses. METHODS: The Human Calicivirus Typing (HuCaT) tool uses a set of curated reference sequences that are compared to query sequences using a k-mer (DNA substring) based algorithm. Outputs include alignments and phylogenetic trees of the 12 top matching reference sequences for each query. RESULTS: The HuCaT tool was validated with a set of 1310 norovirus and 239 sapovirus sequences covering all known human norovirus and sapovirus genotypes. HuCaT tool assigned genotypes to all queries with 100 % accuracy and was much faster (17 s) than BLAST (150 s) or phylogenetic analyses approaches. CONCLUSIONS: The web-based HuCaT tool supports rapid and accurate genotyping of human noroviruses and sapoviruses.


Subject(s)
Caliciviridae Infections , Norovirus , Sapovirus , Genotype , Humans , Internet , Norovirus/genetics , Phylogeny , Sapovirus/genetics
3.
Viruses ; 11(6)2019 06 08.
Article in English | MEDLINE | ID: mdl-31181749

ABSTRACT

Noroviruses evolve by antigenic drift and recombination, which occurs most frequently at the junction between the non-structural and structural protein coding genomic regions. In 2015, a novel GII.P16-GII.4 Sydney recombinant strain emerged, replacing the predominance of GII.Pe-GII.4 Sydney among US outbreaks. Distinct from GII.P16 polymerases detected since 2010, this novel GII.P16 was subsequently detected among GII.1, GII.2, GII.3, GII.10 and GII.12 viruses, prompting an investigation on the unique characteristics of these viruses. Norovirus positive samples (n = 1807) were dual-typed, of which a subset (n = 124) was sequenced to yield near-complete genomes. CaliciNet and National Outbreak Reporting System (NORS) records were matched to link outbreak characteristics and case outcomes to molecular data and GenBank was mined for contextualization. Recombination with the novel GII.P16 polymerase extended GII.4 Sydney predominance and increased the number of GII.2 outbreaks in the US. Introduction of the novel GII.P16 noroviruses occurred without unique amino acid changes in VP1, more severe case outcomes, or differences in affected population. However, unique changes were found among NS1/2, NS4 and VP2 proteins, which have immune antagonistic functions, and the RdRp. Multiple polymerase-capsid combinations were detected among GII viruses including 11 involving GII.P16. Molecular surveillance of protein sequences from norovirus genomes can inform the functional importance of amino acid changes in emerging recombinant viruses and aid in vaccine and antiviral formulation.


Subject(s)
Caliciviridae Infections/epidemiology , Capsid Proteins/genetics , Genotype , Norovirus/genetics , Aged , Amino Acid Sequence , Caliciviridae Infections/immunology , Caliciviridae Infections/physiopathology , Caliciviridae Infections/virology , Capsid/immunology , Capsid Proteins/immunology , Disease Outbreaks , Humans , Immunity, Herd , Male , Middle Aged , Molecular Epidemiology , Sequence Analysis , United States , Viral Nonstructural Proteins/genetics , Whole Genome Sequencing
4.
J Clin Microbiol ; 55(2): 606-615, 2017 02.
Article in English | MEDLINE | ID: mdl-27927929

ABSTRACT

The poliovirus (PV) is currently targeted for worldwide eradication and containment. Sanger-based sequencing of the viral protein 1 (VP1) capsid region is currently the standard method for PV surveillance. However, the whole-genome sequence is sometimes needed for higher resolution global surveillance. In this study, we optimized whole-genome sequencing protocols for poliovirus isolates and FTA cards using next-generation sequencing (NGS), aiming for high sequence coverage, efficiency, and throughput. We found that DNase treatment of poliovirus RNA followed by random reverse transcription (RT), amplification, and the use of the Nextera XT DNA library preparation kit produced significantly better results than other preparations. The average viral reads per total reads, a measurement of efficiency, was as high as 84.2% ± 15.6%. PV genomes covering >99 to 100% of the reference length were obtained and validated with Sanger sequencing. A total of 52 PV genomes were generated, multiplexing as many as 64 samples in a single Illumina MiSeq run. This high-throughput, sequence-independent NGS approach facilitated the detection of a diverse range of PVs, especially for those in vaccine-derived polioviruses (VDPV), circulating VDPV, or immunodeficiency-related VDPV. In contrast to results from previous studies on other viruses, our results showed that filtration and nuclease treatment did not discernibly increase the sequencing efficiency of PV isolates. However, DNase treatment after nucleic acid extraction to remove host DNA significantly improved the sequencing results. This NGS method has been successfully implemented to generate PV genomes for molecular epidemiology of the most recent PV isolates. Additionally, the ability to obtain full PV genomes from FTA cards will aid in facilitating global poliovirus surveillance.


Subject(s)
High-Throughput Nucleotide Sequencing/methods , Poliovirus/classification , Poliovirus/genetics , Specimen Handling/methods , Humans , Molecular Epidemiology/methods , Pilot Projects
5.
Genome Announc ; 3(6)2015 Dec 03.
Article in English | MEDLINE | ID: mdl-26634765

ABSTRACT

Burkholderia pseudomallei strain Bp1651, a human isolate, is resistant to all clinically relevant antibiotics. We report here on the finished genome sequence assembly and annotation of the two chromosomes of this strain. This genome sequence may assist in understanding the mechanisms of antimicrobial resistance for this pathogenic species.

6.
Nucleic Acids Res ; 36(Database issue): D13-21, 2008 Jan.
Article in English | MEDLINE | ID: mdl-18045790

ABSTRACT

In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data available through NCBI's web site. NCBI resources include Entrez, the Entrez Programming Utilities, My NCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link, Electronic PCR, OrfFinder, Spidey, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genome, Genome Project and related tools, the Trace, Assembly, and Short Read Archives, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups, Influenza Viral Resources, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus, Entrez Probe, GENSAT, Database of Genotype and Phenotype, Online Mendelian Inheritance in Man, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool and the PubChem suite of small molecule databases. Augmenting the web applications are custom implementations of the BLAST program optimized to search specialized data sets. These resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.


Subject(s)
Databases, Genetic , National Library of Medicine (U.S.) , Animals , Databases, Nucleic Acid , Gene Expression , Genomics , Genotype , Humans , Internet , Models, Molecular , Phenotype , Proteomics , Sequence Alignment , United States
7.
Nucleic Acids Res ; 35(Database issue): D5-12, 2007 Jan.
Article in English | MEDLINE | ID: mdl-17170002

ABSTRACT

In addition to maintaining the GenBank nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through NCBI's Web site. NCBI resources include Entrez, the Entrez Programming Utilities, My NCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link(BLink), Electronic PCR, OrfFinder, Spidey, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genome, Genome Project and related tools, the Trace and Assembly Archives, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups (COGs), Viral Genotyping Tools, Influenza Viral Resources, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Entrez Probe, GENSAT, Online Mendelian Inheritance in Man (OMIM), Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART) and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. These resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.


Subject(s)
Databases, Genetic , National Library of Medicine (U.S.) , Animals , Databases, Nucleic Acid , Databases, Protein , Gene Expression , Genomics , Humans , Internet , Phenotype , Proteomics , PubMed , Sequence Alignment , Software , United States
8.
BMC Bioinformatics ; 4: 41, 2003 Sep 11.
Article in English | MEDLINE | ID: mdl-12969510

ABSTRACT

BACKGROUND: The availability of multiple, essentially complete genome sequences of prokaryotes and eukaryotes spurred both the demand and the opportunity for the construction of an evolutionary classification of genes from these genomes. Such a classification system based on orthologous relationships between genes appears to be a natural framework for comparative genomics and should facilitate both functional annotation of genomes and large-scale evolutionary studies. RESULTS: We describe here a major update of the previously developed system for delineation of Clusters of Orthologous Groups of proteins (COGs) from the sequenced genomes of prokaryotes and unicellular eukaryotes and the construction of clusters of predicted orthologs for 7 eukaryotic genomes, which we named KOGs after eukaryotic orthologous groups. The COG collection currently consists of 138,458 proteins, which form 4873 COGs and comprise 75% of the 185,505 (predicted) proteins encoded in 66 genomes of unicellular organisms. The eukaryotic orthologous groups (KOGs) include proteins from 7 eukaryotic genomes: three animals (the nematode Caenorhabditis elegans, the fruit fly Drosophila melanogaster and Homo sapiens), one plant, Arabidopsis thaliana, two fungi (Saccharomyces cerevisiae and Schizosaccharomyces pombe), and the intracellular microsporidian parasite Encephalitozoon cuniculi. The current KOG set consists of 4852 clusters of orthologs, which include 59,838 proteins, or approximately 54% of the analyzed eukaryotic 110,655 gene products. Compared to the coverage of the prokaryotic genomes with COGs, a considerably smaller fraction of eukaryotic genes could be included into the KOGs; addition of new eukaryotic genomes is expected to result in substantial increase in the coverage of eukaryotic genomes with KOGs. Examination of the phyletic patterns of KOGs reveals a conserved core represented in all analyzed species and consisting of approximately 20% of the KOG set. This conserved portion of the KOG set is much greater than the ubiquitous portion of the COG set (approximately 1% of the COGs). In part, this difference is probably due to the small number of included eukaryotic genomes, but it could also reflect the relative compactness of eukaryotes as a clade and the greater evolutionary stability of eukaryotic genomes. CONCLUSION: The updated collection of orthologous protein sets for prokaryotes and eukaryotes is expected to be a useful platform for functional annotation of newly sequenced genomes, including those of complex eukaryotes, and genome-wide evolutionary studies.


Subject(s)
Databases, Protein/trends , Eukaryotic Cells , Proteins/classification , Proteins/genetics , Animals , Databases, Nucleic Acid/trends , Eukaryotic Cells/chemistry , Eukaryotic Cells/physiology , Evolution, Molecular , Humans , National Institutes of Health (U.S.) , Proteins/physiology , Terminology as Topic , United States
9.
Nucleic Acids Res ; 30(19): 4264-71, 2002 Oct 01.
Article in English | MEDLINE | ID: mdl-12364605

ABSTRACT

Prokaryotic genomes are considered to be 'wall-to-wall' genomes, which consist largely of genes for proteins and structural RNAs, with only a small fraction of the genomic DNA allotted to intergenic regions, which are thought to typically contain regulatory signals. The majority of bacterial and archaeal genomes contain 6-14% non-coding DNA. Significant positive correlations were detected between the fraction of non-coding DNA and inter- and intra-operonic distances, suggesting that different classes of non-coding DNA evolve congruently. In contrast, no correlation was found between any of these characteristics of non-coding sequences and the number of genes or genome size. Thus, the non-coding regions and the gene sets in prokaryotes seem to evolve in different regimes. The evolution of non-coding regions appears to be determined primarily by the selective pressure to minimize the amount of non-functional DNA, while maintaining essential regulatory signals, because of which the content of non-coding DNA in different genomes is relatively uniform and intra- and inter-operonic non-coding regions evolve congruently. In contrast, the gene set is optimized for the particular environmental niche of the given microbe, which results in the lack of correlation between the gene number and the characteristics of non-coding regions.


Subject(s)
DNA, Intergenic/genetics , Evolution, Molecular , Genome, Archaeal , Genome, Bacterial , Databases, Nucleic Acid , Genes, Archaeal/genetics , Genes, Bacterial/genetics , Operon/genetics
10.
Trends Genet ; 18(5): 228-32, 2002 May.
Article in English | MEDLINE | ID: mdl-12047938

ABSTRACT

In overlapping genes, the same DNA sequence codes for two proteins using different reading frames. Analysis of overlapping genes can help in understanding the mode of evolution of a coding region from noncoding DNA. We identified 71 pairs of convergent genes, with overlapping 3' ends longer than 15 nucleotides, that are conserved in at least two prokaryotic genomes. Among the overlap regions, we observed a statistically significant bias towards the 123:132 phase (i.e. the second codon base in one gene facing the degenerate third position in the second gene). This phase ensures the least mutual constraint on nonconservative amino acid replacements in both overlapping coding sequences. The excess of this phase is compatible with directional (positive) selection acting on the overlapping coding regions. This could be a general evolutionary mode for genes emerging from noncoding sequences, in which the protein sequence has not been subject to selection.


Subject(s)
Genes, Overlapping , Amino Acid Sequence , Base Sequence , Chlamydiaceae/genetics , DNA, Archaeal/genetics , DNA, Bacterial/genetics , Genes, Archaeal , Genes, Bacterial , Molecular Sequence Data , Sequence Homology, Amino Acid , Sequence Homology, Nucleic Acid , Thermoplasma/genetics
11.
Nucleic Acids Res ; 30(10): 2212-23, 2002 May 15.
Article in English | MEDLINE | ID: mdl-12000841

ABSTRACT

A computational method was developed for delineating connected gene neighborhoods in bacterial and archaeal genomes. These gene neighborhoods are not typically present, in their entirety, in any single genome, but are held together by overlapping, partially conserved gene arrays. The procedure was applied to comparing the orders of orthologous genes, which were extracted from the database of Clusters of Orthologous Groups of proteins (COGs), in 31 prokaryotic genomes and resulted in the identification of 188 clusters of gene arrays, which included 1001 of 2890 COGs. These clusters were projected onto actual genomes to produce extended neighborhoods including additional genes, which are adjacent to the genes from the clusters and are transcribed in the same direction, which resulted in a total of 2387 COGs being included in the neighborhoods. Most of the neighborhoods consist predominantly of genes united by a coherent functional theme, but also include a minority of genes without an obvious functional connection to the main theme. We hypothesize that although some of the latter genes might have unsuspected roles, others are maintained within gene arrays because of the advantage of expression at a level that is typical of the given neighborhood. We designate this phenomenon 'genomic hitchhiking'. The largest neighborhood includes 79 genes (COGs) and consists of overlapping, rearranged ribosomal protein superoperons; apparent genome hitchhiking is particularly typical of this neighborhood and other neighborhoods that consist of genes coding for translation machinery components. Several neighborhoods involve previously undetected connections between genes, allowing new functional predictions. Gene neighborhoods appear to evolve via complex rearrangement, with different combinations of genes from a neighborhood fixed in different lineages.


Subject(s)
Genes, Archaeal/genetics , Genes, Bacterial/genetics , Genome, Archaeal , Genome, Bacterial , Algorithms , Archaeal Proteins/genetics , Bacterial Proteins/genetics , Databases, Factual , Gene Order , Ribosomal Proteins/genetics
12.
Proc Natl Acad Sci U S A ; 99(7): 4644-9, 2002 Apr 02.
Article in English | MEDLINE | ID: mdl-11930014

ABSTRACT

We have determined the complete 1,694,969-nt sequence of the GC-rich genome of Methanopyrus kandleri by using a whole direct genome sequencing approach. This approach is based on unlinking of genomic DNA with the ThermoFidelase version of M. kandleri topoisomerase V and cycle sequencing directed by 2'-modified oligonucleotides (Fimers). Sequencing redundancy (3.3x) was sufficient to assemble the genome with less than one error per 40 kb. Using a combination of sequence database searches and coding potential prediction, 1,692 protein-coding genes and 39 genes for structural RNAs were identified. M. kandleri proteins show an unusually high content of negatively charged amino acids, which might be an adaptation to the high intracellular salinity. Previous phylogenetic analysis of 16S RNA suggested that M. kandleri belonged to a very deep branch, close to the root of the archaeal tree. However, genome comparisons indicate that, in both trees constructed using concatenated alignments of ribosomal proteins and trees based on gene content, M. kandleri consistently groups with other archaeal methanogens. M. kandleri shares the set of genes implicated in methanogenesis and, in part, its operon organization with Methanococcus jannaschii and Methanothermobacter thermoautotrophicum. These findings indicate that archaeal methanogens are monophyletic. A distinctive feature of M. kandleri is the paucity of proteins involved in signaling and regulation of gene expression. Also, M. kandleri appears to have fewer genes acquired via lateral transfer than other archaea. These features might reflect the extreme habitat of this organism.


Subject(s)
Euryarchaeota/genetics , Genome, Archaeal , Base Sequence , Euryarchaeota/classification , Euryarchaeota/metabolism , Molecular Sequence Data , Operon , Phylogeny
SELECTION OF CITATIONS
SEARCH DETAIL
...