Search | VHL Regional Portal

1.

Overlapping genes and the proteins they encode differ significantly in their sequence composition from non-overlapping genes.

Pavesi, Angelo; Vianelli, Alberto; Chirico, Nicola; Bao, Yiming; Blinkova, Olga; Belshaw, Robert; Firth, Andrew; Karlin, David.

PLoS One ; 13(10): e0202513, 2018.

Article in English | MEDLINE | ID: mdl-30339683

ABSTRACT

Overlapping genes represent a fascinating evolutionary puzzle, since they encode two functionally unrelated proteins from the same DNA sequence. They originate by a mechanism of overprinting, in which point mutations in an existing frame allow the expression (the "birth") of a completely new protein from a second frame. In viruses, in which overlapping genes are abundant, these new proteins often play a critical role in infection, yet they are frequently overlooked during genome annotation. This results in erroneous interpretation of mutational studies and in a significant waste of resources. Therefore, overlapping genes need to be correctly detected, especially since they are now thought to be abundant also in eukaryotes. Developing better detection methods and conducting systematic evolutionary studies require a large, reliable benchmark dataset of known cases. We thus assembled a high-quality dataset of 80 viral overlapping genes whose expression is experimentally proven. Many of them were not present in databases. We found that overall, overlapping genes differ significantly from non-overlapping genes in their nucleotide and amino acid composition. In particular, the proteins they encode are enriched in high-degeneracy amino acids and depleted in low-degeneracy ones, which may alleviate the evolutionary constraints acting on overlapping genes. Principal component analysis revealed that the vast majority of overlapping genes follow a similar composition bias, despite their heterogeneity in length and function. Six proven mammalian overlapping genes also followed this bias. We propose that this apparently near-universal composition bias may either favour the birth of overlapping genes, or/and result from selection pressure acting on them.

Subject(s)

Evolution, Molecular , Genes, Overlapping/genetics , Proteins/genetics , Amino Acid Sequence/genetics , Animals , Genes, Viral/genetics , Mammals/genetics , Mutation , Open Reading Frames/genetics , Principal Component Analysis

2.

Virus Variation Resource - improved response to emergent viral outbreaks.

Hatcher, Eneida L; Zhdanov, Sergey A; Bao, Yiming; Blinkova, Olga; Nawrocki, Eric P; Ostapchuck, Yuri; Schäffer, Alejandro A; Brister, J Rodney.

Nucleic Acids Res ; 45(D1): D482-D490, 2017 01 04.

Article in English | MEDLINE | ID: mdl-27899678

ABSTRACT

The Virus Variation Resource is a value-added viral sequence data resource hosted by the National Center for Biotechnology Information. The resource is located at http://www.ncbi.nlm.nih.gov/genome/viruses/variation/ and includes modules for seven viral groups: influenza virus, Dengue virus, West Nile virus, Ebolavirus, MERS coronavirus, Rotavirus A and Zika virus Each module is supported by pipelines that scan newly released GenBank records, annotate genes and proteins and parse sample descriptors and then map them to controlled vocabulary. These processes in turn support a purpose-built search interface where users can select sequences based on standardized gene, protein and metadata terms. Once sequences are selected, a suite of tools for downloading data, multi-sequence alignment and tree building supports a variety of user directed activities. This manuscript describes a series of features and functionalities recently added to the Virus Variation Resource.

Subject(s)

Computational Biology/methods , Disease Outbreaks , Genetic Variation , Software , Virus Diseases/epidemiology , Virus Diseases/virology , Viruses/genetics , Databases, Genetic

3.

Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation.

O'Leary, Nuala A; Wright, Mathew W; Brister, J Rodney; Ciufo, Stacy; Haddad, Diana; McVeigh, Rich; Rajput, Bhanu; Robbertse, Barbara; Smith-White, Brian; Ako-Adjei, Danso; Astashyn, Alexander; Badretdin, Azat; Bao, Yiming; Blinkova, Olga; Brover, Vyacheslav; Chetvernin, Vyacheslav; Choi, Jinna; Cox, Eric; Ermolaeva, Olga; Farrell, Catherine M; Goldfarb, Tamara; Gupta, Tripti; Haft, Daniel; Hatcher, Eneida; Hlavina, Wratko; Joardar, Vinita S; Kodali, Vamsi K; Li, Wenjun; Maglott, Donna; Masterson, Patrick; McGarvey, Kelly M; Murphy, Michael R; O'Neill, Kathleen; Pujar, Shashikant; Rangwala, Sanjida H; Rausch, Daniel; Riddick, Lillian D; Schoch, Conrad; Shkeda, Andrei; Storz, Susan S; Sun, Hanzhen; Thibaud-Nissen, Francoise; Tolstoy, Igor; Tully, Raymond E; Vatsan, Anjana R; Wallin, Craig; Webb, David; Wu, Wendy; Landrum, Melissa J; Kimchi, Avi.

Nucleic Acids Res ; 44(D1): D733-45, 2016 Jan 04.

Article in English | MEDLINE | ID: mdl-26553804

ABSTRACT

The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55,000 organisms (>4800 viruses, >40,000 prokaryotes and >10,000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management.

Subject(s)

Databases, Genetic , Genomics , Animals , Cattle , Gene Expression Profiling , Genome, Fungal , Genome, Human , Genome, Microbial , Genome, Plant , Genome, Viral , Genomics/standards , Humans , Invertebrates/genetics , Mice , Molecular Sequence Annotation , Nematoda/genetics , Phylogeny , RNA, Long Noncoding/genetics , Rats , Reference Standards , Sequence Analysis, Protein , Sequence Analysis, RNA , Vertebrates/genetics

4.

NCBI viral genomes resource.

Brister, J Rodney; Ako-Adjei, Danso; Bao, Yiming; Blinkova, Olga.

Nucleic Acids Res ; 43(Database issue): D571-7, 2015 Jan.

Article in English | MEDLINE | ID: mdl-25428358

ABSTRACT

Recent technological innovations have ignited an explosion in virus genome sequencing that promises to fundamentally alter our understanding of viral biology and profoundly impact public health policy. Yet, any potential benefits from the billowing cloud of next generation sequence data hinge upon well implemented reference resources that facilitate the identification of sequences, aid in the assembly of sequence reads and provide reference annotation sources. The NCBI Viral Genomes Resource is a reference resource designed to bring order to this sequence shockwave and improve usability of viral sequence data. The resource can be accessed at http://www.ncbi.nlm.nih.gov/genome/viruses/ and catalogs all publicly available virus genome sequences and curates reference genome sequences. As the number of genome sequences has grown, so too have the difficulties in annotating and maintaining reference sequences. The rapid expansion of the viral sequence universe has forced a recalibration of the data model to better provide extant sequence representation and enhanced reference sequence products to serve the needs of the various viral communities. This, in turn, has placed increased emphasis on leveraging the knowledge of individual scientific communities to identify important viral sequences and develop well annotated reference virus genome sets.

Subject(s)

Databases, Nucleic Acid , Genome, Viral , High-Throughput Nucleotide Sequencing , Internet , Molecular Sequence Annotation , Software , Viruses/classification

5.

Nomenclature- and database-compatible names for the two Ebola virus variants that emerged in Guinea and the Democratic Republic of the Congo in 2014.

Kuhn, Jens H; Andersen, Kristian G; Baize, Sylvain; Bào, Yimíng; Bavari, Sina; Berthet, Nicolas; Blinkova, Olga; Brister, J Rodney; Clawson, Anna N; Fair, Joseph; Gabriel, Martin; Garry, Robert F; Gire, Stephen K; Goba, Augustine; Gonzalez, Jean-Paul; Günther, Stephan; Happi, Christian T; Jahrling, Peter B; Kapetshi, Jimmy; Kobinger, Gary; Kugelman, Jeffrey R; Leroy, Eric M; Maganga, Gael Darren; Mbala, Placide K; Moses, Lina M; Muyembe-Tamfum, Jean-Jacques; N'Faly, Magassouba; Nichol, Stuart T; Omilabu, Sunday A; Palacios, Gustavo; Park, Daniel J; Paweska, Janusz T; Radoshitzky, Sheli R; Rossi, Cynthia A; Sabeti, Pardis C; Schieffelin, John S; Schoepp, Randal J; Sealfon, Rachel; Swanepoel, Robert; Towner, Jonathan S; Wada, Jiro; Wauquier, Nadia; Yozwiak, Nathan L; Formenty, Pierre.

Viruses ; 6(11): 4760-99, 2014 Nov 24.

Article in English | MEDLINE | ID: mdl-25421896

ABSTRACT

In 2014, Ebola virus (EBOV) was identified as the etiological agent of a large and still expanding outbreak of Ebola virus disease (EVD) in West Africa and a much more confined EVD outbreak in Middle Africa. Epidemiological and evolutionary analyses confirmed that all cases of both outbreaks are connected to a single introduction each of EBOV into human populations and that both outbreaks are not directly connected. Coding-complete genomic sequence analyses of isolates revealed that the two outbreaks were caused by two novel EBOV variants, and initial clinical observations suggest that neither of them should be considered strains. Here we present consensus decisions on naming for both variants (West Africa: "Makona", Middle Africa: "Lomela") and provide database-compatible full, shortened, and abbreviated names that are in line with recently established filovirus sub-species nomenclatures.

Subject(s)

Ebolavirus/classification , Hemorrhagic Fever, Ebola/virology , Terminology as Topic , Democratic Republic of the Congo/epidemiology , Disease Outbreaks , Ebolavirus/genetics , Ebolavirus/isolation & purification , Guinea/epidemiology , Hemorrhagic Fever, Ebola/epidemiology , Humans , Phylogeny , RNA, Viral/genetics , Sequence Analysis, DNA

6.

Filovirus RefSeq entries: evaluation and selection of filovirus type variants, type sequences, and names.

Kuhn, Jens H; Andersen, Kristian G; Bào, Yimíng; Bavari, Sina; Becker, Stephan; Bennett, Richard S; Bergman, Nicholas H; Blinkova, Olga; Bradfute, Steven; Brister, J Rodney; Bukreyev, Alexander; Chandran, Kartik; Chepurnov, Alexander A; Davey, Robert A; Dietzgen, Ralf G; Doggett, Norman A; Dolnik, Olga; Dye, John M; Enterlein, Sven; Fenimore, Paul W; Formenty, Pierre; Freiberg, Alexander N; Garry, Robert F; Garza, Nicole L; Gire, Stephen K; Gonzalez, Jean-Paul; Griffiths, Anthony; Happi, Christian T; Hensley, Lisa E; Herbert, Andrew S; Hevey, Michael C; Hoenen, Thomas; Honko, Anna N; Ignatyev, Georgy M; Jahrling, Peter B; Johnson, Joshua C; Johnson, Karl M; Kindrachuk, Jason; Klenk, Hans-Dieter; Kobinger, Gary; Kochel, Tadeusz J; Lackemeyer, Matthew G; Lackner, Daniel F; Leroy, Eric M; Lever, Mark S; Mühlberger, Elke; Netesov, Sergey V; Olinger, Gene G; Omilabu, Sunday A; Palacios, Gustavo.

Viruses ; 6(9): 3663-82, 2014 Sep 26.

Article in English | MEDLINE | ID: mdl-25256396

ABSTRACT

Sequence determination of complete or coding-complete genomes of viruses is becoming common practice for supporting the work of epidemiologists, ecologists, virologists, and taxonomists. Sequencing duration and costs are rapidly decreasing, sequencing hardware is under modification for use by non-experts, and software is constantly being improved to simplify sequence data management and analysis. Thus, analysis of virus disease outbreaks on the molecular level is now feasible, including characterization of the evolution of individual virus populations in single patients over time. The increasing accumulation of sequencing data creates a management problem for the curators of commonly used sequence databases and an entry retrieval problem for end users. Therefore, utilizing the data to their fullest potential will require setting nomenclature and annotation standards for virus isolates and associated genomic sequences. The National Center for Biotechnology Information's (NCBI's) RefSeq is a non-redundant, curated database for reference (or type) nucleotide sequence records that supplies source data to numerous other databases. Building on recently proposed templates for filovirus variant naming [ ()////-], we report consensus decisions from a majority of past and currently active filovirus experts on the eight filovirus type variants and isolates to be represented in RefSeq, their final designations, and their associated sequences.

Subject(s)

Databases, Nucleic Acid , Filoviridae/genetics , Evolution, Molecular , Filoviridae/classification , Humans , Selection, Genetic

7.

Detection of members of the Tombusviridae in the Tallgrass Prairie Preserve, Osage County, Oklahoma, USA.

Scheets, Kay; Blinkova, Olga; Melcher, Ulrich; Palmer, Michael W; Wiley, Graham B; Ding, Tao; Roe, Bruce A.

Virus Res ; 160(1-2): 256-63, 2011 Sep.

Article in English | MEDLINE | ID: mdl-21762736

ABSTRACT

Viruses are most frequently discovered because they cause disease in organisms of importance to humans. To expand knowledge of plant-associated viruses beyond these narrow constraints, non-cultivated plants of the Tallgrass Prairie Preserve, Osage County, Oklahoma, USA were systematically surveyed for evidence of the presence of viruses. This report discusses viruses of the family Tombusviridae putatively identified by the survey. Evidence of two carmoviruses, a tombusvirus, a panicovirus and an unclassifiable tombusvirid was found. The complete genome sequence was obtained for putative TGP carmovirus 1 from the legume Lespedeza procumbens, and the virus was detected in several other plant species including the fern Pellaea atropurpurea. Phylogenetic analysis of the sequence and partial sequence of a related virus supported strongly the placement of these viruses in the genus Carmovirus. Polymorphisms in the sequences suggested existence of two populations of TGP carmovirus 1 in the study area and year-to-year variations in infection by TGP carmovirus 3.

Subject(s)

Plant Diseases/virology , Tombusviridae/classification , Tombusviridae/isolation & purification , Cluster Analysis , Lespedeza/virology , Models, Molecular , Molecular Sequence Data , Nucleic Acid Conformation , Oklahoma , Phylogeny , Pteridaceae/virology , RNA, Viral/genetics , Sequence Analysis, DNA , Tombusviridae/genetics

8.

Novel circular DNA viruses in stool samples of wild-living chimpanzees.

Blinkova, Olga; Victoria, Joseph; Li, Yingying; Keele, Brandon F; Sanz, Crickette; Ndjango, Jean-Bosco N; Peeters, Martine; Travis, Dominic; Lonsdorf, Elizabeth V; Wilson, Michael L; Pusey, Anne E; Hahn, Beatrice H; Delwart, Eric L.

J Gen Virol ; 91(Pt 1): 74-86, 2010 Jan.

Article in English | MEDLINE | ID: mdl-19759238

ABSTRACT

Viral particles in stool samples from wild-living chimpanzees were analysed using random PCR amplification and sequencing. Sequences encoding proteins distantly related to the replicase protein of single-stranded circular DNA viruses were identified. Inverse PCR was used to amplify and sequence multiple small circular DNA viral genomes. The viral genomes were related in size and genome organization to vertebrate circoviruses and plant geminiviruses but with a different location for the stem-loop structure involved in rolling circle DNA replication. The replicase genes of these viruses were most closely related to those of the much smaller (approximately 1 kb) plant nanovirus circular DNA chromosomes. Because the viruses have characteristics of both animal and plant viruses, we named them chimpanzee stool-associated circular viruses (ChiSCV). Further metagenomic studies of animal samples will greatly increase our knowledge of viral diversity and evolution.

Subject(s)

Animals, Wild/virology , DNA Virus Infections/veterinary , DNA Viruses/isolation & purification , DNA, Circular/genetics , DNA, Viral/genetics , Feces/virology , Pan troglodytes/virology , Amino Acid Sequence , Animals , Circovirus/genetics , DNA Virus Infections/virology , DNA Viruses/genetics , Geminiviridae/genetics , Genes, Viral , Models, Molecular , Molecular Sequence Data , Nanovirus/genetics , Nucleic Acid Conformation , Phylogeny , Polymerase Chain Reaction/methods , Sequence Alignment , Sequence Analysis, DNA , Sequence Homology

9.

A novel picornavirus associated with gastroenteritis.

Li, Linlin; Victoria, Joseph; Kapoor, Amit; Blinkova, Olga; Wang, Chunlin; Babrzadeh, Farbod; Mason, Carl J; Pandey, Prativa; Triki, Hinda; Bahri, Olfa; Oderinde, Bamidele Soji; Baba, Marycelin Mandu; Bukbuk, David Nadeba; Besser, John M; Bartkus, Joanne M; Delwart, Eric L.

J Virol ; 83(22): 12002-6, 2009 Nov.

Article in English | MEDLINE | ID: mdl-19759142

ABSTRACT

A novel picornavirus genome was sequenced, showing 42.6%, 35.2%, and 44.6% of deduced amino acid identities corresponding to the P1, P2, and P3 regions, respectively, of the Aichi virus. Divergent strains of this new virus, which we named salivirus, were detected in 18 stool samples from Nigeria, Tunisia, Nepal, and the United States. A statistical association was seen between virus shedding and unexplained cases of gastroenteritis in Nepal (P = 0.0056). Viruses with approximately 90% nucleotide similarity, named klassevirus, were also recently reported in three cases of unexplained diarrhea from the United States and Australia and in sewage from Spain, reflecting a global distribution and supporting a pathogenic role for this new group of picornaviruses.

Subject(s)

Gastroenteritis/virology , Picornaviridae Infections/virology , Picornaviridae/genetics , Amino Acid Sequence , Base Sequence , Genome, Viral/genetics , Humans , Molecular Sequence Data , Phylogeny , Reverse Transcriptase Polymerase Chain Reaction , Viral Proteins/genetics

10.

Metagenomic analyses of viruses in stool samples from children with acute flaccid paralysis.

Victoria, Joseph G; Kapoor, Amit; Li, Linlin; Blinkova, Olga; Slikas, Beth; Wang, Chunlin; Naeem, Asif; Zaidi, Sohail; Delwart, Eric.

J Virol ; 83(9): 4642-51, 2009 May.

Article in English | MEDLINE | ID: mdl-19211756

ABSTRACT

We analyzed viral nucleic acids in stool samples collected from 35 South Asian children with nonpolio acute flaccid paralysis (AFP). Sequence-independent reverse transcription and PCR amplification of capsid-protected, nuclease-resistant viral nucleic acids were followed by DNA sequencing and sequence similarity searches. Limited Sanger sequencing (35 to 240 subclones per sample) identified an average of 1.4 distinct eukaryotic viruses per sample, while pyrosequencing yielded 2.6 viruses per sample. In addition to bacteriophage and plant viruses, we detected known enteric viruses, including rotavirus, adenovirus, picobirnavirus, and human enterovirus species A (HEV-A) to HEV-C, as well as numerous other members of the Picornaviridae family, including parechovirus, Aichi virus, rhinovirus, and human cardiovirus. The viruses with the most divergent sequences relative to those of previously reported viruses included members of a novel Picornaviridae genus and four new viral species (members of the Dicistroviridae, Nodaviridae, and Circoviridae families and the Bocavirus genus). Samples from six healthy contacts of AFP patients were similarly analyzed and also contained numerous viruses, particularly HEV-C, including a potentially novel Enterovirus genotype. Determining the prevalences and pathogenicities of the novel genotypes, species, genera, and potential new viral families identified in this study in different demographic groups will require further studies with different demographic and patient groups, now facilitated by knowledge of these viral genomes.

Subject(s)

Feces/virology , Genome, Viral/genetics , Neurosyphilis/virology , Acute Disease , Adolescent , Asia/epidemiology , Case-Control Studies , Child , Child, Preschool , Enterovirus/classification , Enterovirus/genetics , Enterovirus Infections/epidemiology , Enterovirus Infections/virology , Female , Health , Humans , Infant , Male , Neurosyphilis/blood , Neurosyphilis/epidemiology , Phylogeny , Sequence Analysis, DNA

11.

Cardioviruses are genetically diverse and cause common enteric infections in South Asian children.

Blinkova, Olga; Kapoor, Amit; Victoria, Joseph; Jones, Morris; Wolfe, Nathan; Naeem, Asif; Shaukat, Shahzad; Sharif, Salmaan; Alam, Muhammad Masroor; Angez, Mehar; Zaidi, Sohail; Delwart, Eric L.

J Virol ; 83(9): 4631-41, 2009 May.

Article in English | MEDLINE | ID: mdl-19193786

ABSTRACT

Cardioviruses cause enteric infections in mice and rats which when disseminated have been associated with myocarditis, type 1 diabetes, encephalitis, and multiple sclerosis-like symptoms. Cardioviruses have also been detected at lower frequencies in other mammals. The Cardiovirus genus within the Picornaviridae family is currently made up of two viral species, Theilovirus and Encephalomyocarditis virus. Until recently, only a single strain of cardioviruses (Vilyuisk virus within the Theilovirus species) associated with a geographically restricted and prevalent encephalitis-like condition had been reported to occur in humans. A second theilovirus-related cardiovirus (Saffold virus [SAFV]) was reported in 2007 and subsequently found in respiratory secretions from children with respiratory problems and in stools of both healthy and diarrheic children. Using viral metagenomics, we identified RNA fragments related to SAFV in the stools of Pakistani and Afghani children with nonpolio acute flaccid paralysis (AFP). We sequenced three near-full-length genomes, showing the presence of divergent strains of SAFV and preliminary evidence of a distant recombination event between the ancestors of the Theiler-like viruses of rats and those of human SAFV. Further VP1 sequencing showed the presence of five new SAFV genotypes, doubling the reported genetic diversity of human and animal theiloviruses combined. Both AFP patients and healthy children in Pakistan were found to be excreting SAFV at high frequencies of 9 and 12%, respectively. Further studies are needed to examine the roles of these highly common and diverse SAFV genotypes in nonpolio AFP and other human diseases.

Subject(s)

Cardiovirus Infections/epidemiology , Cardiovirus Infections/virology , Cardiovirus/genetics , Cardiovirus/isolation & purification , Genetic Variation/genetics , Intestinal Diseases/epidemiology , Intestinal Diseases/virology , Acute Disease , Amino Acid Sequence , Animals , Asia/epidemiology , Capsid Proteins/chemistry , Capsid Proteins/classification , Capsid Proteins/genetics , Capsid Proteins/metabolism , Cardiovirus/classification , Cardiovirus/metabolism , Case-Control Studies , Child, Preschool , Genome, Viral/genetics , Genotype , Health , Humans , Molecular Sequence Data , Muscle Hypotonia/virology , Phylogeny , Recombination, Genetic/genetics , Sequence Alignment , Sequence Analysis , Sequence Homology, Amino Acid

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL