Search | VHL Regional Portal

Evidence for stabilizing selection on codon usage in chromosomal rearrangements of Drosophila pseudoobscura.

Fuller, Zachary L; Haynes, Gwilym D; Zhu, Dianhui; Batterton, Matthew; Chao, Hsu; Dugan, Shannon; Javaid, Mehwish; Jayaseelan, Joy C; Lee, Sandra; Li, Mingmei; Ongeri, Fiona; Qi, Sulan; Han, Yi; Doddapaneni, Harshavardhan; Richards, Stephen; Schaeffer, Stephen W.

G3 (Bethesda) ; 4(12): 2433-49, 2014 Oct 17.

Article in English | MEDLINE | ID: mdl-25326424

ABSTRACT

There has been a renewed interest in investigating the role of stabilizing selection acting on genome-wide traits such as codon usage bias. Codon bias, when synonymous codons are used at unequal frequencies, occurs in a wide variety of taxa. Standard evolutionary models explain the maintenance of codon bias through a balance of genetic drift, mutation and weak purifying selection. The efficacy of selection is expected to be reduced in regions of suppressed recombination. Contrary to observations in Drosophila melanogaster, some recent studies have failed to detect a relationship between the recombination rate, intensity of selection acting at synonymous sites, and the magnitude of codon bias as predicted under these standard models. Here, we examined codon bias in 2798 protein coding loci on the third chromosome of D. pseudoobscura using whole-genome sequences of 47 individuals, representing five common third chromosome gene arrangements. Fine-scale recombination maps were constructed using more than 1 million segregating sites. As expected, recombination was demonstrated to be significantly suppressed between chromosome arrangements, allowing for a direct examination of the relationship between recombination, selection, and codon bias. As with other Drosophila species, we observe a strong mutational bias away from the most frequently used codons. We find the rate of synonymous and nonsynonymous polymorphism is variable between different amino acids. However, we do not observe a reduction in codon bias or the strength of selection in regions of suppressed recombination as expected. Instead, we find that the interaction between weak stabilizing selection and mutational bias likely plays a role in shaping the composition of synonymous codons across the third chromosome in D. pseudoobscura.

Subject(s)

Chromosomes/genetics , Drosophila/genetics , Animals , Codon , High-Throughput Nucleotide Sequencing , Polymorphism, Single Nucleotide , Recombination, Genetic , Selection, Genetic , Sequence Analysis, DNA

Natural variation in genome architecture among 205 Drosophila melanogaster Genetic Reference Panel lines.

Huang, Wen; Massouras, Andreas; Inoue, Yutaka; Peiffer, Jason; Ràmia, Miquel; Tarone, Aaron M; Turlapati, Lavanya; Zichner, Thomas; Zhu, Dianhui; Lyman, Richard F; Magwire, Michael M; Blankenburg, Kerstin; Carbone, Mary Anna; Chang, Kyle; Ellis, Lisa L; Fernandez, Sonia; Han, Yi; Highnam, Gareth; Hjelmen, Carl E; Jack, John R; Javaid, Mehwish; Jayaseelan, Joy; Kalra, Divya; Lee, Sandy; Lewis, Lora; Munidasa, Mala; Ongeri, Fiona; Patel, Shohba; Perales, Lora; Perez, Agapito; Pu, LingLing; Rollmann, Stephanie M; Ruth, Robert; Saada, Nehad; Warner, Crystal; Williams, Aneisa; Wu, Yuan-Qing; Yamamoto, Akihiko; Zhang, Yiqing; Zhu, Yiming; Anholt, Robert R H; Korbel, Jan O; Mittelman, David; Muzny, Donna M; Gibbs, Richard A; Barbadilla, Antonio; Johnston, J Spencer; Stone, Eric A; Richards, Stephen; Deplancke, Bart.

Genome Res ; 24(7): 1193-208, 2014 Jul.

Article in English | MEDLINE | ID: mdl-24714809

ABSTRACT

The Drosophila melanogaster Genetic Reference Panel (DGRP) is a community resource of 205 sequenced inbred lines, derived to improve our understanding of the effects of naturally occurring genetic variation on molecular and organismal phenotypes. We used an integrated genotyping strategy to identify 4,853,802 single nucleotide polymorphisms (SNPs) and 1,296,080 non-SNP variants. Our molecular population genomic analyses show higher deletion than insertion mutation rates and stronger purifying selection on deletions. Weaker selection on insertions than deletions is consistent with our observed distribution of genome size determined by flow cytometry, which is skewed toward larger genomes. Insertion/deletion and single nucleotide polymorphisms are positively correlated with each other and with local recombination, suggesting that their nonrandom distributions are due to hitchhiking and background selection. Our cytogenetic analysis identified 16 polymorphic inversions in the DGRP. Common inverted and standard karyotypes are genetically divergent and account for most of the variation in relatedness among the DGRP lines. Intriguingly, variation in genome size and many quantitative traits are significantly associated with inversions. Approximately 50% of the DGRP lines are infected with Wolbachia, and four lines have germline insertions of Wolbachia sequences, but effects of Wolbachia infection on quantitative traits are rarely significant. The DGRP complements ongoing efforts to functionally annotate the Drosophila genome. Indeed, 15% of all D. melanogaster genes segregate for potentially damaged proteins in the DGRP, and genome-wide analyses of quantitative traits identify novel candidate genes. The DGRP lines, sequence data, genotypes, quality scores, phenotypes, and analysis and visualization tools are publicly available.

Subject(s)

Drosophila melanogaster/genetics , Genetic Variation , Genome, Insect , Phenotype , Animals , Chromatin/genetics , Chromatin/metabolism , Drosophila melanogaster/microbiology , Female , Genetic Linkage , Genome Size , Genome-Wide Association Study , Genotype , Genotyping Techniques , High-Throughput Nucleotide Sequencing , INDEL Mutation , Linkage Disequilibrium , Male , Molecular Sequence Annotation , Polymorphism, Single Nucleotide , Quantitative Trait, Heritable , Reproducibility of Results

Finding the missing honey bee genes: lessons learned from a genome upgrade.

Elsik, Christine G; Worley, Kim C; Bennett, Anna K; Beye, Martin; Camara, Francisco; Childers, Christopher P; de Graaf, Dirk C; Debyser, Griet; Deng, Jixin; Devreese, Bart; Elhaik, Eran; Evans, Jay D; Foster, Leonard J; Graur, Dan; Guigo, Roderic; Hoff, Katharina Jasmin; Holder, Michael E; Hudson, Matthew E; Hunt, Greg J; Jiang, Huaiyang; Joshi, Vandita; Khetani, Radhika S; Kosarev, Peter; Kovar, Christie L; Ma, Jian; Maleszka, Ryszard; Moritz, Robin F A; Munoz-Torres, Monica C; Murphy, Terence D; Muzny, Donna M; Newsham, Irene F; Reese, Justin T; Robertson, Hugh M; Robinson, Gene E; Rueppell, Olav; Solovyev, Victor; Stanke, Mario; Stolle, Eckart; Tsuruda, Jennifer M; Vaerenbergh, Matthias Van; Waterhouse, Robert M; Weaver, Daniel B; Whitfield, Charles W; Wu, Yuanqing; Zdobnov, Evgeny M; Zhang, Lan; Zhu, Dianhui; Gibbs, Richard A.

BMC Genomics ; 15: 86, 2014 Jan 30.

Article in English | MEDLINE | ID: mdl-24479613

ABSTRACT

BACKGROUND: The first generation of genome sequence assemblies and annotations have had a significant impact upon our understanding of the biology of the sequenced species, the phylogenetic relationships among species, the study of populations within and across species, and have informed the biology of humans. As only a few Metazoan genomes are approaching finished quality (human, mouse, fly and worm), there is room for improvement of most genome assemblies. The honey bee (Apis mellifera) genome, published in 2006, was noted for its bimodal GC content distribution that affected the quality of the assembly in some regions and for fewer genes in the initial gene set (OGSv1.0) compared to what would be expected based on other sequenced insect genomes. RESULTS: Here, we report an improved honey bee genome assembly (Amel_4.5) with a new gene annotation set (OGSv3.2), and show that the honey bee genome contains a number of genes similar to that of other insect genomes, contrary to what was suggested in OGSv1.0. The new genome assembly is more contiguous and complete and the new gene set includes ~5000 more protein-coding genes, 50% more than previously reported. About 1/6 of the additional genes were due to improvements to the assembly, and the remaining were inferred based on new RNAseq and protein data. CONCLUSIONS: Lessons learned from this genome upgrade have important implications for future genome sequencing projects. Furthermore, the improvements significantly enhance genomic resources for the honey bee, a key model for social behavior and essential to global ecology through pollination.

Subject(s)

Bees/genetics , Genes, Insect , Animals , Base Composition , Databases, Genetic , Interspersed Repetitive Sequences/genetics , Molecular Sequence Annotation , Open Reading Frames/genetics , Peptides/analysis , Sequence Analysis, RNA , Sequence Homology, Amino Acid

Epistasis dominates the genetic architecture of Drosophila quantitative traits.

Huang, Wen; Richards, Stephen; Carbone, Mary Anna; Zhu, Dianhui; Anholt, Robert R H; Ayroles, Julien F; Duncan, Laura; Jordan, Katherine W; Lawrence, Faye; Magwire, Michael M; Warner, Crystal B; Blankenburg, Kerstin; Han, Yi; Javaid, Mehwish; Jayaseelan, Joy; Jhangiani, Shalini N; Muzny, Donna; Ongeri, Fiona; Perales, Lora; Wu, Yuan-Qing; Zhang, Yiqing; Zou, Xiaoyan; Stone, Eric A; Gibbs, Richard A; Mackay, Trudy F C.

Proc Natl Acad Sci U S A ; 109(39): 15553-9, 2012 Sep 25.

Article in English | MEDLINE | ID: mdl-22949659

ABSTRACT

Epistasis-nonlinear genetic interactions between polymorphic loci-is the genetic basis of canalization and speciation, and epistatic interactions can be used to infer genetic networks affecting quantitative traits. However, the role that epistasis plays in the genetic architecture of quantitative traits is controversial. Here, we compared the genetic architecture of three Drosophila life history traits in the sequenced inbred lines of the Drosophila melanogaster Genetic Reference Panel (DGRP) and a large outbred, advanced intercross population derived from 40 DGRP lines (Flyland). We assessed allele frequency changes between pools of individuals at the extremes of the distribution for each trait in the Flyland population by deep DNA sequencing. The genetic architecture of all traits was highly polygenic in both analyses. Surprisingly, none of the SNPs associated with the traits in Flyland replicated in the DGRP and vice versa. However, the majority of these SNPs participated in at least one epistatic interaction in the DGRP. Despite apparent additive effects at largely distinct loci in the two populations, the epistatic interactions perturbed common, biologically plausible, and highly connected genetic networks. Our analysis underscores the importance of epistasis as a principal factor that determines variation for quantitative traits and provides a means to uncover genetic networks affecting these traits. Knowledge of epistatic networks will contribute to our understanding of the genetic basis of evolutionarily and clinically important traits and enhance predictive ability at an individualized level in medicine and agriculture.

Subject(s)

Epistasis, Genetic/physiology , Genes, Insect/physiology , Quantitative Trait, Heritable , Animals , Drosophila melanogaster , Polymorphism, Single Nucleotide

Using whole-genome sequence data to predict quantitative trait phenotypes in Drosophila melanogaster.

Ober, Ulrike; Ayroles, Julien F; Stone, Eric A; Richards, Stephen; Zhu, Dianhui; Gibbs, Richard A; Stricker, Christian; Gianola, Daniel; Schlather, Martin; Mackay, Trudy F C; Simianer, Henner.

PLoS Genet ; 8(5): e1002685, 2012.

Article in English | MEDLINE | ID: mdl-22570636

ABSTRACT

Predicting organismal phenotypes from genotype data is important for plant and animal breeding, medicine, and evolutionary biology. Genomic-based phenotype prediction has been applied for single-nucleotide polymorphism (SNP) genotyping platforms, but not using complete genome sequences. Here, we report genomic prediction for starvation stress resistance and startle response in Drosophila melanogaster, using â¼2.5 million SNPs determined by sequencing the Drosophila Genetic Reference Panel population of inbred lines. We constructed a genomic relationship matrix from the SNP data and used it in a genomic best linear unbiased prediction (GBLUP) model. We assessed predictive ability as the correlation between predicted genetic values and observed phenotypes by cross-validation, and found a predictive ability of 0.239±0.008 (0.230±0.012) for starvation resistance (startle response). The predictive ability of BayesB, a Bayesian method with internal SNP selection, was not greater than GBLUP. Selection of the 5% SNPs with either the highest absolute effect or variance explained did not improve predictive ability. Predictive ability decreased only when fewer than 150,000 SNPs were used to construct the genomic relationship matrix. We hypothesize that predictive power in this population stems from the SNP-based modeling of the subtle relationship structure caused by long-range linkage disequilibrium and not from population structure or SNPs in linkage disequilibrium with causal variants. We discuss the implications of these results for genomic prediction in other organisms.

Subject(s)

Drosophila melanogaster/genetics , Genome, Insect , Genotype , Quantitative Trait Loci , Animals , Bayes Theorem , Chromosome Mapping , Genetics, Population , Linkage Disequilibrium , Models, Genetic , Models, Theoretical , Phenotype , Polymorphism, Single Nucleotide , Selection, Genetic , Sequence Analysis, DNA

The Drosophila melanogaster Genetic Reference Panel.

Mackay, Trudy F C; Richards, Stephen; Stone, Eric A; Barbadilla, Antonio; Ayroles, Julien F; Zhu, Dianhui; Casillas, Sònia; Han, Yi; Magwire, Michael M; Cridland, Julie M; Richardson, Mark F; Anholt, Robert R H; Barrón, Maite; Bess, Crystal; Blankenburg, Kerstin Petra; Carbone, Mary Anna; Castellano, David; Chaboub, Lesley; Duncan, Laura; Harris, Zeke; Javaid, Mehwish; Jayaseelan, Joy Christina; Jhangiani, Shalini N; Jordan, Katherine W; Lara, Fremiet; Lawrence, Faye; Lee, Sandra L; Librado, Pablo; Linheiro, Raquel S; Lyman, Richard F; Mackey, Aaron J; Munidasa, Mala; Muzny, Donna Marie; Nazareth, Lynne; Newsham, Irene; Perales, Lora; Pu, Ling-Ling; Qu, Carson; Ràmia, Miquel; Reid, Jeffrey G; Rollmann, Stephanie M; Rozas, Julio; Saada, Nehad; Turlapati, Lavanya; Worley, Kim C; Wu, Yuan-Qing; Yamamoto, Akihiko; Zhu, Yiming; Bergman, Casey M; Thornton, Kevin R.

Nature ; 482(7384): 173-8, 2012 Feb 08.

Article in English | MEDLINE | ID: mdl-22318601

ABSTRACT

A major challenge of biology is understanding the relationship between molecular genetic variation and variation in quantitative traits, including fitness. This relationship determines our ability to predict phenotypes from genotypes and to understand how evolutionary forces shape variation within and between species. Previous efforts to dissect the genotype-phenotype map were based on incomplete genotypic information. Here, we describe the Drosophila melanogaster Genetic Reference Panel (DGRP), a community resource for analysis of population genomics and quantitative traits. The DGRP consists of fully sequenced inbred lines derived from a natural population. Population genomic analyses reveal reduced polymorphism in centromeric autosomal regions and the X chromosome, evidence for positive and negative selection, and rapid evolution of the X chromosome. Many variants in novel genes, most at low frequency, are associated with quantitative traits and explain a large fraction of the phenotypic variance. The DGRP facilitates genotype-phenotype mapping using the power of Drosophila genetics.

Subject(s)

Drosophila melanogaster/genetics , Genome-Wide Association Study , Genomics , Quantitative Trait Loci/genetics , Alleles , Animals , Centromere/genetics , Chromosomes, Insect/genetics , Genotype , Phenotype , Polymorphism, Single Nucleotide/genetics , Selection, Genetic/genetics , Starvation/genetics , Telomere/genetics , X Chromosome/genetics

STITCH: algorithm to splice, trim, identify, track, and capture the uniqueness of 16S rRNAs sequence pairs using public or in-house database.

Zhu, Dianhui; Vaishampayan, Parag A; Venkateswaran, Kasthuri; Fox, George E.

Microb Ecol ; 61(3): 669-75, 2011 Apr.

Article in English | MEDLINE | ID: mdl-21113709

ABSTRACT

A comparison of variable regions within the 16S rRNA gene is widely used to characterize relationships between bacteria and to identify phylogenetic affiliation of unknown bacteria. In environmental studies, polymerase chain reaction amplification of 16S rRNA followed by cloning and sequencing of numerous individual clones is an extensively used molecular method for elucidating microbial diversity. The sequencing process typically utilizes a forward and reverse primer pair to produce two partial reads (~700 to 800 base pairs each) that overlap and in total cover a large region of the full 16S rRNA sequence (~1.5 k base). In a typical application, this approach rapidly generates very large numbers of 16S rRNA datasets that can overwhelm manual processing efforts leading to both delays and errors. In particular, the approach presents two computational challenges: (1) the assembly of a composite sequence from the two partial reads and (2) the subsequent appropriate identification of the organism represented by the newly sequenced clones. Herein, we describe a software package, search, trim, identify, track, and capture the uniqueness of 16S rRNAs using public and in-house database (STITCH), which offers automated sequence pair splicing and genetic identification, thus simplifying the computationally intensive analysis of large sequencing libraries. The STITCH software is freely accessible over the Internet at: http://prion.bchs.uh.edu/stitch/.

Subject(s)

Algorithms , Computational Biology/methods , Databases, Nucleic Acid , RNA, Ribosomal, 16S/genetics , Sequence Analysis, RNA/methods , Bacteria/classification , Bacteria/genetics , RNA Splicing , Software , User-Computer Interface

RECOVIR: an application package to automatically identify some single stranded RNA viruses using capsid protein residues that uniquely distinguish among these viruses.

Zhu, Dianhui; Fox, George E; Chakravarty, Sugoto.

BMC Bioinformatics ; 8: 379, 2007 Oct 10.

Article in English | MEDLINE | ID: mdl-17927830

ABSTRACT

BACKGROUND: Most single stranded RNA (ssRNA) viruses mutate rapidly to generate large number of strains having highly divergent capsid sequences. Accurate strain recognition in uncharacterized target capsid sequences is essential for epidemiology, diagnostics, and vaccine development. Strain recognition based on similarity scores between target sequences and sequences of homology matched reference strains is often time consuming and ambiguous. This is especially true if only partial target sequences are available or if different ssRNA virus families are jointly analyzed. In such cases, knowledge of residues that uniquely distinguish among known reference strains is critical for rapid and unambiguous strain identification. Conventional sequence comparisons are unable to identify such capsid residues due to high sequence divergence among the ssRNA virus reference strains. Consequently, automated general methods to reliably identify strains using strain distinguishing residues are not currently available. RESULTS: We present here RECOVIR ("recognize viruses"), a software tool to automatically detect strains of caliciviruses and picornaviruses by comparing their capsid residues with built-in databases of residues that uniquely distinguish among known reference strains of these viruses. The databases were created by constructing partitioned phylogenetic trees of complete capsid sequences of these viruses. Strains were correctly identified for more than 300 complete and partial target sequences by comparing the database residues with the aligned residues of these sequences. It required about 5 seconds of real time to process each sequence. A Java-based user interface coupled with Perl-coded computational modules ensures high portability of the software. RECOVIR currently runs on Windows XP and Linux platforms. The software generalizes a manual method briefly outlined earlier for human caliciviruses. CONCLUSION: This study shows implementation of an automated method to identify virus strains using databases of capsid residues. The method is implemented to detect strains of caliciviruses and picornaviruses, two of the most highly divergent ssRNA virus families, and therefore, especially difficult to identify using a uniform method. It is feasible to incorporate the approach into classification schemes of caliciviruses and picornaviruses and to extend the approach to recognize and classify other ssRNA virus families.

Subject(s)

Capsid Proteins/chemistry , Chromosome Mapping/methods , Pattern Recognition, Automated/methods , RNA Viruses/isolation & purification , RNA Viruses/metabolism , Sequence Analysis, Protein/methods , Software , Algorithms , Amino Acid Sequence , Artificial Intelligence , Molecular Sequence Data

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL