Search | VHL Regional Portal

1.

Publisher Correction: Genomes of 13 domesticated and wild rice relatives highlight genetic conservation, turnover and innovation across the genus Oryza.

Stein, Joshua C; Yu, Yeisoo; Copetti, Dario; Zwickl, Derrick J; Zhang, Li; Zhang, Chengjun; Chougule, Kapeel; Gao, Dongying; Iwata, Aiko; Goicoechea, Jose Luis; Wei, Sharon; Wang, Jun; Liao, Yi; Wang, Muhua; Jacquemin, Julie; Becker, Claude; Kudrna, Dave; Zhang, Jianwei; Londono, Carlos E M; Song, Xiang; Lee, Seunghee; Sanchez, Paul; Zuccolo, Andrea; Ammiraju, Jetty S S; Talag, Jayson; Danowitz, Ann; Rivera, Luis F; Gschwend, Andrea R; Noutsos, Christos; Wu, Cheng-Chieh; Kao, Shu-Min; Zeng, Jhih-Wun; Wei, Fu-Jin; Zhao, Qiang; Feng, Qi; El Baidouri, Moaine; Carpentier, Marie-Christine; Lasserre, Eric; Cooke, Richard; da Rosa Farias, Daniel; da Maia, Luciano Carlos; Dos Santos, Railson S; Nyberg, Kevin G; McNally, Kenneth L; Mauleon, Ramil; Alexandrov, Nickolai; Schmutz, Jeremy; Flowers, Dave; Fan, Chuanzhu; Weigel, Detlef.

Nat Genet ; 50(11): 1618, 2018 11.

Article in English | MEDLINE | ID: mdl-30291357

ABSTRACT

This article was not made open access when initially published online, which was corrected before print publication. In addition, ORCID links were missing for 12 authors and have been added to the HTML and PDF versions of the article.

2.

The prevalence of terraced treescapes in analyses of phylogenetic data sets.

Dobrin, Barbara H; Zwickl, Derrick J; Sanderson, Michael J.

BMC Evol Biol ; 18(1): 46, 2018 04 04.

Article in English | MEDLINE | ID: mdl-29618314

ABSTRACT

BACKGROUND: The pattern of data availability in a phylogenetic data set may lead to the formation of terraces, collections of equally optimal trees. Terraces can arise in tree space if trees are scored with parsimony or with partitioned, edge-unlinked maximum likelihood. Theory predicts that terraces can be large, but their prevalence in contemporary data sets has never been surveyed. We selected 26 data sets and phylogenetic trees reported in recent literature and investigated the terraces to which the trees would belong, under a common set of inference assumptions. We examined terrace size as a function of the sampling properties of the data sets, including taxon coverage density (the proportion of taxon-by-gene positions with any data present) and a measure of gene sampling "sufficiency". We evaluated each data set in relation to the theoretical minimum gene sampling depth needed to reduce terrace size to a single tree, and explored the impact of the terraces found in replicate trees in bootstrap methods. RESULTS: Terraces were identified in nearly all data sets with taxon coverage densities < 0.90. They were not found, however, in high-coverage-density (i.e., ≥ 0.94) transcriptomic and genomic data sets. The terraces could be very large, and size varied inversely with taxon coverage density and with gene sampling sufficiency. Few data sets achieved a theoretical minimum gene sampling depth needed to reduce terrace size to a single tree. Terraces found during bootstrap resampling reduced overall support. CONCLUSIONS: If certain inference assumptions apply, trees estimated from empirical data sets often belong to large terraces of equally optimal trees. Terrace size correlates to data set sampling properties. Data sets seldom include enough genes to reduce terrace size to one tree. When bootstrap replicate trees lie on a terrace, statistical support for phylogenetic hypotheses may be reduced. Although some of the published analyses surveyed were conducted with edge-linked inference models (which do not induce terraces), unlinked models have been used and advocated. The present study describes the potential impact of that inference assumption on phylogenetic inference in the context of the kinds of multigene data sets now widely assembled for large-scale tree construction.

Subject(s)

Databases, Genetic , Phylogeny , Genes , Models, Genetic

3.

Genomes of 13 domesticated and wild rice relatives highlight genetic conservation, turnover and innovation across the genus Oryza.

Stein, Joshua C; Yu, Yeisoo; Copetti, Dario; Zwickl, Derrick J; Zhang, Li; Zhang, Chengjun; Chougule, Kapeel; Gao, Dongying; Iwata, Aiko; Goicoechea, Jose Luis; Wei, Sharon; Wang, Jun; Liao, Yi; Wang, Muhua; Jacquemin, Julie; Becker, Claude; Kudrna, Dave; Zhang, Jianwei; Londono, Carlos E M; Song, Xiang; Lee, Seunghee; Sanchez, Paul; Zuccolo, Andrea; Ammiraju, Jetty S S; Talag, Jayson; Danowitz, Ann; Rivera, Luis F; Gschwend, Andrea R; Noutsos, Christos; Wu, Cheng-Chieh; Kao, Shu-Min; Zeng, Jhih-Wun; Wei, Fu-Jin; Zhao, Qiang; Feng, Qi; El Baidouri, Moaine; Carpentier, Marie-Christine; Lasserre, Eric; Cooke, Richard; Rosa Farias, Daniel da; da Maia, Luciano Carlos; Dos Santos, Railson S; Nyberg, Kevin G; McNally, Kenneth L; Mauleon, Ramil; Alexandrov, Nickolai; Schmutz, Jeremy; Flowers, Dave; Fan, Chuanzhu; Weigel, Detlef.

Nat Genet ; 50(2): 285-296, 2018 02.

Article in English | MEDLINE | ID: mdl-29358651

ABSTRACT

The genus Oryza is a model system for the study of molecular evolution over time scales ranging from a few thousand to 15 million years. Using 13 reference genomes spanning the Oryza species tree, we show that despite few large-scale chromosomal rearrangements rapid species diversification is mirrored by lineage-specific emergence and turnover of many novel elements, including transposons, and potential new coding and noncoding genes. Our study resolves controversial areas of the Oryza phylogeny, showing a complex history of introgression among different chromosomes in the young 'AA' subclade containing the two domesticated species. This study highlights the prevalence of functionally coupled disease resistance genes and identifies many new haplotypes of potential use for future crop protection. Finally, this study marks a milestone in modern rice research with the release of a complete long-read assembly of IR 8 'Miracle Rice', which relieved famine and drove the Green Revolution in Asia 50 years ago.

Subject(s)

Crops, Agricultural/genetics , Evolution, Molecular , Genetic Variation , Oryza/classification , Oryza/genetics , Conserved Sequence , Domestication , Genetic Speciation , Genome, Plant , Phylogeny

4.

Impacts of Terraces on Phylogenetic Inference.

Sanderson, Michael J; McMahon, Michelle M; Stamatakis, Alexandros; Zwickl, Derrick J; Steel, Mike.

Syst Biol ; 64(5): 709-26, 2015 Sep.

Article in English | MEDLINE | ID: mdl-25999395

ABSTRACT

Terraces are sets of trees with precisely the same likelihood or parsimony score, which can be induced by missing sequences in partitioned multi-locus phylogenetic data matrices. The potentially large set of trees on a terrace can be characterized by enumeration algorithms or consensus methods that exploit the pattern of partial taxon coverage in the data, independent of the sequence data themselves. Terraces can add ambiguity and complexity to phylogenetic inference, particularly in settings where inference is already challenging: data sets with many taxa and relatively few loci. In this article we present five new findings about terraces and their impacts on phylogenetic inference. First, we clarify assumptions about partitioning scheme model parameters that are necessary for the existence of terraces. Second, we explore the dependence of terrace size on partitioning scheme and indicate how to find the partitioning scheme associated with the largest terrace containing a given tree. Third, we highlight the impact of terrace size on bootstrap estimates of confidence limits in clades, and characterize the surprising result that the bootstrap proportion for a clade, as it is usually calculated, can be entirely determined by the frequency of bipartitions on a terrace, with some bipartitions receiving high support even when incorrect. Fourth, we dissect some effects of prior distributions of edge lengths on the computed posterior probabilities of clades on terraces, to understand an example in which long edges "attract" each other in Bayesian inference. Fifth, we describe how assuming relationships between edge-lengths of different loci, as an attempt to avoid terraces, can also be problematic when taxon coverage is partial, specifically when heterotachy is present. Finally, we discuss strategies for remediation of some of these problems. One promising approach finds a minimal set of taxa which, when deleted from the data matrix, reduces the size of a terrace to a single tree.

Subject(s)

Classification/methods , Computer Simulation/standards , Phylogeny , Models, Genetic

5.

Endogenous florendoviruses are major components of plant genomes and hallmarks of virus evolution.

Geering, Andrew D W; Maumus, Florian; Copetti, Dario; Choisne, Nathalie; Zwickl, Derrick J; Zytnicki, Matthias; McTaggart, Alistair R; Scalabrin, Simone; Vezzulli, Silvia; Wing, Rod A; Quesneville, Hadi; Teycheney, Pierre-Yves.

Nat Commun ; 5: 5269, 2014 Nov 10.

Article in English | MEDLINE | ID: mdl-25381880

ABSTRACT

The extent and importance of endogenous viral elements have been extensively described in animals but are much less well understood in plants. Here we describe a new genus of Caulimoviridae called 'Florendovirus', members of which have colonized the genomes of a large diversity of flowering plants, sometimes at very high copy numbers (>0.5% total genome content). The genome invasion of Oryza is dated to over 1.8 million years ago (MYA) but phylogeographic evidence points to an even older age of 20-34 MYA for this virus group. Some appear to have had a bipartite genome organization, a unique characteristic among viral retroelements. In Vitis vinifera, 9% of the endogenous florendovirus loci are located within introns and therefore may influence host gene expression. The frequent colocation of endogenous florendovirus loci with TA simple sequence repeats, which are associated with chromosome fragility, suggests sequence capture during repair of double-stranded DNA breaks.

Subject(s)

Caulimoviridae/genetics , Evolution, Molecular , Genome, Plant/genetics , Oryza/virology , Phylogeny , Gene Dosage/genetics , Genetic Loci/genetics , Introns/genetics , Microsatellite Repeats/genetics , Virus Replication/genetics

6.

A gateway for phylogenetic analysis powered by grid computing featuring GARLI 2.0.

Bazinet, Adam L; Zwickl, Derrick J; Cummings, Michael P.

Syst Biol ; 63(5): 812-8, 2014 Sep.

Article in English | MEDLINE | ID: mdl-24789072

ABSTRACT

We introduce molecularevolution.org, a publicly available gateway for high-throughput, maximum-likelihood phylogenetic analysis powered by grid computing. The gateway features a garli 2.0 web service that enables a user to quickly and easily submit thousands of maximum likelihood tree searches or bootstrap searches that are executed in parallel on distributed computing resources. The garli web service allows one to easily specify partitioned substitution models using a graphical interface, and it performs sophisticated post-processing of phylogenetic results. Although the garli web service has been used by the research community for over three years, here we formally announce the availability of the service, describe its capabilities, highlight new features and recent improvements, and provide details about how the grid system efficiently delivers high-quality phylogenetic results.

Subject(s)

Classification/methods , Phylogeny , Software , Access to Information , Internet

7.

Disentangling methodological and biological sources of gene tree discordance on Oryza (Poaceae) chromosome 3.

Zwickl, Derrick J; Stein, Joshua C; Wing, Rod A; Ware, Doreen; Sanderson, Michael J.

Syst Biol ; 63(5): 645-59, 2014 Sep.

Article in English | MEDLINE | ID: mdl-24721692

ABSTRACT

We describe new methods for characterizing gene tree discordance in phylogenomic data sets, which screen for deviations from neutral expectations, summarize variation in statistical support among gene trees, and allow comparison of the patterns of discordance induced by various analysis choices. Using an exceptionally complete set of genome sequences for the short arm of chromosome 3 in Oryza (rice) species, we applied these methods to identify the causes and consequences of differing patterns of discordance in the sets of gene trees inferred using a panel of 20 distinct analysis pipelines. We found that discordance patterns were strongly affected by aspects of data selection, alignment, and alignment masking. Unusual patterns of discordance evident when using certain pipelines were reduced or eliminated by using alternative pipelines, suggesting that they were the product of methodological biases rather than evolutionary processes. In some cases, once such biases were eliminated, evolutionary processes such as introgression could be implicated. Additionally, patterns of gene tree discordance had significant downstream impacts on species tree inference. For example, inference from supermatrices was positively misleading when pipelines that led to biased gene trees were used. Several results may generalize to other data sets: we found that gene tree and species tree inference gave more reasonable results when intron sequence was included during sequence alignment and tree inference, the alignment software PRANK was used, and detectable "block-shift" alignment artifacts were removed. We discuss our findings in the context of well-established relationships in Oryza and continuing controversies regarding the domestication history of O. sativa.

Subject(s)

Chromosomes, Plant/genetics , Classification/methods , Oryza/classification , Oryza/genetics , Phylogeny , Genome, Plant/genetics

8.

A simple method for estimating informative node age priors for the fossil calibration of molecular divergence time analyses.

Nowak, Michael D; Smith, Andrew B; Simpson, Carl; Zwickl, Derrick J.

PLoS One ; 8(6): e66245, 2013.

Article in English | MEDLINE | ID: mdl-23755303

ABSTRACT

Molecular divergence time analyses often rely on the age of fossil lineages to calibrate node age estimates. Most divergence time analyses are now performed in a Bayesian framework, where fossil calibrations are incorporated as parametric prior probabilities on node ages. It is widely accepted that an ideal parameterization of such node age prior probabilities should be based on a comprehensive analysis of the fossil record of the clade of interest, but there is currently no generally applicable approach for calculating such informative priors. We provide here a simple and easily implemented method that employs fossil data to estimate the likely amount of missing history prior to the oldest fossil occurrence of a clade, which can be used to fit an informative parametric prior probability distribution on a node age. Specifically, our method uses the extant diversity and the stratigraphic distribution of fossil lineages confidently assigned to a clade to fit a branching model of lineage diversification. Conditioning this on a simple model of fossil preservation, we estimate the likely amount of missing history prior to the oldest fossil occurrence of a clade. The likelihood surface of missing history can then be translated into a parametric prior probability distribution on the age of the clade of interest. We show that the method performs well with simulated fossil distribution data, but that the likelihood surface of missing history can at times be too complex for the distribution-fitting algorithm employed by our software tool. An empirical example of the application of our method is performed to estimate echinoid node ages. A simulation-based sensitivity analysis using the echinoid data set shows that node age prior distributions estimated under poor preservation rates are significantly less informative than those estimated under high preservation rates.

Subject(s)

Genetic Speciation , Models, Genetic , Algorithms , Animals , Bayes Theorem , Calibration , Evolution, Molecular , Fossils , Likelihood Functions , Models, Statistical , Sea Urchins/genetics , Software

9.

A large-scale, higher-level, molecular phylogenetic study of the insect order Lepidoptera (moths and butterflies).

Regier, Jerome C; Mitter, Charles; Zwick, Andreas; Bazinet, Adam L; Cummings, Michael P; Kawahara, Akito Y; Sohn, Jae-Cheon; Zwickl, Derrick J; Cho, Soowon; Davis, Donald R; Baixeras, Joaquin; Brown, John; Parr, Cynthia; Weller, Susan; Lees, David C; Mitter, Kim T.

PLoS One ; 8(3): e58568, 2013.

Article in English | MEDLINE | ID: mdl-23554903

ABSTRACT

BACKGROUND: Higher-level relationships within the Lepidoptera, and particularly within the species-rich subclade Ditrysia, are generally not well understood, although recent studies have yielded progress. We present the most comprehensive molecular analysis of lepidopteran phylogeny to date, focusing on relationships among superfamilies. METHODOLOGY PRINCIPAL FINDINGS: 483 taxa spanning 115 of 124 families were sampled for 19 protein-coding nuclear genes, from which maximum likelihood tree estimates and bootstrap percentages were obtained using GARLI. Assessment of heuristic search effectiveness showed that better trees and higher bootstrap percentages probably remain to be discovered even after 1000 or more search replicates, but further search proved impractical even with grid computing. Other analyses explored the effects of sampling nonsynonymous change only versus partitioned and unpartitioned total nucleotide change; deletion of rogue taxa; and compositional heterogeneity. Relationships among the non-ditrysian lineages previously inferred from morphology were largely confirmed, plus some new ones, with strong support. Robust support was also found for divergences among non-apoditrysian lineages of Ditrysia, but only rarely so within Apoditrysia. Paraphyly for Tineoidea is strongly supported by analysis of nonsynonymous-only signal; conflicting, strong support for tineoid monophyly when synonymous signal was added back is shown to result from compositional heterogeneity. CONCLUSIONS SIGNIFICANCE: Support for among-superfamily relationships outside the Apoditrysia is now generally strong. Comparable support is mostly lacking within Apoditrysia, but dramatically increased bootstrap percentages for some nodes after rogue taxon removal, and concordance with other evidence, strongly suggest that our picture of apoditrysian phylogeny is approximately correct. This study highlights the challenge of finding optimal topologies when analyzing hundreds of taxa. It also shows that some nodes get strong support only when analysis is restricted to nonsynonymous change, while total change is necessary for strong support of others. Thus, multiple types of analyses will be necessary to fully resolve lepidopteran phylogeny.

Subject(s)

Butterflies/genetics , Moths/genetics , Phylogeny , Animals , Butterflies/classification , Moths/classification

10.

Resolving discrepancy between nucleotides and amino acids in deep-level arthropod phylogenomics: differentiating serine codons in 21-amino-acid models.

Zwick, Andreas; Regier, Jerome C; Zwickl, Derrick J.

PLoS One ; 7(11): e47450, 2012.

Article in English | MEDLINE | ID: mdl-23185239

ABSTRACT

BACKGROUND: In a previous study of higher-level arthropod phylogeny, analyses of nucleotide sequences from 62 protein-coding nuclear genes for 80 panarthopod species yielded significantly higher bootstrap support for selected nodes than did amino acids. This study investigates the cause of that discrepancy. METHODOLOGY/PRINCIPAL FINDINGS: The hypothesis is tested that failure to distinguish the serine residues encoded by two disjunct clusters of codons (TCN, AGY) in amino acid analyses leads to this discrepancy. In one test, the two clusters of serine codons (Ser1, Ser2) are conceptually translated as separate amino acids. Analysis of the resulting 21-amino-acid data matrix shows striking increases in bootstrap support, in some cases matching that in nucleotide analyses. In a second approach, nucleotide and 20-amino-acid data sets are artificially altered through targeted deletions, modifications, and replacements, revealing the pivotal contributions of distinct Ser1 and Ser2 codons. We confirm that previous methods of coding nonsynonymous nucleotide change are robust and computationally efficient by introducing two new degeneracy coding methods. We demonstrate for degeneracy coding that neither compositional heterogeneity at the level of nucleotides nor codon usage bias between Ser1 and Ser2 clusters of codons (or their separately coded amino acids) is a major source of non-phylogenetic signal. CONCLUSIONS: The incongruity in support between amino-acid and nucleotide analyses of the forementioned arthropod data set is resolved by showing that "standard" 20-amino-acid analyses yield lower node support specifically when serine provides crucial signal. Separate coding of Ser1 and Ser2 residues yields support commensurate with that found by degenerated nucleotides, without introducing phylogenetic artifacts. While exclusion of all serine data leads to reduced support for serine-sensitive nodes, these nodes are still recovered in the ML topology, indicating that the enhanced signal from Ser1 and Ser2 is not qualitatively different from that of the other amino acids.

Subject(s)

Amino Acids/genetics , Arthropods/genetics , Codon/genetics , Genomics/methods , Nucleotides/genetics , Phylogeny , Serine/genetics , Animals , Databases, Genetic , Likelihood Functions , Models, Genetic , Terminology as Topic

11.

BEAGLE: an application programming interface and high-performance computing library for statistical phylogenetics.

Ayres, Daniel L; Darling, Aaron; Zwickl, Derrick J; Beerli, Peter; Holder, Mark T; Lewis, Paul O; Huelsenbeck, John P; Ronquist, Fredrik; Swofford, David L; Cummings, Michael P; Rambaut, Andrew; Suchard, Marc A.

Syst Biol ; 61(1): 170-3, 2012 Jan.

Article in English | MEDLINE | ID: mdl-21963610

ABSTRACT

Phylogenetic inference is fundamental to our understanding of most aspects of the origin and evolution of life, and in recent years, there has been a concentration of interest in statistical approaches such as Bayesian inference and maximum likelihood estimation. Yet, for large data sets and realistic or interesting models of evolution, these approaches remain computationally demanding. High-throughput sequencing can yield data for thousands of taxa, but scaling to such problems using serial computing often necessitates the use of nonstatistical or approximate approaches. The recent emergence of graphics processing units (GPUs) provides an opportunity to leverage their excellent floating-point computational performance to accelerate statistical phylogenetic inference. A specialized library for phylogenetic calculation would allow existing software packages to make more effective use of available computer hardware, including GPUs. Adoption of a common library would also make it easier for other emerging computing architectures, such as field programmable gate arrays, to be used in the future. We present BEAGLE, an application programming interface (API) and library for high-performance statistical phylogenetic inference. The API provides a uniform interface for performing phylogenetic likelihood calculations on a variety of compute hardware platforms. The library includes a set of efficient implementations and can currently exploit hardware including GPUs using NVIDIA CUDA, central processing units (CPUs) with Streaming SIMD Extensions and related processor supplementary instruction sets, and multicore CPUs via OpenMP. To demonstrate the advantages of a common API, we have incorporated the library into several popular phylogenetic software packages. The BEAGLE library is free open source software licensed under the Lesser GPL and available from http://beagle-lib.googlecode.com. An example client program is available as public domain software.

Subject(s)

Computational Biology/methods , Phylogeny , Software , Algorithms , Computing Methodologies , Evolution, Molecular , Genome

12.

Old gene duplication facilitates origin and diversification of an innovative communication system--twice.

Arnegard, Matthew E; Zwickl, Derrick J; Lu, Ying; Zakon, Harold H.

Proc Natl Acad Sci U S A ; 107(51): 22172-7, 2010 Dec 21.

Article in English | MEDLINE | ID: mdl-21127261

ABSTRACT

The genetic basis of parallel innovation remains poorly understood due to the rarity of independent origins of the same complex trait among model organisms. We focus on two groups of teleost fishes that independently gained myogenic electric organs underlying electrical communication. Earlier work suggested that a voltage-gated sodium channel gene (Scn4aa), which arose by whole-genome duplication, was neofunctionalized for expression in electric organ and subsequently experienced strong positive selection. However, it was not possible to determine if these changes were temporally linked to the independent origins of myogenic electric organs in both lineages. Here, we test predictions of such a relationship. We show that Scn4aa co-option and rapid sequence evolution were tightly coupled to the two origins of electric organ, providing strong evidence that Scn4aa contributed to parallel innovations underlying the evolutionary diversification of each electric fish group. Independent evolution of electric organs and Scn4aa co-option occurred more than 100 million years following the origin of Scn4aa by duplication. During subsequent diversification of the electrical communication channels, amino acid substitutions in both groups occurred in the same regions of the sodium channel that likely contribute to electric signal variation. Thus, the phenotypic similarities between independent electric fish groups are also associated with striking parallelism at genetic and molecular levels. Our results show that gene duplication can contribute to remarkably similar innovations in repeatable ways even after long waiting periods between gene duplication and the origins of novelty.

Subject(s)

Electric Organ/physiology , Evolution, Molecular , Fish Proteins/genetics , Fishes/genetics , Gene Duplication/genetics , Sodium Channels/genetics , Amino Acid Sequence , Amino Acid Substitution , Animals , Genome-Wide Association Study , Humans , Molecular Sequence Data

13.

Source identification in two criminal cases using phylogenetic analysis of HIV-1 DNA sequences.

Scaduto, Diane I; Brown, Jeremy M; Haaland, Wade C; Zwickl, Derrick J; Hillis, David M; Metzker, Michael L.

Proc Natl Acad Sci U S A ; 107(50): 21242-7, 2010 Dec 14.

Article in English | MEDLINE | ID: mdl-21078965

ABSTRACT

Phylogenetic analysis has been widely used to test the a priori hypothesis of epidemiological clustering in suspected transmission chains of HIV-1. Among studies showing strong support for relatedness between HIV samples obtained from infected individuals, evidence for the direction of transmission between epidemiologically related pairs has been lacking. During transmission of HIV, a genetic bottleneck occurs, resulting in the paraphyly of source viruses with respect to those of the recipient. This paraphyly establishes the direction of transmission, from which the source can then be inferred. Here, we present methods and results from two criminal cases, State of Washington v Anthony Eugene Whitfield, case number 04-1-0617-5 (Superior Court of the State of Washington, Thurston County, 2004) and State of Texas v Philippe Padieu, case numbers 219-82276-07, 219-82277-07, 219-82278-07, 219-82279-07, 219-82280-07, and 219-82705-07 (219th Judicial District Court, Collin County, TX, 2009), which provided evidence that direction can be established from blinded case samples. The observed paraphyly from each case study led to the identification of an inferred source (i.e., index case), whose identity was revealed at trial to be that of the defendant.

Subject(s)

Criminal Law , DNA, Viral/analysis , Forensic Genetics/methods , HIV Infections/transmission , HIV-1/classification , HIV-1/genetics , Sequence Analysis, DNA , DNA, Viral/blood , Databases, Genetic , HIV Infections/genetics , HIV Infections/virology , Humans , Molecular Sequence Data , Phylogeny , Texas , Washington

14.

Molecular evolution of Na+ channels in teleost fishes.

Zakon, Harold H; Jost, Manda C; Zwickl, Derrick J; Lu, Ying; Hillis, David M.

Integr Zool ; 4(1): 64-74, 2009 Mar.

Article in English | MEDLINE | ID: mdl-21392277

ABSTRACT

Voltage-dependent sodium channels are critical for electrical excitability. Invertebrates possess a single sodium channel gene; two rounds of genome duplication early in vertebrates increased the number to four. Since the teleost-tetrapod split, independent gene duplications in each lineage have further increased the number of sodium channel genes to 10 in tetrapods and 8 in teleosts. Here we review how the occurrence of multiple sodium channel paralogs has influenced the evolutionary history of three groups of fishes: pufferfish, gymnotiform and mormyriform electric fish. Pufferfish (tetraodontidae) produce a neurotoxin, tetrodotoxin, that binds to and blocks the pore of sodium channels. Pufferfish evolved resistance to their own toxins by amino acid substitutions in the pore of their sodium channels. These substitutions had to occur in parallel across multiple paralogs for organismal resistance to evolve. Gymnotiform and mormyriform fishes independently evolved electric organs to generate electricity for communication and object localization. Two sodium channel genes are expressed in muscle in most fishes. In both groups of weakly electric fishes, one gene lost its expression in muscle and became compartmentalized in the evolutionary novel electric organ, which is a muscle derivative. This gene then evolved at elevated rates, whereas the gene that is still expressed in muscle does not show elevated rates of evolution. In the electric organ-expressing gene, amino acid substitutions occur in parts of the channel involved in determining how long the channel will be open or closed. The enhanced rate of sequence evolution of this gene likely underlies the species-level variations in the electric signal.

Subject(s)

Electric Fish/physiology , Evolution, Molecular , Sodium Channels/physiology , Tetraodontiformes/physiology , Amino Acid Sequence , Amino Acid Substitution , Animals , Drug Resistance/genetics , Electric Organ/physiology , Genes, Duplicate/genetics , Molecular Sequence Data , Muscle, Skeletal/metabolism , Phylogeny , Sequence Alignment , Sodium Channels/genetics , Tetrodotoxin/toxicity

15.

Evaluating the robustness of phylogenetic methods to among-site variability in substitution processes.

Holder, Mark T; Zwickl, Derrick J; Dessimoz, Christophe.

Philos Trans R Soc Lond B Biol Sci ; 363(1512): 4013-21, 2008 Dec 27.

Article in English | MEDLINE | ID: mdl-18852108

ABSTRACT

Computer simulations provide a flexible method for assessing the power and robustness of phylogenetic inference methods. Unfortunately, simulated data are often obviously atypical of data encountered in studies of molecular evolution. Unrealistic simulations can lead to conclusions that are irrelevant to real-data analyses or can provide a biased view of which methods perform well. Here, we present a software tool designed to generate data under a complex codon model that allows each residue in the protein sequence to have a different set of equilibrium amino acid frequencies. The software can obtain maximum-likelihood estimates of the parameters of the Halpern and Bruno model from empirical data and a fixed tree; given an arbitrary tree and a fixed set of parameters, the software can then simulate artificial datasets.We present the results of a simulation experiment using randomly generated tree shapes and substitution parameters estimated from 1610 mammalian cytochrome b sequences.We tested tree inference at the amino acid, nucleotide and codon levels and under parsimony, maximum-likelihood, Bayesian and distance criteria (for a total of more than 650 analyses on each dataset). Based on these simulations, nucleotide-level analyses seem to be more accurate than amino acid and codon analyses. The performance of distance-based phylogenetic methods appears to be quite sensitive to the choice of model and the form of rate heterogeneity used. Further studies are needed to assess the generality of these conclusions. For example, fitting parameters of the Halpern Bruno model to sequences from other genes will reveal the extent to which our conclusions were influenced by the choice of cytochrome b. Incorporating codon bias and more sources heterogeneity into the simulator will be crucial to determining whether the current results are caused by a bias in the current simulation study in favour of nucleotide analyses.

Subject(s)

Algorithms , Amino Acid Substitution/genetics , Classification/methods , Evolution, Molecular , Models, Genetic , Phylogeny , Bayes Theorem , Codon/genetics , Computer Simulation , Cytochromes b/genetics , Likelihood Functions

16.

Molecular evolution of communication signals in electric fish.

Zakon, Harold H; Zwickl, Derrick J; Lu, Ying; Hillis, David M.

J Exp Biol ; 211(Pt 11): 1814-8, 2008 Jun.

Article in English | MEDLINE | ID: mdl-18490397

ABSTRACT

Animal communication systems are subject to natural selection so the imprint of selection must reside in the genome of each species. Electric fish generate electric organ discharges (EODs) from a muscle-derived electric organ (EO) and use these fields for electrolocation and communication. Weakly electric teleosts have evolved at least twice (mormyriforms, gymnotiforms) allowing a comparison of the workings of evolution in two independently evolved sensory/motor systems. We focused on the genes for two Na(+) channels, Nav1.4a and Nav1.4b, which are orthologs of the mammalian muscle-expressed Na(+) channel gene Nav1.4. Both genes are expressed in muscle in non-electric fish. Nav1.4b is expressed in muscle in electric fish, but Nav1.4a expression has been lost from muscle and gained in the evolutionarily novel EO in both groups. We hypothesized that Nav1.4a might be evolving to optimize the EOD for different sensory environments and the generation of species-specific communication signals. We obtained the sequence for Nav1.4a from non-electric, mormyriform and gymnotiform species, estimated a phylogenetic tree, and determined rates of evolution. We observed elevated rates of evolution in this gene in both groups coincident with the loss of Nav1.4a from muscle and its compartmentalization in EO. We found amino acid substitutions at sites known to be critical for channel inactivation; analyses suggest that these changes are likely to be the result of positive selection. We suggest that the diversity of EOD waveforms in both groups of electric fish is correlated with accelerations in the rate of evolution of the Nav1.4a Na(+) channel gene due to changes in selection pressure on the gene once it was solely expressed in the EO.

Subject(s)

Animal Communication , Electric Fish/genetics , Evolution, Molecular , Amino Acid Sequence , Animals , Fish Proteins/chemistry , Fish Proteins/genetics , Fish Proteins/physiology , Molecular Sequence Data , Muscle Proteins/chemistry , Muscle Proteins/genetics , Muscle Proteins/physiology , Phylogeny , Selection, Genetic , Sequence Alignment , Sodium Channels/chemistry , Sodium Channels/genetics , Sodium Channels/physiology , Species Specificity

17.

Taxon sampling affects inferences of macroevolutionary processes from phylogenetic trees.

Heath, Tracy A; Zwickl, Derrick J; Kim, Junhyong; Hillis, David M.

Syst Biol ; 57(1): 160-6, 2008 Feb.

Article in English | MEDLINE | ID: mdl-18300029

Subject(s)

Phylogeny , Computer Simulation , Extinction, Biological , Genetic Speciation , Models, Biological , Sample Size

18.

Sodium channel genes and the evolution of diversity in communication signals of electric fishes: convergent molecular evolution.

Zakon, Harold H; Lu, Ying; Zwickl, Derrick J; Hillis, David M.

Proc Natl Acad Sci U S A ; 103(10): 3675-80, 2006 Mar 07.

Article in English | MEDLINE | ID: mdl-16505358

ABSTRACT

We investigated whether the evolution of electric organs and electric signal diversity in two independently evolved lineages of electric fishes was accompanied by convergent changes on the molecular level. We found that a sodium channel gene (Na(v)1.4a) that is expressed in muscle in nonelectric fishes has lost its expression in muscle and is expressed instead in the evolutionarily novel electric organ in both lineages of electric fishes. This gene appears to be evolving under positive selection in both lineages, facilitated by its restricted expression in the electric organ. This view is reinforced by the lack of evidence for selection on this gene in one electric species in which expression of this gene is retained in muscle. Amino acid replacements occur convergently in domains that influence channel inactivation, a key trait for shaping electric communication signals. Some amino acid replacements occur at or adjacent to sites at which disease-causing mutations have been mapped in human sodium channel genes, emphasizing that these replacements occur in functionally important domains. Selection appears to have acted on the final step in channel inactivation, but complementarily on the inactivation "ball" in one lineage, and its receptor site in the other lineage. Thus, changes in the expression and sequence of the same gene are associated with the independent evolution of signal complexity.

Subject(s)

Electric Fish/genetics , Evolution, Molecular , Sodium Channels/genetics , Amino Acid Sequence , Animals , Electric Fish/classification , Electric Organ/metabolism , Fishes/classification , Fishes/genetics , Gymnotiformes/classification , Gymnotiformes/genetics , Humans , Molecular Sequence Data , Phylogeny , Sequence Homology, Amino Acid , Signal Transduction/genetics , Sodium Channels/chemistry , Species Specificity

19.

Is sparse taxon sampling a problem for phylogenetic inference?

Hillis, David M; Pollock, David D; McGuire, Jimmy A; Zwickl, Derrick J.

Syst Biol ; 52(1): 124-6, 2003 Feb.

Article in English | MEDLINE | ID: mdl-12554446

Subject(s)

Data Interpretation, Statistical , Phylogeny , Sample Size

20.

Phylogenetic relationships of the dwarf boas and a comparison of Bayesian and bootstrap measures of phylogenetic support.

Wilcox, Thomas P; Zwickl, Derrick J; Heath, Tracy A; Hillis, David M.

Mol Phylogenet Evol ; 25(2): 361-71, 2002 Nov.

Article in English | MEDLINE | ID: mdl-12414316

ABSTRACT

Four New World genera of dwarf boas (Exiliboa, Trachyboa, Tropidophis, and Ungaliophis) have been placed by many systematists in a single group (traditionally called Tropidophiidae). However, the monophyly of this group has been questioned in several studies. Moreover, the overall relationships among basal snake lineages, including the placement of the dwarf boas, are poorly understood. We obtained mtDNA sequence data for 12S, 16S, and intervening tRNA-val genes from 23 species of snakes representing most major snake lineages, including all four genera of New World dwarf boas. We then examined the phylogenetic position of these species by estimating the phylogeny of the basal snakes. Our phylogenetic analysis suggests that New World dwarf boas are not monophyletic. Instead, we find Exiliboa and Ungaliophis to be most closely related to sand boas (Erycinae), boas (Boinae), and advanced snakes (Caenophidea), whereas Tropidophis and Trachyboa form an independent clade that separated relatively early in snake radiation. Our estimate of snake phylogeny differs significantly in other ways from some previous estimates of snake phylogeny. For instance, pythons do not cluster with boas and sand boas, but instead show a strong relationship with Loxocemus and Xenopeltis. Additionally, uropeltids cluster strongly with Cylindrophis, and together are embedded in what has previously been considered the macrostomatan radiation. These relationships are supported by both bootstrapping (parametric and nonparametric approaches) and Bayesian analysis, although Bayesian support values are consistently higher than those obtained from nonparametric bootstrapping. Simulations show that Bayesian support values represent much better estimates of phylogenetic accuracy than do nonparametric bootstrap support values, at least under the conditions of our study.

Subject(s)

Boidae/genetics , Phylogeny , Animals , Bayes Theorem , Data Interpretation, Statistical , Likelihood Functions , Mitochondria/genetics , RNA, Ribosomal/genetics , RNA, Ribosomal, 16S/genetics

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

Subject(s)

ABSTRACT

Subject(s)

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL