Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 17 de 17
Filter
Add more filters










Publication year range
1.
Bioinformatics ; 39(9)2023 09 02.
Article in English | MEDLINE | ID: mdl-37651445

ABSTRACT

MOTIVATION: Neighbour-Joining is one of the most widely used distance-based phylogenetic inference methods. However, current implementations do not scale well for datasets with more than 10 000 sequences. Given the increasing pace of generating new sequence data, particularly in outbreaks of emerging diseases, and the already enormous existing databases of sequence data for which Neighbour-Joining is a useful approach, new implementations of existing methods are warranted. RESULTS: Here, we present DecentTree, which provides highly optimized and parallel implementations of Neighbour-Joining and several of its variants. DecentTree is designed as a stand-alone application and a header-only library easily integrated with other phylogenetic software (e.g. it is integral in the popular IQ-TREE software). We show that DecentTree shows similar or improved performance over existing software (BIONJ, Quicktree, FastME, and RapidNJ), especially for handling very large alignments. For example, DecentTree is up to 6-fold faster than the fastest existing Neighbour-Joining software (e.g. RapidNJ) when generating a tree of 64 000 SARS-CoV-2 genomes. AVAILABILITY AND IMPLEMENTATION: DecentTree is open source and freely available at https://github.com/iqtree/decenttree. All code and data used in this analysis are available on Github (https://github.com/asdcid/Comparison-of-neighbour-joining-software).


Subject(s)
COVID-19 , Humans , Phylogeny , SARS-CoV-2/genetics , Genomics , Gene Library
2.
Syst Biol ; 72(5): 1039-1051, 2023 11 01.
Article in English | MEDLINE | ID: mdl-37232476

ABSTRACT

Phylogenetics has been foundational to SARS-CoV-2 research and public health policy, assisting in genomic surveillance, contact tracing, and assessing emergence and spread of new variants. However, phylogenetic analyses of SARS-CoV-2 have often relied on tools designed for de novo phylogenetic inference, in which all data are collected before any analysis is performed and the phylogeny is inferred once from scratch. SARS-CoV-2 data sets do not fit this mold. There are currently over 14 million sequenced SARS-CoV-2 genomes in online databases, with tens of thousands of new genomes added every day. Continuous data collection, combined with the public health relevance of SARS-CoV-2, invites an "online" approach to phylogenetics, in which new samples are added to existing phylogenetic trees every day. The extremely dense sampling of SARS-CoV-2 genomes also invites a comparison between likelihood and parsimony approaches to phylogenetic inference. Maximum likelihood (ML) and pseudo-ML methods may be more accurate when there are multiple changes at a single site on a single branch, but this accuracy comes at a large computational cost, and the dense sampling of SARS-CoV-2 genomes means that these instances will be extremely rare because each internal branch is expected to be extremely short. Therefore, it may be that approaches based on maximum parsimony (MP) are sufficiently accurate for reconstructing phylogenies of SARS-CoV-2, and their simplicity means that they can be applied to much larger data sets. Here, we evaluate the performance of de novo and online phylogenetic approaches, as well as ML, pseudo-ML, and MP frameworks for inferring large and dense SARS-CoV-2 phylogenies. Overall, we find that online phylogenetics produces similar phylogenetic trees to de novo analyses for SARS-CoV-2, and that MP optimization with UShER and matOptimize produces equivalent SARS-CoV-2 phylogenies to some of the most popular ML and pseudo-ML inference tools. MP optimization with UShER and matOptimize is thousands of times faster than presently available implementations of ML and online phylogenetics is faster than de novo inference. Our results therefore suggest that parsimony-based methods like UShER and matOptimize represent an accurate and more practical alternative to established ML implementations for large SARS-CoV-2 phylogenies and could be successfully applied to other similar data sets with particularly dense sampling and short branch lengths.


Subject(s)
COVID-19 , SARS-CoV-2 , Humans , Phylogeny , Probability , Genomics
3.
Proc Natl Acad Sci U S A ; 119(48): e2209766119, 2022 11 29.
Article in English | MEDLINE | ID: mdl-36417430

ABSTRACT

There is massive variation in intron numbers across eukaryotic genomes, yet the major drivers of intron content during evolution remain elusive. Rapid intron loss and gain in some lineages contrast with long-term evolutionary stasis in others. Episodic intron gain could be explained by recently discovered specialized transposons called Introners, but so far Introners are only known from a handful of species. Here, we performed a systematic search across 3,325 eukaryotic genomes and identified 27,563 Introner-derived introns in 175 genomes (5.2%). Species with Introners span remarkable phylogenetic diversity, from animals to basal protists, representing lineages whose last common ancestor dates to over 1.7 billion years ago. Aquatic organisms were 6.5 times more likely to contain Introners than terrestrial organisms. Introners exhibit mechanistic diversity but most are consistent with DNA transposition, indicating that Introners have evolved convergently hundreds of times from nonautonomous transposable elements. Transposable elements and aquatic taxa are associated with high rates of horizontal gene transfer, suggesting that this combination of factors may explain the punctuated and biased diversity of species containing Introners. More generally, our data suggest that Introners may explain the episodic nature of intron gain across the eukaryotic tree of life. These results illuminate the major source of ongoing intron creation in eukaryotic genomes.


Subject(s)
DNA Transposable Elements , Eukaryota , Animals , Introns/genetics , Eukaryota/genetics , DNA Transposable Elements/genetics , Phylogeny , Eukaryotic Cells
4.
Nature ; 609(7929): 994-997, 2022 09.
Article in English | MEDLINE | ID: mdl-35952714

ABSTRACT

Accurate and timely detection of recombinant lineages is crucial for interpreting genetic variation, reconstructing epidemic spread, identifying selection and variants of interest, and accurately performing phylogenetic analyses1-4. During the SARS-CoV-2 pandemic, genomic data generation has exceeded the capacities of existing analysis platforms, thereby crippling real-time analysis of viral evolution5. Here, we use a new phylogenomic method to search a nearly comprehensive SARS-CoV-2 phylogeny for recombinant lineages. In a 1.6 million sample tree from May 2021, we identify 589 recombination events, which indicate that around 2.7% of sequenced SARS-CoV-2 genomes have detectable recombinant ancestry. Recombination breakpoints are inferred to occur disproportionately in the 3' portion of the genome that contains the spike protein. Our results highlight the need for timely analyses of recombination for pinpointing the emergence of recombinant lineages with the potential to increase transmissibility or virulence of the virus. We anticipate that this approach will empower comprehensive real-time tracking of viral recombination during the SARS-CoV-2 pandemic and beyond.


Subject(s)
COVID-19 , Genome, Viral , Pandemics , Phylogeny , Recombination, Genetic , SARS-CoV-2 , COVID-19/epidemiology , COVID-19/transmission , COVID-19/virology , Genome, Viral/genetics , Humans , Mutation , Recombination, Genetic/genetics , SARS-CoV-2/genetics , SARS-CoV-2/pathogenicity , Selection, Genetic/genetics , Spike Glycoprotein, Coronavirus/genetics , Virulence/genetics
5.
Bioinformatics ; 38(15): 3734-3740, 2022 08 02.
Article in English | MEDLINE | ID: mdl-35731204

ABSTRACT

MOTIVATION: Phylogenetic tree optimization is necessary for precise analysis of evolutionary and transmission dynamics, but existing tools are inadequate for handling the scale and pace of data produced during the coronavirus disease 2019 (COVID-19) pandemic. One transformative approach, online phylogenetics, aims to incrementally add samples to an ever-growing phylogeny, but there are no previously existing approaches that can efficiently optimize this vast phylogeny under the time constraints of the pandemic. RESULTS: Here, we present matOptimize, a fast and memory-efficient phylogenetic tree optimization tool based on parsimony that can be parallelized across multiple CPU threads and nodes, and provides orders of magnitude improvement in runtime and peak memory usage compared to existing state-of-the-art methods. We have developed this method particularly to address the pressing need during the COVID-19 pandemic for daily maintenance and optimization of a comprehensive SARS-CoV-2 phylogeny. matOptimize is currently helping refine on a daily basis possibly the largest-ever phylogenetic tree, containing millions of SARS-CoV-2 sequences. AVAILABILITY AND IMPLEMENTATION: The matOptimize code is freely available as part of the UShER package (https://github.com/yatisht/usher) and can also be installed via bioconda (https://bioconda.github.io/recipes/usher/README.html). All scripts we used to perform the experiments in this manuscript are available at https://github.com/yceh/matOptimize-experiments. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
COVID-19 , SARS-CoV-2 , Humans , Phylogeny , SARS-CoV-2/genetics , Pandemics , Software
6.
bioRxiv ; 2022 May 18.
Article in English | MEDLINE | ID: mdl-35611334

ABSTRACT

Phylogenetics has been foundational to SARS-CoV-2 research and public health policy, assisting in genomic surveillance, contact tracing, and assessing emergence and spread of new variants. However, phylogenetic analyses of SARS-CoV-2 have often relied on tools designed for de novo phylogenetic inference, in which all data are collected before any analysis is performed and the phylogeny is inferred once from scratch. SARS-CoV-2 datasets do not fit this mould. There are currently over 10 million sequenced SARS-CoV-2 genomes in online databases, with tens of thousands of new genomes added every day. Continuous data collection, combined with the public health relevance of SARS-CoV-2, invites an "online" approach to phylogenetics, in which new samples are added to existing phylogenetic trees every day. The extremely dense sampling of SARS-CoV-2 genomes also invites a comparison between likelihood and parsimony approaches to phylogenetic inference. Maximum likelihood (ML) methods are more accurate when there are multiple changes at a single site on a single branch, but this accuracy comes at a large computational cost, and the dense sampling of SARS-CoV-2 genomes means that these instances will be extremely rare because each internal branch is expected to be extremely short. Therefore, it may be that approaches based on maximum parsimony (MP) are sufficiently accurate for reconstructing phylogenies of SARS-CoV-2, and their simplicity means that they can be applied to much larger datasets. Here, we evaluate the performance of de novo and online phylogenetic approaches, and ML and MP frameworks, for inferring large and dense SARS-CoV-2 phylogenies. Overall, we find that online phylogenetics produces similar phylogenetic trees to de novo analyses for SARS-CoV-2, and that MP optimizations produce more accurate SARS-CoV-2 phylogenies than do ML optimizations. Since MP is thousands of times faster than presently available implementations of ML and online phylogenetics is faster than de novo , we therefore propose that, in the context of comprehensive genomic epidemiology of SARS-CoV-2, MP online phylogenetics approaches should be favored.

7.
Nucleic Acids Res ; 50(7): 4100-4112, 2022 04 22.
Article in English | MEDLINE | ID: mdl-35380696

ABSTRACT

Metazoan organisms have many tRNA genes responsible for decoding amino acids. The set of all tRNA genes can be grouped in sets of common amino acids and isoacceptor tRNAs that are aminoacylated by corresponding aminoacyl-tRNA synthetases. Analysis of tRNA alignments shows that, despite the high number of tRNA genes, specific tRNA sequence motifs are highly conserved across multicellular eukaryotes. The conservation often extends throughout the isoacceptors and isodecoders with, in some cases, two sets of conserved isodecoders. This study is focused on non-Watson-Crick base pairs in the helical stems, especially GoU pairs. Each of the four helical stems may contain one or more conserved GoU pairs. Some are amino acid specific and could represent identity elements for the cognate aminoacyl tRNA synthetases. Other GoU pairs are found in more than a single amino acid and could be critical for native folding of the tRNAs. Interestingly, some GoU pairs are anticodon-specific, and others are found in phylogenetically-specific clades. Although the distribution of conservation likely reflects a balance between accommodating isotype-specific functions as well as those shared by all tRNAs essential for ribosomal translation, such conservations may indicate the existence of specialized tRNAs for specific translation targets, cellular conditions, or alternative functions.


Subject(s)
Amino Acyl-tRNA Synthetases , Eukaryota/genetics , RNA, Transfer , Amino Acids/genetics , Amino Acyl-tRNA Synthetases/genetics , Amino Acyl-tRNA Synthetases/metabolism , Animals , Anticodon/genetics , Base Pairing , Eukaryota/chemistry , Humans , Nucleic Acid Conformation , RNA, Transfer/chemistry , RNA, Transfer/genetics
8.
bioRxiv ; 2021 Dec 06.
Article in English | MEDLINE | ID: mdl-34927180

ABSTRACT

Phylogenetics has been central to the genomic surveillance, epidemiology and contact tracing efforts during the COVD-19 pandemic. But the massive scale of genomic sequencing has rendered the pre-pandemic tools inadequate for comprehensive phylogenetic analyses. Here, we discuss the phylogenetic package that we developed to address the needs imposed by this pandemic. The package incorporates several pandemic-specific optimization and parallelization techniques and comprises four programs: UShER, matOptimize, RIPPLES and matUtils. Using high-performance computing, UShER and matOptimize maintain and refine daily a massive mutation-annotated phylogenetic tree consisting of all SARS-CoV-2 sequences available in online repositories. With UShER and RIPPLES, individual labs - even with modest compute resources - incorporate newly-sequenced SARS-CoV-2 genomes on this phylogeny and discover evidence for recombination in real-time. With matUtils, they rapidly query and visualize massive SARS-CoV-2 phylogenies. These tools have empowered scientists worldwide to study the SARS-CoV-2 evolution and transmission at an unprecedented scale, resolution and speed.

9.
Mol Biol Evol ; 38(12): 5819-5824, 2021 12 09.
Article in English | MEDLINE | ID: mdl-34469548

ABSTRACT

The vast scale of SARS-CoV-2 sequencing data has made it increasingly challenging to comprehensively analyze all available data using existing tools and file formats. To address this, we present a database of SARS-CoV-2 phylogenetic trees inferred with unrestricted public sequences, which we update daily to incorporate new sequences. Our database uses the recently proposed mutation-annotated tree (MAT) format to efficiently encode the tree with branches labeled with parsimony-inferred mutations, as well as Nextstrain clade and Pango lineage labels at clade roots. As of June 9, 2021, our SARS-CoV-2 MAT consists of 834,521 sequences and provides a comprehensive view of the virus' evolutionary history using public data. We also present matUtils-a command-line utility for rapidly querying, interpreting, and manipulating the MATs. Our daily-updated SARS-CoV-2 MAT database and matUtils software are available at http://hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/UShER_SARS-CoV-2/ and https://github.com/yatisht/usher, respectively.


Subject(s)
Evolution, Molecular , Phylogeny , SARS-CoV-2 , COVID-19/virology , Humans , Mutation , SARS-CoV-2/genetics , Software
10.
Nat Genet ; 53(6): 809-816, 2021 06.
Article in English | MEDLINE | ID: mdl-33972780

ABSTRACT

As the SARS-CoV-2 virus spreads through human populations, the unprecedented accumulation of viral genome sequences is ushering in a new era of 'genomic contact tracing'-that is, using viral genomes to trace local transmission dynamics. However, because the viral phylogeny is already so large-and will undoubtedly grow many fold-placing new sequences onto the tree has emerged as a barrier to real-time genomic contact tracing. Here, we resolve this challenge by building an efficient tree-based data structure encoding the inferred evolutionary history of the virus. We demonstrate that our approach greatly improves the speed of phylogenetic placement of new samples and data visualization, making it possible to complete the placements under the constraints of real-time contact tracing. Thus, our method addresses an important need for maintaining a fully updated reference phylogeny. We make these tools available to the research community through the University of California Santa Cruz SARS-CoV-2 Genome Browser to enable rapid cross-referencing of information in new virus sequences with an ever-expanding array of molecular and structural biology data. The methods described here will empower research and genomic contact tracing for SARS-CoV-2 specifically for laboratories worldwide.


Subject(s)
COVID-19/epidemiology , COVID-19/virology , Computational Biology/methods , Phylogeny , SARS-CoV-2/classification , SARS-CoV-2/genetics , Software , Algorithms , Computational Biology/standards , Databases, Genetic , Genome, Viral , Humans , Molecular Sequence Annotation , Mutation , Web Browser
11.
bioRxiv ; 2021 Apr 06.
Article in English | MEDLINE | ID: mdl-33851162

ABSTRACT

We report a SARS-CoV-2 lineage that shares N501Y, P681H, and other mutations with known variants of concern, such as B.1.1.7. This lineage, which we refer to as B.1.x (COG-UK sometimes references similar samples as B.1.324.1), is present in at least 20 states across the USA and in at least six countries. However, a large deletion causes the sequence to be automatically rejected from repositories, suggesting that the frequency of this new lineage is underestimated using public data. Recent dynamics based on 339 samples obtained in Santa Cruz County, CA, USA suggest that B.1.x may be increasing in frequency at a rate similar to that of B.1.1.7 in Southern California. At present the functional differences between this variant B.1.x and other circulating SARS-CoV-2 variants are unknown, and further studies on secondary attack rates, viral loads, immune evasion and/or disease severity are needed to determine if it poses a public health concern. Nonetheless, given what is known from well-studied circulating variants of concern, it seems unlikely that the lineage could pose larger concerns for human health than many already globally distributed lineages. Our work highlights a need for rapid turnaround time from sequence generation to submission and improved sequence quality control that removes submission bias. We identify promising paths toward this goal.

12.
bioRxiv ; 2021 Jul 13.
Article in English | MEDLINE | ID: mdl-33821270

ABSTRACT

The vast scale of SARS-CoV-2 sequencing data has made it increasingly challenging to comprehensively analyze all available data using existing tools and file formats. To address this, we present a database of SARS-CoV-2 phylogenetic trees inferred with unrestricted public sequences, which we update daily to incorporate new sequences. Our database uses the recently-proposed mutation-annotated tree (MAT) format to efficiently encode the tree with branches labeled with parsimony-inferred mutations as well as Nextstrain clade and Pango lineage labels at clade roots. As of June 9, 2021, our SARS-CoV-2 MAT consists of 834,521 sequences and provides a comprehensive view of the virus' evolutionary history using public data. We also present matUtils - a command-line utility for rapidly querying, interpreting and manipulating the MATs. Our daily-updated SARS-CoV-2 MAT database and matUtils software are available at http://hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/UShER_SARS-CoV-2/ and https://github.com/yatisht/usher, respectively.

13.
PLoS Genet ; 16(11): e1009175, 2020 11.
Article in English | MEDLINE | ID: mdl-33206635

ABSTRACT

The SARS-CoV-2 pandemic has led to unprecedented, nearly real-time genetic tracing due to the rapid community sequencing response. Researchers immediately leveraged these data to infer the evolutionary relationships among viral samples and to study key biological questions, including whether host viral genome editing and recombination are features of SARS-CoV-2 evolution. This global sequencing effort is inherently decentralized and must rely on data collected by many labs using a wide variety of molecular and bioinformatic techniques. There is thus a strong possibility that systematic errors associated with lab-or protocol-specific practices affect some sequences in the repositories. We find that some recurrent mutations in reported SARS-CoV-2 genome sequences have been observed predominantly or exclusively by single labs, co-localize with commonly used primer binding sites and are more likely to affect the protein-coding sequences than other similarly recurrent mutations. We show that their inclusion can affect phylogenetic inference on scales relevant to local lineage tracing, and make it appear as though there has been an excess of recurrent mutation or recombination among viral lineages. We suggest how samples can be screened and problematic variants removed, and we plan to regularly inform the scientific community with our updated results as more SARS-CoV-2 genome sequences are shared (https://virological.org/t/issues-with-sars-cov-2-sequencing-data/473 and https://virological.org/t/masking-strategies-for-sars-cov-2-alignments/480). We also develop tools for comparing and visualizing differences among very large phylogenies and we show that consistent clade- and tree-based comparisons can be made between phylogenies produced by different groups. These will facilitate evolutionary inferences and comparisons among phylogenies produced for a wide array of purposes. Building on the SARS-CoV-2 Genome Browser at UCSC, we present a toolkit to compare, analyze and combine SARS-CoV-2 phylogenies, find and remove potential sequencing errors and establish a widely shared, stable clade structure for a more accurate scientific inference and discourse.


Subject(s)
Genome, Viral/genetics , Phylogeny , SARS-CoV-2/genetics , Algorithms , COVID-19 , Computational Biology , Evolution, Molecular , Humans , RNA, Viral/genetics , Sequence Alignment , Whole Genome Sequencing
14.
bioRxiv ; 2020 Sep 28.
Article in English | MEDLINE | ID: mdl-33024970

ABSTRACT

As the SARS-CoV-2 virus spreads through human populations, the unprecedented accumulation of viral genome sequences is ushering a new era of "genomic contact tracing" - that is, using viral genome sequences to trace local transmission dynamics. However, because the viral phylogeny is already so large - and will undoubtedly grow many fold - placing new sequences onto the tree has emerged as a barrier to real-time genomic contact tracing. Here, we resolve this challenge by building an efficient, tree-based data structure encoding the inferred evolutionary history of the virus. We demonstrate that our approach improves the speed of phylogenetic placement of new samples and data visualization by orders of magnitude, making it possible to complete the placements under real-time constraints. Our method also provides the key ingredient for maintaining a fully-updated reference phylogeny. We make these tools available to the research community through the UCSC SARS-CoV-2 Genome Browser to enable rapid cross-referencing of information in new virus sequences with an ever-expanding array of molecular and structural biology data. The methods described here will empower research and genomic contact tracing for laboratories worldwide. SOFTWARE AVAILABILITY: USHER is available to users through the UCSC Genome Browser at https://genome.ucsc.edu/cgi-bin/hgPhyloPlace . The source code and detailed instructions on how to compile and run UShER are available from https://github.com/yatisht/usher .

15.
Genome Res ; 30(1): 85-94, 2020 01.
Article in English | MEDLINE | ID: mdl-31857444

ABSTRACT

Transfer RNA (tRNA) genes are among the most highly transcribed genes in the genome owing to their central role in protein synthesis. However, there is evidence for a broad range of gene expression across tRNA loci. This complexity, combined with difficulty in measuring transcript abundance and high sequence identity across transcripts, has severely limited our collective understanding of tRNA gene expression regulation and evolution. We establish sequence-based correlates to tRNA gene expression and develop a tRNA gene classification method that does not require, but benefits from, comparative genomic information and achieves accuracy comparable to molecular assays. We observe that guanine + cytosine (G + C) content and CpG density surrounding tRNA loci is exceptionally well correlated with tRNA gene activity, supporting a prominent regulatory role of the local genomic context in combination with internal sequence features. We use our tRNA gene activity predictions in conjunction with a comprehensive tRNA gene ortholog set spanning 29 placental mammals to estimate the evolutionary rate of functional changes among orthologs. Our method adds a new dimension to large-scale tRNA functional prediction and will help prioritize characterization of functional tRNA variants. Its simplicity and robustness should enable development of similar approaches for other clades, as well as exploration of functional diversification of members of large gene families.


Subject(s)
Genome , Genomics , RNA, Transfer , Animals , Computational Biology/methods , CpG Islands , DNA Methylation , Epigenesis, Genetic , Epigenomics/methods , Genomics/methods , Mammals , Mice , Phylogeny , RNA, Transfer/genetics
16.
Genetics ; 210(3): 1089-1107, 2018 11.
Article in English | MEDLINE | ID: mdl-30206187

ABSTRACT

Admixture, the mixing of genetically distinct populations, is increasingly recognized as a fundamental biological process. One major goal of admixture analyses is to estimate the timing of admixture events. Whereas most methods today can only detect the most recent admixture event, here, we present coalescent theory and associated software that can be used to estimate the timing of multiple admixture events in an admixed population. We extensively validate this approach and evaluate the conditions under which it can successfully distinguish one- from two-pulse admixture models. We apply our approach to real and simulated data of Drosophila melanogaster We find evidence of a single very recent pulse of cosmopolitan ancestry contributing to African populations, as well as evidence for more ancient admixture among genetically differentiated populations in sub-Saharan Africa. These results suggest our method can quantify complex admixture histories involving genetic material introduced by multiple discrete admixture pulses. The new method facilitates the exploration of admixture and its contribution to adaptation, ecological divergence, and speciation.


Subject(s)
Drosophila melanogaster/genetics , Animals , Gene Flow , Genetics, Population , Models, Genetic
17.
Proc Natl Acad Sci U S A ; 115(36): 8996-9001, 2018 09 04.
Article in English | MEDLINE | ID: mdl-30127029

ABSTRACT

Transfer RNAs (tRNAs) are a central component for the biological synthesis of proteins, and they are among the most highly conserved and frequently transcribed genes in all living things. Despite their clear significance for fundamental cellular processes, the forces governing tRNA evolution are poorly understood. We present evidence that transcription-associated mutagenesis and strong purifying selection are key determinants of patterns of sequence variation within and surrounding tRNA genes in humans and diverse model organisms. Remarkably, the mutation rate at broadly expressed cytosolic tRNA loci is likely between 7 and 10 times greater than the nuclear genome average. Furthermore, evolutionary analyses provide strong evidence that tRNA genes, but not their flanking sequences, experience strong purifying selection acting against this elevated mutation rate. We also find a strong correlation between tRNA expression levels and the mutation rates in their immediate flanking regions, suggesting a simple method for estimating individual tRNA gene activity. Collectively, this study illuminates the extreme competing forces in tRNA gene evolution and indicates that mutations at tRNA loci contribute disproportionately to mutational load and have unexplored fitness consequences in human populations.


Subject(s)
Arabidopsis/genetics , Genes, Helminth , Genes, Plant , Mutation , RNA, Helminth/genetics , RNA, Plant/genetics , RNA, Transfer/genetics , Animals , Drosophila melanogaster , Mice
SELECTION OF CITATIONS
SEARCH DETAIL
...