Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 23
Filter
Add more filters











Publication year range
1.
Article in English | MEDLINE | ID: mdl-39142817

ABSTRACT

Sheep were domesticated in the Fertile Crescent and then spread globally, where they have been encountering various environmental conditions. The Tibetan sheep has adapted to high altitudes on the Qinghai-Tibet Plateau over the past 3000 years. To explore genomic variants associated with high-altitude adaptation in Tibetan sheep, we analyzed Illumina short-reads of 994 whole genomes representing ∼ 60 sheep breeds/populations at varied altitudes, PacBio High fidelity (HiFi) reads of 13 breeds, and 96 transcriptomes from 12 sheep organs. Association testing between the inhabited altitudes and 34,298,967 variants was conducted to investigate the genetic architecture of altitude adaptation. Highly accurate HiFi reads were used to complement the current ovine reference assembly at the most significantly associated ß-globin locus and to validate the presence of two haplotypes A and B among 13 sheep breeds. The haplotype A carried two homologous gene clusters: (1) HBE1, HBE2, HBB-like, and HBBC, and (2) HBE1-like, HBE2-like, HBB-like, and HBB; while the haplotype B lacked the first cluster. The high-altitude sheep showed highly frequent or nearly fixed haplotype A, while the low-altitude sheep dominated by haplotype B. We further demonstrated that sheep with haplotype A had an increased hemoglobin-O2 affinity compared with those carrying haplotype B. Another highly associated genomic region contained the EGLN1 gene which showed varied expression between high-altitude and low-altitude sheep. Our results provide evidence that the rapid adaptive evolution of advantageous alleles play an important role in facilitating the environmental adaptation of Tibetan sheep.


Subject(s)
Altitude , Haplotypes , Animals , Sheep/genetics , Haplotypes/genetics , Adaptation, Physiological/genetics , Transcriptome/genetics , Polymorphism, Single Nucleotide/genetics , Proteomics/methods , beta-Globins/genetics , Acclimatization/genetics , Tibet , Multiomics
2.
BMC Genomics ; 25(1): 750, 2024 Aug 01.
Article in English | MEDLINE | ID: mdl-39090567

ABSTRACT

BACKGROUND: Association testing between molecular phenotypes and genomic variants can help to understand how genotype affects phenotype. RNA sequencing provides access to molecular phenotypes such as gene expression and alternative splicing while DNA sequencing or microarray genotyping are the prevailing options to obtain genomic variants. RESULTS: We genotype variants for 74 male Braunvieh cattle from both DNA (~ 13-fold coverage) and deep total RNA sequencing from testis, vas deferens, and epididymis tissue (~ 250 million reads per tissue). We show that RNA sequencing can be used to identify approximately 40% of variants (7-10 million) called from DNA sequencing, with over 80% precision. Within highly expressed coding regions, over 92% of expected variants were called with nearly 98% precision. Allele-specific expression and putative post-transcriptional modifications negatively impact variant genotyping accuracy from RNA sequencing and contribute to RNA-DNA differences. Variants called from RNA sequencing detect roughly 75% of eGenes identified using variants called from DNA sequencing, demonstrating a nearly 2-fold enrichment of eQTL variants. We observe a moderate-to-strong correlation in nominal association p-values (Spearman ρ2 ~ 0.6), although only 9% of eGenes have the same top associated variant. CONCLUSIONS: We find hundreds of thousands of RNA-DNA differences in variants called from RNA and DNA sequencing on the same individuals. We identify several highly significant eQTL when using RNA sequencing variant genotypes which are not found with DNA sequencing variant genotypes, suggesting that using RNA sequencing variant genotypes for association testing results in an increased number of false positives. Our findings demonstrate that caution must be exercised beyond filtering for variant quality or imputation accuracy when analysing or imputing variants called from RNA sequencing.


Subject(s)
Quantitative Trait Loci , Animals , Cattle/genetics , Male , DNA/genetics , Genotype , Sequence Analysis, RNA , Testis/metabolism , Genetic Variation , Polymorphism, Single Nucleotide , RNA/genetics , Sequence Analysis, DNA
3.
Nat Genet ; 56(8): 1566-1573, 2024 Aug.
Article in English | MEDLINE | ID: mdl-39103649

ABSTRACT

Telomere-to-telomere (T2T) assemblies reveal new insights into the structure and function of the previously 'invisible' parts of the genome and allow comparative analyses of complete genomes across entire clades. We present here an open collaborative effort, termed the 'Ruminant T2T Consortium' (RT2T), that aims to generate complete diploid assemblies for numerous species of the Artiodactyla suborder Ruminantia to examine chromosomal evolution in the context of natural selection and domestication of species used as livestock.


Subject(s)
Ruminants , Telomere , Telomere/genetics , Animals , Ruminants/genetics , Evolution, Molecular , Genome/genetics , Selection, Genetic , Phylogeny , Diploidy
4.
J Theor Biol ; 595: 111927, 2024 Aug 30.
Article in English | MEDLINE | ID: mdl-39216590

ABSTRACT

The advent of rapid and inexpensive sequencing technologies has necessitated the development of computationally efficient methods for analyzing sequence data for many genes simultaneously in a phylogenetic framework. The coalescent process is the most commonly used model for linking the underlying genealogies of individual genes with the global species-level phylogeny, but inference under the coalescent model is computationally daunting in the typical inference frameworks (e.g., the likelihood and Bayesian frameworks) due to the dimensionality of the space of both gene trees and species trees. Here we consider estimation of the branch lengths in fixed species trees with three or four taxa, and show that these branch lengths are identifiable. We also show that for three and four taxa simple estimators for the branch lengths can be derived based on observed site pattern frequencies. Properties of these estimators, such as their asymptotic variances and large-sample distributions, are examined, and performance of the estimators is assessed using simulation. Finally, we use these estimators to develop a hypothesis test that can be used to delimit species under the coalescent model for three or four putative taxa.

5.
Sci Bull (Beijing) ; 2024 May 25.
Article in English | MEDLINE | ID: mdl-38945748

ABSTRACT

During the past 3000 years, cattle on the Qinghai-Xizang Plateau have developed adaptive phenotypes under the selective pressure of hypoxia, ultraviolet (UV) radiation, and extreme cold. The genetic mechanism underlying this rapid adaptation is not yet well understood. Here, we present whole-genome resequencing data for 258 cattle from 32 cattle breeds/populations, including 89 Tibetan cattle representing eight populations distributed at altitudes ranging from 3400 m to 4300 m. Our genomic analysis revealed that Tibetan cattle exhibited a continuous phylogeographic cline from the East Asian taurine to the South Asian indicine ancestries. We found that recently selected genes in Tibetan cattle were related to body size (HMGA2 and NCAPG) and energy expenditure (DUOXA2). We identified signals of sympatric introgression from yak into Tibetan cattle at different altitudes, covering 0.64%-3.26% of their genomes, which included introgressed genes responsible for hypoxia response (EGLN1), cold adaptation (LRP11), DNA damage repair (LATS1), and UV radiation resistance (GNPAT). We observed that introgressed yak alleles were associated with noncoding variants, including those in present EGLN1. In Tibetan cattle, three yak introgressed SNPs in the EGLN1 promoter region reduced the expression of EGLN1, suggesting that these genomic variants enhance hypoxia tolerance. Taken together, our results indicated complex adaptation processes in Tibetan cattle, where recently selected genes and introgressed yak alleles jointly facilitated rapid adaptation to high-altitude environments.

7.
Nat Chem ; 16(5): 755-761, 2024 May.
Article in English | MEDLINE | ID: mdl-38332330

ABSTRACT

Indenofluorenes are non-benzenoid conjugated hydrocarbons that have received great interest owing to their unusual electronic structure and potential applications in nonlinear optics and photovoltaics. Here we report the generation of unsubstituted indeno[1,2-a]fluorene on various surfaces by the cleavage of two C-H bonds in 7,12-dihydroindeno[1,2-a]fluorene through voltage pulses applied by the tip of a combined scanning tunnelling microscope and atomic force microscope. On bilayer NaCl on Au(111), indeno[1,2-a]fluorene is in the neutral charge state, but it exhibits charge bistability between neutral and anionic states on the lower-workfunction surfaces of bilayer NaCl on Ag(111) and Cu(111). In the neutral state, indeno[1,2-a]fluorene exhibits one of two ground states: an open-shell π-diradical state, predicted to be a triplet by density functional and multireference many-body perturbation theory calculations, or a closed-shell state with a para-quinodimethane moiety in the as-indacene core. We observe switching between open- and closed-shell states of a single molecule by changing its adsorption site on NaCl.

8.
Genome Res ; 34(2): 300-309, 2024 03 20.
Article in English | MEDLINE | ID: mdl-38355307

ABSTRACT

Expression and splicing quantitative trait loci (e/sQTL) are large contributors to phenotypic variability. Achieving sufficient statistical power for e/sQTL mapping requires large cohorts with both genotypes and molecular phenotypes, and so, the genomic variation is often called from short-read alignments, which are unable to comprehensively resolve structural variation. Here we build a pangenome from 16 HiFi haplotype-resolved cattle assemblies to identify small and structural variation and genotype them with PanGenie in 307 short-read samples. We find high (>90%) concordance of PanGenie-genotyped and DeepVariant-called small variation and confidently genotype close to 21 million small and 43,000 structural variants in the larger population. We validate 85% of these structural variants (with MAF > 0.1) directly with a subset of 25 short-read samples that also have medium coverage HiFi reads. We then conduct e/sQTL mapping with this comprehensive variant set in a subset of 117 cattle that have testis transcriptome data, and find 92 structural variants as causal candidates for eQTL and 73 for sQTL. We find that roughly half of the top associated structural variants affecting expression or splicing are transposable elements, such as SV-eQTL for STN1 and MYH7 and SV-sQTL for CEP89 and ASAH2 Extensive linkage disequilibrium between small and structural variation results in only 28 additional eQTL and 17 sQTL discovered when including SVs, although many top associated SVs are compelling candidates.


Subject(s)
Quantitative Trait Loci , RNA Splicing , Male , Cattle/genetics , Animals , Genotype , Phenotype , Linkage Disequilibrium , Genomic Structural Variation
9.
Nat Commun ; 15(1): 674, 2024 Jan 22.
Article in English | MEDLINE | ID: mdl-38253538

ABSTRACT

Breeding bulls are well suited to investigate inherited variation in male fertility because they are genotyped and their reproductive success is monitored through semen analyses and thousands of artificial inseminations. However, functional data from relevant tissues are lacking in cattle, which prevents fine-mapping fertility-associated genomic regions. Here, we characterize gene expression and splicing variation in testis, epididymis, and vas deferens transcriptomes of 118 mature bulls and conduct association tests between 414,667 molecular phenotypes and 21,501,032 genome-wide variants to identify 41,156 regulatory loci. We show broad consensus in tissue-specific and tissue-enriched gene expression between the three bovine tissues and their human and murine counterparts. Expression- and splicing-mediating variants are more than three times as frequent in testis than epididymis and vas deferens, highlighting the transcriptional complexity of testis. Finally, we identify genes (WDR19, SPATA16, KCTD19, ZDHHC1) and molecular phenotypes that are associated with quantitative variation in male fertility through transcriptome-wide association and colocalization analyses.


Subject(s)
Epididymis , Quantitative Trait Loci , Humans , Cattle , Animals , Male , Mice , Quantitative Trait Loci/genetics , Testis , Consensus , Fertility/genetics
10.
Nucleic Acids Res ; 51(22): 12069-12075, 2023 Dec 11.
Article in English | MEDLINE | ID: mdl-37953306

ABSTRACT

The branch point sequence is a degenerate intronic heptamer required for the assembly of the spliceosome during pre-mRNA splicing. Disruption of this motif may promote alternative splicing and eventually cause phenotype variation. Despite its functional relevance, the branch point sequence is not included in most genome annotations. Here, we predict branch point sequences in 30 plant and animal species and attempt to quantify their evolutionary constraints using public variant databases. We find an implausible variant distribution in the databases from 16 of 30 examined species. Comparative analysis of variants from whole-genome sequencing shows that variants submitted from exome sequencing or false positive variants are widespread in public databases and cause these irregularities. We then investigate evolutionary constraint with largely unbiased public variant databases in 14 species and find that the fourth and sixth position of the branch point sequence are more constrained than coding nucleotides. Our findings show that public variant databases should be scrutinized for possible biases before they qualify to analyze evolutionary constraint.


Subject(s)
Biological Evolution , Plants , RNA Splicing , Animals , Genomics , Introns/genetics , Plants/genetics , Spliceosomes , Databases, Genetic
11.
Genome Biol ; 24(1): 211, 2023 09 18.
Article in English | MEDLINE | ID: mdl-37723525

ABSTRACT

BACKGROUND: Structural variations (SVs) in individual genomes are major determinants of complex traits, including adaptability to environmental variables. The Mongolian and Hainan cattle breeds in East Asia are of taurine and indicine origins that have evolved to adapt to cold and hot environments, respectively. However, few studies have investigated SVs in East Asian cattle genomes and their roles in environmental adaptation, and little is known about adaptively introgressed SVs in East Asian cattle. RESULTS: In this study, we examine the roles of SVs in the climate adaptation of these two cattle lineages by generating highly contiguous chromosome-scale genome assemblies. Comparison of the two assemblies along with 18 Mongolian and Hainan cattle genomes obtained by long-read sequencing data provides a catalog of 123,898 nonredundant SVs. Several SVs detected from long reads are in exons of genes associated with epidermal differentiation, skin barrier, and bovine tuberculosis resistance. Functional investigations show that a 108-bp exonic insertion in SPN may affect the uptake of Mycobacterium tuberculosis by macrophages, which might contribute to the low susceptibility of Hainan cattle to bovine tuberculosis. Genotyping of 373 whole genomes from 39 breeds identifies 2610 SVs that are differentiated along a "north-south" gradient in China and overlap with 862 related genes that are enriched in pathways related to environmental adaptation. We identify 1457 Chinese indicine-stratified SVs that possibly originate from banteng and are frequent in Chinese indicine cattle. CONCLUSIONS: Our findings highlight the unique contribution of SVs in East Asian cattle to environmental adaptation and disease resistance.


Subject(s)
Adaptation, Physiological , Disease Susceptibility , Animals , Cattle , Asia, Eastern , China , Tuberculosis, Bovine/genetics , Adaptation, Physiological/genetics
13.
Nat Commun ; 14(1): 4988, 2023 Aug 17.
Article in English | MEDLINE | ID: mdl-37591847

ABSTRACT

In molecular tunnel junctions, where the molecule is decoupled from the electrodes by few-monolayers-thin insulating layers, resonant charge transport takes place by sequential charge transfer to and from the molecule which implies transient charging of the molecule. The corresponding charge state transitions, which involve tunneling through the insulating decoupling layers, are crucial for understanding electrically driven processes such as electroluminescence or photocurrent generation in such a geometry. Here, we use scanning tunneling microscopy to investigate the decharging of single ZnPc and H2Pc molecules through NaCl films of 3 to 5 monolayers thickness on Cu(111) and Au(111). To this end, we approach the tip to the molecule at resonant tunnel conditions up to a regime where charge transport is limited by tunneling through the NaCl film. The resulting saturation of the tunnel current is a direct measure of the lifetimes of the anionic and cationic states, i.e., the molecule's charge-state lifetime, and thus provides a means to study charge dynamics and, thereby, exciton dynamics. Comparison of anion and cation lifetimes on different substrates reveals the critical role of the level alignment with the insulator's conduction and valence band, and the metal-insulator interface state.

14.
ACS Nano ; 17(14): 13563-13574, 2023 Jul 25.
Article in English | MEDLINE | ID: mdl-37436943

ABSTRACT

Incipient soot early in the flame was studied by high-resolution atomic force microscopy and scanning tunneling microscopy to resolve the atomic structure and orbital densities of single soot molecules prepared on bilayer NaCl on Cu(111). We resolved extended catacondensed and pentagonal-ring linked (pentalinked) species indicating how small aromatics cross-link and cyclodehydrogenate to form moderately sized aromatics. In addition, we resolved embedded pentagonal and heptagonal rings in flame aromatics. These nonhexagonal rings suggest simultaneous growth through aromatic cross-linking/cyclodehydrogenation and hydrogen abstraction acetylene addition. Moreover, we observed three classes of open-shell π-radical species. First, radicals with an unpaired π-electron delocalized along the molecule's perimeter. Second, molecules with partially localized π-electrons at zigzag edges of a π-radical. Third, molecules with strong localization of a π-electron at pentagonal- and methylene-type sites. The third class consists of π-radicals localized enough to enable thermally stable bonds, as well as multiradical species such as diradicals in the open-shell triplet state. These π-diradicals can rapidly cluster through barrierless chain reactions enhanced by van der Waals interactions. These results improve our understanding of soot formation and the products formed by combustion and could provide insights for cleaner combustion and the production of hydrogen without CO2 emissions.

15.
Genet Sel Evol ; 55(1): 33, 2023 May 11.
Article in English | MEDLINE | ID: mdl-37170101

ABSTRACT

BACKGROUND: Low-pass sequencing followed by sequence variant genotype imputation is an alternative to the routine microarray-based genotyping in cattle. However, the impact of haplotype reference panels and their interplay with the coverage of low-pass whole-genome sequencing data have not been sufficiently explored in typical livestock settings where only a small number of reference samples is available. METHODS: Sequence variant genotyping accuracy was compared between two variant callers, GATK and DeepVariant, in 50 Brown Swiss cattle with sequencing coverages ranging from 4- to 63-fold. Haplotype reference panels of varying sizes and composition were built with DeepVariant based on 501 individuals from nine breeds. High-coverage sequence data for 24 Brown Swiss cattle were downsampled to between 0.01- and 4-fold to mimic low-pass sequencing. GLIMPSE was used to infer sequence variant genotypes from the low-pass sequencing data using different haplotype reference panels. The accuracy of the sequence variant genotypes that were inferred from low-pass sequencing data was compared with sequence variant genotypes called from high-coverage data. RESULTS: DeepVariant was used to establish bovine haplotype reference panels because it outperformed GATK in all evaluations. Within-breed haplotype reference panels were more accurate and efficient to impute sequence variant genotypes from low-pass sequencing than equally-sized multibreed haplotype reference panels for all target sample coverages and allele frequencies. F1 scores greater than 0.9, which indicate high harmonic means of recall and precision of called genotypes, were achieved with 0.25-fold sequencing coverage when large breed-specific haplotype reference panels (n = 150) were used. In absence of such large within-breed haplotype panels, variant genotyping accuracy from low-pass sequencing could be increased either by adding non-related samples to the haplotype reference panel or by increasing the coverage of the low-pass sequencing data. Sequence variant genotyping from low-pass sequencing was substantially less accurate when the reference panel lacked individuals from the target breed. CONCLUSIONS: Variant genotyping is more accurate with DeepVariant than GATK. DeepVariant is therefore suitable to establish bovine haplotype reference panels. Medium-sized breed-specific haplotype reference panels and large multibreed haplotype reference panels enable accurate imputation of low-pass sequencing data in a typical cattle breed.


Subject(s)
Haplotypes , Animals , Cattle , Genotype , Genetic Variation
16.
Genome Biol ; 24(1): 124, 2023 05 22.
Article in English | MEDLINE | ID: mdl-37217946

ABSTRACT

BACKGROUND: Several models and algorithms have been proposed to build pangenomes from multiple input assemblies, but their impact on variant representation, and consequently downstream analyses, is largely unknown. RESULTS: We create multi-species super-pangenomes using pggb, cactus, and minigraph with the Bos taurus taurus reference sequence and eleven haplotype-resolved assemblies from taurine and indicine cattle, bison, yak, and gaur. We recover 221 k nonredundant structural variations (SVs) from the pangenomes, of which 135 k (61%) are common to all three. SVs derived from assembly-based calling show high agreement with the consensus calls from the pangenomes (96%), but validate only a small proportion of variations private to each graph. Pggb and cactus, which also incorporate base-level variation, have approximately 95% exact matches with assembly-derived small variant calls, which significantly improves the edit rate when realigning assemblies compared to minigraph. We use the three pangenomes to investigate 9566 variable number tandem repeats (VNTRs), finding 63% have identical predicted repeat counts in the three graphs, while minigraph can over or underestimate the count given its approximate coordinate system. We examine a highly variable VNTR locus and show that repeat unit copy number impacts the expression of proximal genes and non-coding RNA. CONCLUSIONS: Our findings indicate good consensus between the three pangenome methods but also show their individual strengths and weaknesses that need to be considered when analysing different types of variants from multiple input assemblies.


Subject(s)
Cattle , Genome , Sequence Analysis, DNA , Animals , Cattle/genetics , Minisatellite Repeats , Sequence Analysis, DNA/methods
17.
Mol Biol Evol ; 39(12)2022 12 05.
Article in English | MEDLINE | ID: mdl-36382357

ABSTRACT

Understanding the genetic mechanism of how animals adapt to extreme conditions is fundamental to determine the relationship between molecular evolution and changing environments. Goat is one of the first domesticated species and has evolved rapidly to adapt to diverse environments, including harsh high-altitude conditions with low temperature and poor oxygen supply but strong ultraviolet radiation. Here, we analyzed 331 genomes of domestic goats and wild caprid species living at varying altitudes (high > 3000 m above sea level and low < 1200 m), along with a reference-guided chromosome-scale assembly (contig-N50: 90.4 Mb) of a female Tibetan goat genome based on PacBio HiFi long reads, to dissect the genetic determinants underlying their adaptation to harsh conditions on the Qinghai-Tibetan Plateau (QTP). Population genomic analyses combined with genome-wide association studies (GWAS) revealed a genomic region harboring the 3'-phosphoadenosine 5'-phosphosulfate synthase 2 (PAPSS2) gene showing strong association with high-altitude adaptability (PGWAS = 3.62 × 10-25) in Tibetan goats. Transcriptomic data from 13 tissues revealed that PAPSS2 was implicated in hypoxia-related pathways in Tibetan goats. We further verified potential functional role of PAPSS2 in response to hypoxia in PAPSS2-deficient cells. Introgression analyses suggested that the PAPSS2 haplotype conferring the high-altitude adaptability in Tibetan goats originated from a recent hybridization between goats and a wild caprid species, the markhor (Capra falconeri). In conclusion, our results uncover a hitherto unknown contribution of PAPSS2 to high-altitude adaptability in Tibetan goats on QTP, following interspecific introgression and natural selection.


Subject(s)
Genome-Wide Association Study , Goats , Animals , Goats/genetics , Ultraviolet Rays , Genomics
18.
Nat Commun ; 13(1): 3012, 2022 05 31.
Article in English | MEDLINE | ID: mdl-35641504

ABSTRACT

Advantages of pangenomes over linear reference assemblies for genome research have recently been established. However, potential effects of sequence platform and assembly approach, or of combining assemblies created by different approaches, on pangenome construction have not been investigated. Here we generate haplotype-resolved assemblies from the offspring of three bovine trios representing increasing levels of heterozygosity that each demonstrate a substantial improvement in contiguity, completeness, and accuracy over the current Bos taurus reference genome. Diploid coverage as low as 20x for HiFi or 60x for ONT is sufficient to produce two haplotype-resolved assemblies meeting standards set by the Vertebrate Genomes Project. Structural variant-based pangenomes created from the haplotype-resolved assemblies demonstrate significant consensus regardless of sequence platform, assembler algorithm, or coverage. Inspecting pangenome topologies identifies 90 thousand structural variants including 931 overlapping with coding sequences; this approach reveals variants affecting QRICH2, PRDM9, HSPA1A, TAS2R46, and GC that have potential to affect phenotype.


Subject(s)
Genome , High-Throughput Nucleotide Sequencing , Animals , Cattle , Diploidy , Genome/genetics , Haplotypes , Sequence Analysis, DNA
19.
Proc Natl Acad Sci U S A ; 118(20)2021 05 18.
Article in English | MEDLINE | ID: mdl-33972446

ABSTRACT

Many genomic analyses start by aligning sequencing reads to a linear reference genome. However, linear reference genomes are imperfect, lacking millions of bases of unknown relevance and are unable to reflect the genetic diversity of populations. This makes reference-guided methods susceptible to reference-allele bias. To overcome such limitations, we build a pangenome from six reference-quality assemblies from taurine and indicine cattle as well as yak. The pangenome contains an additional 70,329,827 bases compared to the Bos taurus reference genome. Our multiassembly approach reveals 30 and 10.1 million bases private to yak and indicine cattle, respectively, and between 3.3 and 4.4 million bases unique to each taurine assembly. Utilizing transcriptomes from 56 cattle, we show that these nonreference sequences encode transcripts that hitherto remained undetected from the B. taurus reference genome. We uncover genes, primarily encoding proteins contributing to immune response and pathogen-mediated immunomodulation, differentially expressed between Mycobacterium bovis-infected and noninfected cattle that are also undetectable in the B. taurus reference genome. Using whole-genome sequencing data of cattle from five breeds, we show that reads which were previously misaligned against the Bos taurus reference genome now align accurately to the pangenome sequences. This enables us to discover 83,250 polymorphic sites that segregate within and between breeds of cattle and capture genetic differentiation across breeds. Our work makes a so-far unused source of variation amenable to genetic investigations and provides methods and a framework for establishing and exploiting a more diverse reference genome.


Subject(s)
Cattle/genetics , Animals , Female , Male , Whole Genome Sequencing
20.
PLoS Comput Biol ; 15(6): e1006886, 2019 06.
Article in English | MEDLINE | ID: mdl-31158218

ABSTRACT

The self-assembly of proteins into protein quaternary structures is of fundamental importance to many biological processes, and protein misassembly is responsible for a wide range of proteopathic diseases. In recent years, abstract lattice models of protein self-assembly have been used to simulate the evolution and assembly of protein quaternary structure, and to provide a tractable way to study the genotype-phenotype map of such systems. Here we generalize these models by representing the interfaces as mutable binary strings. This simple change enables us to model the evolution of interface strengths, interface symmetry, and deterministic assembly pathways. Using the generalized model we are able to reproduce two important results established for real protein complexes: The first is that protein assembly pathways are under evolutionary selection to minimize misassembly. The second is that the assembly pathway of a complex mirrors its evolutionary history, and that both can be derived from the relative strengths of interfaces. These results demonstrate that the generalized lattice model offers a powerful new idealized framework to facilitate the study of protein self-assembly processes and their evolution.


Subject(s)
Evolution, Molecular , Protein Structure, Quaternary , Proteins , Algorithms , Computational Biology , Protein Binding , Protein Structure, Quaternary/genetics , Protein Structure, Quaternary/physiology , Proteins/chemistry , Proteins/genetics
SELECTION OF CITATIONS
SEARCH DETAIL