Search | VHL Regional Portal

1.

SQANTI3: curation of long-read transcriptomes for accurate identification of known and novel isoforms.

Pardo-Palacios, Francisco J; Arzalluz-Luque, Angeles; Kondratova, Liudmyla; Salguero, Pedro; Mestre-Tomás, Jorge; Amorín, Rocío; Estevan-Morió, Eva; Liu, Tianyuan; Nanni, Adalena; McIntyre, Lauren; Tseng, Elizabeth; Conesa, Ana.

Nat Methods ; 21(5): 793-797, 2024 May.

Article in English | MEDLINE | ID: mdl-38509328

ABSTRACT

SQANTI3 is a tool designed for the quality control, curation and annotation of long-read transcript models obtained with third-generation sequencing technologies. Leveraging its annotation framework, SQANTI3 calculates quality descriptors of transcript models, junctions and transcript ends. With this information, potential artifacts can be identified and replaced with reliable sequences. Furthermore, the integrated functional annotation feature enables subsequent functional iso-transcriptomics analyses.

Subject(s)

Molecular Sequence Annotation , Transcriptome , Humans , Molecular Sequence Annotation/methods , Software , Gene Expression Profiling/methods , Sequence Analysis, RNA/methods , Protein Isoforms/genetics , High-Throughput Nucleotide Sequencing/methods

2.

Monoallelically expressed noncoding RNAs form nucleolar territories on NOR-containing chromosomes and regulate rRNA expression.

Hao, Qinyu; Liu, Minxue; Daulatabad, Swapna Vidhur; Gaffari, Saba; Song, You Jin; Srivastava, Rajneesh; Bhaskar, Shivang; Moitra, Anurupa; Mangan, Hazel; Tseng, Elizabeth; Gilmore, Rachel B; Frier, Susan M; Chen, Xin; Wang, Chengliang; Huang, Sui; Chamberlain, Stormy; Jin, Hong; Korlach, Jonas; McStay, Brian; Sinha, Saurabh; Janga, Sarath Chandra; Prasanth, Supriya G; Prasanth, Kannanganattu V.

Elife ; 132024 Jan 19.

Article in English | MEDLINE | ID: mdl-38240312

ABSTRACT

Out of the several hundred copies of rRNA genes arranged in the nucleolar organizing regions (NOR) of the five human acrocentric chromosomes, ~50% remain transcriptionally inactive. NOR-associated sequences and epigenetic modifications contribute to the differential expression of rRNAs. However, the mechanism(s) controlling the dosage of active versus inactive rRNA genes within each NOR in mammals is yet to be determined. We have discovered a family of ncRNAs, SNULs (Single NUcleolus Localized RNA), which form constrained sub-nucleolar territories on individual NORs and influence rRNA expression. Individual members of the SNULs monoallelically associate with specific NOR-containing chromosomes. SNULs share sequence similarity to pre-rRNA and localize in the sub-nucleolar compartment with pre-rRNA. Finally, SNULs control rRNA expression by influencing pre-rRNA sorting to the DFC compartment and pre-rRNA processing. Our study discovered a novel class of ncRNAs influencing rRNA expression by forming constrained nucleolar territories on individual NORs.

Subject(s)

Nucleolus Organizer Region , RNA Precursors , Humans , Animals , Nucleolus Organizer Region/genetics , Nucleolus Organizer Region/metabolism , RNA Precursors/genetics , RNA Precursors/metabolism , Cell Nucleolus/genetics , Cell Nucleolus/metabolism , RNA, Ribosomal/genetics , RNA, Ribosomal/metabolism , Chromosomes, Human/metabolism , RNA, Untranslated/genetics , RNA, Untranslated/metabolism , Mammals/genetics

3.

SQANTI3: curation of long-read transcriptomes for accurate identification of known and novel isoforms.

Pardo-Palacios, Francisco J; Arzalluz-Luque, Angeles; Kondratova, Liudmyla; Salguero, Pedro; Mestre-Tomás, Jorge; Amorín, Rocío; Estevan-Morió, Eva; Liu, Tianyuan; Nanni, Adalena; McIntyre, Lauren; Tseng, Elizabeth; Conesa, Ana.

bioRxiv ; 2023 Jun 03.

Article in English | MEDLINE | ID: mdl-37398077

ABSTRACT

The emergence of long-read RNA sequencing (lrRNA-seq) has provided an unprecedented opportunity to analyze transcriptomes at isoform resolution. However, the technology is not free from biases, and transcript models inferred from these data require quality control and curation. In this study, we introduce SQANTI3, a tool specifically designed to perform quality analysis on transcriptomes constructed using lrRNA-seq data. SQANTI3 provides an extensive naming framework to describe transcript model diversity in comparison to the reference transcriptome. Additionally, the tool incorporates a wide range of metrics to characterize various structural properties of transcript models, such as transcription start and end sites, splice junctions, and other structural features. These metrics can be utilized to filter out potential artifacts. Moreover, SQANTI3 includes a Rescue module that prevents the loss of known genes and transcripts exhibiting evidence of expression but displaying low-quality features. Lastly, SQANTI3 incorporates IsoAnnotLite, which enables functional annotation at the isoform level and facilitates functional iso-transcriptomics analyses. We demonstrate the versatility of SQANTI3 in analyzing different data types, isoform reconstruction pipelines, and sequencing platforms, and how it provides novel biological insights into isoform biology. The SQANTI3 software is available at https://github.com/ConesaLab/SQANTI3 .

4.

Transformation of alignment files improves performance of variant callers for long-read RNA sequencing data.

de Souza, Vladimir B C; Jordan, Ben T; Tseng, Elizabeth; Nelson, Elizabeth A; Hirschi, Karen K; Sheynkman, Gloria; Robinson, Mark D.

Genome Biol ; 24(1): 91, 2023 04 24.

Article in English | MEDLINE | ID: mdl-37095564

ABSTRACT

Long-read RNA sequencing (lrRNA-seq) produces detailed information about full-length transcripts, including novel and sample-specific isoforms. Furthermore, there is an opportunity to call variants directly from lrRNA-seq data. However, most state-of-the-art variant callers have been developed for genomic DNA. Here, there are two objectives: first, we perform a mini-benchmark on GATK, DeepVariant, Clair3, and NanoCaller primarily on PacBio Iso-Seq, data, but also on Nanopore and Illumina RNA-seq data; second, we propose a pipeline to process spliced-alignment files, making them suitable for variant calling with DNA-based callers. With such manipulations, high calling performance can be achieved using DeepVariant on Iso-seq data.

Subject(s)

High-Throughput Nucleotide Sequencing , RNA , Sequence Analysis, RNA , RNA-Seq , Exome Sequencing

5.

RNA-seq data science: From raw data to effective interpretation.

Deshpande, Dhrithi; Chhugani, Karishma; Chang, Yutong; Karlsberg, Aaron; Loeffler, Caitlin; Zhang, Jinyang; Muszynska, Agata; Munteanu, Viorel; Yang, Harry; Rotman, Jeremy; Tao, Laura; Balliu, Brunilda; Tseng, Elizabeth; Eskin, Eleazar; Zhao, Fangqing; Mohammadi, Pejman; P Labaj, Pawel; Mangul, Serghei.

Front Genet ; 14: 997383, 2023.

Article in English | MEDLINE | ID: mdl-36999049

ABSTRACT

RNA sequencing (RNA-seq) has become an exemplary technology in modern biology and clinical science. Its immense popularity is due in large part to the continuous efforts of the bioinformatics community to develop accurate and scalable computational tools to analyze the enormous amounts of transcriptomic data that it produces. RNA-seq analysis enables genes and their corresponding transcripts to be probed for a variety of purposes, such as detecting novel exons or whole transcripts, assessing expression of genes and alternative transcripts, and studying alternative splicing structure. It can be a challenge, however, to obtain meaningful biological signals from raw RNA-seq data because of the enormous scale of the data as well as the inherent limitations of different sequencing technologies, such as amplification bias or biases of library preparation. The need to overcome these technical challenges has pushed the rapid development of novel computational tools, which have evolved and diversified in accordance with technological advancements, leading to the current myriad of RNA-seq tools. These tools, combined with the diverse computational skill sets of biomedical researchers, help to unlock the full potential of RNA-seq. The purpose of this review is to explain basic concepts in the computational analysis of RNA-seq data and define discipline-specific jargon.

6.

Long read isoform sequencing reveals hidden transcriptional complexity between cattle subspecies.

Ren, Yan; Tseng, Elizabeth; Smith, Timothy P L; Hiendleder, Stefan; Williams, John L; Low, Wai Yee.

BMC Genomics ; 24(1): 108, 2023 Mar 13.

Article in English | MEDLINE | ID: mdl-36915055

ABSTRACT

The Iso-Seq method of full-length cDNA sequencing is suitable to quantify differentially expressed genes (DEGs), transcripts (DETs) and transcript usage (DTU). However, the higher cost of Iso-Seq relative to RNA-seq has limited the comparison of both methods. Transcript abundance estimated by RNA-seq and deep Iso-Seq data for fetal liver from two cattle subspecies were compared to evaluate concordance. Inter-sample correlation of gene- and transcript-level abundance was higher within technology than between technologies. Identification of DEGs between the cattle subspecies depended on sequencing method with only 44 genes identified by both that included 6 novel genes annotated by Iso-Seq. There was a pronounced difference between Iso-Seq and RNA-seq results at transcript-level wherein Iso-Seq revealed several magnitudes more transcript abundance and usage differences between subspecies. Factors influencing DEG identification included size selection during Iso-Seq library preparation, average transcript abundance, multi-mapping of RNA-seq reads to the reference genome, and overlapping coordinates of genes. Some DEGs called by RNA-seq alone appear to be sequence duplication artifacts. Among the 44 DEGs identified by both technologies some play a role in immune system, thyroid function and cell growth. Iso-Seq revealed hidden transcriptional complexity in DEGs, DETs and DTU genes between cattle subspecies previously missed by RNA-seq.

Subject(s)

Genome , Transcriptome , Cattle/genetics , Animals , RNA-Seq , Protein Isoforms/genetics , Gene Library , Alternative Splicing , Sequence Analysis, RNA , Gene Expression Profiling , High-Throughput Nucleotide Sequencing/methods

7.

Structure and Alternative Splicing of the Antisense FMR1 (ASFMR1) Gene.

Zafarullah, Marwa; Li, Jie; Tseng, Elizabeth; Tassone, Flora.

Mol Neurobiol ; 60(4): 2051-2061, 2023 Apr.

Article in English | MEDLINE | ID: mdl-36598648

ABSTRACT

Fragile X-associated tremor/ataxia syndrome (FXTAS) is a neurodegenerative disorder caused by an expansion of 55-200 CGG repeats (premutation) in the 5'-UTR of the FMR1 gene. Bidirectional transcription at FMR1 locus has been demonstrated and specific alternative splicing of the Antisense FMR1 (ASFMR1) gene has been proposed to have a contributing role in the pathogenesis of FXTAS. The structure of ASFMR1 gene is still uncharacterized and it is currently unknown how many isoforms of the gene are expressed and at what level in premutation carriers (PM) and if they may contribute to the premutation pathology. In this study, we characterized the ASFMR1 gene structure and the transcriptional landscape by using PacBio SMRT sequencing with target enrichment (IDT customized probe panel). We identified 45 ASFMR1 isoforms ranging in sizes from 523 bp to 6 Kb, spanning approximately 59 kb of genomic DNA. Multiplexing and sequencing of six human brain samples from PM samples and normal control (HC) were carried out on the PacBio Sequel platform. We validated the presence of these isoforms by qRT-PCR and Sanger sequencing and characterized the acceptor and donor splicing site consensus sequences. Consistent with previous studies conducted in other tissue types, we found a high expression of ASFMR1 isoform Iso131bp in brain samples of PM as compared to HC, while no differences in expression levels were observed for the newly identified isoforms IsoAS1 and IsoAS2. We investigated the role of the splicing regulatory protein Sam68 which we did not observe in the alternative splicing of the ASFMR1 gene. Our study provides a useful insight into the structure of ASFMR1 gene and transcriptional landscape along with the expression pattern of various newly identified novel isoforms and on their potential role in premutation pathology.

Subject(s)

Fragile X Syndrome , Trinucleotide Repeat Expansion , Humans , Alternative Splicing , Fragile X Syndrome/pathology , Protein Isoforms/metabolism , Fragile X Mental Retardation Protein/metabolism

8.

Alternative splicing and genetic variation of mhc-e: implications for rhesus cytomegalovirus-based vaccines.

Brochu, Hayden; Wang, Ruihan; Tollison, Tammy; Pyo, Chul-Woo; Thomas, Alexander; Tseng, Elizabeth; Law, Lynn; Picker, Louis J; Gale, Michael; Geraghty, Daniel E; Peng, Xinxia.

Commun Biol ; 5(1): 1387, 2022 12 19.

Article in English | MEDLINE | ID: mdl-36536032

ABSTRACT

Rhesus cytomegalovirus (RhCMV)-based vaccination against Simian Immunodeficiency virus (SIV) elicits MHC-E-restricted CD8+ T cells that stringently control SIV infection in ~55% of vaccinated rhesus macaques (RM). However, it is unclear how accurately the RM model reflects HLA-E immunobiology in humans. Using long-read sequencing, we identified 16 Mamu-E isoforms and all Mamu-E splicing junctions were detected among HLA-E isoforms in humans. We also obtained the complete Mamu-E genomic sequences covering the full coding regions of 59 RM from a RhCMV/SIV vaccine study. The Mamu-E gene was duplicated in 32 (54%) of 59 RM. Among four groups of Mamu-E alleles: three ~5% divergent full-length allele groups (G1, G2, G2_LTR) and a fourth monomorphic group (G3) with a deletion encompassing the canonical Mamu-E exon 6, the presence of G2_LTR alleles was significantly (p = 0.02) associated with the lack of RhCMV/SIV vaccine protection. These genomic resources will facilitate additional MHC-E targeted translational research.

Subject(s)

Alternative Splicing , Cytomegalovirus Vaccines , Histocompatibility Antigens Class I , Animals , Humans , Cytomegalovirus , Genetic Variation , Macaca mulatta , Simian Immunodeficiency Virus , Histocompatibility Antigens Class I/genetics , HLA-E Antigens

9.

Long-read isoform sequencing reveals tissue-specific isoform expression between active and hibernating brown bears (Ursus arctos).

Tseng, Elizabeth; Underwood, Jason G; Evans Hutzenbiler, Brandon D; Trojahn, Shawn; Kingham, Brewster; Shevchenko, Olga; Bernberg, Erin; Vierra, Michelle; Robbins, Charles T; Jansen, Heiko T; Kelley, Joanna L.

G3 (Bethesda) ; 12(3)2022 03 04.

Article in English | MEDLINE | ID: mdl-35100340

ABSTRACT

Understanding hibernation in brown bears (Ursus arctos) can provide insight into some human diseases. During hibernation, brown bears experience periods of insulin resistance, physical inactivity, extreme bradycardia, obesity, and the absence of urine production. These states closely mimic aspects of human diseases such as type 2 diabetes, muscle atrophy, as well as renal and heart failure. The reversibility of these states from hibernation to active season enables the identification of mediators with possible therapeutic value for humans. Recent studies have identified genes and pathways that are differentially expressed between active and hibernation seasons in bears. However, little is known about the role of differential expression of gene isoforms on hibernation physiology. To identify both distinct and novel mRNA isoforms, full-length RNA-sequencing (Iso-Seq) was performed on adipose, skeletal muscle, and liver from three individual bears sampled during both active and hibernation seasons. The existing reference genome annotation was improved by combining it with the Iso-Seq data. Short-read RNA-sequencing data from six individuals were mapped to the new reference annotation to quantify differential isoform usage (DIU) between tissues and seasons. We identified differentially expressed isoforms in all three tissues, to varying degrees. Adipose had a high level of DIU with isoform switching, regardless of whether the genes were differentially expressed. Our analyses revealed that DIU, even in the absence of differential gene expression, is an important mechanism for modulating genes during hibernation. These findings demonstrate the value of isoform expression studies and will serve as the basis for deeper exploration into hibernation biology.

Subject(s)

Diabetes Mellitus, Type 2 , Gene Expression Regulation , Hibernation , Ursidae , Adipose Tissue/metabolism , Animals , Diabetes Mellitus, Type 2/genetics , Diabetes Mellitus, Type 2/metabolism , Hibernation/genetics , Humans , Protein Isoforms/genetics , Protein Isoforms/metabolism , Ursidae/genetics , Ursidae/metabolism

10.

Generating lineage-resolved, complete metagenome-assembled genomes from complex microbial communities.

Bickhart, Derek M; Kolmogorov, Mikhail; Tseng, Elizabeth; Portik, Daniel M; Korobeynikov, Anton; Tolstoganov, Ivan; Uritskiy, Gherman; Liachko, Ivan; Sullivan, Shawn T; Shin, Sung Bong; Zorea, Alvah; Andreu, Victòria Pascal; Panke-Buisse, Kevin; Medema, Marnix H; Mizrahi, Itzhak; Pevzner, Pavel A; Smith, Timothy P L.

Nat Biotechnol ; 40(5): 711-719, 2022 05.

Article in English | MEDLINE | ID: mdl-34980911

ABSTRACT

Microbial communities might include distinct lineages of closely related organisms that complicate metagenomic assembly and prevent the generation of complete metagenome-assembled genomes (MAGs). Here we show that deep sequencing using long (HiFi) reads combined with Hi-C binning can address this challenge even for complex microbial communities. Using existing methods, we sequenced the sheep fecal metagenome and identified 428 MAGs with more than 90% completeness, including 44 MAGs in single circular contigs. To resolve closely related strains (lineages), we developed MAGPhase, which separates lineages of related organisms by discriminating variant haplotypes across hundreds of kilobases of genomic sequence. MAGPhase identified 220 lineage-resolved MAGs in our dataset. The ability to resolve closely related microbes in complex microbial communities improves the identification of biosynthetic gene clusters and the precision of assigning mobile genetic elements to host genomes. We identified 1,400 complete and 350 partial biosynthetic gene clusters, most of which are novel, as well as 424 (298) potential host-viral (host-plasmid) associations using Hi-C data.

Subject(s)

Metagenome , Microbiota , Animals , Feces , Metagenome/genetics , Metagenomics , Microbiota/genetics , Sequence Analysis, DNA , Sheep

11.

Full-length transcript sequencing of human and mouse cerebral cortex identifies widespread isoform diversity and alternative splicing.

Leung, Szi Kay; Jeffries, Aaron R; Castanho, Isabel; Jordan, Ben T; Moore, Karen; Davies, Jonathan P; Dempster, Emma L; Bray, Nicholas J; O'Neill, Paul; Tseng, Elizabeth; Ahmed, Zeshan; Collier, David A; Jeffery, Erin D; Prabhakar, Shyam; Schalkwyk, Leonard; Jops, Connor; Gandal, Michael J; Sheynkman, Gloria M; Hannon, Eilis; Mill, Jonathan.

Cell Rep ; 37(7): 110022, 2021 11 16.

Article in English | MEDLINE | ID: mdl-34788620

ABSTRACT

Alternative splicing is a post-transcriptional regulatory mechanism producing distinct mRNA molecules from a single pre-mRNA with a prominent role in the development and function of the central nervous system. We used long-read isoform sequencing to generate full-length transcript sequences in the human and mouse cortex. We identify novel transcripts not present in existing genome annotations, including transcripts mapping to putative novel (unannotated) genes and fusion transcripts incorporating exons from multiple genes. Global patterns of transcript diversity are similar between human and mouse cortex, although certain genes are characterized by striking differences between species. We also identify developmental changes in alternative splicing, with differential transcript usage between human fetal and adult cortex. Our data confirm the importance of alternative splicing in the cortex, dramatically increasing transcriptional diversity and representing an important mechanism underpinning gene regulation in the brain. We provide transcript-level data for human and mouse cortex as a resource to the scientific community.

Subject(s)

Cerebral Cortex/metabolism , Protein Isoforms/genetics , Transcriptome/genetics , Alternative Splicing/genetics , Animals , Brain/metabolism , Cerebral Cortex/physiology , Exons/genetics , Gene Expression/genetics , Gene Expression Profiling/methods , Genome , High-Throughput Nucleotide Sequencing/methods , Humans , Mice , Protein Isoforms/metabolism , RNA Precursors/genetics , RNA Splice Sites/genetics , RNA, Messenger/genetics , Sequence Analysis, RNA/methods

12.

An anchored chromosome-scale genome assembly of spinach improves annotation and reveals extensive gene rearrangements in euasterids.

Hulse-Kemp, Amanda M; Bostan, Hamed; Chen, Shiyu; Ashrafi, Hamid; Stoffel, Kevin; Sanseverino, Walter; Li, Linzhou; Cheng, Shifeng; Schatz, Michael C; Garvin, Tyler; du Toit, Lindsey J; Tseng, Elizabeth; Chin, Jason; Iorizzo, Massimo; Van Deynze, Allen.

Plant Genome ; 14(2): e20101, 2021 07.

Article in English | MEDLINE | ID: mdl-34109759

ABSTRACT

Spinach (Spinacia oleracea L.) is a member of the Caryophyllales family, a basal eudicot asterid that consists of sugar beet (Beta vulgaris L. subsp. vulgaris), quinoa (Chenopodium quinoa Willd.), and amaranth (Amaranthus hypochondriacus L.). With the introduction of baby leaf types, spinach has become a staple food in many homes. Production issues focus on yield, nitrogen-use efficiency and resistance to downy mildew (Peronospora effusa). Although genomes are available for the above species, a chromosome-level assembly exists only for quinoa, allowing for proper annotation and structural analyses to enhance crop improvement. We independently assembled and annotated genomes of the cultivar Viroflay using short-read strategy (Illumina) and long-read strategies (Pacific Biosciences) to develop a chromosome-level, genetically anchored assembly for spinach. Scaffold N50 for the Illumina assembly was 389 kb, whereas that for Pacific BioSciences was 4.43 Mb, representing 911 Mb (93% of the genome) in 221 scaffolds, 80% of which are anchored and oriented on a sequence-based genetic map, also described within this work. The two assemblies were 99.5% collinear. Independent annotation of the two assemblies with the same comprehensive transcriptome dataset show that the quality of the assembly directly affects the annotation with significantly more genes predicted (26,862 vs. 34,877) in the long-read assembly. Analysis of resistance genes confirms a bias in resistant gene motifs more typical of monocots. Evolutionary analysis indicates that Spinacia is a paleohexaploid with a whole-genome triplication followed by extensive gene rearrangements identified in this work. Diversity analysis of 75 lines indicate that variation in genes is ample for hypothesis-driven, genomic-assisted breeding enabled by this work.

Subject(s)

Peronospora , Spinacia oleracea , Chromosomes , Gene Rearrangement , Plant Breeding , Spinacia oleracea/genetics

13.

Cultivar-specific transcriptome and pan-transcriptome reconstruction of tetraploid potato.

Petek, Marko; Zagorscak, Maja; Ramsak, Ziva; Sanders, Sheri; Tomaz, Spela; Tseng, Elizabeth; Zouine, Mohamed; Coll, Anna; Gruden, Kristina.

Sci Data ; 7(1): 249, 2020 07 24.

Article in English | MEDLINE | ID: mdl-32709858

ABSTRACT

Although the reference genome of Solanum tuberosum Group Phureja double-monoploid (DM) clone is available, knowledge on the genetic diversity of the highly heterozygous tetraploid Group Tuberosum, representing most cultivated varieties, remains largely unexplored. This lack of knowledge hinders further progress in potato research. In conducted investigation, we first merged and manually curated the two existing partially-overlapping DM genome-based gene models, creating a union of genes in Phureja scaffold. Next, we compiled available and newly generated RNA-Seq datasets (cca. 1.5 billion reads) for three tetraploid potato genotypes (cultivar Désirée, cultivar Rywal, and breeding clone PW363) with diverse breeding pedigrees. Short-read transcriptomes were assembled using several de novo assemblers under different settings to test for optimal outcome. For cultivar Rywal, PacBio Iso-Seq full-length transcriptome sequencing was also performed. EvidentialGene redundancy-reducing pipeline complemented with in-house developed scripts was employed to produce accurate and complete cultivar-specific transcriptomes, as well as to attain the pan-transcriptome. The generated transcriptomes and pan-transcriptome represent a valuable resource for potato gene variability exploration, high-throughput omics analyses, and breeding programmes.

Subject(s)

Solanum tuberosum/genetics , Tetraploidy , Transcriptome , Genome, Plant , Plant Breeding , RNA-Seq

14.

An improved pig reference genome sequence to enable pig genetics and genomics research.

Warr, Amanda; Affara, Nabeel; Aken, Bronwen; Beiki, Hamid; Bickhart, Derek M; Billis, Konstantinos; Chow, William; Eory, Lel; Finlayson, Heather A; Flicek, Paul; Girón, Carlos G; Griffin, Darren K; Hall, Richard; Hannum, Greg; Hourlier, Thibaut; Howe, Kerstin; Hume, David A; Izuogu, Osagie; Kim, Kristi; Koren, Sergey; Liu, Haibou; Manchanda, Nancy; Martin, Fergal J; Nonneman, Dan J; O'Connor, Rebecca E; Phillippy, Adam M; Rohrer, Gary A; Rosen, Benjamin D; Rund, Laurie A; Sargent, Carole A; Schook, Lawrence B; Schroeder, Steven G; Schwartz, Ariel S; Skinner, Ben M; Talbot, Richard; Tseng, Elizabeth; Tuggle, Christopher K; Watson, Mick; Smith, Timothy P L; Archibald, Alan L.

Gigascience ; 9(6)2020 06 01.

Article in English | MEDLINE | ID: mdl-32543654

ABSTRACT

BACKGROUND: The domestic pig (Sus scrofa) is important both as a food source and as a biomedical model given its similarity in size, anatomy, physiology, metabolism, pathology, and pharmacology to humans. The draft reference genome (Sscrofa10.2) of a purebred Duroc female pig established using older clone-based sequencing methods was incomplete, and unresolved redundancies, short-range order and orientation errors, and associated misassembled genes limited its utility. RESULTS: We present 2 annotated highly contiguous chromosome-level genome assemblies created with more recent long-read technologies and a whole-genome shotgun strategy, 1 for the same Duroc female (Sscrofa11.1) and 1 for an outbred, composite-breed male (USMARCv1.0). Both assemblies are of substantially higher (>90-fold) continuity and accuracy than Sscrofa10.2. CONCLUSIONS: These highly contiguous assemblies plus annotation of a further 11 short-read assemblies provide an unprecedented view of the genetic make-up of this important agricultural and biomedical model species. We propose that the improved Duroc assembly (Sscrofa11.1) become the reference genome for genomic research in pigs.

Subject(s)

Computational Biology/methods , Genome , Genomics/methods , Sequence Analysis, DNA/methods , Sus scrofa/immunology , Animals , Molecular Sequence Annotation , Reproducibility of Results , Research , Swine

15.

ORF Capture-Seq as a versatile method for targeted identification of full-length isoforms.

Sheynkman, Gloria M; Tuttle, Katharine S; Laval, Florent; Tseng, Elizabeth; Underwood, Jason G; Yu, Liang; Dong, Da; Smith, Melissa L; Sebra, Robert; Willems, Luc; Hao, Tong; Calderwood, Michael A; Hill, David E; Vidal, Marc.

Nat Commun ; 11(1): 2326, 2020 05 11.

Article in English | MEDLINE | ID: mdl-32393825

ABSTRACT

Most human protein-coding genes are expressed as multiple isoforms, which greatly expands the functional repertoire of the encoded proteome. While at least one reliable open reading frame (ORF) model has been assigned for every coding gene, the majority of alternative isoforms remains uncharacterized due to (i) vast differences of overall levels between different isoforms expressed from common genes, and (ii) the difficulty of obtaining full-length transcript sequences. Here, we present ORF Capture-Seq (OCS), a flexible method that addresses both challenges for targeted full-length isoform sequencing applications using collections of cloned ORFs as probes. As a proof-of-concept, we show that an OCS pipeline focused on genes coding for transcription factors increases isoform detection by an order of magnitude when compared to unenriched samples. In short, OCS enables rapid discovery of isoforms from custom-selected genes and will accelerate mapping of the human transcriptome.

Subject(s)

Open Reading Frames/genetics , Sequence Analysis, RNA/methods , Humans , Protein Isoforms/genetics , Protein Isoforms/metabolism , RNA, Messenger/genetics , RNA, Messenger/metabolism , Reference Standards , Transcription Factors/genetics

16.

Systematic Profiling of Full-Length Ig and TCR Repertoire Diversity in Rhesus Macaque through Long Read Transcriptome Sequencing.

Brochu, Hayden N; Tseng, Elizabeth; Smith, Elise; Thomas, Matthew J; Jones, Aiden M; Diveley, Kayleigh R; Law, Lynn; Hansen, Scott G; Picker, Louis J; Gale, Michael; Peng, Xinxia.

J Immunol ; 204(12): 3434-3444, 2020 06 15.

Article in English | MEDLINE | ID: mdl-32376650

ABSTRACT

The diversity of Ig and TCR repertoires is a focal point of immunological studies. Rhesus macaques (Macaca mulatta) are key for modeling human immune responses, placing critical importance on the accurate annotation and quantification of their Ig and TCR repertoires. However, because of incomplete reference resources, the coverage and accuracy of the traditional targeted amplification strategies for profiling rhesus Ig and TCR repertoires are largely unknown. In this study, using long read sequencing, we sequenced four Indian-origin rhesus macaque tissues and obtained high-quality, full-length sequences for over 6000 unique Ig and TCR transcripts, without the need for sequence assembly. We constructed, to our knowledge, the first complete reference set for the constant regions of all known isotypes and chain types of rhesus Ig and TCR repertoires. We show that sequence diversity exists across the entire variable regions of rhesus Ig and TCR transcripts. Consequently, existing strategies using targeted amplification of rearranged variable regions comprised of V(D)J gene segments miss a significant fraction (27-53% and 42-49%) of rhesus Ig/TCR diversity. To overcome these limitations, we designed new rhesus-specific assays that remove the need for primers conventionally targeting variable regions and allow single cell level Ig and TCR repertoire analysis. Our improved approach will enable future studies to fully capture rhesus Ig and TCR repertoire diversity and is applicable for improving annotations in any model organism.

Subject(s)

Immunoglobulins/genetics , Immunoglobulins/immunology , Macaca mulatta/immunology , Receptors, Antigen, T-Cell/immunology , Transcriptome/genetics , Transcriptome/immunology , Animals , High-Throughput Nucleotide Sequencing/methods , Humans , Macaca mulatta/genetics

17.

Haplotype-resolved genomes provide insights into structural variation and gene content in Angus and Brahman cattle.

Low, Wai Yee; Tearle, Rick; Liu, Ruijie; Koren, Sergey; Rhie, Arang; Bickhart, Derek M; Rosen, Benjamin D; Kronenberg, Zev N; Kingan, Sarah B; Tseng, Elizabeth; Thibaud-Nissen, Françoise; Martin, Fergal J; Billis, Konstantinos; Ghurye, Jay; Hastie, Alex R; Lee, Joyce; Pang, Andy W C; Heaton, Michael P; Phillippy, Adam M; Hiendleder, Stefan; Smith, Timothy P L; Williams, John L.

Nat Commun ; 11(1): 2071, 2020 04 29.

Article in English | MEDLINE | ID: mdl-32350247

ABSTRACT

Inbred animals were historically chosen for genome analysis to circumvent assembly issues caused by haplotype variation but this resulted in a composite of the two genomes. Here we report a haplotype-aware scaffolding and polishing pipeline which was used to create haplotype-resolved, chromosome-level genome assemblies of Angus (taurine) and Brahman (indicine) cattle subspecies from contigs generated by the trio binning method. These assemblies reveal structural and copy number variants that differentiate the subspecies and that variant detection is sensitive to the specific reference genome chosen. Six genes with immune related functions have additional copies in the indicine compared with taurine lineage and an indicus-specific extra copy of fatty acid desaturase is under positive selection. The haplotyped genomes also enable transcripts to be phased to detect allele-specific expression. This work exemplifies the value of haplotype-resolved genomes to better explore evolutionary and functional variations.

Subject(s)

Cattle/genetics , Genetic Variation , Genome , Haplotypes/genetics , Alleles , Allelic Imbalance , Animals , Base Sequence , Chromosomes, Mammalian/genetics , Female , Genetic Loci , INDEL Mutation/genetics , Male , Molecular Sequence Annotation , Polymorphism, Single Nucleotide/genetics , RNA, Messenger/genetics , RNA, Messenger/metabolism , Repetitive Sequences, Nucleic Acid/genetics

18.

De novo assembly of the cattle reference genome with single-molecule sequencing.

Rosen, Benjamin D; Bickhart, Derek M; Schnabel, Robert D; Koren, Sergey; Elsik, Christine G; Tseng, Elizabeth; Rowan, Troy N; Low, Wai Y; Zimin, Aleksey; Couldrey, Christine; Hall, Richard; Li, Wenli; Rhie, Arang; Ghurye, Jay; McKay, Stephanie D; Thibaud-Nissen, Françoise; Hoffman, Jinna; Murdoch, Brenda M; Snelling, Warren M; McDaneld, Tara G; Hammond, John A; Schwartz, John C; Nandolo, Wilson; Hagen, Darren E; Dreischer, Christian; Schultheiss, Sebastian J; Schroeder, Steven G; Phillippy, Adam M; Cole, John B; Van Tassell, Curtis P; Liu, George; Smith, Timothy P L; Medrano, Juan F.

Gigascience ; 9(3)2020 03 01.

Article in English | MEDLINE | ID: mdl-32191811

ABSTRACT

BACKGROUND: Major advances in selection progress for cattle have been made following the introduction of genomic tools over the past 10-12 years. These tools depend upon the Bos taurus reference genome (UMD3.1.1), which was created using now-outdated technologies and is hindered by a variety of deficiencies and inaccuracies. RESULTS: We present the new reference genome for cattle, ARS-UCD1.2, based on the same animal as the original to facilitate transfer and interpretation of results obtained from the earlier version, but applying a combination of modern technologies in a de novo assembly to increase continuity, accuracy, and completeness. The assembly includes 2.7 Gb and is >250× more continuous than the original assembly, with contig N50 >25 Mb and L50 of 32. We also greatly expanded supporting RNA-based data for annotation that identifies 30,396 total genes (21,039 protein coding). The new reference assembly is accessible in annotated form for public use. CONCLUSIONS: We demonstrate that improved continuity of assembled sequence warrants the adoption of ARS-UCD1.2 as the new cattle reference genome and that increased assembly accuracy will benefit future research on this species.

Subject(s)

Breeding/standards , Cattle/genetics , Genome , Genomics/standards , Polymorphism, Genetic , Animals , Breeding/methods , Genomics/methods , RNA-Seq/methods , RNA-Seq/standards , Reference Standards , Sequence Analysis, DNA/methods , Sequence Analysis, DNA/standards

19.

Variant phasing and haplotypic expression from long-read sequencing in maize.

Wang, Bo; Tseng, Elizabeth; Baybayan, Primo; Eng, Kevin; Regulski, Michael; Jiao, Yinping; Wang, Liya; Olson, Andrew; Chougule, Kapeel; Buren, Peter Van; Ware, Doreen.

Commun Biol ; 3(1): 78, 2020 02 18.

Article in English | MEDLINE | ID: mdl-32071408

ABSTRACT

Haplotype phasing maize genetic variants is important for genome interpretation, population genetic analysis and functional analysis of allelic activity. We performed an isoform-level phasing study using two maize inbred lines and their reciprocal crosses, based on single-molecule, full-length cDNA sequencing. To phase and analyze transcripts between hybrids and parents, we developed IsoPhase. Using this tool, we validated the majority of SNPs called against matching short-read data from embryo, endosperm and root tissues, and identified allele-specific, gene-level and isoform-level differential expression between the inbred parental lines and hybrid offspring. After phasing 6907 genes in the reciprocal hybrids, we annotated the SNPs and identified large-effect genes. In addition, we identified parent-of-origin isoforms, distinct novel isoforms in maize parent and hybrid lines, and imprinted genes from different tissues. Finally, we characterized variation in cis- and trans-regulatory effects. Our study provides measures of haplotypic expression that could increase accuracy in studies of allelic expression.

Subject(s)

Sequence Analysis, RNA/methods , Zea mays/genetics , Alleles , Endosperm/genetics , Gene Expression Profiling/methods , Gene Expression Regulation, Plant , Genes, Plant , Genome, Plant , Haplotypes , Mutation , Plant Proteins/genetics , Plants, Genetically Modified , RNA, Messenger/analysis , RNA, Messenger/genetics , Zea mays/physiology

20.

5'UTR-mediated regulation of Ataxin-1 expression.

Manek, Rachna; Nelson, Tiffany; Tseng, Elizabeth; Rodriguez-Lebron, Edgardo.

Neurobiol Dis ; 134: 104564, 2020 02.

Article in English | MEDLINE | ID: mdl-31381977

ABSTRACT

Expression of mutant Ataxin-1 with an abnormally expanded polyglutamine domain is necessary for the onset and progression of spinocerebellar ataxia type 1 (SCA1). Understanding how Ataxin-1 expression is regulated in the human brain could inspire novel molecular therapies for this fatal, dominantly inherited neurodegenerative disease. Previous studies have shown that the ATXN1 3'UTR plays a key role in regulating the Ataxin-1 cellular pool via diverse post-transcriptional mechanisms. Here we show that elements within the ATXN1 5'UTR also participate in the regulation of Ataxin-1 expression. PCR and PacBio sequencing analysis of cDNA obtained from control and SCA1 human brain samples revealed the presence of three major, alternatively spliced ATXN1 5'UTR variants. In cell-based assays, fusion of these variants upstream of an EGFP reporter construct revealed significant and differential impacts on total EGFP protein output, uncovering a type of genetic rheostat-like function of the ATXN1 5'UTR. We identified ribosomal scanning of upstream AUG codons and increased transcript instability as potential mechanisms of regulation. Importantly, transcript-based analyses revealed significant differences in the expression pattern of ATXN1 5'UTR variants between control and SCA1 cerebellum. Together, the data presented here shed light into a previously unknown role for the ATXN1 5'UTR in the regulation of Ataxin-1 and provide new opportunities for the development of SCA1 therapeutics.

Subject(s)

5' Untranslated Regions/physiology , Ataxin-1/genetics , Ataxin-1/metabolism , Gene Expression Regulation/physiology , Spinocerebellar Ataxias , Cerebellum , Humans , Protein Isoforms/genetics , Protein Isoforms/metabolism

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL