Search | VHL Regional Portal

1.

Reference-free assembly of long-read transcriptome sequencing data with RNA-Bloom2.

Nip, Ka Ming; Hafezqorani, Saber; Gagalova, Kristina K; Chiu, Readman; Yang, Chen; Warren, René L; Birol, Inanc.

Nat Commun ; 14(1): 2940, 2023 05 22.

Article in English | MEDLINE | ID: mdl-37217540

ABSTRACT

Long-read sequencing technologies have improved significantly since their emergence. Their read lengths, potentially spanning entire transcripts, is advantageous for reconstructing transcriptomes. Existing long-read transcriptome assembly methods are primarily reference-based and to date, there is little focus on reference-free transcriptome assembly. We introduce "RNA-Bloom2 [ https://github.com/bcgsc/RNA-Bloom ]", a reference-free assembly method for long-read transcriptome sequencing data. Using simulated datasets and spike-in control data, we show that the transcriptome assembly quality of RNA-Bloom2 is competitive to those of reference-based methods. Furthermore, we find that RNA-Bloom2 requires 27.0 to 80.6% of the peak memory and 3.6 to 10.8% of the total wall-clock runtime of a competing reference-free method. Finally, we showcase RNA-Bloom2 in assembling a transcriptome sample of Picea sitchensis (Sitka spruce). Since our method does not rely on a reference, it further sets the groundwork for large-scale comparative transcriptomics where high-quality draft genome assemblies are not readily available.

Subject(s)

RNA , Transcriptome , Transcriptome/genetics , High-Throughput Nucleotide Sequencing/methods , Gene Expression Profiling/methods , Sequence Analysis, RNA/methods

2.

Linked-read sequencing for detecting short tandem repeat expansions.

Chiu, Readman; Rajan-Babu, Indhu-Shree; Birol, Inanc; Friedman, Jan M.

Sci Rep ; 12(1): 9352, 2022 06 07.

Article in English | MEDLINE | ID: mdl-35672336

ABSTRACT

Detection of short tandem repeat (STR) expansions with standard short-read sequencing is challenging due to the difficulty in mapping multicopy repeat sequences. In this study, we explored how the long-range sequence information of barcode linked-read sequencing (BLRS) can be leveraged to improve repeat-read detection. We also devised a novel algorithm using BLRS barcodes for distance estimation and evaluated its application for STR genotyping. Both approaches were designed for genotyping large expansions (> 1 kb) that cannot be sized accurately by existing methods. Using simulated and experimental data of genomes with STR expansions from multiple BLRS platforms, we validated the utility of barcode and phasing information in attaining better STR genotypes compared to standard short-read sequencing. Although the coverage bias of extremely GC-rich STRs is an important limitation of BLRS, BLRS is an effective strategy for genotyping many other STR loci.

Subject(s)

High-Throughput Nucleotide Sequencing , Microsatellite Repeats , Algorithms , High-Throughput Nucleotide Sequencing/methods , Microsatellite Repeats/genetics , Sequence Analysis, DNA/methods

3.

Correction to: Genome-wide sequencing as a first-tier screening test for short tandem repeat expansions.

Rajan-Babu, Indhu-Shree; Peng, Junran J; Chiu, Readman; Li, Chenkai; Mohajeri, Arezoo; Dolzhenko, Egor; Eberle, Michael A; Birol, Inanc; Friedman, Jan M.

Genome Med ; 13(1): 151, 2021 Sep 13.

Article in English | MEDLINE | ID: mdl-34517885

4.

Genome-wide sequencing as a first-tier screening test for short tandem repeat expansions.

Rajan-Babu, Indhu-Shree; Peng, Junran J; Chiu, Readman; Li, Chenkai; Mohajeri, Arezoo; Dolzhenko, Egor; Eberle, Michael A; Birol, Inanc; Friedman, Jan M.

Genome Med ; 13(1): 126, 2021 08 09.

Article in English | MEDLINE | ID: mdl-34372915

ABSTRACT

BACKGROUND: Screening for short tandem repeat (STR) expansions in next-generation sequencing data can enable diagnosis, optimal clinical management/treatment, and accurate genetic counseling of patients with repeat expansion disorders. We aimed to develop an efficient computational workflow for reliable detection of STR expansions in next-generation sequencing data and demonstrate its clinical utility. METHODS: We characterized the performance of eight STR analysis methods (lobSTR, HipSTR, RepeatSeq, ExpansionHunter, TREDPARSE, GangSTR, STRetch, and exSTRa) on next-generation sequencing datasets of samples with known disease-causing full-mutation STR expansions and genomes simulated to harbor repeat expansions at selected loci and optimized their sensitivity. We then used a machine learning decision tree classifier to identify an optimal combination of methods for full-mutation detection. In Burrows-Wheeler Aligner (BWA)-aligned genomes, the ensemble approach of using ExpansionHunter, STRetch, and exSTRa performed the best (precision = 82%, recall = 100%, F1-score = 90%). We applied this pipeline to screen 301 families of children with suspected genetic disorders. RESULTS: We identified 10 individuals with full-mutations in the AR, ATXN1, ATXN8, DMPK, FXN, or HTT disease STR locus in the analyzed families. Additional candidates identified in our analysis include two probands with borderline ATXN2 expansions between the established repeat size range for reduced-penetrance and full-penetrance full-mutation and seven individuals with FMR1 CGG repeats in the intermediate/premutation repeat size range. In 67 probands with a prior negative clinical PCR test for the FMR1, FXN, or DMPK disease STR locus, or the spinocerebellar ataxia disease STR panel, our pipeline did not falsely identify aberrant expansion. We performed clinical PCR tests on seven (out of 10) full-mutation samples identified by our pipeline and confirmed the expansion status in all, showing absolute concordance between our bioinformatics and molecular findings. CONCLUSIONS: We have successfully demonstrated the application of a well-optimized bioinformatics pipeline that promotes the utility of genome-wide sequencing as a first-tier screening test to detect expansions of known disease STRs. Interrogating clinical next-generation sequencing data for pathogenic STR expansions using our ensemble pipeline can improve diagnostic yield and enhance clinical outcomes for patients with repeat expansion disorders.

Subject(s)

DNA Repeat Expansion , Genome-Wide Association Study , High-Throughput Nucleotide Sequencing , Microsatellite Repeats , Whole Genome Sequencing , Algorithms , Alleles , Clinical Decision-Making , Computational Biology/methods , Databases, Genetic , Decision Trees , Genetic Diseases, Inborn/diagnosis , Genetic Diseases, Inborn/genetics , Genetic Loci , Genome-Wide Association Study/methods , Humans , Machine Learning , Molecular Diagnostic Techniques , Mutation , Reproducibility of Results

5.

Straglr: discovering and genotyping tandem repeat expansions using whole genome long-read sequences.

Chiu, Readman; Rajan-Babu, Indhu-Shree; Friedman, Jan M; Birol, Inanc.

Genome Biol ; 22(1): 224, 2021 08 13.

Article in English | MEDLINE | ID: mdl-34389037

ABSTRACT

Tandem repeat (TR) expansion is the underlying cause of over 40 neurological disorders. Long-read sequencing offers an exciting avenue over conventional technologies for detecting TR expansions. Here, we present Straglr, a robust software tool for both targeted genotyping and novel expansion detection from long-read alignments. We benchmark Straglr using various simulations, targeted genotyping data of cell lines carrying expansions of known diseases, and whole genome sequencing data with chromosome-scale assembly. Our results suggest that Straglr may be useful for investigating disease-associated TR expansions using long-read sequencing.

Subject(s)

Genotype , Genotyping Techniques/methods , Software , DNA Repeat Expansion , Disease/genetics , Humans , Whole Genome Sequencing/methods

6.

A clinical transcriptome approach to patient stratification and therapy selection in acute myeloid leukemia.

Docking, T Roderick; Parker, Jeremy D K; Jädersten, Martin; Duns, Gerben; Chang, Linda; Jiang, Jihong; Pilsworth, Jessica A; Swanson, Lucas A; Chan, Simon K; Chiu, Readman; Nip, Ka Ming; Mar, Samantha; Mo, Angela; Wang, Xuan; Martinez-Høyer, Sergio; Stubbins, Ryan J; Mungall, Karen L; Mungall, Andrew J; Moore, Richard A; Jones, Steven J M; Birol, Inanç; Marra, Marco A; Hogge, Donna; Karsan, Aly.

Nat Commun ; 12(1): 2474, 2021 04 30.

Article in English | MEDLINE | ID: mdl-33931648

ABSTRACT

As more clinically-relevant genomic features of myeloid malignancies are revealed, it has become clear that targeted clinical genetic testing is inadequate for risk stratification. Here, we develop and validate a clinical transcriptome-based assay for stratification of acute myeloid leukemia (AML). Comparison of ribonucleic acid sequencing (RNA-Seq) to whole genome and exome sequencing reveals that a standalone RNA-Seq assay offers the greatest diagnostic return, enabling identification of expressed gene fusions, single nucleotide and short insertion/deletion variants, and whole-transcriptome expression information. Expression data from 154 AML patients are used to develop a novel AML prognostic score, which is strongly associated with patient outcomes across 620 patients from three independent cohorts, and 42 patients from a prospective cohort. When combined with molecular risk guidelines, the risk score allows for the re-stratification of 22.1 to 25.3% of AML patients from three independent cohorts into correct risk groups. Within the adverse-risk subgroup, we identify a subset of patients characterized by dysregulated integrin signaling and RUNX1 or TP53 mutation. We show that these patients may benefit from therapy with inhibitors of focal adhesion kinase, encoded by PTK2, demonstrating additional utility of transcriptome-based testing for therapy selection in myeloid malignancy.

Subject(s)

Biomarkers, Tumor/metabolism , Gene Expression Regulation, Neoplastic/genetics , Leukemia, Myeloid, Acute/diagnosis , Leukemia, Myeloid, Acute/metabolism , Biomarkers, Tumor/genetics , Cell Line, Tumor , Cohort Studies , Core Binding Factor Alpha 2 Subunit/genetics , Core Binding Factor Alpha 2 Subunit/metabolism , Female , Gene Fusion , Humans , INDEL Mutation , Integrins/genetics , Integrins/metabolism , Leukemia, Myeloid, Acute/genetics , Male , Polymorphism, Single Nucleotide , Prognosis , Prospective Studies , RNA-Seq , Risk Factors , Signal Transduction/genetics , Survival Analysis , Transcriptome , Tumor Suppressor Protein p53/genetics , Tumor Suppressor Protein p53/metabolism , Exome Sequencing , Whole Genome Sequencing

7.

RNA-Bloom enables reference-free and reference-guided sequence assembly for single-cell transcriptomes.

Nip, Ka Ming; Chiu, Readman; Yang, Chen; Chu, Justin; Mohamadi, Hamid; Warren, René L; Birol, Inanc.

Genome Res ; 30(8): 1191-1200, 2020 08.

Article in English | MEDLINE | ID: mdl-32817073

ABSTRACT

Despite the rapid advance in single-cell RNA sequencing (scRNA-seq) technologies within the last decade, single-cell transcriptome analysis workflows have primarily used gene expression data while isoform sequence analysis at the single-cell level still remains fairly limited. Detection and discovery of isoforms in single cells is difficult because of the inherent technical shortcomings of scRNA-seq data, and existing transcriptome assembly methods are mainly designed for bulk RNA samples. To address this challenge, we developed RNA-Bloom, an assembly algorithm that leverages the rich information content aggregated from multiple single-cell transcriptomes to reconstruct cell-specific isoforms. Assembly with RNA-Bloom can be either reference-guided or reference-free, thus enabling unbiased discovery of novel isoforms or foreign transcripts. We compared both assembly strategies of RNA-Bloom against five state-of-the-art reference-free and reference-based transcriptome assembly methods. In our benchmarks on a simulated 384-cell data set, reference-free RNA-Bloom reconstructed 37.9%-38.3% more isoforms than the best reference-free assembler, whereas reference-guided RNA-Bloom reconstructed 4.1%-11.6% more isoforms than reference-based assemblers. When applied to a real 3840-cell data set consisting of more than 4 billion reads, RNA-Bloom reconstructed 9.7%-25.0% more isoforms than the best competing reference-based and reference-free approaches evaluated. We expect RNA-Bloom to boost the utility of scRNA-seq data beyond gene expression analysis, expanding what is informatically accessible now.

Subject(s)

Gene Expression Profiling/methods , RNA-Seq/methods , Single-Cell Analysis/methods , Transcriptome/genetics , Algorithms , Animals , Base Sequence , Humans , Mice , Protein Isoforms/genetics , Software

8.

Mismatch-tolerant, alignment-free sequence classification using multiple spaced seeds and multiindex Bloom filters.

Chu, Justin; Mohamadi, Hamid; Erhan, Emre; Tse, Jeffery; Chiu, Readman; Yeo, Sarah; Birol, Inanc.

Proc Natl Acad Sci U S A ; 117(29): 16961-16968, 2020 07 21.

Article in English | MEDLINE | ID: mdl-32641514

ABSTRACT

Alignment-free classification tools have enabled high-throughput processing of sequencing data in many bioinformatics analysis pipelines primarily due to their computational efficiency. Originally k-mer based, such tools often lack sensitivity when faced with sequencing errors and polymorphisms. In response, some tools have been augmented with spaced seeds, which are capable of tolerating mismatches. However, spaced seeds have seen little practical use in classification because they bring increased computational and memory costs compared to methods that use k-mers. These limitations have also caused the design and length of practical spaced seeds to be constrained, since storing spaced seeds can be costly. To address these challenges, we have designed a probabilistic data structure called a multiindex Bloom Filter (miBF), which can store multiple spaced seed sequences with a low memory cost that remains static regardless of seed length or seed design. We formalize how to minimize the false-positive rate of miBFs when classifying sequences from multiple targets or references. Available within BioBloom Tools, we illustrate the utility of miBF in two use cases: read-binning for targeted assembly, and taxonomic read assignment. In our benchmarks, an analysis pipeline based on miBF shows higher sensitivity and specificity for read-binning than sequence alignment-based methods, also executing in less time. Similarly, for taxonomic classification, miBF enables higher sensitivity than a conventional spaced seed-based approach, while using half the memory and an order of magnitude less computational time.

Subject(s)

Sequence Analysis, DNA/methods , Software , Animals , Base Pair Mismatch , Humans , Phylogeny , Sequence Alignment , Sequence Analysis, DNA/standards

9.

Fusion-Bloom: fusion detection in assembled transcriptomes.

Chiu, Readman; Nip, Ka Ming; Birol, Inanc.

Bioinformatics ; 36(7): 2256-2257, 2020 04 01.

Article in English | MEDLINE | ID: mdl-31790154

ABSTRACT

SUMMARY: Presence or absence of gene fusions is one of the most important diagnostic markers in many cancer types. Consequently, fusion detection methods using various genomics data types, such as RNA sequencing (RNA-seq) are valuable tools for research and clinical applications. While information-rich RNA-seq data have proven to be instrumental in discovery of a number of hallmark fusion events, bioinformatics tools to detect fusions still have room for improvement. Here, we present Fusion-Bloom, a fusion detection method that leverages recent developments in de novo transcriptome assembly and assembly-based structural variant calling technologies (RNA-Bloom and PAVFinder, respectively). We benchmarked Fusion-Bloom against the performance of five other state-of-the-art fusion detection tools using multiple datasets. Overall, we observed Fusion-Bloom to display a good balance between detection sensitivity and specificity. We expect the tool to find applications in translational research and clinical genomics pipelines. AVAILABILITY AND IMPLEMENTATION: Fusion-Bloom is implemented as a UNIX Make utility, available at https://github.com/bcgsc/pavfinder and released under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Software , Transcriptome , Genomics , RNA , Sequence Analysis, RNA

10.

Base excision repair deficiency signatures implicate germline and somatic MUTYH aberrations in pancreatic ductal adenocarcinoma and breast cancer oncogenesis.

Thibodeau, My Linh; Zhao, Eric Y; Reisle, Caralyn; Ch'ng, Carolyn; Wong, Hui-Li; Shen, Yaoqing; Jones, Martin R; Lim, Howard J; Young, Sean; Cremin, Carol; Pleasance, Erin; Zhang, Wei; Holt, Robert; Eirew, Peter; Karasinska, Joanna; Kalloger, Steve E; Taylor, Greg; Majounie, Elisa; Bonakdar, Melika; Zong, Zusheng; Bleile, Dustin; Chiu, Readman; Birol, Inanc; Gelmon, Karen; Lohrisch, Caroline; Mungall, Karen L; Mungall, Andrew J; Moore, Richard; Ma, Yussanne P; Fok, Alexandra; Yip, Stephen; Karsan, Aly; Huntsman, David; Schaeffer, David F; Laskin, Janessa; Marra, Marco A; Renouf, Daniel J; Jones, Steven J M; Schrader, Kasmintan A.

Cold Spring Harb Mol Case Stud ; 5(2)2019 04.

Article in English | MEDLINE | ID: mdl-30833417

ABSTRACT

We report a case of early-onset pancreatic ductal adenocarcinoma in a patient harboring biallelic MUTYH germline mutations, whose tumor featured somatic mutational signatures consistent with defective MUTYH-mediated base excision repair and the associated driver KRAS transversion mutation p.Gly12Cys. Analysis of an additional 730 advanced cancer cases (N = 731) was undertaken to determine whether the mutational signatures were also present in tumors from germline MUTYH heterozygote carriers or if instead the signatures were only seen in those with biallelic loss of function. We identified two patients with breast cancer each carrying a pathogenic germline MUTYH variant with a somatic MUTYH copy loss leading to the germline variant being homozygous in the tumor and demonstrating the same somatic signatures. Our results suggest that monoallelic inactivation of MUTYH is not sufficient for C:G>A:T transversion signatures previously linked to MUTYH deficiency to arise (N = 9), but that biallelic complete loss of MUTYH function can cause such signatures to arise even in tumors not classically seen in MUTYH-associated polyposis (N = 3). Although defective MUTYH is not the only determinant of these signatures, MUTYH germline variants may be present in a subset of patients with tumors demonstrating elevated somatic signatures possibly suggestive of MUTYH deficiency (e.g., COSMIC Signature 18, SigProfiler SBS18/SBS36, SignatureAnalyzer SBS18/SBS36).

Subject(s)

Breast Neoplasms/genetics , Carcinoma, Pancreatic Ductal/genetics , DNA Glycosylases/genetics , Mutation , Pancreatic Neoplasms/genetics , Age of Onset , DNA Glycosylases/deficiency , Female , Germ-Line Mutation , Humans , Loss of Heterozygosity , Middle Aged , Proto-Oncogene Proteins p21(ras)/genetics

11.

TAP: a targeted clinical genomics pipeline for detecting transcript variants using RNA-seq data.

Chiu, Readman; Nip, Ka Ming; Chu, Justin; Birol, Inanc.

BMC Med Genomics ; 11(1): 79, 2018 Sep 10.

Article in English | MEDLINE | ID: mdl-30200994

ABSTRACT

BACKGROUND: RNA-seq is a powerful and cost-effective technology for molecular diagnostics of cancer and other diseases, and it can reach its full potential when coupled with validated clinical-grade informatics tools. Despite recent advances in long-read sequencing, transcriptome assembly of short reads remains a useful and cost-effective methodology for unveiling transcript-level rearrangements and novel isoforms. One of the major concerns for adopting the proven de novo assembly approach for RNA-seq data in clinical settings has been the analysis turnaround time. To address this concern, we have developed a targeted approach to expedite assembly and analysis of RNA-seq data. RESULTS: Here we present our Targeted Assembly Pipeline (TAP), which consists of four stages: 1) alignment-free gene-level classification of RNA-seq reads using BioBloomTools, 2) de novo assembly of individual targets using Trans-ABySS, 3) alignment of assembled contigs to the reference genome and transcriptome with GMAP and BWA and 4) structural and splicing variant detection using PAVFinder. We show that PAVFinder is a robust gene fusion detection tool when compared to established methods such as Tophat-Fusion and deFuse on simulated data of 448 events. Using the Leucegene acute myeloid leukemia (AML) RNA-seq data and a set of 580 COSMIC target genes, TAP identified a wide range of hallmark molecular anomalies including gene fusions, tandem duplications, insertions and deletions in agreement with published literature results. Moreover, also in this dataset, TAP captured AML-specific splicing variants such as skipped exons and novel splice sites reported in studies elsewhere. Running time of TAP on 100-150 million read pairs and a 580-gene set is one to 2 hours on a 48-core machine. CONCLUSIONS: We demonstrated that TAP is a fast and robust RNA-seq variant detection pipeline that is potentially amenable to clinical applications. TAP is available at http://www.bcgsc.ca/platform/bioinfo/software/pavfinder.

Subject(s)

Genetic Variation , Genomics/methods , RNA/metabolism , User-Computer Interface , Humans , INDEL Mutation , Leukemia, Myeloid, Acute/genetics , Leukemia, Myeloid, Acute/pathology , RNA/chemistry , RNA/genetics , RNA Splicing , Sequence Analysis, RNA

12.

Recurrent tumor-specific regulation of alternative polyadenylation of cancer-related genes.

Xue, Zhuyi; Warren, René L; Gibb, Ewan A; MacMillan, Daniel; Wong, Johnathan; Chiu, Readman; Hammond, S Austin; Yang, Chen; Nip, Ka Ming; Ennis, Catherine A; Hahn, Abigail; Reynolds, Sheila; Birol, Inanc.

BMC Genomics ; 19(1): 536, 2018 Jul 13.

Article in English | MEDLINE | ID: mdl-30005633

ABSTRACT

BACKGROUND: Alternative polyadenylation (APA) results in messenger RNA molecules with different 3' untranslated regions (3' UTRs), affecting the molecules' stability, localization, and translation. APA is pervasive and implicated in cancer. Earlier reports on APA focused on 3' UTR length modifications and commonly characterized APA events as 3' UTR shortening or lengthening. However, such characterization oversimplifies the processing of 3' ends of transcripts and fails to adequately describe the various scenarios we observe. RESULTS: We built a cloud-based targeted de novo transcript assembly and analysis pipeline that incorporates our previously developed cleavage site prediction tool, KLEAT. We applied this pipeline to elucidate the APA profiles of 114 genes in 9939 tumor and 729 tissue normal samples from The Cancer Genome Atlas (TCGA). The full set of 10,668 RNA-Seq samples from 33 cancer types has not been utilized by previous APA studies. By comparing the frequencies of predicted cleavage sites between normal and tumor sample groups, we identified 77 events (i.e. gene-cancer type pairs) of tumor-specific APA regulation in 13 cancer types; for 15 genes, such regulation is recurrent across multiple cancers. Our results also support a previous report showing the 3' UTR shortening of FGF2 in multiple cancers. However, over half of the events we identified display complex changes to 3' UTR length that resist simple classification like shortening or lengthening. CONCLUSIONS: Recurrent tumor-specific regulation of APA is widespread in cancer. However, the regulation pattern that we observed in TCGA RNA-seq data cannot be described as straightforward 3' UTR shortening or lengthening. Continued investigation into this complex, nuanced regulatory landscape will provide further insight into its role in tumor formation and development.

Subject(s)

Neoplasms/genetics , RNA, Messenger/genetics , 3' Untranslated Regions , Cloud Computing , Databases, Genetic , Fibroblast Growth Factor 2/genetics , Gene Expression Regulation, Neoplastic , Humans , Neoplasm Recurrence, Local/genetics , Neoplasms/pathology , Polyadenylation , RNA Cleavage , RNA, Messenger/metabolism , Software

13.

Konnector v2.0: pseudo-long reads from paired-end sequencing data.

Vandervalk, Benjamin P; Yang, Chen; Xue, Zhuyi; Raghavan, Karthika; Chu, Justin; Mohamadi, Hamid; Jackman, Shaun D; Chiu, Readman; Warren, René L; Birol, Inanç.

BMC Med Genomics ; 8 Suppl 3: S1, 2015.

Article in English | MEDLINE | ID: mdl-26399504

ABSTRACT

BACKGROUND: Reading the nucleotides from two ends of a DNA fragment is called paired-end tag (PET) sequencing. When the fragment length is longer than the combined read length, there remains a gap of unsequenced nucleotides between read pairs. If the target in such experiments is sequenced at a level to provide redundant coverage, it may be possible to bridge these gaps using bioinformatics methods. Konnector is a local de novo assembly tool that addresses this problem. Here we report on version 2.0 of our tool. RESULTS: Konnector uses a probabilistic and memory-efficient data structure called Bloom filter to represent a k-mer spectrum - all possible sequences of length k in an input file, such as the collection of reads in a PET sequencing experiment. It performs look-ups to this data structure to construct an implicit de Bruijn graph, which describes (k-1) base pair overlaps between adjacent k-mers. It traverses this graph to bridge the gap between a given pair of flanking sequences. CONCLUSIONS: Here we report the performance of Konnector v2.0 on simulated and experimental datasets, and compare it against other tools with similar functionality. We note that, representing k-mers with 1.5 bytes of memory on average, Konnector can scale to very large genomes. With our parallel implementation, it can also process over a billion bases on commodity hardware.

Subject(s)

Sequence Analysis, DNA/methods , Software , Algorithms , DNA/chemistry , High-Throughput Nucleotide Sequencing

14.

Improved white spruce (Picea glauca) genome assemblies and annotation of large gene families of conifer terpenoid and phenolic defense metabolism.

Warren, René L; Keeling, Christopher I; Yuen, Macaire Man Saint; Raymond, Anthony; Taylor, Greg A; Vandervalk, Benjamin P; Mohamadi, Hamid; Paulino, Daniel; Chiu, Readman; Jackman, Shaun D; Robertson, Gordon; Yang, Chen; Boyle, Brian; Hoffmann, Margarete; Weigel, Detlef; Nelson, David R; Ritland, Carol; Isabel, Nathalie; Jaquish, Barry; Yanchuk, Alvin; Bousquet, Jean; Jones, Steven J M; MacKay, John; Birol, Inanc; Bohlmann, Joerg.

Plant J ; 83(2): 189-212, 2015 Jul.

Article in English | MEDLINE | ID: mdl-26017574

ABSTRACT

White spruce (Picea glauca), a gymnosperm tree, has been established as one of the models for conifer genomics. We describe the draft genome assemblies of two white spruce genotypes, PG29 and WS77111, innovative tools for the assembly of very large genomes, and the conifer genomics resources developed in this process. The two white spruce genotypes originate from distant geographic regions of western (PG29) and eastern (WS77111) North America, and represent elite trees in two Canadian tree-breeding programs. We present an update (V3 and V4) for a previously reported PG29 V2 draft genome assembly and introduce a second white spruce genome assembly for genotype WS77111. Assemblies of the PG29 and WS77111 genomes confirm the reconstructed white spruce genome size in the 20 Gbp range, and show broad synteny. Using the PG29 V3 assembly and additional white spruce genomics and transcriptomics resources, we performed MAKER-P annotation and meticulous expert annotation of very large gene families of conifer defense metabolism, the terpene synthases and cytochrome P450s. We also comprehensively annotated the white spruce mevalonate, methylerythritol phosphate and phenylpropanoid pathways. These analyses highlighted the large extent of gene and pseudogene duplications in a conifer genome, in particular for genes of secondary (i.e. specialized) metabolism, and the potential for gain and loss of function for defense and adaptation.

Subject(s)

Genome, Plant , Multigene Family , Phenols/metabolism , Picea/genetics , Terpenes/metabolism , Alkyl and Aryl Transferases/metabolism , Computational Biology , Cytochrome P-450 Enzyme System/metabolism , Transcriptome

15.

Kleat: cleavage site analysis of transcriptomes.

Birol, Inanç; Raymond, Anthony; Chiu, Readman; Nip, Ka Ming; Jackman, Shaun D; Kreitzman, Maayan; Docking, T Roderick; Ennis, Catherine A; Robertson, A Gordon; Karsan, Aly.

Pac Symp Biocomput ; : 347-58, 2015.

Article in English | MEDLINE | ID: mdl-25592595

ABSTRACT

In eukaryotic cells, alternative cleavage of 3' untranslated regions (UTRs) can affect transcript stability, transport and translation. For polyadenylated (poly(A)) transcripts, cleavage sites can be characterized with short-read sequencing using specialized library construction methods. However, for large-scale cohort studies as well as for clinical sequencing applications, it is desirable to characterize such events using RNA-seq data, as the latter are already widely applied to identify other relevant information, such as mutations, alternative splicing and chimeric transcripts. Here we describe KLEAT, an analysis tool that uses de novo assembly of RNA-seq data to characterize cleavage sites on 3' UTRs. We demonstrate the performance of KLEAT on three cell line RNA-seq libraries constructed and sequenced by the ENCODE project, and assembled using Trans-ABySS. Validating the KLEAT predictions with matched ENCODE RNA-seq and RNA-PET libraries, we show that the tool has over 90% positive predictive value when there are at least three RNA-seq reads supporting a poly(A) tail and requiring at least three RNA-PET reads mapping within 100 nucleotides as validation. We also compare the performance of KLEAT with other popular RNA-seq analysis pipelines that reconstruct 3' UTR ends, and show that it performs favourably, based on an ROC-like curve.

Subject(s)

Transcriptome , 3' Untranslated Regions , Binding Sites , Cell Line , Computational Biology , Gene Library , Humans , ROC Curve , Sequence Alignment/statistics & numerical data , Sequence Analysis, RNA/statistics & numerical data

16.

Barnacle: detecting and characterizing tandem duplications and fusions in transcriptome assemblies.

Swanson, Lucas; Robertson, Gordon; Mungall, Karen L; Butterfield, Yaron S; Chiu, Readman; Corbett, Richard D; Docking, T Roderick; Hogge, Donna; Jackman, Shaun D; Moore, Richard A; Mungall, Andrew J; Nip, Ka Ming; Parker, Jeremy D K; Qian, Jenny Qing; Raymond, Anthony; Sung, Sandy; Tam, Angela; Thiessen, Nina; Varhol, Richard; Wang, Sherry; Yorukoglu, Deniz; Zhao, Yongjun; Hoodless, Pamela A; Sahinalp, S Cenk; Karsan, Aly; Birol, Inanc.

BMC Genomics ; 14: 550, 2013 Aug 14.

Article in English | MEDLINE | ID: mdl-23941359

ABSTRACT

BACKGROUND: Chimeric transcripts, including partial and internal tandem duplications (PTDs, ITDs) and gene fusions, are important in the detection, prognosis, and treatment of human cancers. RESULTS: We describe Barnacle, a production-grade analysis tool that detects such chimeras in de novo assemblies of RNA-seq data, and supports prioritizing them for review and validation by reporting the relative coverage of co-occurring chimeric and wild-type transcripts. We demonstrate applications in large-scale disease studies, by identifying PTDs in MLL, ITDs in FLT3, and reciprocal fusions between PML and RARA, in two deeply sequenced acute myeloid leukemia (AML) RNA-seq datasets. CONCLUSIONS: Our analyses of real and simulated data sets show that, with appropriate filter settings, Barnacle makes highly specific predictions for three types of chimeric transcripts that are important in a range of cancers: PTDs, ITDs, and fusions. High specificity makes manual review and validation efficient, which is necessary in large-scale disease studies. Characterizing an extended range of chimera types will help generate insights into progression, treatment, and outcomes for complex diseases.

Subject(s)

Gene Duplication/genetics , Gene Expression Profiling/methods , Gene Fusion/genetics , Genomics , Breast Neoplasms/genetics , Exons/genetics , Humans , Leukemia, Myeloid, Acute/genetics , Molecular Sequence Annotation , RNA, Messenger/genetics , Statistics as Topic

17.

Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia.

Ley, Timothy J; Miller, Christopher; Ding, Li; Raphael, Benjamin J; Mungall, Andrew J; Robertson, A Gordon; Hoadley, Katherine; Triche, Timothy J; Laird, Peter W; Baty, Jack D; Fulton, Lucinda L; Fulton, Robert; Heath, Sharon E; Kalicki-Veizer, Joelle; Kandoth, Cyriac; Klco, Jeffery M; Koboldt, Daniel C; Kanchi, Krishna-Latha; Kulkarni, Shashikant; Lamprecht, Tamara L; Larson, David E; Lin, Ling; Lu, Charles; McLellan, Michael D; McMichael, Joshua F; Payton, Jacqueline; Schmidt, Heather; Spencer, David H; Tomasson, Michael H; Wallis, John W; Wartman, Lukas D; Watson, Mark A; Welch, John; Wendl, Michael C; Ally, Adrian; Balasundaram, Miruna; Birol, Inanc; Butterfield, Yaron; Chiu, Readman; Chu, Andy; Chuah, Eric; Chun, Hye-Jung; Corbett, Richard; Dhalla, Noreen; Guin, Ranabir; He, An; Hirst, Carrie; Hirst, Martin; Holt, Robert A; Jones, Steven.

N Engl J Med ; 368(22): 2059-74, 2013 05 30.

Article in English | MEDLINE | ID: mdl-23634996

ABSTRACT

BACKGROUND: Many mutations that contribute to the pathogenesis of acute myeloid leukemia (AML) are undefined. The relationships between patterns of mutations and epigenetic phenotypes are not yet clear. METHODS: We analyzed the genomes of 200 clinically annotated adult cases of de novo AML, using either whole-genome sequencing (50 cases) or whole-exome sequencing (150 cases), along with RNA and microRNA sequencing and DNA-methylation analysis. RESULTS: AML genomes have fewer mutations than most other adult cancers, with an average of only 13 mutations found in genes. Of these, an average of 5 are in genes that are recurrently mutated in AML. A total of 23 genes were significantly mutated, and another 237 were mutated in two or more samples. Nearly all samples had at least 1 nonsynonymous mutation in one of nine categories of genes that are almost certainly relevant for pathogenesis, including transcription-factor fusions (18% of cases), the gene encoding nucleophosmin (NPM1) (27%), tumor-suppressor genes (16%), DNA-methylation-related genes (44%), signaling genes (59%), chromatin-modifying genes (30%), myeloid transcription-factor genes (22%), cohesin-complex genes (13%), and spliceosome-complex genes (14%). Patterns of cooperation and mutual exclusivity suggested strong biologic relationships among several of the genes and categories. CONCLUSIONS: We identified at least one potential driver mutation in nearly all AML samples and found that a complex interplay of genetic events contributes to AML pathogenesis in individual patients. The databases from this study are widely available to serve as a foundation for further investigations of AML pathogenesis, classification, and risk stratification. (Funded by the National Institutes of Health.).

Subject(s)

Leukemia, Myeloid, Acute/genetics , Mutation , Adult , CpG Islands , DNA Methylation , Epigenomics , Female , Gene Expression , Gene Fusion , Genome, Human , Humans , Leukemia, Myeloid, Acute/classification , Male , MicroRNAs/genetics , Middle Aged , Nucleophosmin , Sequence Analysis, DNA/methods

18.

Mutational and structural analysis of diffuse large B-cell lymphoma using whole-genome sequencing.

Morin, Ryan D; Mungall, Karen; Pleasance, Erin; Mungall, Andrew J; Goya, Rodrigo; Huff, Ryan D; Scott, David W; Ding, Jiarui; Roth, Andrew; Chiu, Readman; Corbett, Richard D; Chan, Fong Chun; Mendez-Lago, Maria; Trinh, Diane L; Bolger-Munro, Madison; Taylor, Greg; Hadj Khodabakhshi, Alireza; Ben-Neriah, Susana; Pon, Julia; Meissner, Barbara; Woolcock, Bruce; Farnoud, Noushin; Rogic, Sanja; Lim, Emilia L; Johnson, Nathalie A; Shah, Sohrab; Jones, Steven; Steidl, Christian; Holt, Robert; Birol, Inanc; Moore, Richard; Connors, Joseph M; Gascoyne, Randy D; Marra, Marco A.

Blood ; 122(7): 1256-65, 2013 Aug 15.

Article in English | MEDLINE | ID: mdl-23699601

ABSTRACT

Diffuse large B-cell lymphoma (DLBCL) is a genetically heterogeneous cancer composed of at least 2 molecular subtypes that differ in gene expression and distribution of mutations. Recently, application of genome/exome sequencing and RNA-seq to DLBCL has revealed numerous genes that are recurrent targets of somatic point mutation in this disease. Here we provide a whole-genome-sequencing-based perspective of DLBCL mutational complexity by characterizing 40 de novo DLBCL cases and 13 DLBCL cell lines and combining these data with DNA copy number analysis and RNA-seq from an extended cohort of 96 cases. Our analysis identified widespread genomic rearrangements including evidence for chromothripsis as well as the presence of known and novel fusion transcripts. We uncovered new gene targets of recurrent somatic point mutations and genes that are targeted by focal somatic deletions in this disease. We highlight the recurrence of germinal center B-cell-restricted mutations affecting genes that encode the S1P receptor and 2 small GTPases (GNA13 and GNAI2) that together converge on regulation of B-cell homing. We further analyzed our data to approximate the relative temporal order in which some recurrent mutations were acquired and demonstrate that ongoing acquisition of mutations and intratumoral clonal heterogeneity are common features of DLBCL. This study further improves our understanding of the processes and pathways involved in lymphomagenesis, and some of the pathways mutated here may indicate new avenues for therapeutic intervention.

Subject(s)

Biomarkers, Tumor/chemistry , Biomarkers, Tumor/genetics , DNA Copy Number Variations/genetics , Genome, Human , Lymphoma, Large B-Cell, Diffuse/genetics , Mutation/genetics , GTP-Binding Protein alpha Subunits, G12-G13/chemistry , GTP-Binding Protein alpha Subunits, G12-G13/genetics , Gene Expression Profiling , High-Throughput Nucleotide Sequencing , Humans , Oligonucleotide Array Sequence Analysis , RNA, Messenger/genetics , Real-Time Polymerase Chain Reaction , Reverse Transcriptase Polymerase Chain Reaction , Tumor Cells, Cultured

19.

The genetic landscape of high-risk neuroblastoma.

Pugh, Trevor J; Morozova, Olena; Attiyeh, Edward F; Asgharzadeh, Shahab; Wei, Jun S; Auclair, Daniel; Carter, Scott L; Cibulskis, Kristian; Hanna, Megan; Kiezun, Adam; Kim, Jaegil; Lawrence, Michael S; Lichenstein, Lee; McKenna, Aaron; Pedamallu, Chandra Sekhar; Ramos, Alex H; Shefler, Erica; Sivachenko, Andrey; Sougnez, Carrie; Stewart, Chip; Ally, Adrian; Birol, Inanc; Chiu, Readman; Corbett, Richard D; Hirst, Martin; Jackman, Shaun D; Kamoh, Baljit; Khodabakshi, Alireza Hadj; Krzywinski, Martin; Lo, Allan; Moore, Richard A; Mungall, Karen L; Qian, Jenny; Tam, Angela; Thiessen, Nina; Zhao, Yongjun; Cole, Kristina A; Diamond, Maura; Diskin, Sharon J; Mosse, Yael P; Wood, Andrew C; Ji, Lingyun; Sposto, Richard; Badgett, Thomas; London, Wendy B; Moyer, Yvonne; Gastier-Foster, Julie M; Smith, Malcolm A; Guidry Auvil, Jaime M; Gerhard, Daniela S.

Nat Genet ; 45(3): 279-84, 2013 Mar.

Article in English | MEDLINE | ID: mdl-23334666

ABSTRACT

Neuroblastoma is a malignancy of the developing sympathetic nervous system that often presents with widespread metastatic disease, resulting in survival rates of less than 50%. To determine the spectrum of somatic mutation in high-risk neuroblastoma, we studied 240 affected individuals (cases) using a combination of whole-exome, genome and transcriptome sequencing as part of the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) initiative. Here we report a low median exonic mutation frequency of 0.60 per Mb (0.48 nonsilent) and notably few recurrently mutated genes in these tumors. Genes with significant somatic mutation frequencies included ALK (9.2% of cases), PTPN11 (2.9%), ATRX (2.5%, and an additional 7.1% had focal deletions), MYCN (1.7%, causing a recurrent p.Pro44Leu alteration) and NRAS (0.83%). Rare, potentially pathogenic germline variants were significantly enriched in ALK, CHEK2, PINK1 and BARD1. The relative paucity of recurrent somatic mutations in neuroblastoma challenges current therapeutic strategies that rely on frequently altered oncogenic drivers.

Subject(s)

Exome , Mutation , Neuroblastoma , Cell Line, Tumor , Genetic Predisposition to Disease , Genome, Human , Humans , Neuroblastoma/genetics , Neuroblastoma/physiopathology , Polymorphism, Single Nucleotide , Sequence Analysis, DNA , Transcriptome

20.

Subgroup-specific structural variation across 1,000 medulloblastoma genomes.

Northcott, Paul A; Shih, David J H; Peacock, John; Garzia, Livia; Morrissy, A Sorana; Zichner, Thomas; Stütz, Adrian M; Korshunov, Andrey; Reimand, Jüri; Schumacher, Steven E; Beroukhim, Rameen; Ellison, David W; Marshall, Christian R; Lionel, Anath C; Mack, Stephen; Dubuc, Adrian; Yao, Yuan; Ramaswamy, Vijay; Luu, Betty; Rolider, Adi; Cavalli, Florence M G; Wang, Xin; Remke, Marc; Wu, Xiaochong; Chiu, Readman Y B; Chu, Andy; Chuah, Eric; Corbett, Richard D; Hoad, Gemma R; Jackman, Shaun D; Li, Yisu; Lo, Allan; Mungall, Karen L; Nip, Ka Ming; Qian, Jenny Q; Raymond, Anthony G J; Thiessen, Nina T; Varhol, Richard J; Birol, Inanc; Moore, Richard A; Mungall, Andrew J; Holt, Robert; Kawauchi, Daisuke; Roussel, Martine F; Kool, Marcel; Jones, David T W; Witt, Hendrick; Fernandez-L, Africa; Kenney, Anna M; Wechsler-Reya, Robert J.

Nature ; 488(7409): 49-56, 2012 Aug 02.

Article in English | MEDLINE | ID: mdl-22832581

ABSTRACT

Medulloblastoma, the most common malignant paediatric brain tumour, is currently treated with nonspecific cytotoxic therapies including surgery, whole-brain radiation, and aggressive chemotherapy. As medulloblastoma exhibits marked intertumoural heterogeneity, with at least four distinct molecular variants, previous attempts to identify targets for therapy have been underpowered because of small samples sizes. Here we report somatic copy number aberrations (SCNAs) in 1,087 unique medulloblastomas. SCNAs are common in medulloblastoma, and are predominantly subgroup-enriched. The most common region of focal copy number gain is a tandem duplication of SNCAIP, a gene associated with Parkinson's disease, which is exquisitely restricted to Group 4α. Recurrent translocations of PVT1, including PVT1-MYC and PVT1-NDRG1, that arise through chromothripsis are restricted to Group 3. Numerous targetable SCNAs, including recurrent events targeting TGF-ß signalling in Group 3, and NF-κB signalling in Group 4, suggest future avenues for rational, targeted therapy.

Subject(s)

Cerebellar Neoplasms/classification , Cerebellar Neoplasms/genetics , Genome, Human/genetics , Genomic Structural Variation/genetics , Medulloblastoma/classification , Medulloblastoma/genetics , Carrier Proteins/genetics , Cerebellar Neoplasms/metabolism , Child , DNA Copy Number Variations/genetics , Gene Duplication/genetics , Genes, myc/genetics , Genomics , Hedgehog Proteins/metabolism , Humans , Medulloblastoma/metabolism , NF-kappa B/metabolism , Nerve Tissue Proteins/genetics , Oncogene Proteins, Fusion/genetics , Proteins/genetics , RNA, Long Noncoding , Signal Transduction , Transforming Growth Factor beta/metabolism , Translocation, Genetic/genetics

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL