Pesquisa | Portal Regional da BVS (teste)

CTAT-LR-fusion: accurate fusion transcript identification from long and short read isoform sequencing at bulk or single cell resolution.

Qin, Qian; Popic, Victoria; Yu, Houlin; White, Emily; Khorgade, Akanksha; Shin, Asa; Wienand, Kirsty; Dondi, Arthur; Beerenwinkel, Niko; Vazquez, Francisca; Al'Khafaji, Aziz M; Haas, Brian J.

bioRxiv ; 2024 Feb 28.

Artigo em Inglês | MEDLINE | ID: mdl-38464114

RESUMO

Gene fusions are found as cancer drivers in diverse adult and pediatric cancers. Accurate detection of fusion transcripts is essential in cancer clinical diagnostics, prognostics, and for guiding therapeutic development. Most currently available methods for fusion transcript detection are compatible with Illumina RNA-seq involving highly accurate short read sequences. Recent advances in long read isoform sequencing enable the detection of fusion transcripts at unprecedented resolution in bulk and single cell samples. Here we developed a new computational tool CTAT-LR-fusion to detect fusion transcripts from long read RNA-seq with or without companion short reads, with applications to bulk or single cell transcriptomes. We demonstrate that CTAT-LR-fusion exceeds fusion detection accuracy of alternative methods as benchmarked with simulated and real long read RNA-seq. Using short and long read RNA-seq, we further apply CTAT-LR-fusion to bulk transcriptomes of nine tumor cell lines, and to tumor single cells derived from a melanoma sample and three metastatic high grade serous ovarian carcinoma samples. In both bulk and in single cell RNA-seq, long isoform reads yielded higher sensitivity for fusion detection than short reads with notable exceptions. By combining short and long reads in CTAT-LR-fusion, we are able to further maximize detection of fusion splicing isoforms and fusion-expressing tumor cells. CTAT-LR-fusion is available at https://github.com/TrinityCTAT/CTAT-LR-fusion/wiki.

High-throughput RNA isoform sequencing using programmed cDNA concatenation.

Al'Khafaji, Aziz M; Smith, Jonathan T; Garimella, Kiran V; Babadi, Mehrtash; Popic, Victoria; Sade-Feldman, Moshe; Gatzen, Michael; Sarkizova, Siranush; Schwartz, Marc A; Blaum, Emily M; Day, Allyson; Costello, Maura; Bowers, Tera; Gabriel, Stacey; Banks, Eric; Philippakis, Anthony A; Boland, Genevieve M; Blainey, Paul C; Hacohen, Nir.

Nat Biotechnol ; 42(4): 582-586, 2024 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-37291427

RESUMO

Full-length RNA-sequencing methods using long-read technologies can capture complete transcript isoforms, but their throughput is limited. We introduce multiplexed arrays isoform sequencing (MAS-ISO-seq), a technique for programmably concatenating complementary DNAs (cDNAs) into molecules optimal for long-read sequencing, increasing the throughput >15-fold to nearly 40 million cDNA reads per run on the Sequel IIe sequencer. When applied to single-cell RNA sequencing of tumor-infiltrating T cells, MAS-ISO-seq demonstrated a 12- to 32-fold increase in the discovery of differentially spliced genes.

Assuntos

Sequenciamento de Nucleotídeos em Larga Escala , Isoformas de RNA , DNA Complementar/genética , Isoformas de RNA/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Isoformas de Proteínas/genética , Análise de Sequência de RNA/métodos , Transcriptoma , Perfilação da Expressão Gênica/métodos , RNA/genética

Cue: a deep-learning framework for structural variant discovery and genotyping.

Popic, Victoria; Rohlicek, Chris; Cunial, Fabio; Hajirasouliha, Iman; Meleshko, Dmitry; Garimella, Kiran; Maheshwari, Anant.

Nat Methods ; 20(4): 559-568, 2023 04.

Artigo em Inglês | MEDLINE | ID: mdl-36959322

RESUMO

Structural variants (SVs) are a major driver of genetic diversity and disease in the human genome and their discovery is imperative to advances in precision medicine. Existing SV callers rely on hand-engineered features and heuristics to model SVs, which cannot scale to the vast diversity of SVs nor fully harness the information available in sequencing datasets. Here we propose an extensible deep-learning framework, Cue, to call and genotype SVs that can learn complex SV abstractions directly from the data. At a high level, Cue converts alignments to images that encode SV-informative signals and uses a stacked hourglass convolutional neural network to predict the type, genotype and genomic locus of the SVs captured in each image. We show that Cue outperforms the state of the art in the detection of several classes of SVs on synthetic and real short-read data and that it can be easily extended to other sequencing platforms, while achieving competitive performance.

Assuntos

Aprendizado Profundo , Software , Humanos , Genótipo , Sinais (Psicologia) , Variação Estrutural do Genoma , Genoma Humano

Meltos: multi-sample tumor phylogeny reconstruction for structural variants.

Ricketts, Camir; Seidman, Daniel; Popic, Victoria; Hormozdiari, Fereydoun; Batzoglou, Serafim; Hajirasouliha, Iman.

Bioinformatics ; 36(4): 1082-1090, 2020 02 15.

Artigo em Inglês | MEDLINE | ID: mdl-31584621

RESUMO

MOTIVATION: We propose Meltos, a novel computational framework to address the challenging problem of building tumor phylogeny trees using somatic structural variants (SVs) among multiple samples. Meltos leverages the tumor phylogeny tree built on somatic single nucleotide variants (SNVs) to identify high confidence SVs and produce a comprehensive tumor lineage tree, using a novel optimization formulation. While we do not assume the evolutionary progression of SVs is necessarily the same as SNVs, we show that a tumor phylogeny tree using high-quality somatic SNVs can act as a guide for calling and assigning somatic SVs on a tree. Meltos utilizes multiple genomic read signals for potential SV breakpoints in whole genome sequencing data and proposes a probabilistic formulation for estimating variant allele fractions (VAFs) of SV events. RESULTS: In order to assess the ability of Meltos to correctly refine SNV trees with SV information, we tested Meltos on two simulated datasets with five genomes in both. We also assessed Meltos on two real cancer datasets. We tested Meltos on multiple samples from a liposarcoma tumor and on a multi-sample breast cancer data (Yates et al., 2015), where the authors provide validated structural variation events together with deep, targeted sequencing for a collection of somatic SNVs. We show Meltos has the ability to place high confidence validated SV calls on a refined tumor phylogeny tree. We also showed the flexibility of Meltos to either estimate VAFs directly from genomic data or to use copy number corrected estimates. AVAILABILITY AND IMPLEMENTATION: Meltos is available at https://github.com/ih-lab/Meltos. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Neoplasias , Genoma , Variação Estrutural do Genoma , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Neoplasias/genética , Filogenia , Análise de Sequência , Software

Using LICHeE and BAMSE for Reconstructing Cancer Phylogenetic Trees.

Ricketts, Camir; Popic, Victoria; Toosi, Hosein; Hajirasouliha, Iman.

Curr Protoc Bioinformatics ; 62(1): e49, 2018 06.

Artigo em Inglês | MEDLINE | ID: mdl-29927069

RESUMO

The reconstruction of cancer phylogeny trees and quantifying the evolution of the disease is a challenging task. LICHeE and BAMSE are two computational tools designed and implemented recently for this purpose. They both utilize estimated variant allele fraction of somatic mutations across multiple samples to infer the most likely cancer phylogenies. This unit provides extensive guidelines for installing and running both LICHeE and BAMSE. © 2018 by John Wiley & Sons, Inc.

Assuntos

Algoritmos , Biologia Computacional/métodos , Neoplasias/genética , Filogenia , Humanos

Fast Metagenomic Binning via Hashing and Bayesian Clustering.

Popic, Victoria; Kuleshov, Volodymyr; Snyder, Michael; Batzoglou, Serafim.

J Comput Biol ; 25(7): 677-688, 2018 07.

Artigo em Inglês | MEDLINE | ID: mdl-29658784

RESUMO

We introduce GATTACA, a framework for fast unsupervised binning of metagenomic contigs. Similar to recent approaches, GATTACA clusters contigs based on their coverage profiles across a large cohort of metagenomic samples; however, unlike previous methods that rely on read mapping, GATTACA quickly estimates these profiles from kmer counts stored in a compact index. This approach can result in over an order of magnitude speedup, while matching the accuracy of earlier methods on synthetic and real data benchmarks. It also provides a way to index metagenomic samples (e.g., from public repositories such as the Human Microbiome Project) offline once and reuse them across experiments; furthermore, the small size of the sample indices allows them to be easily transferred and stored. Leveraging the MinHash technique, GATTACA also provides an efficient way to identify publicly available metagenomic data that can be incorporated into the set of reference metagenomes to further improve binning accuracy. Thus, enabling easy indexing and reuse of publicly available metagenomic data sets, GATTACA makes accurate metagenomic analyses accessible to a much wider range of researchers.

Assuntos

Teorema de Bayes , Biologia Computacional/estatística & dados numéricos , Metagenômica/estatística & dados numéricos , Microbiota/genética , Análise por Conglomerados , Humanos , Metagenoma/genética

A hybrid cloud read aligner based on MinHash and kmer voting that preserves privacy.

Popic, Victoria; Batzoglou, Serafim.

Nat Commun ; 8: 15311, 2017 05 16.

Artigo em Inglês | MEDLINE | ID: mdl-28508884

RESUMO

Low-cost clouds can alleviate the compute and storage burden of the genome sequencing data explosion. However, moving personal genome data analysis to the cloud can raise serious privacy concerns. Here, we devise a method named Balaur, a privacy preserving read mapper for hybrid clouds based on locality sensitive hashing and kmer voting. Balaur can securely outsource a substantial fraction of the computation to the public cloud, while being highly competitive in accuracy and speed with non-private state-of-the-art read aligners on short read data. We also show that the method is significantly faster than the state of the art in long read mapping. Therefore, Balaur can enable institutions handling massive genomic data sets to shift part of their analysis to the cloud without sacrificing accuracy or exposing sensitive information to an untrusted third party.

Assuntos

Algoritmos , Computação em Nuvem , Biologia Computacional/métodos , Segurança Computacional , Privacidade , Genômica/estatística & dados numéricos , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Humanos , Reprodutibilidade dos Testes

Fast and scalable inference of multi-sample cancer lineages.

Popic, Victoria; Salari, Raheleh; Hajirasouliha, Iman; Kashef-Haghighi, Dorna; West, Robert B; Batzoglou, Serafim.

Genome Biol ; 16: 91, 2015 May 06.

Artigo em Inglês | MEDLINE | ID: mdl-25944252

RESUMO

Somatic variants can be used as lineage markers for the phylogenetic reconstruction of cancer evolution. Since somatic phylogenetics is complicated by sample heterogeneity, novel specialized tree-building methods are required for cancer phylogeny reconstruction. We present LICHeE (Lineage Inference for Cancer Heterogeneity and Evolution), a novel method that automates the phylogenetic inference of cancer progression from multiple somatic samples. LICHeE uses variant allele frequencies of somatic single nucleotide variants obtained by deep sequencing to reconstruct multi-sample cell lineage trees and infer the subclonal composition of the samples. LICHeE is open source and available at http://viq854.github.io/lichee .

Assuntos

Linhagem da Célula/genética , Variação Genética , Neoplasias/genética , Algoritmos , Carcinoma de Células Renais/genética , Biologia Computacional/métodos , Simulação por Computador , Progressão da Doença , Feminino , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Neoplasias Renais/genética , Neoplasias Ovarianas/genética , Filogenia , Software , Ensaios Antitumorais Modelo de Xenoenxerto

Short read alignment with populations of genomes.

Huang, Lin; Popic, Victoria; Batzoglou, Serafim.

Bioinformatics ; 29(13): i361-70, 2013 Jul 01.

Artigo em Inglês | MEDLINE | ID: mdl-23813006

RESUMO

SUMMARY: The increasing availability of high-throughput sequencing technologies has led to thousands of human genomes having been sequenced in the past years. Efforts such as the 1000 Genomes Project further add to the availability of human genome variation data. However, to date, there is no method that can map reads of a newly sequenced human genome to a large collection of genomes. Instead, methods rely on aligning reads to a single reference genome. This leads to inherent biases and lower accuracy. To tackle this problem, a new alignment tool BWBBLE is introduced in this article. We (i) introduce a new compressed representation of a collection of genomes, which explicitly tackles the genomic variation observed at every position, and (ii) design a new alignment algorithm based on the Burrows-Wheeler transform that maps short reads from a newly sequenced genome to an arbitrary collection of two or more (up to millions of) genomes with high accuracy and no inherent bias to one specific genome. AVAILABILITY: http://viq854.github.com/bwbble.

Assuntos

Genoma Humano , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Software , Algoritmos , Variação Genética , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA