Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 12 de 12
Filter
Add more filters










Publication year range
1.
Nat Biotechnol ; 42(4): 582-586, 2024 Apr.
Article in English | MEDLINE | ID: mdl-37291427

ABSTRACT

Full-length RNA-sequencing methods using long-read technologies can capture complete transcript isoforms, but their throughput is limited. We introduce multiplexed arrays isoform sequencing (MAS-ISO-seq), a technique for programmably concatenating complementary DNAs (cDNAs) into molecules optimal for long-read sequencing, increasing the throughput >15-fold to nearly 40 million cDNA reads per run on the Sequel IIe sequencer. When applied to single-cell RNA sequencing of tumor-infiltrating T cells, MAS-ISO-seq demonstrated a 12- to 32-fold increase in the discovery of differentially spliced genes.


Subject(s)
High-Throughput Nucleotide Sequencing , RNA Isoforms , DNA, Complementary/genetics , RNA Isoforms/genetics , High-Throughput Nucleotide Sequencing/methods , Protein Isoforms/genetics , Sequence Analysis, RNA/methods , Transcriptome , Gene Expression Profiling/methods , RNA/genetics
2.
Nature ; 608(7922): 353-359, 2022 08.
Article in English | MEDLINE | ID: mdl-35922509

ABSTRACT

Regulation of transcript structure generates transcript diversity and plays an important role in human disease1-7. The advent of long-read sequencing technologies offers the opportunity to study the role of genetic variation in transcript structure8-16. In this Article, we present a large human long-read RNA-seq dataset using the Oxford Nanopore Technologies platform from 88 samples from Genotype-Tissue Expression (GTEx) tissues and cell lines, complementing the GTEx resource. We identified just over 70,000 novel transcripts for annotated genes, and validated the protein expression of 10% of novel transcripts. We developed a new computational package, LORALS, to analyse the genetic effects of rare and common variants on the transcriptome by allele-specific analysis of long reads. We characterized allele-specific expression and transcript structure events, providing new insights into the specific transcript alterations caused by common and rare genetic variants and highlighting the resolution gained from long-read data. We were able to perturb the transcript structure upon knockdown of PTBP1, an RNA binding protein that mediates splicing, thereby finding genetic regulatory effects that are modified by the cellular environment. Finally, we used this dataset to enhance variant interpretation and study rare variants leading to aberrant splicing patterns.


Subject(s)
Alleles , Gene Expression Profiling , Organ Specificity , RNA-Seq , Transcriptome , Alternative Splicing/genetics , Cell Line , Datasets as Topic , Genotype , Heterogeneous-Nuclear Ribonucleoproteins/deficiency , Heterogeneous-Nuclear Ribonucleoproteins/genetics , Humans , Organ Specificity/genetics , Polypyrimidine Tract-Binding Protein/deficiency , Polypyrimidine Tract-Binding Protein/genetics , Reproducibility of Results , Transcriptome/genetics
3.
J Exp Med ; 218(6)2021 06 07.
Article in English | MEDLINE | ID: mdl-33857290

ABSTRACT

Advances in genome sequencing have resulted in the identification of the causes for numerous rare diseases. However, many cases remain unsolved with standard molecular analyses. We describe a family presenting with a phenotype resembling inherited thrombocytopenia 2 (THC2). THC2 is generally caused by single nucleotide variants that prevent silencing of ANKRD26 expression during hematopoietic differentiation. Short-read whole-exome and genome sequencing approaches were unable to identify a causal variant in this family. Using long-read whole-genome sequencing, a large complex structural variant involving a paired-duplication inversion was identified. Through functional studies, we show that this structural variant results in a pathogenic gain-of-function WAC-ANKRD26 fusion transcript. Our findings illustrate how complex structural variants that may be missed by conventional genome sequencing approaches can cause human disease.


Subject(s)
Adaptor Proteins, Signal Transducing/genetics , Intercellular Signaling Peptides and Proteins/genetics , Polymorphism, Single Nucleotide/genetics , Thrombocytopenia/genetics , Adolescent , Adult , Aged , Cell Line , Cell Line, Tumor , Child , Chromosome Breakage , Chromosome Disorders/genetics , Exome/genetics , Female , HEK293 Cells , HeLa Cells , Humans , Male , Middle Aged , Mutation/genetics , Pedigree , Thrombocytopenia/congenital
4.
BMC Genomics ; 19(1): 332, 2018 May 08.
Article in English | MEDLINE | ID: mdl-29739332

ABSTRACT

BACKGROUND: Here we present an in-depth characterization of the mechanism of sequencer-induced sample contamination due to the phenomenon of index swapping that impacts Illumina sequencers employing patterned flow cells with Exclusion Amplification (ExAmp) chemistry (HiSeqX, HiSeq4000, and NovaSeq). We also present a remediation method that minimizes the impact of such swaps. RESULTS: Leveraging data collected over a two-year period, we demonstrate the widespread prevalence of index swapping in patterned flow cell data. We calculate mean swap rates across multiple sample preparation methods and sequencer models, demonstrating that different library methods can have vastly different swapping rates and that even non-ExAmp chemistry instruments display trace levels of index swapping. We provide methods for eliminating sample data cross contamination by utilizing non-redundant dual indexing for complete filtering of index swapped reads, and share the sequences for 96 non-combinatorial dual indexes we have validated across various library preparation methods and sequencer models. Finally, using computational methods we provide a greater insight into the mechanism of index swapping. CONCLUSIONS: Index swapping in pooled libraries is a prevalent phenomenon that we observe at a rate of 0.2 to 6% in all sequencing runs on HiSeqX, HiSeq 4000/3000, and NovaSeq. Utilizing non-redundant dual indexing allows for the removal (flagging/filtering) of these swapped reads and eliminates swapping induced sample contamination, which is critical for sensitive applications such as RNA-seq, single cell, blood biopsy using circulating tumor DNA, or clinical sequencing.


Subject(s)
High-Throughput Nucleotide Sequencing , Sequence Analysis/methods , DNA/chemistry , DNA/isolation & purification , DNA/metabolism , Gene Library , Genome, Human , Humans , Sequence Analysis, DNA
5.
Genome Biol ; 18(1): 36, 2017 03 06.
Article in English | MEDLINE | ID: mdl-28260531

ABSTRACT

BACKGROUND: Structural variation (SV) influences genome organization and contributes to human disease. However, the complete mutational spectrum of SV has not been routinely captured in disease association studies. RESULTS: We sequenced 689 participants with autism spectrum disorder (ASD) and other developmental abnormalities to construct a genome-wide map of large SV. Using long-insert jumping libraries at 105X mean physical coverage and linked-read whole-genome sequencing from 10X Genomics, we document seven major SV classes at ~5 kb SV resolution. Our results encompass 11,735 distinct large SV sites, 38.1% of which are novel and 16.8% of which are balanced or complex. We characterize 16 recurrent subclasses of complex SV (cxSV), revealing that: (1) cxSV are larger and rarer than canonical SV; (2) each genome harbors 14 large cxSV on average; (3) 84.4% of large cxSVs involve inversion; and (4) most large cxSV (93.8%) have not been delineated in previous studies. Rare SVs are more likely to disrupt coding and regulatory non-coding loci, particularly when truncating constrained and disease-associated genes. We also identify multiple cases of catastrophic chromosomal rearrangements known as chromoanagenesis, including somatic chromoanasynthesis, and extreme balanced germline chromothripsis events involving up to 65 breakpoints and 60.6 Mb across four chromosomes, further defining rare categories of extreme cxSV. CONCLUSIONS: These data provide a foundational map of large SV in the morbid human genome and demonstrate a previously underappreciated abundance and diversity of cxSV that should be considered in genomic studies of human disease.


Subject(s)
Chromosome Aberrations , Chromosome Inversion , Chromothripsis , Genome, Human , Genomics , Autism Spectrum Disorder/genetics , Gene Order , Gene Rearrangement , Genetic Predisposition to Disease , Genomics/methods , High-Throughput Nucleotide Sequencing , Humans , Mutation
6.
Genome Biol ; 14(5): R51, 2013 May 29.
Article in English | MEDLINE | ID: mdl-23718773

ABSTRACT

BACKGROUND: DNA sequencing technologies deviate from the ideal uniform distribution of reads. These biases impair scientific and medical applications. Accordingly, we have developed computational methods for discovering, describing and measuring bias. RESULTS: We applied these methods to the Illumina, Ion Torrent, Pacific Biosciences and Complete Genomics sequencing platforms, using data from human and from a set of microbes with diverse base compositions. As in previous work, library construction conditions significantly influence sequencing bias. Pacific Biosciences coverage levels are the least biased, followed by Illumina, although all technologies exhibit error-rate biases in high- and low-GC regions and at long homopolymer runs. The GC-rich regions prone to low coverage include a number of human promoters, so we therefore catalog 1,000 that were exceptionally resistant to sequencing. Our results indicate that combining data from two technologies can reduce coverage bias if the biases in the component technologies are complementary and of similar magnitude. Analysis of Illumina data representing 120-fold coverage of a well-studied human sample reveals that 0.20% of the autosomal genome was covered at less than 10% of the genome-wide average. Excluding locations that were similar to known bias motifs or likely due to sample-reference variations left only 0.045% of the autosomal genome with unexplained poor coverage. CONCLUSIONS: The assays presented in this paper provide a comprehensive view of sequencing bias, which can be used to drive laboratory improvements and to monitor production processes. Development guided by these assays should result in improved genome assemblies and better coverage of biologically important loci.


Subject(s)
Base Composition , Sequence Analysis, DNA/methods , Algorithms , Genome, Bacterial , Genome, Human , Genome, Protozoan , Genomics/methods , Humans , Promoter Regions, Genetic , Sequence Analysis, DNA/instrumentation
7.
Nucleic Acids Res ; 41(6): e67, 2013 Apr 01.
Article in English | MEDLINE | ID: mdl-23303777

ABSTRACT

As researchers begin probing deep coverage sequencing data for increasingly rare mutations and subclonal events, the fidelity of next generation sequencing (NGS) laboratory methods will become increasingly critical. Although error rates for sequencing and polymerase chain reaction (PCR) are well documented, the effects that DNA extraction and other library preparation steps could have on downstream sequence integrity have not been thoroughly evaluated. Here, we describe the discovery of novel C > A/G > T transversion artifacts found at low allelic fractions in targeted capture data. Characteristics such as sequencer read orientation and presence in both tumor and normal samples strongly indicated a non-biological mechanism. We identified the source as oxidation of DNA during acoustic shearing in samples containing reactive contaminants from the extraction process. We show generation of 8-oxoguanine (8-oxoG) lesions during DNA shearing, present analysis tools to detect oxidation in sequencing data and suggest methods to reduce DNA oxidation through the introduction of antioxidants. Further, informatics methods are presented to confidently filter these artifacts from sequencing data sets. Though only seen in a low percentage of reads in affected samples, such artifacts could have profoundly deleterious effects on the ability to confidently call rare mutations, and eliminating other possible sources of artifacts should become a priority for the research community.


Subject(s)
Artifacts , DNA Damage , High-Throughput Nucleotide Sequencing/methods , Mutation , Sequence Analysis, DNA/methods , Alleles , DNA/chemistry , Genomics , Guanine/analogs & derivatives , Guanine/analysis , Humans , Melanoma/genetics , Oxidation-Reduction
8.
Proc Natl Acad Sci U S A ; 108(4): 1513-8, 2011 Jan 25.
Article in English | MEDLINE | ID: mdl-21187386

ABSTRACT

Massively parallel DNA sequencing technologies are revolutionizing genomics by making it possible to generate billions of relatively short (~100-base) sequence reads at very low cost. Whereas such data can be readily used for a wide range of biomedical applications, it has proven difficult to use them to generate high-quality de novo genome assemblies of large, repeat-rich vertebrate genomes. To date, the genome assemblies generated from such data have fallen far short of those obtained with the older (but much more expensive) capillary-based sequencing approach. Here, we report the development of an algorithm for genome assembly, ALLPATHS-LG, and its application to massively parallel DNA sequence data from the human and mouse genomes, generated on the Illumina platform. The resulting draft genome assemblies have good accuracy, short-range contiguity, long-range connectivity, and coverage of the genome. In particular, the base accuracy is high (≥99.95%) and the scaffold sizes (N50 size = 11.5 Mb for human and 7.2 Mb for mouse) approach those obtained with capillary-based sequencing. The combination of improved sequencing technology and improved computational methods should now make it possible to increase dramatically the de novo sequencing of large genomes. The ALLPATHS-LG program is available at http://www.broadinstitute.org/science/programs/genome-biology/crd.


Subject(s)
Algorithms , Genomics/methods , Sequence Analysis, DNA/methods , Software , Animals , Genome/genetics , Humans , Internet , Mice , Reproducibility of Results
9.
Curr Protoc Hum Genet ; Chapter 18: Unit 18.4, 2010 Jul.
Article in English | MEDLINE | ID: mdl-20582916

ABSTRACT

This unit describes a protocol for the targeted enrichment of exons from randomly sheared genomic DNA libraries using an in-solution hybrid selection approach for sequencing on an Illumina Genome Analyzer II. The steps for designing and ordering a hybrid selection oligo pool are reviewed, as are critical steps for performing the preparation and hybrid selection of an Illumina paired-end library. Critical parameters, performance metrics, and analysis workflow are discussed.


Subject(s)
Exons/genetics , Nucleic Acid Hybridization/methods , Sequence Analysis, DNA/methods , Humans , Solutions
10.
Ann Hematol ; 87(3): 195-203, 2008 Mar.
Article in English | MEDLINE | ID: mdl-18026954

ABSTRACT

BP1, a homeobox gene, is overexpressed in the bone marrow of 63% of acute myeloid leukemia patients. In this study, we compared the growth-inhibitory and cyto-differentiating activities of all-trans retinoic acid (ATRA) in NB4 (ATRA-responsive) and R4 (ATRA-resistant) acute promyelocytic leukemia (APL) cells relative to BP1 levels. Expression of two oncogenes, bcl-2 and c-myc, was also assessed. NB4 and R4 cells express BP1, bcl-2, and c-myc; the expression of all three genes was repressed after ATRA treatment of NB4 cells but not R4 cells. To determine whether BP1 overexpression affects sensitivity to ATRA, NB4 cells were transfected with a BP1-expressing plasmid and treated with ATRA. In cells overexpressing BP1: (1) proliferation was no longer inhibited; (2) differentiation was reduced two- to threefold; (3) c-myc was no longer repressed. These and other data suggest that BP1 may regulate bcl-2 and c-myc expression. Clinically, BP1 levels were elevated in all pretreatment APL patients tested, while BP1 expression was decreased in 91% of patients after combined ATRA and chemotherapy treatment. Two patients underwent disease relapse during follow-up; one patient exhibited a 42-fold increase in BP1 expression, while the other showed no change. This suggests that BP1 may be part of a pathway involved in resistance to therapy. Taken together, our data suggest that BP1 is a potential therapeutic target in APL.


Subject(s)
Antineoplastic Agents/pharmacology , Drug Resistance, Neoplasm/drug effects , Homeodomain Proteins/metabolism , Leukemia, Promyelocytic, Acute/metabolism , Transcription Factors/metabolism , Tretinoin/pharmacology , Antineoplastic Agents/therapeutic use , Cell Line, Tumor , Drug Resistance, Neoplasm/genetics , Gene Expression Regulation, Leukemic/drug effects , Gene Expression Regulation, Leukemic/genetics , Homeodomain Proteins/genetics , Humans , Leukemia, Promyelocytic, Acute/drug therapy , Leukemia, Promyelocytic, Acute/genetics , Proto-Oncogene Proteins c-bcl-2/biosynthesis , Proto-Oncogene Proteins c-bcl-2/genetics , Proto-Oncogene Proteins c-myc/biosynthesis , Proto-Oncogene Proteins c-myc/genetics , Transcription Factors/genetics , Tretinoin/therapeutic use
11.
Diabetes ; 55(3): 640-50, 2006 Mar.
Article in English | MEDLINE | ID: mdl-16505226

ABSTRACT

To identify novel pathways mediating molecular mechanisms of thiazolidinediones (TZDs) in humans, we assessed gene expression in adipose and muscle tissue from six subjects with type 2 diabetes before and after 8 weeks of treatment with rosiglitazone. mRNA was analyzed using Total Gene Expression Analysis (TOGA), an automated restriction-based cDNA display method with quantitative analysis of PCR products. The expression of cell cycle regulatory transcription factors E2F4 and the MAGE protein necdin were similarly altered in all subjects after rosiglitazone treatment. E2F4 expression was decreased by 10-fold in muscle and 2.5-fold in adipose tissue; necdin was identified in adipose tissue only and increased 1.8-fold after TZD treatment. To determine whether changes were related to an effect of the drug or adipogenesis, we evaluated the impact of rosiglitazone and differentiation independently in 3T3-L1 adipocytes. While treatment of differentiated adipocytes with rosiglitazone did not alter E2F4 or necdin, expression of both genes was significantly altered during differentiation. Differentiation was associated with increased cytosolic localization of E2F4. Moreover, necdin overexpression potently inhibited adipocyte differentiation and cell cycle progression. These data suggest that changes in necdin and E2F4 expression after rosiglitazone exposure in humans are associated with altered adipocyte differentiation and may contribute to improved insulin sensitivity in humans treated with TZDs.


Subject(s)
Adipocytes/metabolism , Diabetes Mellitus, Type 2/drug therapy , E2F4 Transcription Factor/genetics , Hypoglycemic Agents/therapeutic use , Muscles/metabolism , Nerve Tissue Proteins/genetics , Nuclear Proteins/genetics , Thiazolidinediones/therapeutic use , 3T3-L1 Cells , Adult , Aged , Animals , Cell Differentiation , Diabetes Mellitus, Type 2/metabolism , E2F4 Transcription Factor/physiology , Female , Humans , Male , Mice , Mice, Inbred ICR , Middle Aged , Nerve Tissue Proteins/physiology , Nuclear Proteins/physiology , RNA, Messenger/analysis , Rosiglitazone
12.
Proc Natl Acad Sci U S A ; 100(14): 8466-71, 2003 Jul 08.
Article in English | MEDLINE | ID: mdl-12832613

ABSTRACT

Type 2 diabetes mellitus (DM) is characterized by insulin resistance and pancreatic beta cell dysfunction. In high-risk subjects, the earliest detectable abnormality is insulin resistance in skeletal muscle. Impaired insulin-mediated signaling, gene expression, glycogen synthesis, and accumulation of intramyocellular triglycerides have all been linked with insulin resistance, but no specific defect responsible for insulin resistance and DM has been identified in humans. To identify genes potentially important in the pathogenesis of DM, we analyzed gene expression in skeletal muscle from healthy metabolically characterized nondiabetic (family history negative and positive for DM) and diabetic Mexican-American subjects. We demonstrate that insulin resistance and DM associate with reduced expression of multiple nuclear respiratory factor-1 (NRF-1)-dependent genes encoding key enzymes in oxidative metabolism and mitochondrial function. Although NRF-1 expression is decreased only in diabetic subjects, expression of both PPAR gamma coactivator 1-alpha and-beta (PGC1-alpha/PPARGC1 and PGC1-beta/PERC), coactivators of NRF-1 and PPAR gamma-dependent transcription, is decreased in both diabetic subjects and family history-positive nondiabetic subjects. Decreased PGC1 expression may be responsible for decreased expression of NRF-dependent genes, leading to the metabolic disturbances characteristic of insulin resistance and DM.


Subject(s)
DNA-Binding Proteins/physiology , Diabetes Mellitus, Type 2/genetics , Gene Expression Regulation/genetics , Insulin Resistance/genetics , Oxidative Phosphorylation , Prediabetic State/genetics , Trans-Activators/physiology , Transcription Factors/physiology , Adult , Biopsy , Citric Acid Cycle/genetics , Diabetes Mellitus/genetics , Diabetes Mellitus/metabolism , Female , Gene Expression Profiling , Genetic Predisposition to Disease , Glycolysis/genetics , Humans , Lipid Peroxidation/genetics , Male , Mexican Americans/genetics , Middle Aged , Muscle, Skeletal/metabolism , Muscle, Skeletal/pathology , NF-E2-Related Factor 1 , Nuclear Respiratory Factor 1 , Nuclear Respiratory Factors , Obesity , Oligonucleotide Array Sequence Analysis , Prediabetic State/metabolism , Receptors, Cytoplasmic and Nuclear/physiology , Transcription Factors/deficiency , Transcription Factors/genetics , Transcription, Genetic
SELECTION OF CITATIONS
SEARCH DETAIL
...