Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters










Database
Language
Publication year range
1.
Nat Commun ; 10(1): 4660, 2019 10 11.
Article in English | MEDLINE | ID: mdl-31604920

ABSTRACT

Whole-genome sequencing using sequencing technologies such as Illumina enables the accurate detection of small-scale variants but provides limited information about haplotypes and variants in repetitive regions of the human genome. Single-molecule sequencing (SMS) technologies such as Pacific Biosciences and Oxford Nanopore generate long reads that can potentially address the limitations of short-read sequencing. However, the high error rate of SMS reads makes it challenging to detect small-scale variants in diploid genomes. We introduce a variant calling method, Longshot, which leverages the haplotype information present in SMS reads to accurately detect and phase single-nucleotide variants (SNVs) in diploid genomes. We demonstrate that Longshot achieves very high accuracy for SNV detection using whole-genome Pacific Biosciences data, outperforms existing variant calling methods, and enables variant detection in duplicated regions of the genome that cannot be mapped using short reads.


Subject(s)
Polymorphism, Single Nucleotide , Software , Whole Genome Sequencing/methods , Algorithms , Diploidy , Genome, Human , Haplotypes , High-Throughput Nucleotide Sequencing/methods , Humans , Repetitive Sequences, Nucleic Acid , Sequence Analysis, DNA/methods
2.
Proc Natl Acad Sci U S A ; 114(47): 12512-12517, 2017 11 21.
Article in English | MEDLINE | ID: mdl-29078313

ABSTRACT

Accurate detection of variants and long-range haplotypes in genomes of single human cells remains very challenging. Common approaches require extensive in vitro amplification of genomes of individual cells using DNA polymerases and high-throughput short-read DNA sequencing. These approaches have two notable drawbacks. First, polymerase replication errors could generate tens of thousands of false-positive calls per genome. Second, relatively short sequence reads contain little to no haplotype information. Here we report a method, which is dubbed SISSOR (single-stranded sequencing using microfluidic reactors), for accurate single-cell genome sequencing and haplotyping. A microfluidic processor is used to separate the Watson and Crick strands of the double-stranded chromosomal DNA in a single cell and to randomly partition megabase-size DNA strands into multiple nanoliter compartments for amplification and construction of barcoded libraries for sequencing. The separation and partitioning of large single-stranded DNA fragments of the homologous chromosome pairs allows for the independent sequencing of each of the complementary and homologous strands. This enables the assembly of long haplotypes and reduction of sequence errors by using the redundant sequence information and haplotype-based error removal. We demonstrated the ability to sequence single-cell genomes with error rates as low as 10-8 and average 500-kb-long DNA fragments that can be assembled into haplotype contigs with N50 greater than 7 Mb. The performance could be further improved with more uniform amplification and more accurate sequence alignment. The ability to obtain accurate genome sequences and haplotype information from single cells will enable applications of genome sequencing for diverse clinical needs.


Subject(s)
Contig Mapping/methods , Genome, Human , Haplotypes , Microfluidic Analytical Techniques/methods , Single-Cell Analysis/methods , Whole Genome Sequencing/methods , Alleles , Cell Line , Contig Mapping/statistics & numerical data , Fibroblasts/cytology , Fibroblasts/metabolism , HLA Antigens/genetics , HLA Antigens/metabolism , Humans , Microfluidic Analytical Techniques/instrumentation , Mutation , Polymorphism, Single Nucleotide , Single-Cell Analysis/instrumentation , Whole Genome Sequencing/instrumentation
3.
Genome Res ; 27(5): 801-812, 2017 05.
Article in English | MEDLINE | ID: mdl-27940952

ABSTRACT

Many tools have been developed for haplotype assembly-the reconstruction of individual haplotypes using reads mapped to a reference genome sequence. Due to increasing interest in obtaining haplotype-resolved human genomes, a range of new sequencing protocols and technologies have been developed to enable the reconstruction of whole-genome haplotypes. However, existing computational methods designed to handle specific technologies do not scale well on data from different protocols. We describe a new algorithm, HapCUT2, that extends our previous method (HapCUT) to handle multiple sequencing technologies. Using simulations and whole-genome sequencing (WGS) data from multiple different data types-dilution pool sequencing, linked-read sequencing, single molecule real-time (SMRT) sequencing, and proximity ligation (Hi-C) sequencing-we show that HapCUT2 rapidly assembles haplotypes with best-in-class accuracy for all data types. In particular, HapCUT2 scales well for high sequencing coverage and rapidly assembled haplotypes for two long-read WGS data sets on which other methods struggled. Further, HapCUT2 directly models Hi-C specific error modalities, resulting in significant improvements in error rates compared to HapCUT, the only other method that could assemble haplotypes from Hi-C data. Using HapCUT2, haplotype assembly from a 90× coverage whole-genome Hi-C data set yielded high-resolution haplotypes (78.6% of variants phased in a single block) with high pairwise phasing accuracy (∼98% across chromosomes). Our results demonstrate that HapCUT2 is a robust tool for haplotype assembly applicable to data from diverse sequencing technologies.


Subject(s)
Contig Mapping/methods , Genomics/methods , Haplotypes , Sequence Analysis, DNA/methods , Software , Contig Mapping/standards , Genome, Human , Genomics/standards , Humans , Sequence Analysis, DNA/standards
4.
Nucleic Acids Res ; 44(12): e111, 2016 07 08.
Article in English | MEDLINE | ID: mdl-27105843

ABSTRACT

Phasing of single nucleotide (SNV), and structural variations into chromosome-wide haplotypes in humans has been challenging, and required either trio sequencing or restricting phasing to population-based haplotypes. Selvaraj et al demonstrated single individual SNV phasing is possible with proximity ligated (HiC) sequencing. Here, we demonstrate HiC can phase structural variants into phased scaffolds of SNVs. Since HiC data is noisy, and SV calling is challenging, we applied a range of supervised classification techniques, including Support Vector Machines and Random Forest, to phase deletions. Our approach was demonstrated on deletion calls and phasings on the NA12878 human genome. We used three NA12878 chromosomes and simulated chromosomes to train model parameters. The remaining NA12878 chromosomes withheld from training were used to evaluate phasing accuracy. Random Forest had the highest accuracy and correctly phased 86% of the deletions with allele-specific read evidence. Allele-specific read evidence was found for 76% of the deletions. HiC provides significant read evidence for accurately phasing 33% of the deletions. Also, eight of eight top ranked deletions phased by only HiC were validated using long range polymerase chain reaction and Sanger. Thus, deletions from a single individual can be accurately phased using a combination of shotgun and proximity ligation sequencing. InPhaDel software is available at: http://l337x911.github.io/inphadel/.


Subject(s)
Genomic Structural Variation , High-Throughput Nucleotide Sequencing/methods , Polymorphism, Single Nucleotide/genetics , Sequence Analysis, DNA/methods , Algorithms , Alleles , Genome, Human , Haplotypes/genetics , Humans , Sequence Deletion/genetics , Software
5.
Genome Res ; 24(12): 1963-76, 2014 Dec.
Article in English | MEDLINE | ID: mdl-25319995

ABSTRACT

Transposable elements (TEs) have been shown to contain functional binding sites for certain transcription factors (TFs). However, the extent to which TEs contribute to the evolution of TF binding sites is not well known. We comprehensively mapped binding sites for 26 pairs of orthologous TFs in two pairs of human and mouse cell lines (representing two cell lineages), along with epigenomic profiles, including DNA methylation and six histone modifications. Overall, we found that 20% of binding sites were embedded within TEs. This number varied across different TFs, ranging from 2% to 40%. We further identified 710 TF-TE relationships in which genomic copies of a TE subfamily contributed a significant number of binding peaks for a TF, and we found that LTR elements dominated these relationships in human. Importantly, TE-derived binding peaks were strongly associated with open and active chromatin signatures, including reduced DNA methylation and increased enhancer-associated histone marks. On average, 66% of TE-derived binding events were cell type-specific with a cell type-specific epigenetic landscape. Most of the binding sites contributed by TEs were species-specific, but we also identified binding sites conserved between human and mouse, the functional relevance of which was supported by a signature of purifying selection on DNA sequences of these TEs. Interestingly, several TFs had significantly expanded binding site landscapes only in one species, which were linked to species-specific gene functions, suggesting that TEs are an important driving force for regulatory innovation. Taken together, our data suggest that TEs have significantly and continuously shaped gene regulatory networks during mammalian evolution.


Subject(s)
DNA Transposable Elements , Gene Regulatory Networks , Animals , Binding Sites , Cell Line , Chromatin/genetics , Chromatin/metabolism , Chromatin Immunoprecipitation , Epigenomics , Evolution, Molecular , High-Throughput Nucleotide Sequencing , Humans , Mice , Multigene Family , Nucleotide Motifs , Organ Specificity/genetics , Position-Specific Scoring Matrices , Protein Binding , Transcription Factors/genetics , Transcription Factors/metabolism
SELECTION OF CITATIONS
SEARCH DETAIL
...