Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 55
Filter
1.
Nat Commun ; 15(1): 4220, 2024 May 17.
Article in English | MEDLINE | ID: mdl-38760338

ABSTRACT

When somatic cells acquire complex karyotypes, they often are removed by the immune system. Mutant somatic cells that evade immune surveillance can lead to cancer. Neurons with complex karyotypes arise during neurotypical brain development, but neurons are almost never the origin of brain cancers. Instead, somatic mutations in neurons can bring about neurodevelopmental disorders, and contribute to the polygenic landscape of neuropsychiatric and neurodegenerative disease. A subset of human neurons harbors idiosyncratic copy number variants (CNVs, "CNV neurons"), but previous analyses of CNV neurons are limited by relatively small sample sizes. Here, we develop an allele-based validation approach, SCOVAL, to corroborate or reject read-depth based CNV calls in single human neurons. We apply this approach to 2,125 frontal cortical neurons from a neurotypical human brain. SCOVAL identifies 226 CNV neurons, which include a subclass of 65 CNV neurons with highly aberrant karyotypes containing whole or substantial losses on multiple chromosomes. Moreover, we find that CNV location appears to be nonrandom. Recurrent regions of neuronal genome rearrangement contain fewer, but longer, genes.


Subject(s)
DNA Copy Number Variations , Mosaicism , Neurons , Humans , Neurons/metabolism , Alleles
2.
Sci Data ; 10(1): 813, 2023 11 20.
Article in English | MEDLINE | ID: mdl-37985666

ABSTRACT

Somatic mosaicism is defined as an occurrence of two or more populations of cells having genomic sequences differing at given loci in an individual who is derived from a single zygote. It is a characteristic of multicellular organisms that plays a crucial role in normal development and disease. To study the nature and extent of somatic mosaicism in autism spectrum disorder, bipolar disorder, focal cortical dysplasia, schizophrenia, and Tourette syndrome, a multi-institutional consortium called the Brain Somatic Mosaicism Network (BSMN) was formed through the National Institute of Mental Health (NIMH). In addition to genomic data of affected and neurotypical brains, the BSMN also developed and validated a best practices somatic single nucleotide variant calling workflow through the analysis of reference brain tissue. These resources, which include >400 terabytes of data from 1087 subjects, are now available to the research community via the NIMH Data Archive (NDA) and are described here.


Subject(s)
Mental Disorders , Humans , Autism Spectrum Disorder/genetics , Brain , Genomics , Mosaicism , Genome, Human , Mental Disorders/genetics
3.
PLoS Biol ; 21(5): e3001822, 2023 05.
Article in English | MEDLINE | ID: mdl-37205709

ABSTRACT

Candida albicans is a frequent colonizer of human mucosal surfaces as well as an opportunistic pathogen. C. albicans is remarkably versatile in its ability to colonize diverse host sites with differences in oxygen and nutrient availability, pH, immune responses, and resident microbes, among other cues. It is unclear how the genetic background of a commensal colonizing population can influence the shift to pathogenicity. Therefore, we examined 910 commensal isolates from 35 healthy donors to identify host niche-specific adaptations. We demonstrate that healthy people are reservoirs for genotypically and phenotypically diverse C. albicans strains. Using limited diversity exploitation, we identified a single nucleotide change in the uncharacterized ZMS1 transcription factor that was sufficient to drive hyper invasion into agar. We found that SC5314 was significantly different from the majority of both commensal and bloodstream isolates in its ability to induce host cell death. However, our commensal strains retained the capacity to cause disease in the Galleria model of systemic infection, including outcompeting the SC5314 reference strain during systemic competition assays. This study provides a global view of commensal strain variation and within-host strain diversity of C. albicans and suggests that selection for commensalism in humans does not result in a fitness cost for invasive disease.


Subject(s)
Candida albicans , Symbiosis , Humans , Candida albicans/genetics , Transcription Factors/genetics , Gene Expression Regulation
4.
bioRxiv ; 2023 Mar 07.
Article in English | MEDLINE | ID: mdl-36945473

ABSTRACT

When somatic cells acquire complex karyotypes, they are removed by the immune system. Mutant somatic cells that evade immune surveillance can lead to cancer. Neurons with complex karyotypes arise during neurotypical brain development, but neurons are almost never the origin of brain cancers. Instead, somatic mutations in neurons can bring about neurodevelopmental disorders, and contribute to the polygenic landscape of neuropsychiatric and neurodegenerative disease. A subset of human neurons harbors idiosyncratic copy number variants (CNVs, "CNV neurons"), but previous analyses of CNV neurons have been limited by relatively small sample sizes. Here, we developed an allele-based validation approach, SCOVAL, to corroborate or reject read-depth based CNV calls in single human neurons. We applied this approach to 2,125 frontal cortical neurons from a neurotypical human brain. This approach identified 226 CNV neurons, as well as a class of CNV neurons with complex karyotypes containing whole or substantial losses on multiple chromosomes. Moreover, we found that CNV location appears to be nonrandom. Recurrent regions of neuronal genome rearrangement contained fewer, but longer, genes.

5.
bioRxiv ; 2023 Apr 21.
Article in English | MEDLINE | ID: mdl-36778249

ABSTRACT

The transfer of mitochondrial DNA into the nuclear genomes of eukaryotes (Numts) has been linked to lifespan in non-human species 1-3 and recently demonstrated to occur in rare instances from one human generation to the next 4. Here we investigated numtogenesis dynamics in humans in two ways. First, we quantified Numts in 1,187 post-mortem brain and blood samples from different individuals. Compared to circulating immune cells (n=389), post-mitotic brain tissue (n=798) contained more Numts, consistent with their potential somatic accumulation. Within brain samples we observed a 5.5-fold enrichment of somatic Numt insertions in the dorsolateral prefrontal cortex compared to cerebellum samples, suggesting that brain Numts arose spontaneously during development or across the lifespan. Moreover, more brain Numts was linked to earlier mortality. The brains of individuals with no cognitive impairment who died at younger ages carried approximately 2 more Numts per decade of life lost than those who lived longer. Second, we tested the dynamic transfer of Numts using a repeated-measures WGS design in a human fibroblast model that recapitulates several molecular hallmarks of aging 5. These longitudinal experiments revealed a gradual accumulation of one Numt every ~13 days. Numtogenesis was independent of large-scale genomic instability and unlikely driven cell clonality. Targeted pharmacological perturbations including chronic glucocorticoid signaling or impairing mitochondrial oxidative phosphorylation (OxPhos) only modestly increased the rate of numtogenesis, whereas patient-derived SURF1-mutant cells exhibiting mtDNA instability accumulated Numts 4.7-fold faster than healthy donors. Combined, our data document spontaneous numtogenesis in human cells and demonstrate an association between brain cortical somatic Numts and human lifespan. These findings open the possibility that mito-nuclear horizontal gene transfer among human post-mitotic tissues produce functionally-relevant human Numts over timescales shorter than previously assumed.

6.
Viruses ; 14(11)2022 10 26.
Article in English | MEDLINE | ID: mdl-36366450

ABSTRACT

Mucoepidermoid Carcinomas (MEC) represent the most common malignancies of salivary glands. Approximately 50% of all MEC cases are known to harbor CRTC1/3-MAML2 gene fusions, but the additional molecular drivers remain largely uncharacterized. Here, we sought to resolve controversy around the role of human papillomavirus (HPV) as a potential driver of mucoepidermoid carcinoma. Bioinformatics analysis was performed on 48 MEC transcriptomes. Subsequent targeted capture DNA sequencing was used to annotate HPV content and integration status in the host genome. HPV of any type was only identified in 1/48 (2%) of the MEC transcriptomes analyzed. Importantly, the one HPV16+ tumor expressed high levels of p16, had high expression of HPV16 oncogenes E6 and E7, and displayed a complex integration pattern that included breakpoints into 13 host genes including PIK3AP1, HIPI, OLFM4,SIRT1, ARAP2, TMEM161B-AS1, and EPS15L1 as well as 9 non-genic regions. In this cohort, HPV is a rare driver of MEC but may have a substantial etiologic role in cases that harbor the virus. Genetic mechanisms of host genome integration are similar to those observed in other head and neck cancers.


Subject(s)
Alphapapillomavirus , Carcinoma, Mucoepidermoid , Papillomavirus Infections , Humans , Carcinoma, Mucoepidermoid/genetics , Carcinoma, Mucoepidermoid/metabolism , Carcinoma, Mucoepidermoid/pathology , DNA-Binding Proteins/genetics , Papillomaviridae/genetics , Trans-Activators/genetics , Nuclear Proteins/genetics , Transcription Factors/genetics
7.
Clin Cancer Res ; 28(2): 350-359, 2022 01 15.
Article in English | MEDLINE | ID: mdl-34702772

ABSTRACT

PURPOSE: In locally advanced p16+ oropharyngeal squamous cell carcinoma (OPSCC), (i) to investigate kinetics of human papillomavirus (HPV) circulating tumor DNA (ctDNA) and association with tumor progression after chemoradiation, and (ii) to compare the predictive value of ctDNA to imaging biomarkers of MRI and FDG-PET. EXPERIMENTAL DESIGN: Serial blood samples were collected from patients with AJCC8 stage III OPSCC (n = 34) enrolled on a randomized trial: pretreatment; during chemoradiation at weeks 2, 4, and 7; and posttreatment. All patients also had dynamic-contrast-enhanced and diffusion-weighted MRI, as well as FDG-PET scans pre-chemoradiation and week 2 during chemoradiation. ctDNA values were analyzed for prediction of freedom from progression (FFP), and correlations with aggressive tumor subvolumes with low blood volume (TVLBV) and low apparent diffusion coefficient (TVLADC), and metabolic tumor volume (MTV) using Cox proportional hazards model and Spearman rank correlation. RESULTS: Low pretreatment ctDNA and an early increase in ctDNA at week 2 compared with baseline were significantly associated with superior FFP (P < 0.02 and P < 0.05, respectively). At week 4 or 7, neither ctDNA counts nor clearance were significantly predictive of progression (P = 0.8). Pretreatment ctDNA values were significantly correlated with nodal TVLBV, TVLADC, and MTV pre-chemoradiation (P < 0.03), while the ctDNA values at week 2 were correlated with these imaging metrics in primary tumor. Multivariate analysis showed that ctDNA and the imaging metrics performed comparably to predict FFP. CONCLUSIONS: Early ctDNA kinetics during definitive chemoradiation may predict therapy response in stage III OPSCC.


Subject(s)
Alphapapillomavirus , Carcinoma, Squamous Cell , Circulating Tumor DNA , Head and Neck Neoplasms , Oropharyngeal Neoplasms , Papillomavirus Infections , Biomarkers , Carcinoma, Squamous Cell/diagnostic imaging , Carcinoma, Squamous Cell/genetics , Carcinoma, Squamous Cell/therapy , Circulating Tumor DNA/genetics , Fluorodeoxyglucose F18 , Humans , Kinetics , Oropharyngeal Neoplasms/diagnostic imaging , Oropharyngeal Neoplasms/genetics , Oropharyngeal Neoplasms/therapy , Papillomaviridae/genetics , Papillomavirus Infections/complications , Papillomavirus Infections/genetics , Prognosis , Retrospective Studies , Squamous Cell Carcinoma of Head and Neck
8.
Genome Biol ; 22(1): 298, 2021 10 27.
Article in English | MEDLINE | ID: mdl-34706748

ABSTRACT

We present SquiggleNet, the first deep-learning model that can classify nanopore reads directly from their electrical signals. SquiggleNet operates faster than DNA passes through the pore, allowing real-time classification and read ejection. Using 1 s of sequencing data, the classifier achieves significantly higher accuracy than base calling followed by sequence alignment. Our approach is also faster and requires an order of magnitude less memory than alignment-based approaches. SquiggleNet distinguished human from bacterial DNA with over 90% accuracy, generalized to unseen bacterial species in a human respiratory meta genome sample, and accurately classified sequences containing human long interspersed repeat elements.


Subject(s)
Deep Learning , Nanopore Sequencing/methods , DNA, Bacterial/analysis , Humans , Long Interspersed Nucleotide Elements , Metagenome , Respiratory System/microbiology
9.
Cancer ; 127(19): 3531-3540, 2021 10 01.
Article in English | MEDLINE | ID: mdl-34160069

ABSTRACT

BACKGROUND: Human papillomavirus (HPV) is a well-established driver of malignant transformation at a number of sites, including head and neck, cervical, vulvar, anorectal, and penile squamous cell carcinomas; however, the impact of HPV integration into the host human genome on this process remains largely unresolved. This is due to the technical challenge of identifying HPV integration sites, which includes limitations of existing informatics approaches to discovering viral-host breakpoints from low-read-coverage sequencing data. METHODS: To overcome this limitation, the authors developed SearcHPV, a new HPV detection pipeline based on targeted capture technology, and applied the algorithm to targeted capture data. They performed an integrated analysis of SearcHPV-defined breakpoints with genome-wide linked-read sequencing to identify potential HPV-related structural variations. RESULTS: Through an analysis of HPV+ models, the authors showed that SearcHPV detected HPV-host integration sites with a higher sensitivity and specificity than 2 other commonly used HPV detection callers. SearcHPV uncovered HPV integration sites adjacent to known cancer-related genes, including TP63, MYC, and TRAF2, and near regions of large structural variation. The authors further validated the junction contig assembly feature of SearcHPV, which helped to accurately identify viral-host junction breakpoint sequences. They found that viral integration occurred through a variety of DNA repair mechanisms, including nonhomologous end joining, alternative end joining, and microhomology-mediated repair. CONCLUSIONS: In summary, SearcHPV is a new optimized tool for the accurate detection of HPV-human integration sites from targeted capture DNA sequencing data.


Subject(s)
Alphapapillomavirus , Carcinoma, Squamous Cell , Papillomavirus Infections , Uterine Cervical Neoplasms , Alphapapillomavirus/genetics , DNA, Viral/genetics , Female , Genomics , Humans , Papillomaviridae/genetics , Papillomavirus Infections/complications , Papillomavirus Infections/genetics
10.
Nat Commun ; 12(1): 3586, 2021 06 11.
Article in English | MEDLINE | ID: mdl-34117247

ABSTRACT

Mobile element insertions (MEIs) are repetitive genomic sequences that contribute to genetic variation and can lead to genetic disorders. Targeted and whole-genome approaches using short-read sequencing have been developed to identify reference and non-reference MEIs; however, the read length hampers detection of these elements in complex genomic regions. Here, we pair Cas9-targeted nanopore sequencing with computational methodologies to capture active MEIs in human genomes. We demonstrate parallel enrichment for distinct classes of MEIs, averaging 44% of reads on-targeted signals and exhibiting a 13.4-54x enrichment over whole-genome approaches. We show an individual flow cell can recover most MEIs (97% L1Hs, 93% AluYb, 51% AluYa, 99% SVA_F, and 65% SVA_E). We identify seventeen non-reference MEIs in GM12878 overlooked by modern, long-read analysis pipelines, primarily in repetitive genomic regions. This work introduces the utility of nanopore sequencing for MEI enrichment and lays the foundation for rapid discovery of elusive, repetitive genetic elements.


Subject(s)
CRISPR-Cas Systems , Genomics , Interspersed Repetitive Sequences , Nanopore Sequencing/methods , Cell Line , DNA-Binding Proteins , Genome, Human , Humans , Repetitive Sequences, Nucleic Acid , Ribonucleoproteins/metabolism , Sequence Analysis, DNA
11.
Am J Hum Genet ; 108(5): 919-928, 2021 05 06.
Article in English | MEDLINE | ID: mdl-33789087

ABSTRACT

Virtually all genome sequencing efforts in national biobanks, complex and Mendelian disease programs, and medical genetic initiatives are reliant upon short-read whole-genome sequencing (srWGS), which presents challenges for the detection of structural variants (SVs) relative to emerging long-read WGS (lrWGS) technologies. Given this ubiquity of srWGS in large-scale genomics initiatives, we sought to establish expectations for routine SV detection from this data type by comparison with lrWGS assembly, as well as to quantify the genomic properties and added value of SVs uniquely accessible to each technology. Analyses from the Human Genome Structural Variation Consortium (HGSVC) of three families captured ~11,000 SVs per genome from srWGS and ~25,000 SVs per genome from lrWGS assembly. Detection power and precision for SV discovery varied dramatically by genomic context and variant class: 9.7% of the current GRCh38 reference is defined by segmental duplication (SD) and simple repeat (SR), yet 91.4% of deletions that were specifically discovered by lrWGS localized to these regions. Across the remaining 90.3% of reference sequence, we observed extremely high (93.8%) concordance between technologies for deletions in these datasets. In contrast, lrWGS was superior for detection of insertions across all genomic contexts. Given that non-SD/SR sequences encompass 95.9% of currently annotated disease-associated exons, improved sensitivity from lrWGS to discover novel pathogenic deletions in these currently interpretable genomic regions is likely to be incremental. However, these analyses highlight the considerable added value of assembly-based lrWGS to create new catalogs of insertions and transposable elements, as well as disease-associated repeat expansions in genomic sequences that were previously recalcitrant to routine assessment.


Subject(s)
Genome, Human/genetics , Genomic Structural Variation , Genomics/methods , Goals , Whole Genome Sequencing/methods , Whole Genome Sequencing/standards , DNA Copy Number Variations , Exons/genetics , Humans , Research Design , Segmental Duplications, Genomic , Sequence Alignment
12.
Genome Biol ; 22(1): 92, 2021 03 29.
Article in English | MEDLINE | ID: mdl-33781308

ABSTRACT

BACKGROUND: Post-zygotic mutations incurred during DNA replication, DNA repair, and other cellular processes lead to somatic mosaicism. Somatic mosaicism is an established cause of various diseases, including cancers. However, detecting mosaic variants in DNA from non-cancerous somatic tissues poses significant challenges, particularly if the variants only are present in a small fraction of cells. RESULTS: Here, the Brain Somatic Mosaicism Network conducts a coordinated, multi-institutional study to examine the ability of existing methods to detect simulated somatic single-nucleotide variants (SNVs) in DNA mixing experiments, generate multiple replicates of whole-genome sequencing data from the dorsolateral prefrontal cortex, other brain regions, dura mater, and dural fibroblasts of a single neurotypical individual, devise strategies to discover somatic SNVs, and apply various approaches to validate somatic SNVs. These efforts lead to the identification of 43 bona fide somatic SNVs that range in variant allele fractions from ~ 0.005 to ~ 0.28. Guided by these results, we devise best practices for calling mosaic SNVs from 250× whole-genome sequencing data in the accessible portion of the human genome that achieve 90% specificity and sensitivity. Finally, we demonstrate that analysis of multiple bulk DNA samples from a single individual allows the reconstruction of early developmental cell lineage trees. CONCLUSIONS: This study provides a unified set of best practices to detect somatic SNVs in non-cancerous tissues. The data and methods are freely available to the scientific community and should serve as a guide to assess the contributions of somatic SNVs to neuropsychiatric diseases.


Subject(s)
Brain/metabolism , Genetic Association Studies , Genetic Variation , Alleles , Chromosome Mapping , Computational Biology/methods , Genetic Association Studies/methods , Genomics/methods , Germ Cells/metabolism , High-Throughput Nucleotide Sequencing , Humans , Organ Specificity/genetics , Polymorphism, Single Nucleotide
13.
Gigascience ; 10(1)2021 01 13.
Article in English | MEDLINE | ID: mdl-33438729

ABSTRACT

BACKGROUND: The main goal of this collaborative effort is to provide genome-wide data for the previously underrepresented population in Eastern Europe, and to provide cross-validation of the data from genome sequences and genotypes of the same individuals acquired by different technologies. We collected 97 genome-grade DNA samples from consented individuals representing major regions of Ukraine that were consented for public data release. BGISEQ-500 sequence data and genotypes by an Illumina GWAS chip were cross-validated on multiple samples and additionally referenced to 1 sample that has been resequenced by Illumina NovaSeq6000 S4 at high coverage. RESULTS: The genome data have been searched for genomic variation represented in this population, and a number of variants have been reported: large structural variants, indels, copy number variations, single-nucletide polymorphisms, and microsatellites. To our knowledge, this study provides the largest to-date survey of genetic variation in Ukraine, creating a public reference resource aiming to provide data for medical research in a large understudied population. CONCLUSIONS: Our results indicate that the genetic diversity of the Ukrainian population is uniquely shaped by evolutionary and demographic forces and cannot be ignored in future genetic and biomedical studies. These data will contribute a wealth of new information bringing forth a wealth of novel, endemic and medically related alleles.


Subject(s)
DNA Copy Number Variations , Polymorphism, Single Nucleotide , Genome , Genomics , Humans , Ukraine
14.
NPJ Genom Med ; 5: 41, 2020.
Article in English | MEDLINE | ID: mdl-33062306

ABSTRACT

Germline copy number variants (CNVs) and single-nucleotide polymorphisms (SNPs) form the basis of inter-individual genetic variation. Although the phenotypic effects of SNPs have been extensively investigated, the effects of CNVs is relatively less understood. To better characterize mechanisms by which CNVs affect cellular phenotype, we tested their association with variable CpG methylation in a genome-wide manner. Using paired CNV and methylation data from the 1000 genomes and HapMap projects, we identified genome-wide associations by methylation quantitative trait locus (mQTL) analysis. We found individual CNVs being associated with methylation of multiple CpGs and vice versa. CNV-associated methylation changes were correlated with gene expression. CNV-mQTLs were enriched for regulatory regions, transcription factor-binding sites (TFBSs), and were involved in long-range physical interactions with associated CpGs. Some CNV-mQTLs were associated with methylation of imprinted genes. Several CNV-mQTLs and/or associated genes were among those previously reported by genome-wide association studies (GWASs). We demonstrate that germline CNVs in the genome are associated with CpG methylation. Our findings suggest that structural variation together with methylation may affect cellular phenotype.

16.
Nat Biotechnol ; 38(11): 1347-1355, 2020 11.
Article in English | MEDLINE | ID: mdl-32541955

ABSTRACT

New technologies and analysis methods are enabling genomic structural variants (SVs) to be detected with ever-increasing accuracy, resolution and comprehensiveness. To help translate these methods to routine research and clinical practice, we developed a sequence-resolved benchmark set for identification of both false-negative and false-positive germline large insertions and deletions. To create this benchmark for a broadly consented son in a Personal Genome Project trio with broadly available cells and DNA, the Genome in a Bottle Consortium integrated 19 sequence-resolved variant calling methods from diverse technologies. The final benchmark set contains 12,745 isolated, sequence-resolved insertion (7,281) and deletion (5,464) calls ≥50 base pairs (bp). The Tier 1 benchmark regions, for which any extra calls are putative false positives, cover 2.51 Gbp and 5,262 insertions and 4,095 deletions supported by ≥1 diploid assembly. We demonstrate that the benchmark set reliably identifies false negatives and false positives in high-quality SV callsets from short-, linked- and long-read sequencing and optical mapping.


Subject(s)
Germ-Line Mutation/genetics , INDEL Mutation/genetics , Diploidy , Genomic Structural Variation , Humans , Molecular Sequence Annotation , Sequence Analysis, DNA
17.
NAR Genom Bioinform ; 2(4): lqaa089, 2020 Dec.
Article in English | MEDLINE | ID: mdl-33575633

ABSTRACT

The transfer and integration of whole and partial mitochondrial genomes into the nuclear genomes of eukaryotes is an ongoing process that has facilitated the transfer of genes and contributed to the evolution of various cellular pathways. Many previous studies have explored the impact of these insertions, referred to as NumtS, but have focused primarily on older events that have become fixed and are therefore present in all individual genomes for a given species. We previously developed an approach to identify novel Numt polymorphisms from next-generation sequence data and applied it to thousands of human genomes. Here, we extend this analysis to 79 individuals of other great ape species including chimpanzee, bonobo, gorilla, orang-utan and also an old world monkey, macaque. We show that recent Numt insertions are prevalent in each species though at different apparent rates, with chimpanzees exhibiting a significant increase in both polymorphic and fixed Numt sequences as compared to other great apes. We further assessed positional effects in each species in terms of evolutionary time and rate of insertion and identified putative hotspots on chromosome 5 for Numt integration, providing insight into both recent polymorphic and older fixed reference NumtS in great apes in comparison to human events.

18.
Nucleic Acids Res ; 48(3): 1146-1163, 2020 02 20.
Article in English | MEDLINE | ID: mdl-31853540

ABSTRACT

Long Interspersed Element-1 (LINE-1) retrotransposition contributes to inter- and intra-individual genetic variation and occasionally can lead to human genetic disorders. Various strategies have been developed to identify human-specific LINE-1 (L1Hs) insertions from short-read whole genome sequencing (WGS) data; however, they have limitations in detecting insertions in complex repetitive genomic regions. Here, we developed a computational tool (PALMER) and used it to identify 203 non-reference L1Hs insertions in the NA12878 benchmark genome. Using PacBio long-read sequencing data, we identified L1Hs insertions that were absent in previous short-read studies (90/203). Approximately 81% (73/90) of the L1Hs insertions reside within endogenous LINE-1 sequences in the reference assembly and the analysis of unique breakpoint junction sequences revealed 63% (57/90) of these L1Hs insertions could be genotyped in 1000 Genomes Project sequences. Moreover, we observed that amplification biases encountered in single-cell WGS experiments led to a wide variation in L1Hs insertion detection rates between four individual NA12878 cells; under-amplification limited detection to 32% (65/203) of insertions, whereas over-amplification increased false positive calls. In sum, these data indicate that L1Hs insertions are often missed using standard short-read sequencing approaches and long-read sequencing approaches can significantly improve the detection of L1Hs insertions present in individual genomes.


Subject(s)
Long Interspersed Nucleotide Elements , Sequence Analysis, DNA/methods , Cell Line , Genome, Human , Humans , Polymorphism, Genetic , Single-Cell Analysis , Software , Whole Genome Sequencing
19.
Nat Rev Genet ; 21(3): 171-189, 2020 03.
Article in English | MEDLINE | ID: mdl-31729472

ABSTRACT

Identifying structural variation (SV) is essential for genome interpretation but has been historically difficult due to limitations inherent to available genome technologies. Detection methods that use ensemble algorithms and emerging sequencing technologies have enabled the discovery of thousands of SVs, uncovering information about their ubiquity, relationship to disease and possible effects on biological mechanisms. Given the variability in SV type and size, along with unique detection biases of emerging genomic platforms, multiplatform discovery is necessary to resolve the full spectrum of variation. Here, we review modern approaches for investigating SVs and proffer that, moving forwards, studies integrating biological information with detection will be necessary to comprehensively understand the impact of SV in the human genome.


Subject(s)
Genomic Structural Variation , Sequence Analysis/methods , Algorithms , Genome, Human , Humans
20.
Gigascience ; 8(12)2019 12 01.
Article in English | MEDLINE | ID: mdl-31886876

ABSTRACT

BACKGROUND: Multiple myeloma (MM) is a hematological cancer caused by abnormal accumulation of monoclonal plasma cells in bone marrow. With the increase in treatment options, risk-adapted therapy is becoming more and more important. Survival analysis is commonly applied to study progression or other events of interest and stratify the risk of patients. RESULTS: In this study, we present the current state-of-the-art model for MM prognosis and the molecular biomarker set for stratification: the winning algorithm in the 2017 Multiple Myeloma DREAM Challenge, Sub-Challenge 3. Specifically, we built a non-parametric complete hazard ranking model to map the right-censored data into a linear space, where commonplace machine learning techniques, such as Gaussian process regression and random forests, can play their roles. Our model integrated both the gene expression profile and clinical features to predict the progression of MM. Compared with conventional models, such as Cox model and random survival forests, our model achieved higher accuracy in 3 within-cohort predictions. In addition, it showed robust predictive power in cross-cohort validations. Key molecular signatures related to MM progression were identified from our model, which may function as the core determinants of MM progression and provide important guidance for future research and clinical practice. Functional enrichment analysis and mammalian gene-gene interaction network revealed crucial biological processes and pathways involved in MM progression. The model is dockerized and publicly available at https://www.synapse.org/#!Synapse:syn11459638. Both data and reproducible code are included in the docker. CONCLUSIONS: We present the current state-of-the-art prognostic model for MM integrating gene expression and clinical features validated in an independent test set.


Subject(s)
Gene Expression Profiling/methods , Gene Regulatory Networks , Multiple Myeloma/genetics , Multiple Myeloma/mortality , Aged , Algorithms , Cohort Studies , Disease Progression , Female , Gene Expression Regulation, Neoplastic , Humans , Machine Learning , Male , Middle Aged , Models, Statistical , Prognosis , Survival Analysis
SELECTION OF CITATIONS
SEARCH DETAIL
...