Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 12 de 12
Filter
1.
Genome Med ; 15(1): 94, 2023 11 09.
Article in English | MEDLINE | ID: mdl-37946251

ABSTRACT

BACKGROUND: Whole genome sequencing is increasingly being used for the diagnosis of patients with rare diseases. However, the diagnostic yields of many studies, particularly those conducted in a healthcare setting, are often disappointingly low, at 25-30%. This is in part because although entire genomes are sequenced, analysis is often confined to in silico gene panels or coding regions of the genome. METHODS: We undertook WGS on a cohort of 122 unrelated rare disease patients and their relatives (300 genomes) who had been pre-screened by gene panels or arrays. Patients were recruited from a broad spectrum of clinical specialties. We applied a bioinformatics pipeline that would allow comprehensive analysis of all variant types. We combined established bioinformatics tools for phenotypic and genomic analysis with our novel algorithms (SVRare, ALTSPLICE and GREEN-DB) to detect and annotate structural, splice site and non-coding variants. RESULTS: Our diagnostic yield was 43/122 cases (35%), although 47/122 cases (39%) were considered solved when considering novel candidate genes with supporting functional data into account. Structural, splice site and deep intronic variants contributed to 20/47 (43%) of our solved cases. Five genes that are novel, or were novel at the time of discovery, were identified, whilst a further three genes are putative novel disease genes with evidence of causality. We identified variants of uncertain significance in a further fourteen candidate genes. The phenotypic spectrum associated with RMND1 was expanded to include polymicrogyria. Two patients with secondary findings in FBN1 and KCNQ1 were confirmed to have previously unidentified Marfan and long QT syndromes, respectively, and were referred for further clinical interventions. Clinical diagnoses were changed in six patients and treatment adjustments made for eight individuals, which for five patients was considered life-saving. CONCLUSIONS: Genome sequencing is increasingly being considered as a first-line genetic test in routine clinical settings and can make a substantial contribution to rapidly identifying a causal aetiology for many patients, shortening their diagnostic odyssey. We have demonstrated that structural, splice site and intronic variants make a significant contribution to diagnostic yield and that comprehensive analysis of the entire genome is essential to maximise the value of clinical genome sequencing.


Subject(s)
Genetic Variation , Rare Diseases , Humans , Rare Diseases/diagnosis , Rare Diseases/genetics , Whole Genome Sequencing , Genetic Testing , Mutation , Cell Cycle Proteins
2.
Nat Genet ; 54(11): 1675-1689, 2022 11.
Article in English | MEDLINE | ID: mdl-36333502

ABSTRACT

The value of genome-wide over targeted driver analyses for predicting clinical outcomes of cancer patients is debated. Here, we report the whole-genome sequencing of 485 chronic lymphocytic leukemia patients enrolled in clinical trials as part of the United Kingdom's 100,000 Genomes Project. We identify an extended catalog of recurrent coding and noncoding genetic mutations that represents a source for future studies and provide the most complete high-resolution map of structural variants, copy number changes and global genome features including telomere length, mutational signatures and genomic complexity. We demonstrate the relationship of these features with clinical outcome and show that integration of 186 distinct recurrent genomic alterations defines five genomic subgroups that associate with response to therapy, refining conventional outcome prediction. While requiring independent validation, our findings highlight the potential of whole-genome sequencing to inform future risk stratification in chronic lymphocytic leukemia.


Subject(s)
Leukemia, Lymphocytic, Chronic, B-Cell , Humans , Leukemia, Lymphocytic, Chronic, B-Cell/genetics , Whole Genome Sequencing , Mutation , Genomics , Prognosis
3.
Bioinformatics ; 37(2): 147-154, 2021 04 19.
Article in English | MEDLINE | ID: mdl-32722772

ABSTRACT

MOTIVATION: Tumours are composed of distinct cancer cell populations (clones), which continuously adapt to their local micro-environment. Standard methods for clonal deconvolution seek to identify groups of mutations and estimate the prevalence of each group in the tumour, while considering its purity and copy number profile. These methods have been applied on cross-sectional data and on longitudinal data after discarding information on the timing of sample collection. Two key questions are how can we incorporate such information in our analyses and is there any benefit in doing so? RESULTS: We developed a clonal deconvolution method, which incorporates explicitly the temporal spacing of longitudinally sampled tumours. By merging a Dirichlet Process Mixture Model with Gaussian Process priors and using as input a sequence of several sparsely collected samples, our method can reconstruct the temporal profile of the abundance of any mutation cluster supported by the data as a continuous function of time. We benchmarked our method on whole genome, whole exome and targeted sequencing data from patients with chronic lymphocytic leukaemia, on liquid biopsy data from a patient with melanoma and on synthetic data and we found that incorporating information on the timing of tissue collection improves model performance, as long as data of sufficient volume and complexity are available for estimating free model parameters. Thus, our approach is particularly useful when collecting a relatively long sequence of tumour samples is feasible, as in liquid cancers (e.g. leukaemia) and liquid biopsies. AVAILABILITY AND IMPLEMENTATION: The statistical methodology presented in this paper is freely available at github.com/dvav/clonosGP. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
High-Throughput Nucleotide Sequencing , Neoplasms , Cross-Sectional Studies , Exome , Humans , Mutation , Neoplasms/genetics , Software , Tumor Microenvironment , Whole Genome Sequencing
4.
Blood ; 137(20): 2800-2816, 2021 05 20.
Article in English | MEDLINE | ID: mdl-33206936

ABSTRACT

The transformation of chronic lymphocytic leukemia (CLL) to high-grade B-cell lymphoma is known as Richter syndrome (RS), a rare event with dismal prognosis. In this study, we conducted whole-genome sequencing (WGS) of paired circulating CLL (PB-CLL) and RS biopsies (tissue-RS) from 17 patients recruited into a clinical trial (CHOP-O). We found that tissue-RS was enriched for mutations in poor-risk CLL drivers and genes in the DNA damage response (DDR) pathway. In addition, we identified genomic aberrations not previously implicated in RS, including the protein tyrosine phosphatase receptor (PTPRD) and tumor necrosis factor receptor-associated factor 3 (TRAF3). In the noncoding genome, we discovered activation-induced cytidine deaminase-related and unrelated kataegis in tissue-RS affecting regulatory regions of key immune-regulatory genes. These include BTG2, CXCR4, NFATC1, PAX5, NOTCH-1, SLC44A5, FCRL3, SELL, TNIP2, and TRIM13. Furthermore, differences between the global mutation signatures of pairs of PB-CLL and tissue-RS samples implicate DDR as the dominant mechanism driving transformation. Pathway-based clonal deconvolution analysis showed that genes in the MAPK and DDR pathways demonstrate high clonal-expansion probability. Direct comparison of nodal-CLL and tissue-RS pairs from an independent cohort confirmed differential expression of the same pathways by RNA expression profiling. Our integrated analysis of WGS and RNA expression data significantly extends previous targeted approaches, which were limited by the lack of germline samples, and it facilitates the identification of novel genomic correlates implicated in RS transformation, which could be targeted therapeutically. Our results inform the future selection of investigative agents for a UK clinical platform study. This trial was registered at www.clinicaltrials.gov as #NCT03899337.


Subject(s)
Clonal Evolution/genetics , Gene Expression Regulation, Neoplastic/genetics , Leukemia, Lymphocytic, Chronic, B-Cell/pathology , Lymphoma, Large B-Cell, Diffuse/pathology , RNA, Neoplasm/genetics , Transcriptome , Aged , Aged, 80 and over , Antibodies, Monoclonal, Humanized/therapeutic use , Antineoplastic Combined Chemotherapy Protocols/administration & dosage , Antineoplastic Combined Chemotherapy Protocols/therapeutic use , Base Sequence , Clone Cells/pathology , Combined Modality Therapy , Cyclophosphamide/administration & dosage , DNA Repair , Disease Progression , Doxorubicin/administration & dosage , Female , Gene Regulatory Networks , Genes, Neoplasm , Humans , Leukemia, Lymphocytic, Chronic, B-Cell/genetics , Lymphoma, Large B-Cell, Diffuse/drug therapy , Lymphoma, Large B-Cell, Diffuse/genetics , Male , Middle Aged , Mutation , Neoplasm Proteins/genetics , Prednisone/administration & dosage , Prospective Studies , RNA, Neoplasm/biosynthesis , Syndrome , Vincristine/administration & dosage , Whole Genome Sequencing
5.
Methods Mol Biol ; 2082: 123-146, 2020.
Article in English | MEDLINE | ID: mdl-31849012

ABSTRACT

The discovery of genomic polymorphisms influencing gene expression (also known as expression quantitative trait loci or eQTLs) can be formulated as a sparse Bayesian multivariate/multiple regression problem. An important aspect in the development of such models is the implementation of bespoke inference methodologies, a process which can become quite laborious, when multiple candidate models are being considered. We describe automatic, black-box inference in such models using Stan, a popular probabilistic programming language. The utilization of systems like Stan can facilitate model prototyping and testing, thus accelerating the data modeling process. The code described in this chapter can be found at https://github.com/dvav/eQTLBookChapter .


Subject(s)
Bayes Theorem , Chromosome Mapping , Computational Biology/methods , Gene Expression , Quantitative Trait Loci , Software , Algorithms , Gene Expression Profiling/methods , Polymorphism, Single Nucleotide , Programming Languages
7.
Genet Med ; 20(10): 1196-1205, 2018 10.
Article in English | MEDLINE | ID: mdl-29388947

ABSTRACT

PURPOSE: Fresh-frozen (FF) tissue is the optimal source of DNA for whole-genome sequencing (WGS) of cancer patients. However, it is not always available, limiting the widespread application of WGS in clinical practice. We explored the viability of using formalin-fixed, paraffin-embedded (FFPE) tissues, available routinely for cancer patients, as a source of DNA for clinical WGS. METHODS: We conducted a prospective study using DNAs from matched FF, FFPE, and peripheral blood germ-line specimens collected from 52 cancer patients (156 samples) following routine diagnostic protocols. We compared somatic variants detected in FFPE and matching FF samples. RESULTS: We found the single-nucleotide variant agreement reached 71% across the genome and somatic copy-number alterations (CNAs) detection from FFPE samples was suboptimal (0.44 median correlation with FF) due to nonuniform coverage. CNA detection was improved significantly with lower reverse crosslinking temperature in FFPE DNA extraction (80 °C or 65 °C depending on the methods). Our final data showed somatic variant detection from FFPE for clinical decision making is possible. We detected 98% of clinically actionable variants (including 30/31 CNAs). CONCLUSION: We present the first prospective WGS study of cancer patients using FFPE specimens collected in a routine clinical environment proving WGS can be applied in the clinic.


Subject(s)
DNA Copy Number Variations/genetics , Genome, Human/genetics , Neoplasms/genetics , Whole Genome Sequencing/methods , Decision Making , Female , Humans , Male , Neoplasms/blood , Neoplasms/pathology , Paraffin Embedding , Polymorphism, Single Nucleotide/genetics
8.
Bioinformatics ; 33(19): 3058-3064, 2017 Oct 01.
Article in English | MEDLINE | ID: mdl-28575251

ABSTRACT

MOTIVATION: The identification of genetic variants influencing gene expression (known as expression quantitative trait loci or eQTLs) is important in unravelling the genetic basis of complex traits. Detecting multiple eQTLs simultaneously in a population based on paired DNA-seq and RNA-seq assays employs two competing types of models: models which rely on appropriate transformations of RNA-seq data (and are powered by a mature mathematical theory), or count-based models, which represent digital gene expression explicitly, thus rendering such transformations unnecessary. The latter constitutes an immensely popular methodology, which is however plagued by mathematical intractability. RESULTS: We develop tractable count-based models, which are amenable to efficient estimation through the introduction of latent variables and the appropriate application of recent statistical theory in a sparse Bayesian modelling framework. Furthermore, we examine several transformation methods for RNA-seq read counts and we introduce arcsin, logit and Laplace smoothing as preprocessing steps for transformation-based models. Using natural and carefully simulated data from the 1000 Genomes and gEUVADIS projects, we benchmark both approaches under a variety of scenarios, including the presence of noise and violation of basic model assumptions. We demonstrate that an arcsin transformation of Laplace-smoothed data is at least as good as state-of-the-art models, particularly at small samples. Furthermore, we show that an over-dispersed Poisson model is comparable to the celebrated Negative Binomial, but much easier to estimate. These results provide strong support for transformation-based versus count-based (particularly Negative-Binomial-based) models for eQTL mapping. AVAILABILITY AND IMPLEMENTATION: All methods are implemented in the free software eQTLseq: https://github.com/dvav/eQTLseq. CONTACT: dimitris.vavoulis@well.ox.ac.uk. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Gene Expression , High-Throughput Nucleotide Sequencing/methods , Models, Statistical , Quantitative Trait Loci , Sequence Analysis, RNA/methods , Bayes Theorem , Genetic Variation , Software
9.
PLoS Med ; 14(2): e1002230, 2017 02.
Article in English | MEDLINE | ID: mdl-28196074

ABSTRACT

BACKGROUND: Single gene tests to predict whether cancers respond to specific targeted therapies are performed increasingly often. Advances in sequencing technology, collectively referred to as next generation sequencing (NGS), mean the entire cancer genome or parts of it can now be sequenced at speed with increased depth and sensitivity. However, translation of NGS into routine cancer care has been slow. Healthcare stakeholders are unclear about the clinical utility of NGS and are concerned it could be an expensive addition to cancer diagnostics, rather than an affordable alternative to single gene testing. METHODS AND FINDINGS: We validated a 46-gene hotspot cancer panel assay allowing multiple gene testing from small diagnostic biopsies. From 1 January 2013 to 31 December 2013, solid tumour samples (including non-small-cell lung carcinoma [NSCLC], colorectal carcinoma, and melanoma) were sequenced in the context of the UK National Health Service from 351 consecutively submitted prospective cases for which treating clinicians thought the patient had potential to benefit from more extensive genetic analysis. Following histological assessment, tumour-rich regions of formalin-fixed paraffin-embedded (FFPE) sections underwent macrodissection, DNA extraction, NGS, and analysis using a pipeline centred on Torrent Suite software. With a median turnaround time of seven working days, an integrated clinical report was produced indicating the variants detected, including those with potential diagnostic, prognostic, therapeutic, or clinical trial entry implications. Accompanying phenotypic data were collected, and a detailed cost analysis of the panel compared with single gene testing was undertaken to assess affordability for routine patient care. Panel sequencing was successful for 97% (342/351) of tumour samples in the prospective cohort and showed 100% concordance with known mutations (detected using cobas assays). At least one mutation was identified in 87% (296/342) of tumours. A locally actionable mutation (i.e., available targeted treatment or clinical trial) was identified in 122/351 patients (35%). Forty patients received targeted treatment, in 22/40 (55%) cases solely due to use of the panel. Examination of published data on the potential efficacy of targeted therapies showed theoretically actionable mutations (i.e., mutations for which targeted treatment was potentially appropriate) in 66% (71/107) and 39% (41/105) of melanoma and NSCLC patients, respectively. At a cost of £339 (US$449) per patient, the panel was less expensive locally than performing more than two or three single gene tests. Study limitations include the use of FFPE samples, which do not always provide high-quality DNA, and the use of "real world" data: submission of cases for sequencing did not always follow clinical guidelines, meaning that when mutations were detected, patients were not always eligible for targeted treatments on clinical grounds. CONCLUSIONS: This study demonstrates that more extensive tumour sequencing can identify mutations that could improve clinical decision-making in routine cancer care, potentially improving patient outcomes, at an affordable level for healthcare providers.


Subject(s)
Carcinoma, Non-Small-Cell Lung/diagnosis , Colorectal Neoplasms/diagnosis , Genomics , Melanoma/diagnosis , Pathology/methods , Pathology/standards , Adolescent , Adult , Aged , Aged, 80 and over , Carcinoma, Non-Small-Cell Lung/economics , Carcinoma, Non-Small-Cell Lung/genetics , Carcinoma, Non-Small-Cell Lung/therapy , Child , Clinical Decision-Making , Colorectal Neoplasms/economics , Colorectal Neoplasms/genetics , Colorectal Neoplasms/therapy , Female , High-Throughput Nucleotide Sequencing , Humans , Male , Melanoma/economics , Melanoma/genetics , Melanoma/therapy , Middle Aged , National Health Programs , Prospective Studies , Retrospective Studies , United Kingdom , Young Adult
10.
Genome Biol ; 16: 39, 2015 Feb 20.
Article in English | MEDLINE | ID: mdl-25853652

ABSTRACT

We present a statistical methodology, DGEclust, for differential expression analysis of digital expression data. Our method treats differential expression as a form of clustering, thus unifying these two concepts. Furthermore, it simultaneously addresses the problem of how many clusters are supported by the data and uncertainty in parameter estimation. DGEclust successfully identifies differentially expressed genes under a number of different scenarios, maintaining a low error rate and an excellent control of its false discovery rate with reasonable computational requirements. It is formulated to perform particularly well on low-replicated data and be applicable to multi-group data. DGEclust is available at http://dvav.github.io/dgeclust/.


Subject(s)
Gene Expression Profiling/methods , Gene Expression/genetics , Oligonucleotide Array Sequence Analysis/methods , Algorithms , Cluster Analysis , Humans , Software
11.
Nucleic Acids Res ; 43(Database issue): D227-33, 2015 Jan.
Article in English | MEDLINE | ID: mdl-25414345

ABSTRACT

We present updates to the SUPERFAMILY 1.75 (http://supfam.org) online resource and protein sequence collection. The hidden Markov model library that provides sequence homology to SCOP structural domains remains unchanged at version 1.75. In the last 4 years SUPERFAMILY has more than doubled its holding of curated complete proteomes over all cellular life, from 1400 proteomes reported previously in 2010 up to 3258 at present. Outside of the main sequence collection, SUPERFAMILY continues to provide domain annotation for sequences provided by other resources such as: UniProt, Ensembl, PDB, much of JGI Phytozome and selected subcollections of NCBI RefSeq. Despite this growth in data volume, SUPERFAMILY now provides users with an expanded and daily updated phylogenetic tree of life (sTOL). This tree is built with genomic-scale domain annotation data as before, but constantly updated when new species are introduced to the sequence library. Our Gene Ontology and other functional and phenotypic annotations previously reported have stood up to critical assessment by the function prediction community. We have now introduced these data in an integrated manner online at the level of an individual sequence, and--in the case of whole genomes--with enrichment analysis against a taxonomically defined background.


Subject(s)
Databases, Protein , Protein Structure, Tertiary , Gene Ontology , Molecular Sequence Annotation , Phylogeny , Proteins/classification , Proteins/genetics , Proteome/chemistry , Sequence Analysis, Protein
12.
PLoS Comput Biol ; 8(3): e1002401, 2012.
Article in English | MEDLINE | ID: mdl-22396632

ABSTRACT

Traditional approaches to the problem of parameter estimation in biophysical models of neurons and neural networks usually adopt a global search algorithm (for example, an evolutionary algorithm), often in combination with a local search method (such as gradient descent) in order to minimize the value of a cost function, which measures the discrepancy between various features of the available experimental data and model output. In this study, we approach the problem of parameter estimation in conductance-based models of single neurons from a different perspective. By adopting a hidden-dynamical-systems formalism, we expressed parameter estimation as an inference problem in these systems, which can then be tackled using a range of well-established statistical inference methods. The particular method we used was Kitagawa's self-organizing state-space model, which was applied on a number of Hodgkin-Huxley-type models using simulated or actual electrophysiological data. We showed that the algorithm can be used to estimate a large number of parameters, including maximal conductances, reversal potentials, kinetics of ionic currents, measurement and intrinsic noise, based on low-dimensional experimental data and sufficiently informative priors in the form of pre-defined constraints imposed on model parameters. The algorithm remained operational even when very noisy experimental data were used. Importantly, by combining the self-organizing state-space model with an adaptive sampling algorithm akin to the Covariance Matrix Adaptation Evolution Strategy, we achieved a significant reduction in the variance of parameter estimates. The algorithm did not require the explicit formulation of a cost function and it was straightforward to apply on compartmental models and multiple data sets. Overall, the proposed methodology is particularly suitable for resolving high-dimensional inference problems based on noisy electrophysiological data and, therefore, a potentially useful tool in the construction of biophysical neuron models.


Subject(s)
Action Potentials/physiology , Algorithms , Models, Neurological , Models, Statistical , Neurons/physiology , Animals , Computer Simulation , Humans
SELECTION OF CITATIONS
SEARCH DETAIL
...