Pesquisa | Portal Regional da BVS

1.

ICR142 Benchmarker: evaluating, optimising and benchmarking variant calling performance using the ICR142 NGS validation series.

Ruark, Elise; Holt, Esty; Renwick, Anthony; Münz, Márton; Wakeling, Matthew; Ellard, Sian; Mahamdallie, Shazia; Yost, Shawn; Rahman, Nazneen.

Wellcome Open Res ; 3: 108, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-30483600

RESUMO

Evaluating, optimising and benchmarking of next generation sequencing (NGS) variant calling performance are essential requirements for clinical, commercial and academic NGS pipelines. Such assessments should be performed in a consistent, transparent and reproducible fashion, using independently, orthogonally generated data. Here we present ICR142 Benchmarker, a tool to generate outputs for assessing germline base substitution and indel calling performance using the ICR142 NGS validation series, a dataset of Illumina platform-based exome sequence data from 142 samples together with Sanger sequence data at 704 sites. ICR142 Benchmarker provides summary and detailed information on the sensitivity, specificity and false detection rates of variant callers. ICR142 Benchmarker also automatically generates a single page report highlighting key performance metrics and how performance compares to widely-used open-source tools. We used ICR142 Benchmarker with VCF files outputted by GATK, OpEx and DeepVariant to create a benchmark for variant calling performance. This evaluation revealed pipeline-specific differences and shared challenges in variant calling, for example in detecting indels in short repeating sequence motifs. We next used ICR142 Benchmarker to perform regression testing with DeepVariant versions 0.5.2 and 0.6.1. This showed that v0.6.1 improves variant calling performance, but there was evidence of minor changes in indel calling behaviour that may benefit from attention. The data also allowed us to evaluate filters to optimise DeepVariant calling, and we recommend using 30 as the QUAL threshold for base substitution calls when using DeepVariant v0.6.1. Finally, we used ICR142 Benchmarker with VCF files from two commercial variant calling providers to facilitate optimisation of their in-house pipelines and to provide transparent benchmarking of their performance. ICR142 Benchmarker consistently and transparently analyses variant calling performance based on the ICR142 NGS validation series, using the standard VCF input and outputting informative metrics to enable user understanding of pipeline performance. ICR142 Benchmarker is freely available at https://github.com/RahmanTeamDevelopment/ICR142_Benchmarker/releases.

2.

The Quality Sequencing Minimum (QSM): providing comprehensive, consistent, transparent next generation sequencing data quality assurance.

Mahamdallie, Shazia; Ruark, Elise; Yost, Shawn; Münz, Márton; Renwick, Anthony; Poyastro-Pearson, Emma; Strydom, Ann; Seal, Sheila; Rahman, Nazneen.

Wellcome Open Res ; 3: 37, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-29992192

RESUMO

Next generation sequencing (NGS) is routinely used in clinical genetic testing. Quality management of NGS testing is essential to ensure performance is consistently and rigorously evaluated. Three primary metrics are used in NGS quality evaluation: depth of coverage, base quality and mapping quality. To provide consistency and transparency in the utilisation of these metrics we present the Quality Sequencing Minimum (QSM). The QSM defines the minimum quality requirement a laboratory has selected for depth of coverage (C), base quality (B) and mapping quality (M) and can be applied per base, exon, gene or other genomic region, as appropriate. The QSM format is CX_BY(P Y)_MZ(P Z). X is the parameter threshold for C, Y the parameter threshold for B, P Y the percentage of reads that must reach Y, Z the parameter threshold for M, P Z the percentage of reads that must reach Z. The data underlying the QSM is in the BAM file, so a QSM can be easily and automatically calculated in any NGS pipeline. We used the QSM to optimise cancer predisposition gene testing using the TruSight Cancer Panel (TSCP). We set the QSM as C50_B10(85)_M20(95). Test regions falling below the QSM were automatically flagged for review, with 100/1471 test regions QSM-flagged in multiple individuals. Supplementing these regions with 132 additional probes improved performance in 85/100. We also used the QSM to optimise testing of genes with pseudogenes such as PTEN and PMS2. In TSCP data from 960 individuals the median number of regions that passed QSM per sample was 1429 (97%). Importantly, the QSM can be used at an individual report level to provide succinct, comprehensive quality assurance information about individual test performance. We believe many laboratories would find the QSM useful. Furthermore, widespread adoption of the QSM would facilitate consistent, transparent reporting of genetic test performance by different laboratories.

3.

CoverView: a sequence quality evaluation tool for next generation sequencing data.

Münz, Márton; Mahamdallie, Shazia; Yost, Shawn; Rimmer, Andrew; Poyastro-Pearson, Emma; Strydom, Ann; Seal, Sheila; Ruark, Elise; Rahman, Nazneen.

Wellcome Open Res ; 3: 36, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-29881786

RESUMO

Quality assurance and quality control are essential for robust next generation sequencing (NGS). Here we present CoverView, a fast, flexible, user-friendly quality evaluation tool for NGS data. CoverView processes mapped sequencing reads and user-specified regions to report depth of coverage, base and mapping quality metrics with increasing levels of detail from a chromosome-level summary to per-base profiles. CoverView can flag regions that do not fulfil user-specified quality requirements, allowing suboptimal data to be systematically and automatically presented for review. It also provides an interactive graphical user interface (GUI) that can be opened in a web browser and allows intuitive exploration of results. We have integrated CoverView into our accredited clinical cancer predisposition gene testing laboratory that uses the TruSight Cancer Panel (TSCP). CoverView has been invaluable for optimisation and quality control of our testing pipeline, providing transparent, consistent quality metric information and automatic flagging of regions that fall below quality thresholds. We demonstrate this utility with TSCP data from the Genome in a Bottle reference sample, which CoverView analysed in 13 seconds. CoverView uses data routinely generated by NGS pipelines, reads standard input formats, and rapidly creates easy-to-parse output text (.txt) files that are customised by a simple configuration file. CoverView can therefore be easily integrated into any NGS pipeline. CoverView and detailed documentation for its use are freely available at github.com/RahmanTeamDevelopment/CoverView/releases and www.icr.ac.uk/CoverView.

4.

OpEx - a validated, automated pipeline optimised for clinical exome sequence analysis.

Ruark, Elise; Münz, Márton; Clarke, Matthew; Renwick, Anthony; Ramsay, Emma; Elliott, Anna; Seal, Sheila; Lunter, Gerton; Rahman, Nazneen.

Sci Rep ; 6: 31029, 2016 08 03.

Artigo em Inglês | MEDLINE | ID: mdl-27485037

RESUMO

We present an easy-to-use, open-source Optimised Exome analysis tool, OpEx (http://icr.ac.uk/opex) that accurately detects small-scale variation, including indels, to clinical standards. We evaluated OpEx performance with an experimentally validated dataset (the ICR142 NGS validation series), a large 1000 exome dataset (the ICR1000 UK exome series), and a clinical proband-parent trio dataset. The performance of OpEx for high-quality base substitutions and short indels in both small and large datasets is excellent, with overall sensitivity of 95%, specificity of 97% and low false detection rate (FDR) of 3%. Depending on the individual performance requirements the OpEx output allows one to optimise the inevitable trade-offs between sensitivity and specificity. For example, in the clinical setting one could permit a higher FDR and lower specificity to maximise sensitivity. In contexts where experimental validation is not possible, minimising the FDR and improving specificity may be a preferable trade-off for slightly lower sensitivity. OpEx is simple to install and use; the whole pipeline is run from a single command. OpEx is therefore well suited to the increasing research and clinical laboratories undertaking exome sequencing, particularly those without in-house dedicated bioinformatics expertise.

5.

In-Depth Assessment of Within-Individual and Inter-Individual Variation in the B Cell Receptor Repertoire.

Galson, Jacob D; Trück, Johannes; Fowler, Anna; Münz, Márton; Cerundolo, Vincenzo; Pollard, Andrew J; Lunter, Gerton; Kelly, Dominic F.

Front Immunol ; 6: 531, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26528292

RESUMO

High-throughput sequencing of the B cell receptor (BCR) repertoire can provide rapid characterization of the B cell response in a wide variety of applications in health, after vaccination and in infectious, inflammatory and immune-driven disease, and is starting to yield clinical applications. However, the interpretation of repertoire data is compromised by a lack of studies to assess the intra and inter-individual variation in the BCR repertoire over time in healthy individuals. We applied a standardized isotype-specific BCR repertoire deep sequencing protocol to a single highly sampled participant, and then evaluated the method in 9 further participants to comprehensively describe such variation. We assessed total repertoire metrics of mutation, diversity, VJ gene usage and isotype subclass usage as well as tracking specific BCR sequence clusters. There was good assay reproducibility (both in PCR amplification and biological replicates), but we detected striking fluctuations in the repertoire over time that we hypothesize may be due to subclinical immune activation. Repertoire properties were unique for each individual, which could partly be explained by a decrease in IgG2 with age, and genetic differences at the immunoglobulin locus. There was a small repertoire of public clusters (0.5, 0.3, and 1.4% of total IgA, IgG, and IgM clusters, respectively), which was enriched for expanded clusters containing sequences with suspected specificity toward antigens that should have been historically encountered by all participants through prior immunization or infection. We thus provide baseline BCR repertoire information that can be used to inform future study design, and aid in interpretation of results from these studies. Furthermore, our results indicate that BCR repertoire studies could be used to track changes in the public repertoire in and between populations that might relate to population immunity against infectious diseases, and identify the characteristics of inflammatory and immunological diseases.

6.

CSN and CAVA: variant annotation tools for rapid, robust next-generation sequencing analysis in the clinical setting.

Münz, Márton; Ruark, Elise; Renwick, Anthony; Ramsay, Emma; Clarke, Matthew; Mahamdallie, Shazia; Cloke, Victoria; Seal, Sheila; Strydom, Ann; Lunter, Gerton; Rahman, Nazneen.

Genome Med ; 7: 76, 2015 Jul 28.

Artigo em Inglês | MEDLINE | ID: mdl-26315209

RESUMO

BACKGROUND: Next-generation sequencing (NGS) offers unprecedented opportunities to expand clinical genomics. It also presents challenges with respect to integration with data from other sequencing methods and historical data. Provision of consistent, clinically applicable variant annotation of NGS data has proved difficult, particularly of indels, an important variant class in clinical genomics. Annotation in relation to a reference genome sequence, the DNA strand of coding transcripts and potential alternative variant representations has not been well addressed. Here we present tools that address these challenges to provide rapid, standardized, clinically appropriate annotation of NGS data in line with existing clinical standards. METHODS: We developed a clinical sequencing nomenclature (CSN), a fixed variant annotation consistent with the principles of the Human Genome Variation Society (HGVS) guidelines, optimized for automated variant annotation of NGS data. To deliver high-throughput CSN annotation we created CAVA (Clinical Annotation of VAriants), a fast, lightweight tool designed for easy incorporation into NGS pipelines. CAVA allows transcript specification, appropriately accommodates the strand of a gene transcript and flags variants with alternative annotations to facilitate clinical interpretation and comparison with other datasets. We evaluated CAVA in exome data and a clinical BRCA1/BRCA2 gene testing pipeline. RESULTS: CAVA generated CSN calls for 10,313,034 variants in the ExAC database in 13.44 hours, and annotated the ICR1000 exome series in 6.5 hours. Evaluation of 731 different indels from a single individual revealed 92 % had alternative representations in left aligned and right aligned data. Annotation of left aligned data, as performed by many annotation tools, would thus give clinically discrepant annotation for the 339 (46 %) indels in genes transcribed from the forward DNA strand. By contrast, CAVA provides the correct clinical annotation for all indels. CAVA also flagged the 370 indels with alternative representations of a different functional class, which may profoundly influence clinical interpretation. CAVA annotation of 50 BRCA1/BRCA2 gene mutations from a clinical pipeline gave 100 % concordance with Sanger data; only 8/25 BRCA2 mutations were correctly clinically annotated by other tools. CONCLUSIONS: CAVA is a freely available tool that provides rapid, robust, high-throughput clinical annotation of NGS data, using a standardized clinical sequencing nomenclature.

Assuntos

Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Anotação de Sequência Molecular , Análise de Sequência de DNA , Proteína BRCA1/genética , Proteína BRCA2/genética , Exoma , Humanos , Mutação

7.

BCR repertoire sequencing: different patterns of B-cell activation after two Meningococcal vaccines.

Galson, Jacob D; Clutterbuck, Elizabeth A; Trück, Johannes; Ramasamy, Maheshi N; Münz, Márton; Fowler, Anna; Cerundolo, Vincenzo; Pollard, Andrew J; Lunter, Gerton; Kelly, Dominic F.

Immunol Cell Biol ; 93(10): 885-95, 2015 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-25976772

RESUMO

Next-generation sequencing was used to investigate the B-cell receptor heavy chain transcript repertoire of different B-cell subsets (naive, marginal zone (MZ), immunoglobulin M (IgM) memory and IgG memory) at baseline, and of plasma cells (PCs) 7 days following administration of serogroup ACWY meningococcal polysaccharide and protein-polysaccharide conjugate vaccines. Baseline B-cell subsets could be distinguished from each other using a small number of repertoire properties (clonality, mutation from germline and complementarity-determining region 3 (CDR3) length) that were conserved between individuals. However, analyzing the CDR3 amino-acid sequence (which is particularly important for antigen binding) of the baseline subsets showed few sequences shared between individuals. In contrast, day 7 PCs demonstrated nearly 10-fold greater sequence sharing between individuals than the baseline subsets, consistent with the PCs being induced by the vaccine antigen and sharing specificity for a more limited range of epitopes. By annotating PC sequences based on IgG subclass usage and mutation, and also comparing them with the sequences of the baseline cell subsets, we were able to identify different signatures after the polysaccharide and conjugate vaccines. PCs produced after conjugate vaccination were predominantly IgG1, and most related to IgG memory cells. In contrast, after polysaccharide vaccination, the PCs were predominantly IgG2, less mutated and were equally likely to be related to MZ, IgM memory or IgG memory cells. High-throughput B-cell repertoire sequencing thus provides a unique insight into patterns of B-cell activation not possible from more conventional measures of immunogenicity.

Assuntos

Subpopulações de Linfócitos B/imunologia , Linfócitos B/imunologia , Regiões Determinantes de Complementaridade/genética , Vacinas Meningocócicas/imunologia , Receptores de Antígenos de Linfócitos B/genética , Epitopos , Epitopos de Linfócito B/metabolismo , Variação Genética/genética , Antígenos HLA-DR/metabolismo , Sequenciamento de Nucleotídeos em Larga Escala , Imunoglobulina G/metabolismo , Memória Imunológica/genética , Ativação Linfocitária/genética , Vacinas Meningocócicas/administração & dosagem , Análise de Componente Principal , Transcriptoma

8.

The ICR1000 UK exome series: a resource of gene variation in an outbred population.

Ruark, Elise; Münz, Márton; Renwick, Anthony; Clarke, Matthew; Ramsay, Emma; Hanks, Sandra; Mahamdallie, Shazia; Elliott, Anna; Seal, Sheila; Strydom, Ann; Gerton, Lunter; Rahman, Nazneen.

F1000Res ; 4: 883, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26834991

RESUMO

To enhance knowledge of gene variation in outbred populations, and to provide a dataset with utility in research and clinical genomics, we performed exome sequencing of 1,000 UK individuals from the general population and applied a high-quality analysis pipeline that includes high sensitivity and specificity for indel detection. Each UK individual has, on average, 21,978 gene variants including 160 rare (0.1%) variants not present in any other individual in the series. These data provide a baseline expectation for gene variation in an outbred population. Summary data of all 295,391 variants we detected are included here and the individual exome sequences are available from the European Genome-phenome Archive as the ICR1000 UK exome series. Furthermore, samples and other phenotype and experimental data for these individuals are obtainable through application to the 1958 Birth Cohort committee.

9.

Analysis of B Cell Repertoire Dynamics Following Hepatitis B Vaccination in Humans, and Enrichment of Vaccine-specific Antibody Sequences.

Galson, Jacob D; Trück, Johannes; Fowler, Anna; Clutterbuck, Elizabeth A; Münz, Márton; Cerundolo, Vincenzo; Reinhard, Claudia; van der Most, Robbert; Pollard, Andrew J; Lunter, Gerton; Kelly, Dominic F.

EBioMedicine ; 2(12): 2070-9, 2015 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-26844287

RESUMO

Generating a diverse B cell immunoglobulin repertoire is essential for protection against infection. The repertoire in humans can now be comprehensively measured by high-throughput sequencing. Using hepatitis B vaccination as a model, we determined how the total immunoglobulin sequence repertoire changes following antigen exposure in humans, and compared this to sequences from vaccine-specific sorted cells. Clonal sequence expansions were seen 7 days after vaccination, which correlated with vaccine-specific plasma cell numbers. These expansions caused an increase in mutation, and a decrease in diversity and complementarity-determining region 3 sequence length in the repertoire. We also saw an increase in sequence convergence between participants 14 and 21 days after vaccination, coinciding with an increase of vaccine-specific memory cells. These features allowed development of a model for in silico enrichment of vaccine-specific sequences from the total repertoire. Identifying antigen-specific sequences from total repertoire data could aid our understanding B cell driven immunity, and be used for disease diagnostics and vaccine evaluation.

Assuntos

Subpopulações de Linfócitos B/imunologia , Anticorpos Anti-Hepatite B/imunologia , Vacinas contra Hepatite B/imunologia , Vírus da Hepatite B/imunologia , Hepatite B/imunologia , Hepatite B/prevenção & controle , Vacinação , Adulto , Especificidade de Anticorpos , Subpopulações de Linfócitos B/metabolismo , Biologia Computacional/métodos , Bases de Dados Genéticas , Anticorpos Anti-Hepatite B/genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Isotipos de Imunoglobulinas/genética , Isotipos de Imunoglobulinas/imunologia , Memória Imunológica , Contagem de Linfócitos , Pessoa de Meia-Idade , Plasmócitos/imunologia , Plasmócitos/metabolismo , Análise de Sequência de DNA , Fatores de Tempo , Adulto Jovem

10.

The role of flexibility and conformational selection in the binding promiscuity of PDZ domains.

Münz, Márton; Hein, Jotun; Biggin, Philip C.

PLoS Comput Biol ; 8(11): e1002749, 2012.

Artigo em Inglês | MEDLINE | ID: mdl-23133356

RESUMO

In molecular recognition, it is often the case that ligand binding is coupled to conformational change in one or both of the binding partners. Two hypotheses describe the limiting cases involved; the first is the induced fit and the second is the conformational selection model. The conformational selection model requires that the protein adopts conformations that are similar to the ligand-bound conformation in the absence of ligand, whilst the induced-fit model predicts that the ligand-bound conformation of the protein is only accessible when the ligand is actually bound. The flexibility of the apo protein clearly plays a major role in these interpretations. For many proteins involved in signaling pathways there is the added complication that they are often promiscuous in that they are capable of binding to different ligand partners. The relationship between protein flexibility and promiscuity is an area of active research and is perhaps best exemplified by the PDZ domain family of proteins. In this study we use molecular dynamics simulations to examine the relationship between flexibility and promiscuity in five PDZ domains: the human Dvl2 (Dishevelled-2) PDZ domain, the human Erbin PDZ domain, the PDZ1 domain of InaD (inactivation no after-potential D protein) from fruit fly, the PDZ7 domain of GRIP1 (glutamate receptor interacting protein 1) from rat and the PDZ2 domain of PTP-BL (protein tyrosine phosphatase) from mouse. We show that despite their high structural similarity, the PDZ binding sites have significantly different dynamics. Importantly, the degree of binding pocket flexibility was found to be closely related to the various characteristics of peptide binding specificity and promiscuity of the five PDZ domains. Our findings suggest that the intrinsic motions of the apo structures play a key role in distinguishing functional properties of different PDZ domains and allow us to make predictions that can be experimentally tested.

Assuntos

Domínios PDZ , Proteínas/química , Proteínas/metabolismo , Sequência de Aminoácidos , Animais , Sítios de Ligação , Análise por Conglomerados , Biologia Computacional , Proteínas de Drosophila , Humanos , Camundongos , Simulação de Dinâmica Molecular , Dados de Sequência Molecular , Ligação Proteica , Ratos , Alinhamento de Sequência , Transdução de Sinais

11.

JGromacs: a Java package for analyzing protein simulations.

Münz, Márton; Biggin, Philip C.

J Chem Inf Model ; 52(1): 255-9, 2012 Jan 23.

Artigo em Inglês | MEDLINE | ID: mdl-22191855

RESUMO

UNLABELLED: In this paper, we introduce JGromacs, a Java API (Application Programming Interface) that facilitates the development of cross-platform data analysis applications for Molecular Dynamics (MD) simulations. The API supports parsing and writing file formats applied by GROMACS (GROningen MAchine for Chemical Simulations), one of the most widely used MD simulation packages. JGromacs builds on the strengths of object-oriented programming in Java by providing a multilevel object-oriented representation of simulation data to integrate and interconvert sequence, structure, and dynamics information. The easy-to-learn, easy-to-use, and easy-to-extend framework is intended to simplify and accelerate the implementation and development of complex data analysis algorithms. Furthermore, a basic analysis toolkit is included in the package. The programmer is also provided with simple tools (e.g., XML-based configuration) to create applications with a user interface resembling the command-line interface of GROMACS applications. AVAILABILITY: JGromacs and detailed documentation is freely available from http://sbcb.bioch.ox.ac.uk/jgromacs under a GPLv3 license .

Assuntos

Linguagens de Programação , Proteínas/química , Software , Interface Usuário-Computador , Algoritmos , Animais , Drosophila melanogaster , Humanos , Simulação de Dinâmica Molecular , Proteínas/análise

12.

Dynamics based alignment of proteins: an alternative approach to quantify dynamic similarity.

Münz, Márton; Lyngsø, Rune; Hein, Jotun; Biggin, Philip C.

BMC Bioinformatics ; 11: 188, 2010 Apr 14.

Artigo em Inglês | MEDLINE | ID: mdl-20398246

RESUMO

BACKGROUND: The dynamic motions of many proteins are central to their function. It therefore follows that the dynamic requirements of a protein are evolutionary constrained. In order to assess and quantify this, one needs to compare the dynamic motions of different proteins. Comparing the dynamics of distinct proteins may also provide insight into how protein motions are modified by variations in sequence and, consequently, by structure. The optimal way of comparing complex molecular motions is, however, far from trivial. The majority of comparative molecular dynamics studies performed to date relied upon prior sequence or structural alignment to define which residues were equivalent in 3-dimensional space. RESULTS: Here we discuss an alternative methodology for comparative molecular dynamics that does not require any prior alignment information. We show it is possible to align proteins based solely on their dynamics and that we can use these dynamics-based alignments to quantify the dynamic similarity of proteins. Our method was tested on 10 representative members of the PDZ domain family. CONCLUSIONS: As a result of creating pair-wise dynamics-based alignments of PDZ domains, we have found evolutionarily conserved patterns in their backbone dynamics. The dynamic similarity of PDZ domains is highly correlated with their structural similarity as calculated with Dali. However, significant differences in their dynamics can be detected indicating that sequence has a more refined role to play in protein dynamics than just dictating the overall fold. We suggest that the method should be generally applicable.

Assuntos

Proteínas/química , Alinhamento de Sequência/métodos , Bases de Dados de Proteínas , Modelos Moleculares , Simulação de Dinâmica Molecular , Estrutura Terciária de Proteína , Análise de Sequência de Proteína/métodos

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA