Pesquisa | Portal Regional da BVS

Assessing reproducibility of inherited variants detected with short-read whole genome sequencing.

Pan, Bohu; Ren, Luyao; Onuchic, Vitor; Guan, Meijian; Kusko, Rebecca; Bruinsma, Steve; Trigg, Len; Scherer, Andreas; Ning, Baitang; Zhang, Chaoyang; Glidewell-Kenney, Christine; Xiao, Chunlin; Donaldson, Eric; Sedlazeck, Fritz J; Schroth, Gary; Yavas, Gokhan; Grunenwald, Haiying; Chen, Haodong; Meinholz, Heather; Meehan, Joe; Wang, Jing; Yang, Jingcheng; Foox, Jonathan; Shang, Jun; Miclaus, Kelci; Dong, Lianhua; Shi, Leming; Mohiyuddin, Marghoob; Pirooznia, Mehdi; Gong, Ping; Golshani, Rooz; Wolfinger, Russ; Lababidi, Samir; Sahraeian, Sayed Mohammad Ebrahim; Sherry, Steve; Han, Tao; Chen, Tao; Shi, Tieliu; Hou, Wanwan; Ge, Weigong; Zou, Wen; Guo, Wenjing; Bao, Wenjun; Xiao, Wenzhong; Fan, Xiaohui; Gondo, Yoichi; Yu, Ying; Zhao, Yongmei; Su, Zhenqiang; Liu, Zhichao.

Genome Biol ; 23(1): 2, 2022 01 03.

Artigo em Inglês | MEDLINE | ID: mdl-34980216

RESUMO

BACKGROUND: Reproducible detection of inherited variants with whole genome sequencing (WGS) is vital for the implementation of precision medicine and is a complicated process in which each step affects variant call quality. Systematically assessing reproducibility of inherited variants with WGS and impact of each step in the process is needed for understanding and improving quality of inherited variants from WGS. RESULTS: To dissect the impact of factors involved in detection of inherited variants with WGS, we sequence triplicates of eight DNA samples representing two populations on three short-read sequencing platforms using three library kits in six labs and call variants with 56 combinations of aligners and callers. We find that bioinformatics pipelines (callers and aligners) have a larger impact on variant reproducibility than WGS platform or library preparation. Single-nucleotide variants (SNVs), particularly outside difficult-to-map regions, are more reproducible than small insertions and deletions (indels), which are least reproducible when > 5 bp. Increasing sequencing coverage improves indel reproducibility but has limited impact on SNVs above 30×. CONCLUSIONS: Our findings highlight sources of variability in variant detection and the need for improvement of bioinformatics pipelines in the era of precision medicine with WGS.

Assuntos

Genoma Humano , Polimorfismo de Nucleotídeo Único , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Mutação INDEL , Reprodutibilidade dos Testes , Sequenciamento Completo do Genoma

An open resource for accurately benchmarking small variant and reference calls.

Zook, Justin M; McDaniel, Jennifer; Olson, Nathan D; Wagner, Justin; Parikh, Hemang; Heaton, Haynes; Irvine, Sean A; Trigg, Len; Truty, Rebecca; McLean, Cory Y; De La Vega, Francisco M; Xiao, Chunlin; Sherry, Stephen; Salit, Marc.

Nat Biotechnol ; 37(5): 561-566, 2019 05.

Artigo em Inglês | MEDLINE | ID: mdl-30936564

RESUMO

Benchmark small variant calls are required for developing, optimizing and assessing the performance of sequencing and bioinformatics methods. Here, as part of the Genome in a Bottle (GIAB) Consortium, we apply a reproducible, cloud-based pipeline to integrate multiple short- and linked-read sequencing datasets and provide benchmark calls for human genomes. We generate benchmark calls for one previously analyzed GIAB sample, as well as six genomes from the Personal Genome Project. These new genomes have broad, open consent, making this a 'first of its kind' resource that is available to the community for multiple downstream applications. We produce 17% more benchmark single nucleotide variations, 176% more indels and 12% larger benchmark regions than previously published GIAB benchmarks. We demonstrate that this benchmark reliably identifies errors in existing callsets and highlight challenges in interpreting performance metrics when using benchmarks that are not perfect or comprehensive. Finally, we identify strengths and weaknesses of callsets by stratifying performance according to variant type and genome context.

Assuntos

Benchmarking , Biologia Computacional/tendências , Genoma Humano/genética , Genômica/tendências , Variação Genética/genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Mutação INDEL/genética , Polimorfismo de Nucleotídeo Único , Software/tendências

Best practices for benchmarking germline small-variant calls in human genomes.

Krusche, Peter; Trigg, Len; Boutros, Paul C; Mason, Christopher E; De La Vega, Francisco M; Moore, Benjamin L; Gonzalez-Porta, Mar; Eberle, Michael A; Tezak, Zivana; Lababidi, Samir; Truty, Rebecca; Asimenos, George; Funke, Birgit; Fleharty, Mark; Chapman, Brad A; Salit, Marc; Zook, Justin M.

Nat Biotechnol ; 37(5): 555-560, 2019 05.

Artigo em Inglês | MEDLINE | ID: mdl-30858580

RESUMO

Standardized benchmarking approaches are required to assess the accuracy of variants called from sequence data. Although variant-calling tools and the metrics used to assess their performance continue to improve, important challenges remain. Here, as part of the Global Alliance for Genomics and Health (GA4GH), we present a benchmarking framework for variant calling. We provide guidance on how to match variant calls with different representations, define standard performance metrics, and stratify performance by variant type and genome context. We describe limitations of high-confidence calls and regions that can be used as truth sets (for example, single-nucleotide variant concordance of two methods is 99.7% inside versus 76.5% outside high-confidence regions). Our web-based app enables comparison of variant calls against truth sets to obtain a standardized performance report. Our approach has been piloted in the PrecisionFDA variant-calling challenges to identify the best-in-class variant-calling methods within high-confidence regions. Finally, we recommend a set of best practices for using our tools and evaluating the results.

Assuntos

Benchmarking , Exoma/genética , Genoma Humano/genética , Sequenciamento de Nucleotídeos em Larga Escala , Algoritmos , Genômica/tendências , Células Germinativas , Humanos , Polimorfismo de Nucleotídeo Único/genética , Software

Author Correction: Best practices for benchmarking germline small-variant calls in human genomes.

Nat Biotechnol ; 37(5): 567, 2019 05.

Artigo em Inglês | MEDLINE | ID: mdl-30899106

RESUMO

In the version of this article initially published online, two pairs of headings were switched with each other in Table 4: "Recall (PCR free)" was switched with "Recall (with PCR)," and "Precision (PCR free)" was switched with "Precision (with PCR)." The error has been corrected in the print, PDF and HTML versions of this article.

Joint variant and de novo mutation identification on pedigrees from high-throughput sequencing data.

Cleary, John G; Braithwaite, Ross; Gaastra, Kurt; Hilbush, Brian S; Inglis, Stuart; Irvine, Sean A; Jackson, Alan; Littin, Richard; Nohzadeh-Malakshah, Sahar; Rathod, Mehul; Ware, David; Trigg, Len; De La Vega, Francisco M.

J Comput Biol ; 21(6): 405-19, 2014 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-24874280

RESUMO

The analysis of whole-genome or exome sequencing data from trios and pedigrees has been successfully applied to the identification of disease-causing mutations. However, most methods used to identify and genotype genetic variants from next-generation sequencing data ignore the relationships between samples, resulting in significant Mendelian errors, false positives and negatives. Here we present a Bayesian network framework that jointly analyzes data from all members of a pedigree simultaneously using Mendelian segregation priors, yet providing the ability to detect de novo mutations in offspring, and is scalable to large pedigrees. We evaluated our method by simulations and analysis of whole-genome sequencing (WGS) data from a 17-individual, 3-generation CEPH pedigree sequenced to 50× average depth. Compared with singleton calling, our family caller produced more high-quality variants and eliminated spurious calls as judged by common quality metrics such as Ti/Tv, Het/Hom ratios, and dbSNP/SNP array data concordance, and by comparing to ground truth variant sets available for this sample. We identify all previously validated de novo mutations in NA12878, concurrent with a 7× precision improvement. Our results show that our method is scalable to large genomics and human disease studies.

Assuntos

Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Mutação , Linhagem , Análise Mutacional de DNA/métodos , Humanos

Data mining in bioinformatics using Weka.

Frank, Eibe; Hall, Mark; Trigg, Len; Holmes, Geoffrey; Witten, Ian H.

Bioinformatics ; 20(15): 2479-81, 2004 Oct 12.

Artigo em Inglês | MEDLINE | ID: mdl-15073010

RESUMO

UNLABELLED: The Weka machine learning workbench provides a general-purpose environment for automatic classification, regression, clustering and feature selection-common data mining problems in bioinformatics research. It contains an extensive collection of machine learning algorithms and data pre-processing methods complemented by graphical user interfaces for data exploration and the experimental comparison of different machine learning techniques on the same problem. Weka can process data given in the form of a single relational table. Its main objectives are to (a) assist users in extracting useful information from data and (b) enable them to easily identify a suitable algorithm for generating an accurate predictive model from it. AVAILABILITY: http://www.cs.waikato.ac.nz/ml/weka.

Assuntos

Algoritmos , Inteligência Artificial , Biologia Computacional/métodos , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Factuais , Armazenamento e Recuperação da Informação/métodos , Interface Usuário-Computador , Processamento de Linguagem Natural , Software

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA