Pesquisa | Portal Regional da BVS

Increasing calling accuracy, coverage, and read-depth in sequence data by the use of haplotype blocks.

Pook, Torsten; Nemri, Adnane; Gonzalez Segovia, Eric Gerardo; Valle Torres, Daniel; Simianer, Henner; Schoen, Chris-Carolin.

PLoS Genet ; 17(12): e1009944, 2021 12.

Artigo em Inglês | MEDLINE | ID: mdl-34941872

RESUMO

High-throughput genotyping of large numbers of lines remains a key challenge in plant genetics, requiring geneticists and breeders to find a balance between data quality and the number of genotyped lines under a variety of different existing genotyping technologies when resources are limited. In this work, we are proposing a new imputation pipeline ("HBimpute") that can be used to generate high-quality genomic data from low read-depth whole-genome-sequence data. The key idea of the pipeline is the use of haplotype blocks from the software HaploBlocker to identify locally similar lines and subsequently use the reads of all locally similar lines in the variant calling for a specific line. The effectiveness of the pipeline is showcased on a dataset of 321 doubled haploid lines of a European maize landrace, which were sequenced at 0.5X read-depth. The overall imputing error rates are cut in half compared to state-of-the-art software like BEAGLE and STITCH, while the average read-depth is increased to 83X, thus enabling the calling of copy number variation. The usefulness of the obtained imputed data panel is further evaluated by comparing the performance of sequence data in common breeding applications to that of genomic data generated with a genotyping array. For both genome-wide association studies and genomic prediction, results are on par or even slightly better than results obtained with high-density array data (600k). In particular for genomic prediction, we observe slightly higher data quality for the sequence data compared to the 600k array in the form of higher prediction accuracies. This occurred specifically when reducing the data panel to the set of overlapping markers between sequence and array, indicating that sequencing data can benefit from the same marker ascertainment as used in the array process to increase the quality and usability of genomic data.

Assuntos

Estudo de Associação Genômica Ampla/normas , Técnicas de Genotipagem , Haplótipos/genética , Software , Variações do Número de Cópias de DNA/genética , Genoma/genética , Genômica/métodos , Genótipo , Polimorfismo de Nucleotídeo Único/genética , Sequenciamento Completo do Genoma , Zea mays/genética

HaploBlocker: Creation of Subgroup-Specific Haplotype Blocks and Libraries.

Pook, Torsten; Schlather, Martin; de Los Campos, Gustavo; Mayer, Manfred; Schoen, Chris Carolin; Simianer, Henner.

Genetics ; 212(4): 1045-1061, 2019 08.

Artigo em Inglês | MEDLINE | ID: mdl-31152070

RESUMO

The concept of haplotype blocks has been shown to be useful in genetics. Fields of application range from the detection of regions under positive selection to statistical methods that make use of dimension reduction. We propose a novel approach ("HaploBlocker") for defining and inferring haplotype blocks that focuses on linkage instead of the commonly used population-wide measures of linkage disequilibrium. We define a haplotype block as a sequence of genetic markers that has a predefined minimum frequency in the population, and only haplotypes with a similar sequence of markers are considered to carry that block, effectively screening a dataset for group-wise identity-by-descent. From these haplotype blocks, we construct a haplotype library that represents a large proportion of genetic variability with a limited number of blocks. Our method is implemented in the associated R-package HaploBlocker, and provides flexibility not only to optimize the structure of the obtained haplotype library for subsequent analyses, but also to handle datasets of different marker density and genetic diversity. By using haplotype blocks instead of single nucleotide polymorphisms (SNPs), local epistatic interactions can be naturally modeled, and the reduced number of parameters enables a wide variety of new methods for further genomic analyses such as genomic prediction and the detection of selection signatures. We illustrate our methodology with a dataset comprising 501 doubled haploid lines in a European maize landrace genotyped at 501,124 SNPs. With the suggested approach, we identified 2991 haplotype blocks with an average length of 2685 SNPs that together represent 94% of the dataset.

Assuntos

Biblioteca Gênica , Haplótipos , Algoritmos , Animais , Biologia Computacional , Conjuntos de Dados como Assunto , Ligação Genética , Marcadores Genéticos , Humanos , Modelos Genéticos , Polimorfismo de Nucleotídeo Único , Zea mays/genética

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA