Pesquisa | Portal Regional da BVS

Denoising DNA Encoded Library Screens with Sparse Learning.

Kómár, Péter; Kalinic, Marko.

ACS Comb Sci ; 22(8): 410-421, 2020 08 10.

Artigo em Inglês | MEDLINE | ID: mdl-32531158

RESUMO

DNA-encoded libraries (DELs) are large, pooled collections of compounds in which every library member is attached to a stretch of DNA encoding its complete synthetic history. DEL-based hit discovery involves affinity selection of the library against a protein of interest, whereby compounds retained by the target are subsequently identified by next-generation sequencing of the corresponding DNA tags. When analyzing the resulting data, one typically assumes that sequencing output (i.e., read counts) is proportional to the binding affinity of a given compound, thus enabling hit prioritization and elucidation of any underlying structure-activity relationships (SAR). This assumption, though, tends to be severely confounded by a number of factors, including variable reaction yields, presence of incomplete products masquerading as their intended counterparts, and sequencing noise. In practice, these confounders are often ignored, potentially contributing to low hit validation rates, and universally leading to loss of valuable information. To address this issue, we have developed a method for comprehensively denoising DEL selection outputs. Our method, dubbed "deldenoiser", is based on sparse learning and leverages inputs that are commonly available within a DEL generation and screening workflow. Using simulated and publicly available DEL affinity selection data, we show that "deldenoiser" is not only able to recover and rank true binders much more robustly than read count-based approaches but also that it yields scores, which accurately capture the underlying SAR. The proposed method can, thus, be of significant utility in hit prioritization following DEL screens.

Assuntos

DNA/química , Biblioteca Gênica , Aprendizado de Máquina

Fast and accurate genomic analyses using genome graphs.

Rakocevic, Goran; Semenyuk, Vladimir; Lee, Wan-Ping; Spencer, James; Browning, John; Johnson, Ivan J; Arsenijevic, Vladan; Nadj, Jelena; Ghose, Kaushik; Suciu, Maria C; Ji, Sun-Gou; Demir, Gülfem; Li, Lizao; Toptas, Berke Ç; Dolgoborodov, Alexey; Pollex, Björn; Spulber, Iosif; Glotova, Irina; Kómár, Péter; Stachyra, Andrew L; Li, Yilong; Popovic, Milos; Källberg, Morten; Jain, Amit; Kural, Deniz.

Nat Genet ; 51(2): 354-362, 2019 02.

Artigo em Inglês | MEDLINE | ID: mdl-30643257

RESUMO

The human reference genome serves as the foundation for genomics by providing a scaffold for alignment of sequencing reads, but currently only reflects a single consensus haplotype, thus impairing analysis accuracy. Here we present a graph reference genome implementation that enables read alignment across 2,800 diploid genomes encompassing 12.6 million SNPs and 4.0 million insertions and deletions (indels). The pipeline processes one whole-genome sequencing sample in 6.5 h using a system with 36 CPU cores. We show that using a graph genome reference improves read mapping sensitivity and produces a 0.5% increase in variant calling recall, with unaffected specificity. Structural variations incorporated into a graph genome can be genotyped accurately under a unified framework. Finally, we show that iterative augmentation of graph genomes yields incremental gains in variant calling accuracy. Our implementation is an important advance toward fulfilling the promise of graph genomes to radically enhance the scalability and accuracy of genomic analyses.

Assuntos

Genoma Humano/genética , Genômica/métodos , Humanos , Polimorfismo de Nucleotídeo Único/genética , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Deleção de Sequência/genética , Sequenciamento Completo do Genoma/métodos

Comparing complex variants in family trios.

Toptas, Berke Ç; Rakocevic, Goran; Kómár, Péter; Kural, Deniz.

Bioinformatics ; 34(24): 4241-4247, 2018 12 15.

Artigo em Inglês | MEDLINE | ID: mdl-29868720

RESUMO

Motivation: Several tools exist to count Mendelian violations in family trios by comparing variants at the same genomic positions. This naive variant comparison, however, fails to assess regions where multiple variants need to be examined together, resulting in reduced accuracy of existing Mendelian violation checking tools. Results: We introduce VBT, a trio concordance analysis tool, which identifies Mendelian violations by approximately solving the 3-way variant matching problem to resolve variant representation differences in family trios. We show that VBT outperforms previous trio comparison methods by accuracy. Availability and implementation: VBT is implemented in C++ and source code is available under GNU GPLv3 license at the following URL: https://github.com/sbg/VBT-TrioAnalysis.git. Supplementary information: Supplementary data are available at Bioinformatics online.

Assuntos

Genoma , Genômica , Software , Genoma/genética , Genômica/métodos

geck: trio-based comparative benchmarking of variant calls.

Kómár, Péter; Kural, Deniz.

Bioinformatics ; 34(20): 3488-3495, 2018 10 15.

Artigo em Inglês | MEDLINE | ID: mdl-29850774

RESUMO

Motivation: Classical methods of comparing the accuracies of variant calling pipelines are based on truth sets of variants whose genotypes are previously determined with high confidence. An alternative way of performing benchmarking is based on Mendelian constraints between related individuals. Statistical analysis of Mendelian violations can provide truth set-independent benchmarking information, and enable benchmarking less-studied variants and diverse populations. Results: We introduce a statistical mixture model for comparing two variant calling pipelines from genotype data they produce after running on individual members of a trio. We determine the accuracy of our model by comparing the precision and recall of GATK Unified Genotyper and Haplotype Caller on the high-confidence SNPs of the NIST Ashkenazim trio and the two independent Platinum Genome trios. We show that our method is able to estimate differential precision and recall between the two pipelines with 10-3 uncertainty. Availability and implementation: The Python library geck, and usage examples are available at the following URL: https://github.com/sbg/geck, under the GNU General Public License v3. Supplementary information: Supplementary data are available at Bioinformatics online.

Assuntos

Sequenciamento de Nucleotídeos em Larga Escala/métodos , Benchmarking , Genoma , Genótipo , Polimorfismo de Nucleotídeo Único

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA