Pesquisa | Portal Regional da BVS (teste)

Sequre: a high-performance framework for secure multiparty computation enables biomedical data sharing.

Smajlovic, Haris; Shajii, Ariya; Berger, Bonnie; Cho, Hyunghoon; Numanagic, Ibrahim.

Genome Biol ; 24(1): 5, 2023 01 11.

Artigo em Inglês | MEDLINE | ID: mdl-36631897

RESUMO

Secure multiparty computation (MPC) is a cryptographic tool that allows computation on top of sensitive biomedical data without revealing private information to the involved entities. Here, we introduce Sequre, an easy-to-use, high-performance framework for developing performant MPC applications. Sequre offers a set of automatic compile-time optimizations that significantly improve the performance of MPC applications and incorporates the syntax of Python programming language to facilitate rapid application development. We demonstrate its usability and performance on various bioinformatics tasks showing up to 3-4 times increased speed over the existing pipelines with 7-fold reductions in codebase sizes.

Assuntos

Biologia Computacional , Disseminação de Informação

Sequre: a high-performance framework for rapid development of secure bioinformatics pipelines.

Smajlovic, Haris; Shajii, Ariya; Berger, Bonnie; Cho, Hyunghoon; Numanagic, Ibrahim.

IEEE Int Symp Parallel Distrib Process Workshops Phd Forum ; 2022: 164-165, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35958356

A Python-based programming language for high-performance computational genomics.

Shajii, Ariya; Numanagic, Ibrahim; Leighton, Alexander T; Greenyer, Haley; Amarasinghe, Saman; Berger, Bonnie.

Nat Biotechnol ; 39(9): 1062-1064, 2021 09.

Artigo em Inglês | MEDLINE | ID: mdl-34282326

Assuntos

Biologia Computacional/métodos , Genômica , Linguagens de Programação , Software

Seq: A High-Performance Language for Bioinformatics.

Shajii, Ariya; Numanagic, Ibrahim; Baghdadi, Riyadh; Berger, Bonnie; Amarasinghe, Saman.

Proc ACM Program Lang ; 32019 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-35775031

RESUMO

The scope and scale of biological data are increasing at an exponential rate, as technologies like next-generation sequencing are becoming radically cheaper and more prevalent. Over the last two decades, the cost of sequencing a genome has dropped from $100 million to nearly $100-a factor of over 106-and the amount of data to be analyzed has increased proportionally. Yet, as Moore's Law continues to slow, computational biologists can no longer rely on computing hardware to compensate for the ever-increasing size of biological datasets. In a field where many researchers are primarily focused on biological analysis over computational optimization, the unfortunate solution to this problem is often to simply buy larger and faster machines. Here, we introduce Seq, the first language tailored specifically to bioinformatics, which marries the ease and productivity of Python with C-like performance. Seq starts with a subset of Python-and is in many cases a drop-in replacement-yet also incorporates novel bioinformatics- and computational genomics-oriented data types, language constructs and optimizations. Seq enables users to write high-level, Pythonic code without having to worry about low-level or domain-specific optimizations, and allows for the seamless expression of the algorithms, idioms and patterns found in many genomics or bioinformatics applications. We evaluated Seq on several standard computational genomics tasks like reverse complementation, k-mer manipulation, sequence pattern matching and large genomic index queries. On equivalent CPython code, Seq attains a performance improvement of up to two orders of magnitude, and a 160× improvement once domain-specific language features and optimizations are used. With parallelism, we demonstrate up to a 650× improvement. Compared to optimized C++ code, which is already difficult for most biologists to produce, Seq frequently attains up to a 2× improvement, and with shorter, cleaner code. Thus, Seq opens the door to an age of democratization of highly-optimized bioinformatics software.

Statistical Binning for Barcoded Reads Improves Downstream Analyses.

Shajii, Ariya; Numanagic, Ibrahim; Whelan, Christopher; Berger, Bonnie.

Cell Syst ; 7(2): 219-226.e5, 2018 08 22.

Artigo em Inglês | MEDLINE | ID: mdl-30138581

RESUMO

Sequencing technologies are capturing longer-range genomic information at lower error rates, enabling alignment to genomic regions that are inaccessible with short reads. However, many methods are unable to align reads to much of the genome, recognized as important in disease, and thus report erroneous results in downstream analyses. We introduce EMA, a novel two-tiered statistical binning model for barcoded read alignment, that first probabilistically maps reads to potentially multiple "read clouds" and then within clouds by newly exploiting the non-uniform read densities characteristic of barcoded read sequencing. EMA substantially improves downstream accuracy over existing methods, including phasing and genotyping on 10x data, with fewer false variant calls in nearly half the time. EMA effectively resolves particularly challenging alignments in genomic regions that contain nearby homologous elements, uncovering variants in the pharmacogenomically important CYP2D region, and clinically important genes C4 (schizophrenia) and AMY1A (obesity), which go undetected by existing methods. Our work provides a framework for future generation sequencing.

Assuntos

Técnicas de Genotipagem/métodos , Análise de Sequência de DNA/métodos , Algoritmos , Citocromo P-450 CYP2D6/genética , Sistema Enzimático do Citocromo P-450/genética , Variação Genética , Genômica/métodos , Humanos , Probabilidade , alfa-Amilases Salivares/genética , Software

Latent Variable Model for Aligning Barcoded Short-Reads Improves Downstream Analyses.

Shajii, Ariya; Numanagic, Ibrahim; Berger, Bonnie.

Res Comput Mol Biol ; 10812: 280-282, 2018 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-29888346

Fast genotyping of known SNPs through approximate k-mer matching.

Shajii, Ariya; Yorukoglu, Deniz; William Yu, Yun; Berger, Bonnie.

Bioinformatics ; 32(17): i538-i544, 2016 09 01.

Artigo em Inglês | MEDLINE | ID: mdl-27587672

RESUMO

MOTIVATION: As the volume of next-generation sequencing (NGS) data increases, faster algorithms become necessary. Although speeding up individual components of a sequence analysis pipeline (e.g. read mapping) can reduce the computational cost of analysis, such approaches do not take full advantage of the particulars of a given problem. One problem of great interest, genotyping a known set of variants (e.g. dbSNP or Affymetrix SNPs), is important for characterization of known genetic traits and causative disease variants within an individual, as well as the initial stage of many ancestral and population genomic pipelines (e.g. GWAS). RESULTS: We introduce lightweight assignment of variant alleles (LAVA), an NGS-based genotyping algorithm for a given set of SNP loci, which takes advantage of the fact that approximate matching of mid-size k-mers (with k = 32) can typically uniquely identify loci in the human genome without full read alignment. LAVA accurately calls the vast majority of SNPs in dbSNP and Affymetrix's Genome-Wide Human SNP Array 6.0 up to about an order of magnitude faster than standard NGS genotyping pipelines. For Affymetrix SNPs, LAVA has significantly higher SNP calling accuracy than existing pipelines while using as low as â¼5 GB of RAM. As such, LAVA represents a scalable computational method for population-level genotyping studies as well as a flexible NGS-based replacement for SNP arrays. AVAILABILITY AND IMPLEMENTATION: LAVA software is available at http://lava.csail.mit.edu CONTACT: bab@mit.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos , Genótipo , Polimorfismo de Nucleotídeo Único , Alelos , Análise por Conglomerados , Genoma Humano , Humanos , Software

RESUMO

Assuntos

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA