Pesquisa | Portal Regional da BVS (teste)

Accelerating minimap2 for long-read sequencing applications on modern CPUs.

Kalikar, Saurabh; Jain, Chirag; Vasimuddin, Md; Misra, Sanchit.

Nat Comput Sci ; 2(2): 78-83, 2022 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-38177520

RESUMO

Long-read sequencing is now routinely used at scale for genomics and transcriptomics applications. Mapping long reads or a draft genome assembly to a reference sequence is often one of the most time-consuming steps in these applications. Here we present techniques to accelerate minimap2, a widely used software for this task. We present multiple optimizations using single-instruction multiple-data parallelization, efficient cache utilization and a learned index data structure to accelerate the three main computational modules of minimap2: seeding, chaining and pairwise sequence alignment. These optimizations result in an up to 1.8-fold reduction of end-to-end mapping time of minimap2 while maintaining identical output.

Action research on education in Ayurveda.

Misra, Sanchit.

J Ayurveda Integr Med ; 8(4): 283-284, 2017.

Artigo em Inglês | MEDLINE | ID: mdl-29122453

muBLASTP: database-indexed protein sequence search on multicore CPUs.

Zhang, Jing; Misra, Sanchit; Wang, Hao; Feng, Wu-Chun.

BMC Bioinformatics ; 17(1): 443, 2016 Nov 04.

Artigo em Inglês | MEDLINE | ID: mdl-27809763

RESUMO

BACKGROUND: The Basic Local Alignment Search Tool (BLAST) is a fundamental program in the life sciences that searches databases for sequences that are most similar to a query sequence. Currently, the BLAST algorithm utilizes a query-indexed approach. Although many approaches suggest that sequence search with a database index can achieve much higher throughput (e.g., BLAT, SSAHA, and CAFE), they cannot deliver the same level of sensitivity as the query-indexed BLAST, i.e., NCBI BLAST, or they can only support nucleotide sequence search, e.g., MegaBLAST. Due to different challenges and characteristics between query indexing and database indexing, the existing techniques for query-indexed search cannot be used into database indexed search. RESULTS: muBLASTP, a novel database-indexed BLAST for protein sequence search, delivers identical hits returned to NCBI BLAST. On Intel Haswell multicore CPUs, for a single query, the single-threaded muBLASTP achieves up to a 4.41-fold speedup for alignment stages, and up to a 1.75-fold end-to-end speedup over single-threaded NCBI BLAST. For a batch of queries, the multithreaded muBLASTP achieves up to a 5.7-fold speedups for alignment stages, and up to a 4.56-fold end-to-end speedup over multithreaded NCBI BLAST. CONCLUSIONS: With a newly designed index structure for protein database and associated optimizations in BLASTP algorithm, we re-factored BLASTP algorithm for modern multicore processors that achieves much higher throughput with acceptable memory footprint for the database index.

Assuntos

Algoritmos , Bases de Dados de Proteínas , Armazenamento e Recuperação da Informação/métodos , Proteínas/química , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Humanos , Software , Interface Usuário-Computador

A proposal for inclusion of Indology in regular school curriculum.

Misra, Sanchit.

Anc Sci Life ; 35(4): 251-2, 2016.

Artigo em Inglês | MEDLINE | ID: mdl-27621526

Education: Reforms set to seep into India's schools.

Misra, Sanchit.

Nature ; 536(7615): 148, 2016 08 11.

Artigo em Inglês | MEDLINE | ID: mdl-27510210

Assuntos

Criatividade , Plágio , Pesquisadores/educação , Pesquisadores/normas , Ciência/educação , Ciência/normas

Don't read too much into National Sample Survey Organization survey results.

Misra, Sanchit.

J Ayurveda Integr Med ; 6(3): 211-2, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26604559

Parallel Mutual Information Based Construction of Genome-Scale Networks on the Intel® Xeon Phi™ Coprocessor.

Misra, Sanchit; Pamnany, Kiran; Aluru, Srinivas.

IEEE/ACM Trans Comput Biol Bioinform ; 12(5): 1008-20, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26451815

RESUMO

Construction of whole-genome networks from large-scale gene expression data is an important problem in systems biology. While several techniques have been developed, most cannot handle network reconstruction at the whole-genome scale, and the few that can, require large clusters. In this paper, we present a solution on the Intel Xeon Phi coprocessor, taking advantage of its multi-level parallelism including many x86-based cores, multiple threads per core, and vector processing units. We also present a solution on the Intel® Xeon® processor. Our solution is based on TINGe, a fast parallel network reconstruction technique that uses mutual information and permutation testing for assessing statistical significance. We demonstrate the first ever inference of a plant whole genome regulatory network on a single chip by constructing a 15,575 gene network of the plant Arabidopsis thaliana from 3,137 microarray experiments in only 22 minutes. In addition, our optimization for parallelizing mutual information computation on the Intel Xeon Phi coprocessor holds out lessons that are applicable to other domains.

Assuntos

Proteínas de Arabidopsis/metabolismo , Arabidopsis/metabolismo , Mapeamento Cromossômico/instrumentação , Ensaios de Triagem em Larga Escala/instrumentação , Análise de Sequência com Séries de Oligonucleotídeos/instrumentação , Mapeamento de Interação de Proteínas/instrumentação , Arabidopsis/genética , Proteínas de Arabidopsis/genética , Ensaios de Triagem em Larga Escala/métodos , Transdução de Sinais/fisiologia

Accelerating pairwise statistical significance estimation for local alignment by harvesting GPU's power.

Zhang, Yuhong; Misra, Sanchit; Agrawal, Ankit; Patwary, Md Mostofa Ali; Liao, Wei-Keng; Qin, Zhiguang; Choudhary, Alok.

BMC Bioinformatics ; 13 Suppl 5: S3, 2012 Apr 12.

Artigo em Inglês | MEDLINE | ID: mdl-22537007

RESUMO

BACKGROUND: Pairwise statistical significance has been recognized to be able to accurately identify related sequences, which is a very important cornerstone procedure in numerous bioinformatics applications. However, it is both computationally and data intensive, which poses a big challenge in terms of performance and scalability. RESULTS: We present a GPU implementation to accelerate pairwise statistical significance estimation of local sequence alignment using standard substitution matrices. By carefully studying the algorithm's data access characteristics, we developed a tile-based scheme that can produce a contiguous data access in the GPU global memory and sustain a large number of threads to achieve a high GPU occupancy. We further extend the parallelization technique to estimate pairwise statistical significance using position-specific substitution matrices, which has earlier demonstrated significantly better sequence comparison accuracy than using standard substitution matrices. The implementation is also extended to take advantage of dual-GPUs. We observe end-to-end speedups of nearly 250 (370) × using single-GPU Tesla C2050 GPU (dual-Tesla C2050) over the CPU implementation using Intel Corei7 CPU 920 processor. CONCLUSIONS: Harvesting the high performance of modern GPUs is a promising approach to accelerate pairwise statistical significance estimation for local sequence alignment.

Assuntos

Gráficos por Computador/instrumentação , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Algoritmos , Alinhamento de Sequência/instrumentação , Análise de Sequência de Proteína/instrumentação , Software

Anatomy of a hash-based long read sequence mapping algorithm for next generation DNA sequencing.

Misra, Sanchit; Agrawal, Ankit; Liao, Wei-keng; Choudhary, Alok.

Bioinformatics ; 27(2): 189-95, 2011 Jan 15.

Artigo em Inglês | MEDLINE | ID: mdl-21088030

RESUMO

MOTIVATION: Recently, a number of programs have been proposed for mapping short reads to a reference genome. Many of them are heavily optimized for short-read mapping and hence are very efficient for shorter queries, but that makes them inefficient or not applicable for reads longer than 200 bp. However, many sequencers are already generating longer reads and more are expected to follow. For long read sequence mapping, there are limited options; BLAT, SSAHA2, FANGS and BWA-SW are among the popular ones. However, resequencing and personalized medicine need much faster software to map these long sequencing reads to a reference genome to identify SNPs or rare transcripts. RESULTS: We present AGILE (AliGnIng Long rEads), a hash table based high-throughput sequence mapping algorithm for longer 454 reads that uses diagonal multiple seed-match criteria, customized q-gram filtering and a dynamic incremental search approach among other heuristics to optimize every step of the mapping process. In our experiments, we observe that AGILE is more accurate than BLAT, and comparable to BWA-SW and SSAHA2. For practical error rates (< 5%) and read lengths (200-1000 bp), AGILE is significantly faster than BLAT, SSAHA2 and BWA-SW. Even for the other cases, AGILE is comparable to BWA-SW and several times faster than BLAT and SSAHA2. AVAILABILITY: http://www.ece.northwestern.edu/~smi539/agile.html.

Assuntos

Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Mapeamento Cromossômico , Genoma , Software

RESUMO

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA