Pesquisa | Portal Regional da BVS (teste)

Improving Retrieval Efficacy of Homology Searches Using the False Discovery Rate.

Carroll, Hyrum D; Williams, Alex C; Davis, Anthony G; Spouge, John L.

IEEE/ACM Trans Comput Biol Bioinform ; 12(3): 531-7, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26357264

RESUMO

Over the past few decades, discovery based on sequence homology has become a widely accepted practice. Consequently, comparative accuracy of retrieval algorithms (e.g., BLAST) has been rigorously studied for improvement. Unlike most components of retrieval algorithms, the E-value threshold criterion has yet to be thoroughly investigated. An investigation of the threshold is important as it exclusively dictates which sequences are declared relevant and irrelevant. In this paper, we introduce the false discovery rate (FDR) statistic as a replacement for the uniform threshold criterion in order to improve efficacy in retrieval systems. Using NCBI's BLAST and PSI-BLAST software packages, we demonstrate the applicability of such a replacement in both non-iterative (BLASTFDR) and iterative (PSI-BLAST(FDR)) homology searches. For each application, we performed an evaluation of retrieval efficacy with five different multiple testing methods on a large training database. For each algorithm, we choose the best performing method, Benjamini-Hochberg, as the default statistic. As measured by the threshold average precision, BLAST(FDR) yielded 14.1 percent better retrieval performance than BLAST on a large (5,161 queries) test database and PSI-BLAST(FDR) attained 11.8 percent better retrieval performance than PSI-BLAST. The C++ source code specific to BLAST(FDR) and PSI-BLAST(FDR) and instructions are available at http://www.cs.mtsu.edu/~hcarroll/blast_fdr/.

Assuntos

Biologia Computacional/métodos , Análise de Sequência de Proteína/métodos , Homologia de Sequência de Aminoácidos , Algoritmos , Bases de Dados de Proteínas , Software

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA