Pesquisa | Portal Regional da BVS (teste)

RODAN: a fully convolutional architecture for basecalling nanopore RNA sequencing data.

Neumann, Don; Reddy, Anireddy S N; Ben-Hur, Asa.

BMC Bioinformatics ; 23(1): 142, 2022 Apr 20.

Artigo em Inglês | MEDLINE | ID: mdl-35443610

RESUMO

BACKGROUND: Despite recent progress in basecalling of Oxford nanopore DNA sequencing data, its wide adoption is still being hampered by its relatively low accuracy compared to short read technologies. Furthermore, very little of the recent research was focused on basecalling of RNA data, which has different characteristics than its DNA counterpart. RESULTS: We fill this gap by benchmarking a fully convolutional deep learning basecalling architecture with improved performance compared to Oxford nanopore's RNA basecallers. AVAILABILITY: The source code for our basecaller is available at: https://github.com/biodlab/RODAN .

Assuntos

Sequenciamento por Nanoporos , Nanoporos , DNA , Sequenciamento de Nucleotídeos em Larga Escala , RNA , Análise de Sequência de DNA , Análise de Sequência de RNA

On the choice of negative examples for prediction of host-pathogen protein interactions.

Neumann, Don; Roy, Soumyadip; Minhas, Fayyaz Ul Amir Afsar; Ben-Hur, Asa.

Front Bioinform ; 2: 1083292, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-36591335

RESUMO

As practitioners of machine learning in the area of bioinformatics we know that the quality of the results crucially depends on the quality of our labeled data. While there is a tendency to focus on the quality of positive examples, the negative examples are equally as important. In this opinion paper we revisit the problem of choosing negative examples for the task of predicting protein-protein interactions, either among proteins of a given species or for host-pathogen interactions and describe important issues that are prevalent in the current literature. The challenge in creating datasets for this task is the noisy nature of the experimentally derived interactions and the lack of information on non-interacting proteins. A standard approach is to choose random pairs of non-interacting proteins as negative examples. Since the interactomes of all species are only partially known, this leads to a very small percentage of false negatives. This is especially true for host-pathogen interactions. To address this perceived issue, some researchers have chosen to select negative examples as pairs of proteins whose sequence similarity to the positive examples is sufficiently low. This clearly reduces the chance for false negatives, but also makes the problem much easier than it really is, leading to over-optimistic accuracy estimates. We demonstrate the effect of this form of bias using a selection of recent protein interaction prediction methods of varying complexity, and urge researchers to pay attention to the details of generating their datasets for potential biases like this.

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA