Pesquisa | Portal Regional da BVS (teste)

Mining sponge phenomena in RNA expression data.

Angiulli, Fabrizio; Colombo, Teresa; Fassetti, Fabio; Furfaro, Angelo; Paci, Paola.

J Bioinform Comput Biol ; 20(1): 2150022, 2022 02.

Artigo em Inglês | MEDLINE | ID: mdl-34794369

RESUMO

In the last few years, the interactions among competing endogenous RNAs (ceRNAs) have been recognized as a key post-transcriptional regulatory mechanism in cell differentiation, tissue development, and disease. Notably, such sponge phenomena substracting active microRNAs from their silencing targets have been recognized as having a potential oncosuppressive, or oncogenic, role in several cancer types. Hence, the ability to predict sponges from the analysis of large expression data sets (e.g. from international cancer projects) has become an important data mining task in bioinformatics. We present a technique designed to mine sponge phenomena whose presence or absence may discriminate between healthy and unhealthy populations of samples in tumoral or normal expression data sets, thus providing lists of candidates potentially relevant in the pathology. With this aim, we search for pairs of elements acting as ceRNA for a given miRNA, namely, we aim at discovering miRNA-RNA pairs involved in phenomena which are clearly present in one population and almost absent in the other one. The results on tumoral expression data, concerning five different cancer types, confirmed the effectiveness of the approach in mining interesting knowledge. Indeed, 32 out of 33 miRNAs and 22 out of 25 protein-coding genes identified as top scoring in our analysis are corroborated by having been similarly associated with cancer processes in independent studies. In fact, the subset of miRNAs selected by the sponge analysis results in a significant enrichment of annotation for the KEGG32 pathway "microRNAs in cancer" when tested with the commonly used bioinformatic resource DAVID. Moreover, often the cancer datasets where our sponge analysis identified a miRNA as top scoring match the one reported already in the pertaining literature.

Assuntos

MicroRNAs , Neoplasias , RNA Longo não Codificante , Biologia Computacional , Mineração de Dados , Regulação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Redes Reguladoras de Genes , Humanos , MicroRNAs/genética , MicroRNAs/metabolismo , Neoplasias/genética , RNA Longo não Codificante/genética

Prototype-based Domain Description for one-class classification.

Angiulli, Fabrizio.

IEEE Trans Pattern Anal Mach Intell ; 34(6): 1131-44, 2012 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-22516649

RESUMO

This work introduces the Prototype-based Domain Description rule (PDD) one-class classifier. PDD is a nearest neighbor-based classifier since it accepts objects on the basis of their nearest neighbor distances in a reference set of objects, also called prototypes. For a suitable choice of the prototype set, the PDD classifier is equivalent to another nearest neighbor-based one-class classifier, namely, the NNDD classifier. Moreover, it generalizes statistical tests for outlier detection. The concept of a PDD consistent subset is introduced, which exploits only a selected subset of the training set. It is shown that computing a minimum size PDD consistent subset is, in general, not approximable within any constant factor. A logarithmic approximation factor algorithm, called the CPDD algorithm, for computing a minimum size PDD consistent subset is then introduced. In order to efficiently manage very large data sets, a variant of the basic rule, called Fast CPDD, is also presented. Experimental results show that the CPDD rule sensibly improves over the CNNDD classifier, namely the condensed variant of NNDD, in terms of size of the subset while guaranteeing a comparable classification quality, that it is competitive over other one-class classification methods and is suitable to classify large data sets.

Assuntos

Algoritmos , Reconhecimento Automatizado de Padrão/métodos , Análise por Conglomerados , Bases de Dados Factuais , Análise Discriminante , Reprodutibilidade dos Testes

Scaling up support vector machines using nearest neighbor condensation.

Angiulli, Fabrizio; Astorino, Annabella.

IEEE Trans Neural Netw ; 21(2): 351-7, 2010 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-20071255

RESUMO

In this brief, we describe the FCNN-SVM classifier, which combines the support vector machine (SVM) approach and the fast nearest neighbor condensation classification rule (FCNN) in order to make SVMs practical on large collections of data. As a main contribution, it is experimentally shown that, on very large and multidimensional data sets, the FCNN-SVM is one or two orders of magnitude faster than SVM, and that the number of support vectors (SVs) is more than halved with respect to SVM. Thus, a drastic reduction of both training and testing time is achieved by using the FCNN-SVM. This result is obtained at the expense of a little loss of accuracy. The FCNN-SVM is proposed as a viable alternative to the standard SVM in applications where a fast response time is a fundamental requirement.

Condensed nearest neighbor data domain description.

Angiulli, Fabrizio.

IEEE Trans Pattern Anal Mach Intell ; 29(10): 1746-58, 2007 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-17699920

RESUMO

A simple yet effective unsupervised classification rule to discriminate between normal and abnormal data is based on accepting test objects whose nearest neighbors distances in a reference data set, assumed to model normal behavior, lie within a certain threshold. This work investigates the effect of using a subset of the original data set as the reference set of the classifier. With this aim, the concept of a reference consistent subset is introduced and it is shown that finding the minimum cardinality reference consistent subset is intractable. Then, the CNNDD algorithm is described, which computes a reference consistent subset with only two reference set passes. Experimental results revealed the advantages of condensing the data set and confirmed the effectiveness of the proposed approach. A thorough comparison with related methods was accomplished, pointing out the strengths and weaknesses of one-class nearest-neighbor-based training set consistent condensation.

Assuntos

Algoritmos , Inteligência Artificial , Análise por Conglomerados , Armazenamento e Recuperação da Informação/métodos , Reconhecimento Automatizado de Padrão/métodos , Análise Discriminante , Reprodutibilidade dos Testes , Sensibilidade e Especificidade

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA