Pesquisa | Index Medicus Global

Looking for exceptions on knowledge rules induced from HIV cleavage data set

Prati, Ronaldo Cristiano; Monard, Maria Carolina; Carvalho, André C. P. L. F. de.

Genet. mol. biol ; 27(4): 637-643, Dec. 2004. ilus, tab

Artigo em Inglês | LILACS | ID: lil-391241

RESUMO

The aim of data mining is to find useful knowledge inout of databases. In order to extract such knowledge, several methods can be used, among them machine learning (ML) algorithms. In this work we focus on ML algorithms that express the extracted knowledge in a symbolic form, such as rules. This representation may allow us to "explain" the data. Rule learning algorithms are mainly designed to induce classification rules that can predict new cases with high accuracy. However, these sorts of rules generally express common sense knowledge, resulting in many interesting and useful rules not being discovered. Furthermore, the domain independent biases, especially those related to the language used to express the induced knowledge, could induce rules that are difficult to understand. Exceptions might be used in order to overcome these drawbacks. Exceptions are defined as rules that contradict common believebeliefs. This kind of rules can play an important role in the process of understanding the underlying data as well as in making critical decisions. By contradicting the user's common beliefves, exceptions are bound to be interesting. This work proposes a method to find exceptions. In order to illustrate the potential of our approach, we apply the method in a real world data set to discover rules and exceptions in the HIV virus protein cleavage process. A good understanding of the process that generates this data plays an important role oin the research of cleavage inhibitors. We consider believe that the proposed approach may help the domain expert to further understand this process.

Assuntos

Bases de Dados como Assunto , Protease de HIV , Modelos Moleculares , Estrutura Molecular

Evaluation of gene selection metrics for tumor cell classification

Faceli, Katti; Carvalho, André C. P. L. F. de; Silva Júnior, Wilson A.

Genet. mol. biol ; 27(4): 651-657, Dec. 2004. ilus, tab

Artigo em Inglês | LILACS | ID: lil-391243

RESUMO

Gene expression profiles contain the expression level of thousands of genes. Depending on the issue under investigation, this large amount of data makes analysis impractical. Thus, it is important to select subsets of relevant genes to work with. This paper investigates different metrics for gene selection. The metrics are evaluated based on their ability in selecting genes whose expression profile provides information to distinguish between tumor and normal tissues. This evaluation is made by constructing classifiers using the genes selected by each metric and then comparing the performance of these classifiers. The performance of the classifiers is evaluated using the error rate in the classification of new tissues. As the dataset has few tissue samples, the leave-one-out methodology was employed to guarantee more reliable results. The classifiers are generated using different machine learning algorithms. Support Vector Machines (SVMs) and the C4.5 algorithm are employed. The experiments are conduced employing SAGE data obtained from the NCBI web site. There are few analysis involving SAGE data in the literature. It was found that the best metric for the data and algorithms employed is the metric logistic.

Assuntos

Humanos , Expressão Gênica , Neoplasias , Inteligência Artificial , Seleção Genética , Estatística

Evaluation of noise reduction techniques in the splice junction recognition problem

Lorena, Ana C; Carvalho, André C. P. L. F. de.

Genet. mol. biol ; 27(4): 665-672, Dec. 2004. ilus, tab

Artigo em Inglês | LILACS | ID: lil-391245

RESUMO

The Human Genome Project has generated a large amount of sequence data. A number of works are currently concerned with analyzing these data. One of the analyses carried out is the identification of genes' structures on the junctions represent a type of signal present on eukariot genes. Many studies have appied Machine Learning techniques in the recognition of such regions. However, most of the genetic databases are characterized y the presence of noise data, which can affect the performance of the learning techniques. This paper evaluates the effectiveness of five data pre-processing algorithms in the elimination of noisy instances from two splice junction recognition datasets. After the pre-processing phase, two learning techniques, Decision Trees and Support Vector Machines, are employed in the recognition process.

Assuntos

Humanos , Biologia Computacional , Expressão Gênica , Biologia Molecular , Algoritmos , Inteligência Artificial , Dados de Sequência Molecular

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA