RESUMO
The Human Genome Project has generated a large amount of sequence data. A number of works are currently concerned with analyzing these data. One of the analyses carried out is the identification of genes' structures on the junctions represent a type of signal present on eukariot genes. Many studies have appied Machine Learning techniques in the recognition of such regions. However, most of the genetic databases are characterized y the presence of noise data, which can affect the performance of the learning techniques. This paper evaluates the effectiveness of five data pre-processing algorithms in the elimination of noisy instances from two splice junction recognition datasets. After the pre-processing phase, two learning techniques, Decision Trees and Support Vector Machines, are employed in the recognition process.