Search | VHL Regional Portal

Prediction of enzymatic function with high efficiency and a reduced number of features using genetic algorithm.

Reis, Diogo R; Santos, Bruno C; Bleicher, Lucas; Zárate, Luis E; Nobre, Cristiane N.

Comput Biol Med ; 158: 106799, 2023 05.

Article in English | MEDLINE | ID: mdl-37028140

ABSTRACT

The post-genomic era has raised a growing demand for efficient procedures to identify protein functions, which can be accomplished by applying machine learning to the characteristics set extracted from the protein. This approach is feature-based and has been the focus of several works in bioinformatics. In this work, we investigated the characteristics of proteins, representing the primary, secondary, tertiary, and quaternary structures of the protein, that improve the model's quality by applying dimensionality reduction techniques and using the Support Vector Machine classifier for predicting the enzymes' classes. During the investigation, two approaches were evaluated: feature extraction/transformation, which was performed using the statistical technique Factor Analysis, and feature selection methods. For feature selection, we proposed an approach based on a genetic algorithm to face the optimization conflict between the simplicity and reliability of an ideal representation of the characteristics of the enzymes and also compared and employed other methods for this purpose. The best result was accomplished using a feature subset generated by our implementation of a multi-objective genetic algorithm enriched with features that this work identified as relevant to represent the enzymes. This subset representation reduced the dataset by about 87% and reached 85.78% of F-measure performance, improving the overall quality of the model classification. In addition, we verified in this work a subset addressed with only 28 features out of a total of 424 that reached a performance above 80% of F-measure for four of the six evaluated classes, showing that satisfactory classification performance can be achieved with a reduced number of enzymes's characteristics. The datasets and implementations are openly available.

Subject(s)

Machine Learning , Proteins , Reproducibility of Results , Computational Biology , Genomics , Support Vector Machine , Algorithms

Characterizing Infant Mortality Using Data Mining - A Case Study in Two Brazilian States - Santa Catarina and Amapá.

Soares, Wanderson L; Song, Mark A J; Zárate, Luis E; Nobre, Cristiane N.

Stud Health Technol Inform ; 290: 772-776, 2022 Jun 06.

Article in English | MEDLINE | ID: mdl-35673122

ABSTRACT

Infant mortality is characterized by the death of young children under the age of one, and it is an issue affecting millions of children in the world. The objective of this article is to employ concepts of knowledge discovery in databases, specifically of machine learning in the data mining phase, to characterize infant mortality in two states of Brazil: Santa Catarina, with the lowest infant mortality rate of the country's states, and Amapá, with the highest. The classifiers C4.5, JRip, Random Forest, SVM, and Multilayer Perceptron were used, and a brief comparison of the results obtained by the classifiers in both states is made. In addition, the dataset preprocessing is detailed, which includes attribute selection and class balancing. The results show that the features APGAR5, WEIGHT, and CONGENITAL ANOMALY stood out the most from the rules generated by the tree-based classifiers.

Subject(s)

Data Mining , Machine Learning , Brazil/epidemiology , Child , Child, Preschool , Humans , Infant , Infant Mortality , Neural Networks, Computer

Interpreting the Human Longevity Profile Through Triadic Rules - A Case Study Based on the ELSA-UK Longitudinal Study.

Noronha, Marta D M; Nobre, Cristiane N; Song, Mark A J; Zárate, Luis E.

Stud Health Technol Inform ; 290: 782-786, 2022 Jun 06.

Article in English | MEDLINE | ID: mdl-35673124

ABSTRACT

Human aging is a complex process with several factors interacting. One of the ways to identify patterns about human aging is longitudinal population studies. In this work, we identified longevity profiles through a process of knowledge discovery. After identifying the profiles, we apply triadic rules which allow extracting rules of implication with conditions. These rules can be used to identify related factors, in the various waves, of longitudinal studies, which can better explain the conditions that favor longevity profiles.The results show that the triadic analysis is efficient to allow the analysis of the temporal evolution of clinical or environmental conditions that favor certain profiles when databases of longitudinal studies are considered.

Subject(s)

Aging , Longevity , Humans , Longitudinal Studies , United Kingdom

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL