Search | VHL Regional Portal

Multiomics and eXplainable artificial intelligence for decision support in insulin resistance early diagnosis: A pediatric population-based longitudinal study.

Torres-Martos, Álvaro; Anguita-Ruiz, Augusto; Bustos-Aibar, Mireia; Ramírez-Mena, Alberto; Arteaga, María; Bueno, Gloria; Leis, Rosaura; Aguilera, Concepción M; Alcalá, Rafael; Alcalá-Fdez, Jesús.

Artif Intell Med ; 156: 102962, 2024 Oct.

Article in English | MEDLINE | ID: mdl-39180924

ABSTRACT

Pediatric obesity can drastically heighten the risk of cardiometabolic alterations later in life, with insulin resistance standing as the cornerstone linking adiposity to the increased cardiovascular risk. Puberty has been pointed out as a critical stage after which obesity-associated insulin resistance is more difficult to revert. Timely prediction of insulin resistance in pediatric obesity is therefore vital for mitigating the risk of its associated comorbidities. The construction of effective and robust predictive systems for a complex health outcome like insulin resistance during the early stages of life demands the adoption of longitudinal designs for more causal inferences, and the integration of factors of varying nature involved in its onset. In this work, we propose an eXplainable Artificial Intelligence-based decision support pipeline for early diagnosis of insulin resistance in a longitudinal cohort of 90 children. For that, we leverage multi-omics (genomics and epigenomics) and clinical data from the pre-pubertal stage. Different data layers combinations, pre-processing techniques (missing values, feature selection, class imbalance, etc.), algorithms, training procedures were considered following good practices for Machine Learning. SHapley Additive exPlanations were provided for specialists to understand both the decision-making mechanisms of the system and the impact of the features on each automatic decision, an essential issue in high-risk areas such as this one where system decisions may affect people's lives. The system showed a relevant predictive ability (AUC and G-mean of 0.92). A deep exploration, both at the global and the local level, revealed promising biomarkers of insulin resistance in our population, highlighting classical markers, such as Body Mass Index z-score or leptin/adiponectin ratio, and novel ones such as methylation patterns of relevant genes, such as HDAC4, PTPRN2, MATN2, RASGRF1 and EBF1. Our findings highlight the importance of integrating multi-omics data and following eXplainable Artificial Intelligence trends when building decision support systems.

Subject(s)

Artificial Intelligence , Early Diagnosis , Insulin Resistance , Pediatric Obesity , Humans , Longitudinal Studies , Child , Male , Female , Pediatric Obesity/diagnosis , Pediatric Obesity/physiopathology , Machine Learning , Genomics/methods , Epigenomics/methods , Child, Preschool , Multiomics

Explainable artificial intelligence to predict and identify prostate cancer tissue by gene expression.

Ramírez-Mena, Alberto; Andrés-León, Eduardo; Alvarez-Cubero, Maria Jesus; Anguita-Ruiz, Augusto; Martinez-Gonzalez, Luis Javier; Alcala-Fdez, Jesus.

Comput Methods Programs Biomed ; 240: 107719, 2023 Oct.

Article in English | MEDLINE | ID: mdl-37453366

ABSTRACT

BACKGROUND AND OBJECTIVE: Prostate cancer is one of the most prevalent forms of cancer in men worldwide. Traditional screening strategies such as serum PSA levels, which are not necessarily cancer-specific, or digital rectal exams, which are often inconclusive, are still the screening methods used for the disease. Some studies have focused on identifying biomarkers of the disease but none have been reported for diagnosis in routine clinical practice and few studies have provided tools to assist the pathologist in the decision-making process when analyzing prostate tissue. Therefore, a classifier is proposed to predict the occurrence of PCa that provides physicians with accurate predictions and understandable explanations. METHODS: A selection of 47 genes was made based on differential expression between PCa and normal tissue, GO gene ontology as well as the literature to be used as input predictors for different machine learning methods based on eXplainable Artificial Intelligence. These methods were trained using different class-balancing strategies to build accurate classifiers using gene expression data from 550 samples from 'The Cancer Genome Atlas'. Our model was validated in four external cohorts with different ancestries, totaling 463 samples. In addition, a set of SHapley Additive exPlanations was provided to help clinicians understand the underlying reasons for each decision. RESULTS: An in-depth analysis showed that the Random Forest algorithm combined with majority class downsampling was the best performing approach with robust statistical significance. Our method achieved an average sensitivity and specificity of 0.90 and 0.8 with an AUC of 0.84 across all databases. The relevance of DLX1, MYL9 and FGFR genes for PCa screening was demonstrated in addition to the important role of novel genes such as CAV2 and MYLK. CONCLUSIONS: This model has shown good performance in 4 independent external cohorts of different ancestries and the explanations provided are consistent with each other and with the literature, opening a horizon for its application in clinical practice. In the near future, these genes, in combination with our model, could be applied to liquid biopsy to improve PCa screening.

Subject(s)

Artificial Intelligence , Prostatic Neoplasms , Male , Humans , Prostatic Neoplasms/genetics , Sensitivity and Specificity , Gene Expression

Omics Data Preprocessing for Machine Learning: A Case Study in Childhood Obesity.

Torres-Martos, Álvaro; Bustos-Aibar, Mireia; Ramírez-Mena, Alberto; Cámara-Sánchez, Sofía; Anguita-Ruiz, Augusto; Alcalá, Rafael; Aguilera, Concepción M; Alcalá-Fdez, Jesús.

Genes (Basel) ; 14(2)2023 01 18.

Article in English | MEDLINE | ID: mdl-36833178

ABSTRACT

The use of machine learning techniques for the construction of predictive models of disease outcomes (based on omics and other types of molecular data) has gained enormous relevance in the last few years in the biomedical field. Nonetheless, the virtuosity of omics studies and machine learning tools are subject to the proper application of algorithms as well as the appropriate pre-processing and management of input omics and molecular data. Currently, many of the available approaches that use machine learning on omics data for predictive purposes make mistakes in several of the following key steps: experimental design, feature selection, data pre-processing, and algorithm selection. For this reason, we propose the current work as a guideline on how to confront the main challenges inherent to multi-omics human data. As such, a series of best practices and recommendations are also presented for each of the steps defined. In particular, the main particularities of each omics data layer, the most suitable preprocessing approaches for each source, and a compilation of best practices and tips for the study of disease development prediction using machine learning are described. Using examples of real data, we show how to address the key problems mentioned in multi-omics research (e.g., biological heterogeneity, technical noise, high dimensionality, presence of missing values, and class imbalance). Finally, we define the proposals for model improvement based on the results found, which serve as the bases for future work.

Subject(s)

Pediatric Obesity , Child , Humans , Machine Learning , Algorithms

Functional Enrichment Analysis of Regulatory Elements.

Garcia-Moreno, Adrian; López-Domínguez, Raul; Villatoro-García, Juan Antonio; Ramirez-Mena, Alberto; Aparicio-Puerta, Ernesto; Hackenberg, Michael; Pascual-Montano, Alberto; Carmona-Saez, Pedro.

Biomedicines ; 10(3)2022 Mar 03.

Article in English | MEDLINE | ID: mdl-35327392

ABSTRACT

Statistical methods for enrichment analysis are important tools to extract biological information from omics experiments. Although these methods have been widely used for the analysis of gene and protein lists, the development of high-throughput technologies for regulatory elements demands dedicated statistical and bioinformatics tools. Here, we present a set of enrichment analysis methods for regulatory elements, including CpG sites, miRNAs, and transcription factors. Statistical significance is determined via a power weighting function for target genes and tested by the Wallenius noncentral hypergeometric distribution model to avoid selection bias. These new methodologies have been applied to the analysis of a set of miRNAs associated with arrhythmia, showing the potential of this tool to extract biological information from a list of regulatory elements. These new methods are available in GeneCodis 4, a web tool able to perform singular and modular enrichment analysis that allows the integration of heterogeneous information.

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL