Search | VHL Regional Portal

Machine learning based study for the classification of Type 2 diabetes mellitus subtypes.

Ordoñez-Guillen, Nelson E; Gonzalez-Compean, Jose Luis; Lopez-Arevalo, Ivan; Contreras-Murillo, Miguel; Aldana-Bobadilla, Edwin.

BioData Min ; 16(1): 24, 2023 Aug 22.

Article in English | MEDLINE | ID: mdl-37608329

ABSTRACT

PURPOSE: Data-driven diabetes research has increased its interest in exploring the heterogeneity of the disease, aiming to support in the development of more specific prognoses and treatments within the so-called precision medicine. Recently, one of these studies found five diabetes subgroups with varying risks of complications and treatment responses. Here, we tackle the development and assessment of different models for classifying Type 2 Diabetes (T2DM) subtypes through machine learning approaches, with the aim of providing a performance comparison and new insights on the matter. METHODS: We developed a three-stage methodology starting with the preprocessing of public databases NHANES (USA) and ENSANUT (Mexico) to construct a dataset with N = 10,077 adult diabetes patient records. We used N = 2,768 records for training/validation of models and left the remaining (N = 7,309) for testing. In the second stage, groups of observations -each one representing a T2DM subtype- were identified. We tested different clustering techniques and strategies and validated them by using internal and external clustering indices; obtaining two annotated datasets Dset A and Dset B. In the third stage, we developed different classification models assaying four algorithms, seven input-data schemes, and two validation settings on each annotated dataset. We also tested the obtained models using a majority-vote approach for classifying unseen patient records in the hold-out dataset. RESULTS: From the independently obtained bootstrap validation for Dset A and Dset B, mean accuracies across all seven data schemes were [Formula: see text] ([Formula: see text]) and [Formula: see text] ([Formula: see text]), respectively. Best accuracies were [Formula: see text] and [Formula: see text]. Both validation setting results were consistent. For the hold-out dataset, results were consonant with most of those obtained in the literature in terms of class proportions. CONCLUSION: The development of machine learning systems for the classification of diabetes subtypes constitutes an important task to support physicians for fast and timely decision-making. We expect to deploy this methodology in a data analysis platform to conduct studies for identifying T2DM subtypes in patient records from hospitals.

A Memory-Efficient Encoding Method for Processing Mixed-Type Data on Machine Learning.

Lopez-Arevalo, Ivan; Aldana-Bobadilla, Edwin; Molina-Villegas, Alejandro; Galeana-Zapién, Hiram; Muñiz-Sanchez, Victor; Gausin-Valle, Saul.

Entropy (Basel) ; 22(12)2020 Dec 09.

Article in English | MEDLINE | ID: mdl-33316972

ABSTRACT

The most common machine-learning methods solve supervised and unsupervised problems based on datasets where the problem's features belong to a numerical space. However, many problems often include data where numerical and categorical data coexist, which represents a challenge to manage them. To transform categorical data into a numeric form, preprocessing tasks are compulsory. Methods such as one-hot and feature-hashing have been the most widely used encoding approaches at the expense of a significant increase in the dimensionality of the dataset. This effect introduces unexpected challenges to deal with the overabundance of variables and/or noisy data. In this regard, in this paper we propose a novel encoding approach that maps mixed-type data into an information space using Shannon's Theory to model the amount of information contained in the original data. We evaluated our proposal with ten mixed-type datasets from the UCI repository and two datasets representing real-world problems obtaining promising results. For demonstrating the performance of our proposal, this was applied for preparing these datasets for classification, regression, and clustering tasks. We demonstrate that our encoding proposal is remarkably superior to one-hot and feature-hashing encoding in terms of memory efficiency. Our proposal can preserve the information conveyed by the original data.

Distributed Algorithm for Base Station Assignment in 4G/5G Machine-Type Communication Scenarios with Backhaul Limited Conditions.

Esquivel-Mendiola, Edgar A; Galeana-Zapién, Hiram; Covarrubias, David H; Aldana-Bobadilla, Edwin.

Sensors (Basel) ; 20(22)2020 Nov 17.

Article in English | MEDLINE | ID: mdl-33212750

ABSTRACT

A progressive paradigm shift from centralized to distributed network architectures has been consolidated since the 4G communication standard, calling for novel decision-making mechanisms with distributed control to operate at the network edge. This situation implies that each base station (BS) must manage resources independently to meet the quality of service (QoS) of existing human-type communication devices (HTC), as well as the emerging machine type communication (MTC) devices from the internet of things (IoT). In this paper, we address the BS assignment problem, whose aim is to determine the most appropriate serving BS to each mobile device. This problem is formulated as an optimization problem for maximizing the system throughput and imposing constraints on the air interface and backhaul resources. The assignment problem is challenging to solve, so we present a simple yet valid reformulation of the original problem while using dual decomposition theory. Subsequently, we propose a distributed price-based BS assignment algorithm that performs at each BS the assignment process, where a novel pricing update scheme is presented. The simulation results show that our proposed solution outperforms traditional maximum signal to interference plus noise ratio (Max-SINR) and minimum path-loss (Min-PL) approaches in terms of system throughput.

ABSTRACT

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL