Pesquisa | Portal Regional da BVS

A multimodal approach using fundus images and text meta-data in a machine learning classifier with embeddings to predict years with self-reported diabetes - An exploratory analysis.

Carrillo-Larco, Rodrigo M; Bravo-Rocca, Gusseppe; Castillo-Cara, Manuel; Xu, Xiaolin; Bernabe-Ortiz, Antonio.

Prim Care Diabetes ; 18(3): 327-332, 2024 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-38616442

RESUMO

AIMS: Machine learning models can use image and text data to predict the number of years since diabetes diagnosis; such model can be applied to new patients to predict, approximately, how long the new patient may have lived with diabetes unknowingly. We aimed to develop a model to predict self-reported diabetes duration. METHODS: We used the Brazilian Multilabel Ophthalmological Dataset. Unit of analysis was the fundus image and its meta-data, regardless of the patient. We included people 40â¯+ years and fundus images without diabetic retinopathy. Fundus images and meta-data (sex, age, comorbidities and taking insulin) were passed to the MedCLIP model to extract the embedding representation. The embedding representation was passed to an Extra Tree Classifier to predict: 0-4, 5-9, 10-14 and 15â¯+ years with self-reported diabetes. RESULTS: There were 988 images from 563 people (mean ageâ¯=â¯67 years; 64â¯% were women). Overall, the F1 score was 57â¯%. The group 15â¯+ years of self-reported diabetes had the highest precision (64â¯%) and F1 score (63â¯%), while the highest recall (69â¯%) was observed in the group 0-4 years. The proportion of correctly classified observations was 55â¯% for the group 0-4 years, 51â¯% for 5-9 years, 58â¯% for 10-14 years, and 64â¯% for 15â¯+ years with self-reported diabetes. CONCLUSIONS: The machine learning model had acceptable accuracy and F1 score, and correctly classified more than half of the patients according to diabetes duration. Using large foundational models to extract image and text embeddings seems a feasible and efficient approach to predict years living with self-reported diabetes.

Assuntos

Diabetes Mellitus , Fundo de Olho , Aprendizado de Máquina , Valor Preditivo dos Testes , Autorrelato , Humanos , Feminino , Masculino , Idoso , Pessoa de Meia-Idade , Fatores de Tempo , Diabetes Mellitus/diagnóstico , Diabetes Mellitus/epidemiologia , Brasil/epidemiologia , Adulto , Bases de Dados Factuais , Retinopatia Diabética/diagnóstico , Retinopatia Diabética/epidemiologia , Mineração de Dados/métodos , Reprodutibilidade dos Testes , Interpretação de Imagem Assistida por Computador

Phenotypes of non-alcoholic fatty liver disease (NAFLD) and all-cause mortality: unsupervised machine learning analysis of NHANES III.

Carrillo-Larco, Rodrigo M; Guzman-Vilca, Wilmer Cristobal; Castillo-Cara, Manuel; Alvizuri-Gómez, Claudia; Alqahtani, Saleh; Garcia-Larsen, Vanessa.

BMJ Open ; 12(11): e067203, 2022 11 23.

Artigo em Inglês | MEDLINE | ID: mdl-36418130

RESUMO

OBJECTIVES: Non-alcoholic fatty liver disease (NAFLD) is a non-communicable disease with a rising prevalence worldwide and with large burden for patients and health systems. To date, the presence of unique phenotypes in patients with NAFLD has not been studied, and their identification could inform precision medicine and public health with pragmatic implications in personalised management and care for patients with NAFLD. DESIGN: Cross-sectional and prospective (up to 31 December 2019) analysis of National Health and Nutrition Examination Survey III (1988-1994). PRIMARY AND SECONDARY OUTCOMES MEASURES: NAFLD diagnosis was based on liver ultrasound. The following predictors informed an unsupervised machine learning algorithm (k-means): body mass index, waist circumference, systolic blood pressure (SBP), plasma glucose, total cholesterol, triglycerides, liver enzymes alanine aminotransferase, aspartate aminotransferase and gamma glutamyl transferase. We summarised (means) and compared the predictors across clusters. We used Cox proportional hazard models to quantify the all-cause mortality risk associated with each cluster. RESULTS: 1652 patients with NAFLD (mean age 47.2 years and 51.5% women) were grouped into 3 clusters: anthro-SBP-glucose (6.36%; highest levels of anthropometrics, SBP and glucose), lipid-liver (10.35%; highest levels of lipid and liver enzymes) and average (83.29%; predictors at average levels). Compared with the average phenotype, the anthro-SBP-glucose phenotype had higher all-cause mortality risk (aHR=2.88; 95% CI: 2.26 to 3.67); the lipid-liver phenotype was not associated with higher all-cause mortality risk (aHR=1.11; 95% CI: 0.86 to 1.42). CONCLUSIONS: There is heterogeneity in patients with NAFLD, whom can be divided into three phenotypes with different mortality risk. These phenotypes could guide specific interventions and management plans, thus advancing precision medicine and public health for patients with NAFLD.

Assuntos

Hepatopatia Gordurosa não Alcoólica , Feminino , Masculino , Humanos , Hepatopatia Gordurosa não Alcoólica/epidemiologia , Inquéritos Nutricionais , Estudos Transversais , Aprendizado de Máquina não Supervisionado , Estudos Prospectivos , Triglicerídeos , Glucose

Street images classification according to COVID-19 risk in Lima, Peru: a convolutional neural networks feasibility analysis.

Carrillo-Larco, Rodrigo M; Castillo-Cara, Manuel; Hernández Santa Cruz, Jose Francisco.

BMJ Open ; 12(9): e063411, 2022 09 19.

Artigo em Inglês | MEDLINE | ID: mdl-36123096

RESUMO

OBJECTIVES: During the COVID-19 pandemic, convolutional neural networks (CNNs) have been used in clinical medicine (eg, X-rays classification). Whether CNNs could inform the epidemiology of COVID-19 classifying street images according to COVID-19 risk is unknown, yet it could pinpoint high-risk places and relevant features of the built environment. In a feasibility study, we trained CNNs to classify the area surrounding bus stops (Lima, Peru) into moderate or extreme COVID-19 risk. DESIGN: CNN analysis based on images from bus stops and the surrounding area. We used transfer learning and updated the output layer of five CNNs: NASNetLarge, InceptionResNetV2, Xception, ResNet152V2 and ResNet101V2. We chose the best performing CNN, which was further tuned. We used GradCam to understand the classification process. SETTING: Bus stops from Lima, Peru. We used five images per bus stop. PRIMARY AND SECONDARY OUTCOME MEASURES: Bus stop images were classified according to COVID-19 risk into two labels: moderate or extreme. RESULTS: NASNetLarge outperformed the other CNNs except in the recall metric for the moderate label and in the precision metric for the extreme label; the ResNet152V2 performed better in these two metrics (85% vs 76% and 63% vs 60%, respectively). The NASNetLarge was further tuned. The best recall (75%) and F1 score (65%) for the extreme label were reached with data augmentation techniques. Areas close to buildings or with people were often classified as extreme risk. CONCLUSIONS: This feasibility study showed that CNNs have the potential to classify street images according to levels of COVID-19 risk. In addition to applications in clinical medicine, CNNs and street images could advance the epidemiology of COVID-19 at the population level.

Assuntos

COVID-19 , COVID-19/epidemiologia , Estudos de Viabilidade , Humanos , Redes Neurais de Computação , Pandemias , Peru/epidemiologia

Development, validation, and application of a machine learning model to estimate salt consumption in 54 countries.

Guzman-Vilca, Wilmer Cristobal; Castillo-Cara, Manuel; Carrillo-Larco, Rodrigo M.

Elife ; 112022 01 25.

Artigo em Inglês | MEDLINE | ID: mdl-34984979

RESUMO

Global targets to reduce salt intake have been proposed, but their monitoring is challenged by the lack of population-based data on salt consumption. We developed a machine learning (ML) model to predict salt consumption at the population level based on simple predictors and applied this model to national surveys in 54 countries. We used 21 surveys with spot urine samples for the ML model derivation and validation; we developed a supervised ML regression model based on sex, age, weight, height, and systolic and diastolic blood pressure. We applied the ML model to 54 new surveys to quantify the mean salt consumption in the population. The pooled dataset in which we developed the ML model included 49,776 people. Overall, there were no substantial differences between the observed and ML-predicted mean salt intake (p<0.001). The pooled dataset where we applied the ML model included 166,677 people; the predicted mean salt consumption ranged from 6.8 g/day (95% CI: 6.8-6.8 g/day) in Eritrea to 10.0 g/day (95% CI: 9.9-10.0 g/day) in American Samoa. The countries with the highest predicted mean salt intake were in the Western Pacific. The lowest predicted intake was found in Africa. The country-specific predicted mean salt intake was within reasonable difference from the best available evidence. An ML model based on readily available predictors estimated daily salt consumption with good accuracy. This model could be used to predict mean salt consumption in the general population where urine samples are not available.

Assuntos

Aprendizado de Máquina , Cloreto de Sódio na Dieta/urina , Pressão Sanguínea , Humanos

Clusters of people with type 2 diabetes in the general population: unsupervised machine learning approach using national surveys in Latin America and the Caribbean.

Carrillo-Larco, Rodrigo M; Castillo-Cara, Manuel; Anza-Ramirez, Cecilia; Bernabé-Ortiz, Antonio.

BMJ Open Diabetes Res Care ; 9(1)2021 01.

Artigo em Inglês | MEDLINE | ID: mdl-33514531

RESUMO

INTRODUCTION: We aimed to identify clusters of people with type 2 diabetes mellitus (T2DM) and to assess whether the frequency of these clusters was consistent across selected countries in Latin America and the Caribbean (LAC). RESEARCH DESIGN AND METHODS: We analyzed 13 population-based national surveys in nine countries (n=8361). We used k-means to develop a clustering model; predictors were age, sex, body mass index (BMI), waist circumference (WC), systolic/diastolic blood pressure (SBP/DBP), and T2DM family history. The training data set included all surveys, and the clusters were then predicted in each country-year data set. We used Euclidean distance, elbow and silhouette plots to select the optimal number of clusters and described each cluster according to the underlying predictors (mean and proportions). RESULTS: The optimal number of clusters was 4. Cluster 0 grouped more men and those with the highest mean SBP/DBP. Cluster 1 had the highest mean BMI and WC, as well as the largest proportion of T2DM family history. We observed the smallest values of all predictors in cluster 2. Cluster 3 had the highest mean age. When we reflected the four clusters in each country-year data set, a different distribution was observed. For example, cluster 3 was the most frequent in the training data set, and so it was in 7 out of 13 other country-year data sets. CONCLUSIONS: Using unsupervised machine learning algorithms, it was possible to cluster people with T2DM from the general population in LAC; clusters showed unique profiles that could be used to identify the underlying characteristics of the T2DM population in LAC.

Assuntos

Diabetes Mellitus Tipo 2 , Região do Caribe/epidemiologia , Diabetes Mellitus Tipo 2/epidemiologia , Humanos , América Latina/epidemiologia , Masculino , Aprendizado de Máquina não Supervisionado , Circunferência da Cintura

Using country-level variables to classify countries according to the number of confirmed COVID-19 cases: An unsupervised machine learning approach.

Carrillo-Larco, Rodrigo M; Castillo-Cara, Manuel.

Wellcome Open Res ; 5: 56, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-32587900

RESUMO

Background: The COVID-19 pandemic has attracted the attention of researchers and clinicians whom have provided evidence about risk factors and clinical outcomes. Research on the COVID-19 pandemic benefiting from open-access data and machine learning algorithms is still scarce yet can produce relevant and pragmatic information. With country-level pre-COVID-19-pandemic variables, we aimed to cluster countries in groups with shared profiles of the COVID-19 pandemic. Methods: Unsupervised machine learning algorithms (k-means) were used to define data-driven clusters of countries; the algorithm was informed by disease prevalence estimates, metrics of air pollution, socio-economic status and health system coverage. Using the one-way ANOVA test, we compared the clusters in terms of number of confirmed COVID-19 cases, number of deaths, case fatality rate and order in which the country reported the first case. Results: The model to define the clusters was developed with 155 countries. The model with three principal component analysis parameters and five or six clusters showed the best ability to group countries in relevant sets. There was strong evidence that the model with five or six clusters could stratify countries according to the number of confirmed COVID-19 cases (p<0.001). However, the model could not stratify countries in terms of number of deaths or case fatality rate. Conclusions: A simple data-driven approach using available global information before the COVID-19 pandemic, seemed able to classify countries in terms of the number of confirmed COVID-19 cases. The model was not able to stratify countries based on COVID-19 mortality data.

An Empirical Study of the Transmission Power Setting for Bluetooth-Based Indoor Localization Mechanisms.

Castillo-Cara, Manuel; Lovón-Melgarejo, Jesús; Bravo-Rocca, Gusseppe; Orozco-Barbosa, Luis; García-Varea, Ismael.

Sensors (Basel) ; 17(6)2017 Jun 07.

Artigo em Inglês | MEDLINE | ID: mdl-28590413

RESUMO

Nowadays, there is a great interest in developing accurate wireless indoor localization mechanisms enabling the implementation of many consumer-oriented services. Among the many proposals, wireless indoor localization mechanisms based on the Received Signal Strength Indication (RSSI) are being widely explored. Most studies have focused on the evaluation of the capabilities of different mobile device brands and wireless network technologies. Furthermore, different parameters and algorithms have been proposed as a means of improving the accuracy of wireless-based localization mechanisms. In this paper, we focus on the tuning of the RSSI fingerprint to be used in the implementation of a Bluetooth Low Energy 4.0 (BLE4.0) Bluetooth localization mechanism. Following a holistic approach, we start by assessing the capabilities of two Bluetooth sensor/receiver devices. We then evaluate the relevance of the RSSI fingerprint reported by each BLE4.0 beacon operating at various transmission power levels using feature selection techniques. Based on our findings, we use two classification algorithms in order to improve the setting of the transmission power levels of each of the BLE4.0 beacons. Our main findings show that our proposal can greatly improve the localization accuracy by setting a custom transmission power level for each BLE4.0 beacon.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA