Pesquisa | Portal Regional da BVS (teste)

1.

A deep semantic matching approach for identifying relevant messages for social media analysis.

Biggers, Frederick Brown; Mohanty, Somya D; Manda, Prashanti.

Sci Rep ; 13(1): 12005, 2023 Jul 25.

Artigo em Inglês | MEDLINE | ID: mdl-37491443

RESUMO

There is a growing interest in using social media content for Natural Language Processing applications. However, it is not easy to computationally identify the most relevant set of tweets related to any specific event. Challenging semantics coupled with different ways for using natural language in social media make it difficult for retrieving the most relevant set of data from any social media outlet. This paper seeks to demonstrate a way to present the changing semantics of Twitter within the context of a crisis event, specifically tweets during Hurricane Irma. These methods can be used to identify the most relevant corpus of text for analysis in relevance to a specific incident such as a hurricane. Using an implementation of the Word2Vec method of Neural Network training mechanisms to create Word Embeddings, this paper will: discuss how the relative meaning of words changes as events unfold; present a mechanism for scoring tweets based upon dynamic, relative context relatedness; and show that similarity between words is not necessarily static. We present different methods for training the vector model in Word2Vec for identification of the most relevant tweets for any search query. The impact of tuning parameters such as Word Window Size, Minimum Word Frequency, Hidden Layer Dimensionality, and Negative Sampling on model performance was explored. The window containing the local maximum for AU_ROC for each parameter serves as a guide for other studies using the methods presented here for social media data analysis.

2.

A Gated Recurrent Unit based architecture for recognizing ontology concepts from biological literature.

Devkota, Pratik; Mohanty, Somya D; Manda, Prashanti.

BioData Min ; 15(1): 22, 2022 Sep 28.

Artigo em Inglês | MEDLINE | ID: mdl-36171616

RESUMO

BACKGROUND: Annotating scientific literature with ontology concepts is a critical task in biology and several other domains for knowledge discovery. Ontology based annotations can power large-scale comparative analyses in a wide range of applications ranging from evolutionary phenotypes to rare human diseases to the study of protein functions. Computational methods that can tag scientific text with ontology terms have included lexical/syntactic methods, traditional machine learning, and most recently, deep learning. RESULTS: Here, we present state of the art deep learning architectures based on Gated Recurrent Units for annotating text with ontology concepts. We use the Colorado Richly Annotated Full Text Corpus (CRAFT) as a gold standard for training and testing. We explore a number of additional information sources including NCBI's BioThesauraus and Unified Medical Language System (UMLS) to augment information from CRAFT for increasing prediction accuracy. Our best model results in a 0.84 F1 and semantic similarity. CONCLUSION: The results shown here underscore the impact for using deep learning architectures for automatically recognizing ontology concepts from literature. The augmentation of the models with biological information beyond that present in the gold standard corpus shows a distinct improvement in prediction accuracy.

3.

Machine learning for predicting readmission risk among the frail: Explainable AI for healthcare.

Mohanty, Somya D; Lekan, Deborah; McCoy, Thomas P; Jenkins, Marjorie; Manda, Prashanti.

Patterns (N Y) ; 3(1): 100395, 2022 Jan 14.

Artigo em Inglês | MEDLINE | ID: mdl-35079714

RESUMO

Healthcare costs due to unplanned readmissions are high and negatively affect health and wellness of patients. Hospital readmission is an undesirable outcome for elderly patients. Here, we present readmission risk prediction using five machine learning approaches for predicting 30-day unplanned readmission for elderly patients (age ≥ 50 years). We use a comprehensive and curated set of variables that include frailty, comorbidities, high-risk medications, demographics, hospital, and insurance utilization to build these models. We conduct a large-scale study with electronic health record (her) data with over 145,000 observations from 76,000 patients. Findings indicate that the category boost (CatBoost) model outperforms other models with a mean area under the curve (AUC) of 0.79. We find that prior readmissions, discharge to a rehabilitation facility, length of stay, comorbidities, and frailty indicators were all strong predictors of 30-day readmission. We present in-depth insights using Shapley additive explanations (SHAP), the state of the art in machine learning explainability.

4.

A multi-modal approach towards mining social media data during natural disasters - a case study of Hurricane Irma.

Mohanty, Somya D; Biggers, Brown; Sayedahmed, Saed; Pourebrahim, Nastaran; Goldstein, Evan B; Bunch, Rick; Chi, Guangqing; Sadri, Fereidoon; McCoy, Tom P; Cosby, Arthur.

Int J Disaster Risk Reduct ; 542021 Feb 15.

Artigo em Inglês | MEDLINE | ID: mdl-33542893

RESUMO

Streaming social media provides a real-time glimpse of extreme weather impacts. However, the volume of streaming data makes mining information a challenge for emergency managers, policy makers, and disciplinary scientists. Here we explore the effectiveness of data learned approaches to mine and filter information from streaming social media data from Hurricane Irma's landfall in Florida, USA. We use 54,383 Twitter messages (out of 784K geolocated messages) from 16,598 users from Sept. 10 - 12, 2017 to develop 4 independent models to filter data for relevance: 1) a geospatial model based on forcing conditions at the place and time of each tweet, 2) an image classification model for tweets that include images, 3) a user model to predict the reliability of the tweeter, and 4) a text model to determine if the text is related to Hurricane Irma. All four models are independently tested, and can be combined to quickly filter and visualize tweets based on user-defined thresholds for each submodel. We envision that this type of filtering and visualization routine can be useful as a base model for data capture from noisy sources such as Twitter. The data can then be subsequently used by policy makers, environmental managers, emergency managers, and domain scientists interested in finding tweets with specific attributes to use during different stages of the disaster (e.g., preparedness, response, and recovery), or for detailed research.

5.

A data-driven approach to predicting diabetes and cardiovascular disease with machine learning.

Dinh, An; Miertschin, Stacey; Young, Amber; Mohanty, Somya D.

BMC Med Inform Decis Mak ; 19(1): 211, 2019 11 06.

Artigo em Inglês | MEDLINE | ID: mdl-31694707

RESUMO

BACKGROUND: Diabetes and cardiovascular disease are two of the main causes of death in the United States. Identifying and predicting these diseases in patients is the first step towards stopping their progression. We evaluate the capabilities of machine learning models in detecting at-risk patients using survey data (and laboratory results), and identify key variables within the data contributing to these diseases among the patients. METHODS: Our research explores data-driven approaches which utilize supervised machine learning models to identify patients with such diseases. Using the National Health and Nutrition Examination Survey (NHANES) dataset, we conduct an exhaustive search of all available feature variables within the data to develop models for cardiovascular, prediabetes, and diabetes detection. Using different time-frames and feature sets for the data (based on laboratory data), multiple machine learning models (logistic regression, support vector machines, random forest, and gradient boosting) were evaluated on their classification performance. The models were then combined to develop a weighted ensemble model, capable of leveraging the performance of the disparate models to improve detection accuracy. Information gain of tree-based models was used to identify the key variables within the patient data that contributed to the detection of at-risk patients in each of the diseases classes by the data-learned models. RESULTS: The developed ensemble model for cardiovascular disease (based on 131 variables) achieved an Area Under - Receiver Operating Characteristics (AU-ROC) score of 83.1% using no laboratory results, and 83.9% accuracy with laboratory results. In diabetes classification (based on 123 variables), eXtreme Gradient Boost (XGBoost) model achieved an AU-ROC score of 86.2% (without laboratory data) and 95.7% (with laboratory data). For pre-diabetic patients, the ensemble model had the top AU-ROC score of 73.7% (without laboratory data), and for laboratory based data XGBoost performed the best at 84.4%. Top five predictors in diabetes patients were 1) waist size, 2) age, 3) self-reported weight, 4) leg length, and 5) sodium intake. For cardiovascular diseases the models identified 1) age, 2) systolic blood pressure, 3) self-reported weight, 4) occurrence of chest pain, and 5) diastolic blood pressure as key contributors. CONCLUSION: We conclude machine learned models based on survey questionnaire can provide an automated identification mechanism for patients at risk of diabetes and cardiovascular diseases. We also identify key contributors to the prediction, which can be further explored for their implications on electronic health records.

Assuntos

Doenças Cardiovasculares/diagnóstico , Doenças Cardiovasculares/etiologia , Diabetes Mellitus/diagnóstico , Diabetes Mellitus/etiologia , Aprendizado de Máquina , Registros Eletrônicos de Saúde/estatística & dados numéricos , Feminino , Humanos , Modelos Logísticos , Masculino , Inquéritos Nutricionais , Valor Preditivo dos Testes , Curva ROC , Máquina de Vetores de Suporte

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA