Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
JMIR Ment Health ; 11: e50150, 2024 Jan 25.
Artigo em Inglês | MEDLINE | ID: mdl-38271138

RESUMO

BACKGROUND: Health care providers and health-related researchers face significant challenges when applying sentiment analysis tools to health-related free-text survey data. Most state-of-the-art applications were developed in domains such as social media, and their performance in the health care context remains relatively unknown. Moreover, existing studies indicate that these tools often lack accuracy and produce inconsistent results. OBJECTIVE: This study aims to address the lack of comparative analysis on sentiment analysis tools applied to health-related free-text survey data in the context of COVID-19. The objective was to automatically predict sentence sentiment for 2 independent COVID-19 survey data sets from the National Institutes of Health and Stanford University. METHODS: Gold standard labels were created for a subset of each data set using a panel of human raters. We compared 8 state-of-the-art sentiment analysis tools on both data sets to evaluate variability and disagreement across tools. In addition, few-shot learning was explored by fine-tuning Open Pre-Trained Transformers (OPT; a large language model [LLM] with publicly available weights) using a small annotated subset and zero-shot learning using ChatGPT (an LLM without available weights). RESULTS: The comparison of sentiment analysis tools revealed high variability and disagreement across the evaluated tools when applied to health-related survey data. OPT and ChatGPT demonstrated superior performance, outperforming all other sentiment analysis tools. Moreover, ChatGPT outperformed OPT, exhibited higher accuracy by 6% and higher F-measure by 4% to 7%. CONCLUSIONS: This study demonstrates the effectiveness of LLMs, particularly the few-shot learning and zero-shot learning approaches, in the sentiment analysis of health-related survey data. These results have implications for saving human labor and improving efficiency in sentiment analysis tasks, contributing to advancements in the field of automated sentiment analysis.


Assuntos
COVID-19 , Análise de Sentimentos , Estados Unidos/epidemiologia , Humanos , COVID-19/epidemiologia , Inquéritos Epidemiológicos , Aprendizagem , Dissidências e Disputas
2.
JMIR Ment Health ; 10: e40899, 2023 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-36525362

RESUMO

BACKGROUND: The COVID-19 pandemic and its associated restrictions have been a major stressor that has exacerbated mental health worldwide. Qualitative data play a unique role in documenting mental states through both language features and content. Text analysis methods can provide insights into the associations between language use and mental health and reveal relevant themes that emerge organically in open-ended responses. OBJECTIVE: The aim of this web-based longitudinal study on mental health during the early COVID-19 pandemic was to use text analysis methods to analyze free responses to the question, "Is there anything else you would like to tell us that might be important that we did not ask about?" Our goals were to determine whether individuals who responded to the item differed from nonresponders, to determine whether there were associations between language use and psychological status, and to characterize the content of responses and how responses changed over time. METHODS: A total of 3655 individuals enrolled in the study were asked to complete self-reported measures of mental health and COVID-19 pandemic-related questions every 2 weeks for 6 months. Of these 3655 participants, 2497 (68.32%) provided at least 1 free response (9741 total responses). We used various text analysis methods to measure the links between language use and mental health and to characterize response themes over the first year of the pandemic. RESULTS: Response likelihood was influenced by demographic factors and health status: those who were male, Asian, Black, or Hispanic were less likely to respond, and the odds of responding increased with age and education as well as with a history of physical health conditions. Although mental health treatment history did not influence the overall likelihood of responding, it was associated with more negative sentiment, negative word use, and higher use of first-person singular pronouns. Responses were dynamically influenced by psychological status such that distress and loneliness were positively associated with an individual's likelihood to respond at a given time point and were associated with more negativity. Finally, the responses were negative in valence overall and exhibited fluctuations linked with external events. The responses covered a variety of topics, with the most common being mental health and emotion, social or physical distancing, and policy and government. CONCLUSIONS: Our results identify trends in language use during the first year of the pandemic and suggest that both the content of responses and overall sentiments are linked to mental health.

3.
Int J Med Inform ; 162: 104739, 2022 Mar 16.
Artigo em Inglês | MEDLINE | ID: mdl-35325663

RESUMO

BACKGROUND: The national increase in opioid use and misuse has become a public health crisis in the U.S. To tackle this crisis, the systematic evaluation and monitoring of opioid prescribing patterns is necessary. Thus, opioid prescriptions from electronic health records (EHRs) must be standardized to morphine milligram equivalent (MME) to facilitate monitoring and surveillance. While most studies report MMEs to describe opioid prescribing patterns, there is a lack of transparency regarding their data pre-processing and conversion processes for replication or comparison purposes. METHODS: In this work, we developed Opioid2MME, a SQL-based open-source framework, to convert opioid prescriptions to MMEs using EHR prescription data. The MME conversions were validated internally using F-measures through manual chart review; were compared with two existing tools, as MedEx and MedXN; and the framework was tested in an external academic EHR system. RESULTS: We identified 232,913 prescriptions for 49,060 unique patients in the EHRs, 2008-2019. We manually annotated a sample of prescriptions to assess the performance of the framework. The internal evaluation for medication information extraction achieved F-measures from 0.98 to 1.00 for each piece of the extracted information, outperforming MedEx and MedXN (F-Scores 0.98 and 0.94, respectively). MME values in the internal EHR system obtained a F-measure of 0.97 and identified 3% of the data as outliers and 7% missing values. The MME conversion in the external EHR system obtained 78.3% agreement between the MME values obtained with the development site. CONCLUSIONS: The results demonstrated that the framework is replicable and capable of converting opioid prescriptions to MMEs across different medical institutions. In summary, this work sets the groundwork for the systematic evaluation and monitoring of opioid prescribing patterns across healthcare systems.

4.
Front Artif Intell ; 5: 1051724, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36714202

RESUMO

Objective: The adoption of electronic health records (EHRs) has produced enormous amounts of data, creating research opportunities in clinical data sciences. Several concept recognition systems have been developed to facilitate clinical information extraction from these data. While studies exist that compare the performance of many concept recognition systems, they are typically developed internally and may be biased due to different internal implementations, parameters used, and limited number of systems included in the evaluations. The goal of this research is to evaluate the performance of existing systems to retrieve relevant clinical concepts from EHRs. Methods: We investigated six concept recognition systems, including CLAMP, cTAKES, MetaMap, NCBO Annotator, QuickUMLS, and ScispaCy. Clinical concepts extracted included procedures, disorders, medications, and anatomical location. The system performance was evaluated on two datasets: the 2010 i2b2 and the MIMIC-III. Additionally, we assessed the performance of these systems in five challenging situations, including negation, severity, abbreviation, ambiguity, and misspelling. Results: For clinical concept extraction, CLAMP achieved the best performance on exact and inexact matching, with an F-score of 0.70 and 0.94, respectively, on i2b2; and 0.39 and 0.50, respectively, on MIMIC-III. Across the five challenging situations, ScispaCy excelled in extracting abbreviation information (F-score: 0.86) followed by NCBO Annotator (F-score: 0.79). CLAMP outperformed in extracting severity terms (F-score 0.73) followed by NCBO Annotator (F-score: 0.68). CLAMP outperformed other systems in extracting negated concepts (F-score 0.63). Conclusions: Several concept recognition systems exist to extract clinical information from unstructured data. This study provides an external evaluation by end-users of six commonly used systems across different extraction tasks. Our findings suggest that CLAMP provides the most comprehensive set of annotations for clinical concept extraction tasks and associated challenges. Comparing standard extraction tasks across systems provides guidance to other clinical researchers when selecting a concept recognition system relevant to their clinical information extraction task.

5.
Front Psychol ; 12: 712111, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34539512

RESUMO

COVID-19 has presented an unprecedented challenge to human welfare. Indeed, we have witnessed people experiencing a rise of depression, acute stress disorder, and worsening levels of subclinical psychological distress. Finding ways to support individuals' mental health has been particularly difficult during this pandemic. An opportunity for intervention to protect individuals' health & well-being is to identify the existing sources of consolation and hope that have helped people persevere through the early days of the pandemic. In this paper, we identified positive aspects, or "silver linings," that people experienced during the COVID-19 crisis using computational natural language processing methods and qualitative thematic content analysis. These silver linings revealed sources of strength that included finding a sense of community, closeness, gratitude, and a belief that the pandemic may spur positive social change. People's abilities to engage in benefit-finding and leverage protective factors can be bolstered and reinforced by public health policy to improve society's resilience to the distress of this pandemic and potential future health crises.

6.
Artif Intell Med ; 117: 102096, 2021 07.
Artigo em Inglês | MEDLINE | ID: mdl-34127235

RESUMO

BACKGROUND: Internet provides different tools for communicating with patients, such as social media (e.g., Twitter) and email platforms. These platforms provided new data sources to shed lights on patient experiences with health care and improve our understanding of patient-provider communication. Several existing topic modeling and document clustering methods have been adapted to analyze these new free-text data automatically. However, both tweets and emails are often composed of short texts; and existing topic modeling and clustering approaches have suboptimal performance on these short texts. Moreover, research over health-related short texts using these methods has become difficult to reproduce and benchmark, partially due to the absence of a detailed comparison of state-of-the-art topic modeling and clustering methods on these short texts. METHODS: We trained eight state-of- the-art topic modeling and clustering algorithms on short texts from two health-related datasets (tweets and emails): Latent Semantic Indexing (LSI), Latent Dirichlet Allocation (LDA), LDA with Gibbs Sampling (GibbsLDA), Online LDA, Biterm Model (BTM), Online Twitter LDA, and Gibbs Sampling for Dirichlet Multinomial Mixture (GSDMM), as well as the k-means clustering algorithm with two different feature representations: TF-IDF and Doc2Vec. We used cluster validity indices to evaluate the performance of topic modeling and clustering: two internal indices (i.e. assessing the goodness of a clustering structure without external information) and five external indices (i.e. comparing the results of a cluster analysis to an externally known provided class labels). RESULTS: In overall, for number of clusters (k) from 2 to 50, Online Twitter LDA and GSDMM achieved the best performance in terms of internal indices, while LSI and k-means with TF-IDF had the highest external indices. Also, of all tweets (N = 286, 971; HPV represents 94.6% of tweets and lynch syndrome represents 5.4%), for k = 2, most of the methods could respect this initial clustering distribution. However, we found model performance varies with the source of data and hyper-parameters such as the number of topics and the number of iterations used to train the models. We also conducted an error analysis using the Hamming loss metric, for which the poorest value was obtained by GSDMM on both datasets. CONCLUSIONS: Researchers hoping to group or classify health related short-text data can expect to select the most suitable topic modeling and clustering methods for their specific research questions. Therefore, we presented a comparison of the most common used topic modeling and clustering algorithms over two health-related, short-text datasets using both internal and external clustering validation indices. Internal indices suggested Online Twitter LDA and GSDMM as the best, while external indices suggested LSI and k-means with TF-IDF as the best. In summary, our work suggested researchers can improve their analysis of model performance by using a variety of metrics, since there is not a single best metric.


Assuntos
Correio Eletrônico , Mídias Sociais , Análise por Conglomerados , Comunicação , Humanos , Aprendizado de Máquina
7.
Artigo em Inglês | MEDLINE | ID: mdl-35462884

RESUMO

The COVID-19 crisis has produced worldwide changes from people's lifestyles to travel restrictions imposed by world's nations aiming to keep the virus out. Several countries have created digital information applications to help control and manage the COVID-19 crisis, such as the creation of contact tracing apps. The Peruvian government in collaboration with several institutions developed PerúEnTusManos, an epidemiological tracing application. The application uses georeferencing to study users' movements and creates individual mobility patterns from the Peruvian citizens as well as detects crowds. In this article, we present a process to detect possible infected individuals based on probabilities assigned to people that had contact with someone who tested positive for COVID-19, using data collected from PerúEnTusManos. The preliminary evaluation shows promising results when detecting probabilities of possible infected individuals as well as the most infected districts in Peru. The ultimate goal of the application in Peru is to provide reliable information to health authorities to make informed decisions about the assignations of the available clinical tests and the economic re-activation.

8.
Artigo em Inglês | MEDLINE | ID: mdl-35463810

RESUMO

The adoption of electronic health records has increased the volume of clinical data, which has opened an opportunity for healthcare research. There are several biomedical annotation systems that have been used to facilitate the analysis of clinical data. However, there is a lack of clinical annotation comparisons to select the most suitable tool for a specific clinical task. In this work, we used clinical notes from the MIMIC-III database and evaluated three annotation systems to identify four types of entities: (1) procedure, (2) disorder, (3) drug, and (4) anatomy. Our preliminary results demonstrate that BioPortal performs well when extracting disorder and drug. This can provide clinical researchers with real-clinical insights into patient's health patterns and it may allow to create a first version of an annotated dataset.

9.
Artigo em Inglês | MEDLINE | ID: mdl-35463811

RESUMO

Twitter became the most popular form of social interactions in the healthcare domain. Thus, various teams have evaluated Twitter as an additional source where patients share information about their healthcare with the potential goal to improve their outcomes. Several existing topic modeling and document clustering applications have been adapted to assess tweets showing that the performances of the applications are negatively affected due to the nature and characteristics of tweets. Moreover, Twitter health research has become difficult to measure because of the absence of comparisons between the existing applications. In this paper, we perform an evaluation based on internal indexes of different topic modeling and document clustering applications over two Twitter health-related datasets. Our results show that Online Twitter LDA and Gibbs LDA get a better performance for extracting topics and grouping tweets. We want to provide health practitioners this comparison to select the most suitable application for their tasks.

10.
BMC Med Inform Decis Mak ; 18(Suppl 2): 55, 2018 07 23.
Artigo em Inglês | MEDLINE | ID: mdl-30066655

RESUMO

BACKGROUND: There is strong scientific evidence linking obesity and overweight to the risk of various cancers and to cancer survivorship. Nevertheless, the existing online information about the relationship between obesity and cancer is poorly organized, not evidenced-based, of poor quality, and confusing to health information consumers. A formal knowledge representation such as a Semantic Web knowledge base (KB) can help better organize and deliver quality health information. We previously presented the OC-2-KB (Obesity and Cancer to Knowledge Base), a software pipeline that can automatically build an obesity and cancer KB from scientific literature. In this work, we investigated crowdsourcing strategies to increase the number of ground truth annotations and improve the quality of the KB. METHODS: We developed a new release of the OC-2-KB system addressing key challenges in automatic KB construction. OC-2-KB automatically extracts semantic triples in the form of subject-predicate-object expressions from PubMed abstracts related to the obesity and cancer literature. The accuracy of the facts extracted from scientific literature heavily relies on both the quantity and quality of the available ground truth triples. Thus, we incorporated a crowdsourcing process to improve the quality of the KB. RESULTS: We conducted two rounds of crowdsourcing experiments using a new corpus with 82 obesity and cancer-related PubMed abstracts. We demonstrated that crowdsourcing is indeed a low-cost mechanism to collect labeled data from non-expert laypeople. Even though individual layperson might not offer reliable answers, the collective wisdom of the crowd is comparable to expert opinions. We also retrained the relation detection machine learning models in OC-2-KB using the crowd annotated data and evaluated the content of the curated KB with a set of competency questions. Our evaluation showed improved performance of the underlying relation detection model in comparison to the baseline OC-2-KB. CONCLUSIONS: We presented a new version of OC-2-KB, a system that automatically builds an evidence-based obesity and cancer KB from scientific literature. Our KB construction framework integrated automatic information extraction with crowdsourcing techniques to verify the extracted knowledge. Our ultimate goal is a paradigm shift in how the general public access, read, digest, and use online health information.


Assuntos
Crowdsourcing , Bases de Conhecimento , Neoplasias , Obesidade , Curadoria de Dados , Medicina Baseada em Evidências , Humanos , Armazenamento e Recuperação da Informação , Aprendizado de Máquina , PubMed , Semântica , Software
11.
J Biomed Inform ; 80: 1-13, 2018 04.
Artigo em Inglês | MEDLINE | ID: mdl-29462669

RESUMO

With the proliferation of heterogeneous health care data in the last three decades, biomedical ontologies and controlled biomedical terminologies play a more and more important role in knowledge representation and management, data integration, natural language processing, as well as decision support for health information systems and biomedical research. Biomedical ontologies and controlled terminologies are intended to assure interoperability. Nevertheless, the quality of biomedical ontologies has hindered their applicability and subsequent adoption in real-world applications. Ontology evaluation is an integral part of ontology development and maintenance. In the biomedicine domain, ontology evaluation is often conducted by third parties as a quality assurance (or auditing) effort that focuses on identifying modeling errors and inconsistencies. In this work, we first organized four categorical schemes of ontology evaluation methods in the existing literature to create an integrated taxonomy. Further, to understand the ontology evaluation practice in the biomedicine domain, we reviewed a sample of 200 ontologies from the National Center for Biomedical Ontology (NCBO) BioPortal-the largest repository for biomedical ontologies-and observed that only 15 of these ontologies have documented evaluation in their corresponding inception papers. We then surveyed the recent quality assurance approaches for biomedical ontologies and their use. We also mapped these quality assurance approaches to the ontology evaluation criteria. It is our anticipation that ontology evaluation and quality assurance approaches will be more widely adopted in the development life cycle of biomedical ontologies.


Assuntos
Ontologias Biológicas , Informática Médica/normas , Registros Eletrônicos de Saúde , Humanos , Garantia da Qualidade dos Cuidados de Saúde , Semântica
12.
Artigo em Inglês | MEDLINE | ID: mdl-29629236

RESUMO

Obesity has been linked to several types of cancer. Access to adequate health information activates people's participation in managing their own health, which ultimately improves their health outcomes. Nevertheless, the existing online information about the relationship between obesity and cancer is heterogeneous and poorly organized. A formal knowledge representation can help better organize and deliver quality health information. Currently, there are several efforts in the biomedical domain to convert unstructured data to structured data and store them in Semantic Web knowledge bases (KB). In this demo paper, we present, OC-2-KB (Obesity and Cancer to Knowledge Base), a system that is tailored to guide the automatic KB construction for managing obesity and cancer knowledge from free-text scientific literature (i.e., PubMed abstracts) in a systematic way. OC-2-KB has two important modules which perform the acquisition of entities and the extraction then classification of relationships among these entities. We tested the OC-2-KB system on a data set with 23 manually annotated obesity and cancer PubMed abstracts and created a preliminary KB with 765 triples. We conducted a preliminary evaluation on this sample of triples and reported our evaluation results.

13.
Artigo em Inglês | MEDLINE | ID: mdl-28503356

RESUMO

Obesity is associated with increased risks of various types of cancer, as well as a wide range of other chronic diseases. On the other hand, access to health information activates patient participation, and improve their health outcomes. However, existing online information on obesity and its relationship to cancer is heterogeneous ranging from pre-clinical models and case studies to mere hypothesis-based scientific arguments. A formal knowledge representation (i.e., a semantic knowledge base) would help better organizing and delivering quality health information related to obesity and cancer that consumers need. Nevertheless, current ontologies describing obesity, cancer and related entities are not designed to guide automatic knowledge base construction from heterogeneous information sources. Thus, in this paper, we present methods for named-entity recognition (NER) to extract biomedical entities from scholarly articles and for detecting if two biomedical entities are related, with the long term goal of building a obesity-cancer knowledge base. We leverage both linguistic and statistical approaches in the NER task, which supersedes the state-of-the-art results. Further, based on statistical features extracted from the sentences, our method for relation detection obtains an accuracy of 99.3% and a f-measure of 0.993.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...