Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 4.371
Filtrar
1.
PeerJ Comput Sci ; 10: e2122, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38983192

RESUMO

Grammar error correction systems are pivotal in the field of natural language processing (NLP), with a primary focus on identifying and correcting the grammatical integrity of written text. This is crucial for both language learning and formal communication. Recently, neural machine translation (NMT) has emerged as a promising approach in high demand. However, this approach faces significant challenges, particularly the scarcity of training data and the complexity of grammar error correction (GEC), especially for low-resource languages such as Indonesian. To address these challenges, we propose InSpelPoS, a confusion method that combines two synthetic data generation methods: the Inverted Spellchecker and Patterns+POS. Furthermore, we introduce an adapted seq2seq framework equipped with a dynamic decoding method and state-of-the-art Transformer-based neural language models to enhance the accuracy and efficiency of GEC. The dynamic decoding method is capable of navigating the complexities of GEC and correcting a wide range of errors, including contextual and grammatical errors. The proposed model leverages the contextual information of words and sentences to generate a corrected output. To assess the effectiveness of our proposed framework, we conducted experiments using synthetic data and compared its performance with existing GEC systems. The results demonstrate a significant improvement in the accuracy of Indonesian GEC compared to existing methods.

2.
PeerJ Comput Sci ; 10: e2063, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38983191

RESUMO

Lack of an effective early sign language learning framework for a hard-of-hearing population can have traumatic consequences, causing social isolation and unfair treatment in workplaces. Alphabet and digit detection methods have been the basic framework for early sign language learning but are restricted by performance and accuracy, making it difficult to detect signs in real life. This article proposes an improved sign language detection method for early sign language learners based on the You Only Look Once version 8.0 (YOLOv8) algorithm, referred to as the intelligent sign language detection system (iSDS), which exploits the power of deep learning to detect sign language-distinct features. The iSDS method could overcome the false positive rates and improve the accuracy as well as the speed of sign language detection. The proposed iSDS framework for early sign language learners consists of three basic steps: (i) image pixel processing to extract features that are underrepresented in the frame, (ii) inter-dependence pixel-based feature extraction using YOLOv8, (iii) web-based signer independence validation. The proposed iSDS enables faster response times and reduces misinterpretation and inference delay time. The iSDS achieved state-of-the-art performance of over 97% for precision, recall, and F1-score with the best mAP of 87%. The proposed iSDS method has several potential applications, including continuous sign language detection systems and intelligent web-based sign recognition systems.

3.
PeerJ Comput Sci ; 10: e2092, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38983225

RESUMO

More sophisticated data access is possible with artificial intelligence (AI) techniques such as question answering (QA), but regulations and privacy concerns have limited their use. Federated learning (FL) deals with these problems, and QA is a viable substitute for AI. The utilization of hierarchical FL systems is examined in this research, along with an ideal method for developing client-specific adapters. The User Modified Hierarchical Federated Learning Model (UMHFLM) selects local models for users' tasks. The article suggests employing recurrent neural network (RNN) as a neural network (NN) technique for learning automatically and categorizing questions based on natural language into the appropriate templates. Together, local and global models are developed, with the worldwide model influencing local models, which are, in turn, combined for personalization. The method is applied in natural language processing pipelines for phrase matching employing template exact match, segmentation, and answer type detection. The (SQuAD-2.0), a DL-based QA method for acquiring knowledge of complicated SPARQL test questions and their accompanying SPARQL queries across the DBpedia dataset, was used to train and assess the model. The SQuAD2.0 datasets evaluate the model, which identifies 38 distinct templates. Considering the top two most likely templates, the RNN model achieves template classification accuracy of 92.8% and 61.8% on the SQuAD2.0 and QALD-7 datasets. A study on data scarcity among participants found that FL Match outperformed BERT significantly. A MAP margin of 2.60% exists between BERT and FL Match at a 100% data ratio and an MRR margin of 7.23% at a 20% data ratio.

4.
PeerJ Comput Sci ; 10: e2138, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38983234

RESUMO

The recent rapid growth in the number of Saudi female athletes and sports enthusiasts' presence on social media has exposed them to gender-hate speech and discrimination. Hate speech, a harmful worldwide phenomenon, can have severe consequences. Its prevalence in sports has surged alongside the growing influence of social media, with X serving as a prominent platform for the expression of hate speech and discriminatory comments, often targeting women in sports. This research combines two studies that explores online hate speech and gender biases in the context of sports, proposing an automated solution for detecting hate speech targeting women in sports on platforms like X, with a particular focus on Arabic, a challenging domain with limited prior research. In Study 1, semi-structured interviews with 33 Saudi female athletes and sports fans revealed common forms of hate speech, including gender-based derogatory comments, misogyny, and appearance-related discrimination. Building upon the foundations laid by Study 1, Study 2 addresses the pressing need for effective interventions to combat hate speech against women in sports on social media by evaluating machine learning (ML) models for identifying hate speech targeting women in sports in Arabic. A dataset of 7,487 Arabic tweets was collected, annotated, and pre-processed. Term frequency-inverse document frequency (TF-IDF) and part-of-speech (POS) feature extraction techniques were used, and various ML algorithms were trained Random Forest consistently outperformed, achieving accuracy (85% and 84% using TF-IDF and POS, respectively) compared to other methods, demonstrating the effectiveness of both feature sets in identifying Arabic hate speech. The research contribution advances the understanding of online hate targeting Arabic women in sports by identifying various forms of such hate. The systematic creation of a meticulously annotated Arabic hate speech dataset, specifically focused on women's sports, enhances the dataset's reliability and provides valuable insights for future research in countering hate speech against women in sports. This dataset forms a strong foundation for developing effective strategies to address online hate within the unique context of women's sports. The research findings contribute to the ongoing efforts to combat hate speech against women in sports on social media, aligning with the objectives of Saudi Arabia's Vision 2030 and recognizing the significance of female participation in sports.

5.
Front Digit Health ; 6: 1387139, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38983792

RESUMO

Introduction: Patient-reported outcomes measures (PROMs) are valuable tools for assessing health-related quality of life and treatment effectiveness in individuals with traumatic brain injuries (TBIs). Understanding the experiences of individuals with TBIs in completing PROMs is crucial for improving their utility and relevance in clinical practice. Methods: Sixteen semi-structured interviews were conducted with a sample of individuals with TBIs. The interviews were transcribed verbatim and analysed using Thematic Analysis (TA) and Natural Language Processing (NLP) techniques to identify themes and emotional connotations related to the experiences of completing PROMs. Results: The TA of the data revealed six key themes regarding the experiences of individuals with TBIs in completing PROMs. Participants expressed varying levels of understanding and engagement with PROMs, with factors such as cognitive impairments and communication difficulties influencing their experiences. Additionally, insightful suggestions emerged on the barriers to the completion of PROMs, the factors facilitating it, and the suggestions for improving their contents and delivery methods. The sentiment analyses performed using NLP techniques allowed for the retrieval of the general sentimental and emotional "tones" in the participants' narratives of their experiences with PROMs, which were mainly characterised by low positive sentiment connotations. Although mostly neutral, participants' narratives also revealed the presence of emotions such as fear and, to a lesser extent, anger. The combination of a semantic and sentiment analysis of the experiences of people with TBIs rendered valuable information on the views and emotional responses to different aspects of the PROMs. Discussion: The findings highlighted the complexities involved in administering PROMs to individuals with TBIs and underscored the need for tailored approaches to accommodate their unique challenges. Integrating TA-based and NLP techniques can offer valuable insights into the experiences of individuals with TBIs and enhance the interpretation of qualitative data in this population.

6.
Artigo em Inglês | MEDLINE | ID: mdl-38980524

RESUMO

OBJECTIVE: Language used by providers in medical documentation may reveal evidence of race-related implicit bias. We aimed to use natural language processing (NLP) to examine if prevalence of stigmatizing language in emergency medicine (EM) encounter notes differs across patient race/ethnicity. METHODS: In a retrospective cohort of EM encounters, NLP techniques identified stigmatizing and positive themes. Logistic regression models analyzed the association of race/ethnicity and themes within notes. Outcomes were the presence (or absence) of 7 different themes: 5 stigmatizing (difficult, non-compliant, skepticism, substance abuse/seeking, and financial difficulty) and 2 positive (compliment and compliant). RESULTS: The sample included notes from 26,363 unique patients. NH Black patient notes were less likely to contain difficult (odds ratio (OR) 0.80, 95% confidence interval (CI), 0.73-0.88), skepticism (OR 0.87, 95% CI, 0.79-0.96), and substance abuse/seeking (OR 0.62, 95% CI, 0.56-0.70) compared to NH White patient notes but more likely to contain non-compliant (OR 1.26, 95% CI, 1.17-1.36) and financial difficulty (OR 1.14, 95% CI, 1.04-1.25). Hispanic patient notes were less likely to contain difficult (OR 0.68, 95% CI, 0.58-0.80) and substance abuse/seeking (OR 0.78, 95% CI, 0.66-0.93). NH NA/AI patient notes had twice the odds as NH White patient notes to contain a stigmatizing theme (OR 2.02, 95% CI, 1.64-2.49). CONCLUSIONS: Using an NLP model to analyze themes in EM notes across racial groups, we identified several inequities in the usage of positive and stigmatizing language. Interventions to minimize race-related implicit bias should be undertaken.

7.
JMIR Ment Health ; 11: e56569, 2024 Jul 02.
Artigo em Inglês | MEDLINE | ID: mdl-38958218

RESUMO

Unlabelled: Large language model (LLM)-powered services are gaining popularity in various applications due to their exceptional performance in many tasks, such as sentiment analysis and answering questions. Recently, research has been exploring their potential use in digital health contexts, particularly in the mental health domain. However, implementing LLM-enhanced conversational artificial intelligence (CAI) presents significant ethical, technical, and clinical challenges. In this viewpoint paper, we discuss 2 challenges that affect the use of LLM-enhanced CAI for individuals with mental health issues, focusing on the use case of patients with depression: the tendency to humanize LLM-enhanced CAI and their lack of contextualized robustness. Our approach is interdisciplinary, relying on considerations from philosophy, psychology, and computer science. We argue that the humanization of LLM-enhanced CAI hinges on the reflection of what it means to simulate "human-like" features with LLMs and what role these systems should play in interactions with humans. Further, ensuring the contextualization of the robustness of LLMs requires considering the specificities of language production in individuals with depression, as well as its evolution over time. Finally, we provide a series of recommendations to foster the responsible design and deployment of LLM-enhanced CAI for the therapeutic support of individuals with depression.


Assuntos
Inteligência Artificial , Depressão , Humanos , Depressão/psicologia , Depressão/terapia , Idioma , Comunicação , Humanismo
8.
J Am Coll Radiol ; 2024 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-38960083

RESUMO

PURPOSE: We compared the performance of generative AI (G-AI, ATARI) and natural language processing (NLP) tools for identifying laterality errors in radiology reports and images. METHODS: We used an NLP-based (mPower) tool to identify radiology reports flagged for laterality errors in its QA Dashboard. The NLP model detects and highlights laterality mismatches in radiology reports. From an initial pool of 1124 radiology reports flagged by the NLP for laterality errors, we selected and evaluated 898 reports that encompassed radiography, CT, MRI, and ultrasound modalities to ensure comprehensive coverage. A radiologist reviewed each radiology report to assess if the flagged laterality errors were present (reporting error - true positive) or absent (NLP error - false positive). Next, we applied ATARI to 237 radiology reports and images with consecutive NLP true positive (118 reports) and false positive (119 reports) laterality errors. We estimated accuracy of NLP and G-AI tools to identify overall and modality-wise laterality errors. RESULTS: Among the 898 NLP-flagged laterality errors, 64% (574/898) had NLP errors and 36% (324/898) were reporting errors. The text query ATARI feature correctly identified the absence of laterality mismatch (NLP false positives) with a 97.4% accuracy (115/118 reports; 95% CI = 96.5% - 98.3%). Combined Vision and text query resulted in 98.3% accuracy (116/118 reports/images; 95% CI = 97.6% - 99.0%) query alone had a 98.3% accuracy (116/118 images; 95% CI = 97.6% - 99.0%). CONCLUSION: The generative AI-empowered ATARI prototype outperformed the assessed NLP tool for determining true and false laterality errors in radiology reports while enabling an image-based laterality determination. Underlying errors in ATARI text query in complex radiology reports emphasize the need for further improvement in the technology.

9.
Radiol Bras ; 57: e20230096en, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38993952

RESUMO

Objective: To develop a natural language processing application capable of automatically identifying benign gallbladder diseases that require surgery, from radiology reports. Materials and Methods: We developed a text classifier to classify reports as describing benign diseases of the gallbladder that do or do not require surgery. We randomly selected 1,200 reports describing the gallbladder from our database, including different modalities. Four radiologists classified the reports as describing benign disease that should or should not be treated surgically. Two deep learning architectures were trained for classification: a convolutional neural network (CNN) and a bidirectional long short-term memory (BiLSTM) network. In order to represent words in vector form, the models included a Word2Vec representation, with dimensions of 300 or 1,000. The models were trained and evaluated by dividing the dataset into training, validation, and subsets (80/10/10). Results: The CNN and BiLSTM performed well in both dimensional spaces. For the 300- and 1,000-dimensional spaces, respectively, the F1-scores were 0.95945 and 0.95302 for the CNN model, compared with 0.96732 and 0.96732 for the BiLSTM model. Conclusion: Our models achieved high performance, regardless of the architecture and dimensional space employed.


Objetivo: Desenvolver uma aplicação de processamento de linguagem natural capaz de identificar automaticamente doenças cirúrgicas benignas da vesícula biliar a partir de laudos radiológicos. Materiais e Métodos: Desenvolvemos um classificador de texto para classificar laudos como contendo ou não doenças cirúrgicas benignas da vesícula biliar. Selecionamos aleatoriamente 1.200 laudos com descrição da vesícula biliar de nosso banco de dados, incluindo diferentes modalidades. Quatro radiologistas classificaram os laudos como doença benigna cirúrgica ou não. Duas arquiteturas de aprendizagem profunda foram treinadas para a classificação: a rede neural convolucional (convolutional neural network - CNN) e a memória longa de curto prazo bidirecional (bidirectional long short-term memory - BiLSTM). Para representar palavras de forma vetorial, os modelos incluíram uma representação Word2Vec, com dimensões variando de 300 a 1000. Os modelos foram treinados e avaliados por meio da divisão do conjunto de dados entre treinamento, validação e teste (80/10/10). Resultados: CNN e BiLSTM tiveram bom desempenho em ambos os espaços dimensionais. Relatamos para 300 e 1000 dimensões, respectivamente, as pontuações F1 de 0,95945 e 0,95302 para o modelo CNN e de 0,96732 e 0,96732 para a BiLSTM. Conclusão: Nossos modelos alcançaram alto desempenho, independentemente de diferentes arquiteturas e espaços dimensionais.

10.
Res Nurs Health ; 2024 Jul 03.
Artigo em Inglês | MEDLINE | ID: mdl-38961672

RESUMO

The global prevalence of prediabetes is expected to reach 8.3% (587 million people) by 2045, with 70% of people with prediabetes developing diabetes during their lifetimes. We aimed to classify community-dwelling adults with a high risk for prediabetes based on prediabetes-related symptoms and to identify their characteristics, which might be factors associated with prediabetes. We analyzed homecare nursing records (n = 26,840) of 1628 patients aged over 20 years. Using a natural language processing algorithm, we classified each nursing episode as either low-risk or high-risk for prediabetes based on the detected number and category of prediabetes-symptom words. To identify differences between the risk groups, we employed t-tests, chi-square tests, and data visualization. Risk factors for prediabetes were identified using multiple logistic regression models with generalized estimating equations. A total of 3270 episodes (12.18%) were classified as potentially high-risk for prediabetes. There were significant differences in the personal, social, and clinical factors between groups. Results revealed that female sex, age, cancer coverage as part of homecare insurance coverage, and family caregivers were significantly associated with an increased risk of prediabetes. Although prediabetes is not a life-threatening disease, uncontrolled blood glucose can cause unfavorable outcomes for other major diseases. Thus, medical professionals should consider the associated symptoms and risk factors of prediabetes. Moreover, the proposed algorithm may support the detection of individuals at a high risk for prediabetes. Implementing this approach could facilitate proactive monitoring and early intervention, leading to reduced healthcare expenses and better health outcomes for community-dwelling adults.

11.
IEEE Open J Signal Process ; 5: 738-749, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38957540

RESUMO

The ADReSS-M Signal Processing Grand Challenge was held at the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023. The challenge targeted difficult automatic prediction problems of great societal and medical relevance, namely, the detection of Alzheimer's Dementia (AD) and the estimation of cognitive test scoress. Participants were invited to create models for the assessment of cognitive function based on spontaneous speech data. Most of these models employed signal processing and machine learning methods. The ADReSS-M challenge was designed to assess the extent to which predictive models built based on speech in one language generalise to another language. The language data compiled and made available for ADReSS-M comprised English, for model training, and Greek, for model testing and validation. To the best of our knowledge no previous shared research task investigated acoustic features of the speech signal or linguistic characteristics in the context of multilingual AD detection. This paper describes the context of the ADReSS-M challenge, its data sets, its predictive tasks, the evaluation methodology we employed, our baseline models and results, and the top five submissions. The paper concludes with a summary discussion of the ADReSS-M results, and our critical assessment of the future outlook in this field.

12.
BMC Med Inform Decis Mak ; 24(1): 192, 2024 Jul 09.
Artigo em Inglês | MEDLINE | ID: mdl-38982465

RESUMO

BACKGROUND: As global aging intensifies, the prevalence of ocular fundus diseases continues to rise. In China, the tense doctor-patient ratio poses numerous challenges for the early diagnosis and treatment of ocular fundus diseases. To reduce the high risk of missed or misdiagnosed cases, avoid irreversible visual impairment for patients, and ensure good visual prognosis for patients with ocular fundus diseases, it is particularly important to enhance the growth and diagnostic capabilities of junior doctors. This study aims to leverage the value of electronic medical record data to developing a diagnostic intelligent decision support platform. This platform aims to assist junior doctors in diagnosing ocular fundus diseases quickly and accurately, expedite their professional growth, and prevent delays in patient treatment. An empirical evaluation will assess the platform's effectiveness in enhancing doctors' diagnostic efficiency and accuracy. METHODS: In this study, eight Chinese Named Entity Recognition (NER) models were compared, and the SoftLexicon-Glove-Word2vec model, achieving a high F1 score of 93.02%, was selected as the optimal recognition tool. This model was then used to extract key information from electronic medical records (EMRs) and generate feature variables based on diagnostic rule templates. Subsequently, an XGBoost algorithm was employed to construct an intelligent decision support platform for diagnosing ocular fundus diseases. The effectiveness of the platform in improving diagnostic efficiency and accuracy was evaluated through a controlled experiment comparing experienced and junior doctors. RESULTS: The use of the diagnostic intelligent decision support platform resulted in significant improvements in both diagnostic efficiency and accuracy for both experienced and junior doctors (P < 0.05). Notably, the gap in diagnostic speed and precision between junior doctors and experienced doctors narrowed considerably when the platform was used. Although the platform also provided some benefits to experienced doctors, the improvement was less pronounced compared to junior doctors. CONCLUSION: The diagnostic intelligent decision support platform established in this study, based on the XGBoost algorithm and NER, effectively enhances the diagnostic efficiency and accuracy of junior doctors in ocular fundus diseases. This has significant implications for optimizing clinical diagnosis and treatment.


Assuntos
Oftalmologistas , Humanos , Tomada de Decisão Clínica , Registros Eletrônicos de Saúde/normas , Inteligência Artificial , China , Sistemas de Apoio a Decisões Clínicas
13.
Sci Rep ; 14(1): 16117, 2024 Jul 12.
Artigo em Inglês | MEDLINE | ID: mdl-38997332

RESUMO

Patient portal messages often relate to specific clinical phenomena (e.g., patients undergoing treatment for breast cancer) and, as a result, have received increasing attention in biomedical research. These messages require natural language processing and, while word embedding models, such as word2vec, have the potential to extract meaningful signals from text, they are not readily applicable to patient portal messages. This is because embedding models typically require millions of training samples to sufficiently represent semantics, while the volume of patient portal messages associated with a particular clinical phenomenon is often relatively small. We introduce a novel adaptation of the word2vec model, PK-word2vec (where PK stands for prior knowledge), for small-scale messages. PK-word2vec incorporates the most similar terms for medical words (including problems, treatments, and tests) and non-medical words from two pre-trained embedding models as prior knowledge to improve the training process. We applied PK-word2vec in a case study of patient portal messages in the Vanderbilt University Medical Center electric health record system sent by patients diagnosed with breast cancer from December 2004 to November 2017. We evaluated the model through a set of 1000 tasks, each of which compared the relevance of a given word to a group of the five most similar words generated by PK-word2vec and a group of the five most similar words generated by the standard word2vec model. We recruited 200 Amazon Mechanical Turk (AMT) workers and 7 medical students to perform the tasks. The dataset was composed of 1389 patient records and included 137,554 messages with 10,683 unique words. Prior knowledge was available for 7981 non-medical and 1116 medical words. In over 90% of the tasks, both reviewers indicated PK-word2vec generated more similar words than standard word2vec (p = 0.01).The difference in the evaluation by AMT workers versus medical students was negligible for all comparisons of tasks' choices between the two groups of reviewers ( p = 0.774 under a paired t-test). PK-word2vec can effectively learn word representations from a small message corpus, marking a significant advancement in processing patient portal messages.


Assuntos
Neoplasias da Mama , Processamento de Linguagem Natural , Portais do Paciente , Humanos , Feminino , Semântica , Registros Eletrônicos de Saúde
14.
Psychiatry Res ; 339: 116078, 2024 Jul 05.
Artigo em Inglês | MEDLINE | ID: mdl-39003802

RESUMO

STUDY OBJECTIVES: Loneliness impacts the health of many older adults, yet effective and targeted interventions are lacking. Compared to surveys, speech data can capture the personalized experience of loneliness. In this proof-of-concept study, we used Natural Language Processing to extract novel linguistic features and AI approaches to identify linguistic features that distinguish lonely adults from non-lonely adults. METHODS: Participants completed UCLA loneliness scales and semi-structured interviews (sections: social relationships, loneliness, successful aging, meaning/purpose in life, wisdom, technology and successful aging). We used the Linguistic Inquiry and Word Count (LIWC-22) program to analyze linguistic features and built a classifier to predict loneliness. Each interview section was analyzed using an explainable AI (XAI) model to classify loneliness. RESULTS: The sample included 97 older adults (age 66-101 years, 65 % women). The model had high accuracy (Accuracy: 0.889, AUC: 0.8), precision (F1: 0.8), and recall (1.0). The sections on social relationships and loneliness were most important for classifying loneliness. Social themes, conversational fillers, and pronoun usage were important features for classifying loneliness. CONCLUSIONS: XAI approaches can be used to detect loneliness through the analyses of unstructured speech and to better understand the experience of loneliness.

15.
Health Aff Sch ; 2(7): qxae082, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38979103

RESUMO

Designing effective childhood vaccination counseling guidelines, public health campaigns, and school-entry mandates requires a nuanced understanding of the information ecology in which parents make vaccination decisions. However, evidence is lacking on how best to "catch the signal" about the public's attitudes, beliefs, and misperceptions. In this study, we characterize public sentiment and discourse about vaccinating children against SARS-CoV-2 with mRNA vaccines to identify prevalent concerns about the vaccine and to understand anti-vaccine rhetorical strategies. We applied computational topic modeling to 149 897 comments submitted to regulations.gov in October 2021 and February 2022 regarding the Food and Drug Administration's Vaccines and Related Biological Products Advisory Committee's emergency use authorization of the COVID-19 vaccines for children. We used a latent Dirichlet allocation topic modeling algorithm to generate topics and then used iterative thematic and discursive analysis to identify relevant domains, themes, and rhetorical strategies. Three domains emerged: (1) specific concerns about the COVID-19 vaccines; (2) foundational beliefs shaping vaccine attitudes; and (3) rhetorical strategies deployed in anti-vaccine arguments. Computational social listening approaches can contribute to misinformation surveillance and evidence-based guidelines for vaccine counseling and public health promotion campaigns.

16.
JAMIA Open ; 7(3): ooae060, 2024 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-38962662

RESUMO

Objective: Accurately identifying clinical phenotypes from Electronic Health Records (EHRs) provides additional insights into patients' health, especially when such information is unavailable in structured data. This study evaluates the application of OpenAI's Generative Pre-trained Transformer (GPT)-4 model to identify clinical phenotypes from EHR text in non-small cell lung cancer (NSCLC) patients. The goal was to identify disease stages, treatments and progression utilizing GPT-4, and compare its performance against GPT-3.5-turbo, Flan-T5-xl, Flan-T5-xxl, Llama-3-8B, and 2 rule-based and machine learning-based methods, namely, scispaCy and medspaCy. Materials and Methods: Phenotypes such as initial cancer stage, initial treatment, evidence of cancer recurrence, and affected organs during recurrence were identified from 13 646 clinical notes for 63 NSCLC patients from Washington University in St. Louis, Missouri. The performance of the GPT-4 model is evaluated against GPT-3.5-turbo, Flan-T5-xxl, Flan-T5-xl, Llama-3-8B, medspaCy, and scispaCy by comparing precision, recall, and micro-F1 scores. Results: GPT-4 achieved higher F1 score, precision, and recall compared to Flan-T5-xl, Flan-T5-xxl, Llama-3-8B, medspaCy, and scispaCy's models. GPT-3.5-turbo performed similarly to that of GPT-4. GPT, Flan-T5, and Llama models were not constrained by explicit rule requirements for contextual pattern recognition. spaCy models relied on predefined patterns, leading to their suboptimal performance. Discussion and Conclusion: GPT-4 improves clinical phenotype identification due to its robust pre-training and remarkable pattern recognition capability on the embedded tokens. It demonstrates data-driven effectiveness even with limited context in the input. While rule-based models remain useful for some tasks, GPT models offer improved contextual understanding of the text, and robust clinical phenotype extraction.

18.
Res Sq ; 2024 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-38978609

RESUMO

The performance of deep learning-based natural language processing systems is based on large amounts of labeled training data which, in the clinical domain, are not easily available or affordable. Weak supervision and in-context learning offer partial solutions to this issue, particularly using large language models (LLMs), but their performance still trails traditional supervised methods with moderate amounts of gold-standard data. In particular, inferencing with LLMs is computationally heavy. We propose an approach leveraging fine-tuning LLMs and weak supervision with virtually no domain knowledge that still achieves consistently dominant performance. Using a prompt-based approach, the LLM is used to generate weakly-labeled data for training a downstream BERT model. The weakly supervised model is then further fine-tuned on small amounts of gold standard data. We evaluate this approach using Llama2 on three different n2c2 datasets. With no more than 10 gold standard notes, our final BERT models weakly supervised by fine-tuned Llama2-13B consistently outperformed out-of-the-box PubMedBERT by 4.7-47.9% in F1 scores. With only 50 gold standard notes, our models achieved close performance to fully fine-tuned systems.

19.
Eur Radiol ; 2024 Jul 12.
Artigo em Inglês | MEDLINE | ID: mdl-38995378

RESUMO

OBJECTIVES: To compare the diagnostic accuracy of Generative Pre-trained Transformer (GPT)-4-based ChatGPT, GPT-4 with vision (GPT-4V) based ChatGPT, and radiologists in musculoskeletal radiology. MATERIALS AND METHODS: We included 106 "Test Yourself" cases from Skeletal Radiology between January 2014 and September 2023. We input the medical history and imaging findings into GPT-4-based ChatGPT and the medical history and images into GPT-4V-based ChatGPT, then both generated a diagnosis for each case. Two radiologists (a radiology resident and a board-certified radiologist) independently provided diagnoses for all cases. The diagnostic accuracy rates were determined based on the published ground truth. Chi-square tests were performed to compare the diagnostic accuracy of GPT-4-based ChatGPT, GPT-4V-based ChatGPT, and radiologists. RESULTS: GPT-4-based ChatGPT significantly outperformed GPT-4V-based ChatGPT (p < 0.001) with accuracy rates of 43% (46/106) and 8% (9/106), respectively. The radiology resident and the board-certified radiologist achieved accuracy rates of 41% (43/106) and 53% (56/106). The diagnostic accuracy of GPT-4-based ChatGPT was comparable to that of the radiology resident, but was lower than that of the board-certified radiologist although the differences were not significant (p = 0.78 and 0.22, respectively). The diagnostic accuracy of GPT-4V-based ChatGPT was significantly lower than those of both radiologists (p < 0.001 and < 0.001, respectively). CONCLUSION: GPT-4-based ChatGPT demonstrated significantly higher diagnostic accuracy than GPT-4V-based ChatGPT. While GPT-4-based ChatGPT's diagnostic performance was comparable to radiology residents, it did not reach the performance level of board-certified radiologists in musculoskeletal radiology. CLINICAL RELEVANCE STATEMENT: GPT-4-based ChatGPT outperformed GPT-4V-based ChatGPT and was comparable to radiology residents, but it did not reach the level of board-certified radiologists in musculoskeletal radiology. Radiologists should comprehend ChatGPT's current performance as a diagnostic tool for optimal utilization. KEY POINTS: This study compared the diagnostic performance of GPT-4-based ChatGPT, GPT-4V-based ChatGPT, and radiologists in musculoskeletal radiology. GPT-4-based ChatGPT was comparable to radiology residents, but did not reach the level of board-certified radiologists. When utilizing ChatGPT, it is crucial to input appropriate descriptions of imaging findings rather than the images.

20.
Neuroradiology ; 2024 Jul 12.
Artigo em Inglês | MEDLINE | ID: mdl-38995393

RESUMO

PURPOSE: This study aimed to investigate the efficacy of fine-tuned large language models (LLM) in classifying brain MRI reports into pretreatment, posttreatment, and nontumor cases. METHODS: This retrospective study included 759, 284, and 164 brain MRI reports for training, validation, and test dataset. Radiologists stratified the reports into three groups: nontumor (group 1), posttreatment tumor (group 2), and pretreatment tumor (group 3) cases. A pretrained Bidirectional Encoder Representations from Transformers Japanese model was fine-tuned using the training dataset and evaluated on the validation dataset. The model which demonstrated the highest accuracy on the validation dataset was selected as the final model. Two additional radiologists were involved in classifying reports in the test datasets for the three groups. The model's performance on test dataset was compared to that of two radiologists. RESULTS: The fine-tuned LLM attained an overall accuracy of 0.970 (95% CI: 0.930-0.990). The model's sensitivity for group 1/2/3 was 1.000/0.864/0.978. The model's specificity for group1/2/3 was 0.991/0.993/0.958. No statistically significant differences were found in terms of accuracy, sensitivity, and specificity between the LLM and human readers (p ≥ 0.371). The LLM completed the classification task approximately 20-26-fold faster than the radiologists. The area under the receiver operating characteristic curve for discriminating groups 2 and 3 from group 1 was 0.994 (95% CI: 0.982-1.000) and for discriminating group 3 from groups 1 and 2 was 0.992 (95% CI: 0.982-1.000). CONCLUSION: Fine-tuned LLM demonstrated a comparable performance with radiologists in classifying brain MRI reports, while requiring substantially less time.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...