Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 20 de 128
Filter
1.
Res Synth Methods ; 14(4): 608-621, 2023 Jul.
Article in English | MEDLINE | ID: covidwho-20241233

ABSTRACT

The laborious and time-consuming nature of systematic review production hinders the dissemination of up-to-date evidence synthesis. Well-performing natural language processing (NLP) tools for systematic reviews have been developed, showing promise to improve efficiency. However, the feasibility and value of these technologies have not been comprehensively demonstrated in a real-world review. We developed an NLP-assisted abstract screening tool that provides text inclusion recommendations, keyword highlights, and visual context cues. We evaluated this tool in a living systematic review on SARS-CoV-2 seroprevalence, conducting a quality improvement assessment of screening with and without the tool. We evaluated changes to abstract screening speed, screening accuracy, characteristics of included texts, and user satisfaction. The tool improved efficiency, reducing screening time per abstract by 45.9% and decreasing inter-reviewer conflict rates. The tool conserved precision of article inclusion (positive predictive value; 0.92 with tool vs. 0.88 without) and recall (sensitivity; 0.90 vs. 0.81). The summary statistics of included studies were similar with and without the tool. Users were satisfied with the tool (mean satisfaction score of 4.2/5). We evaluated an abstract screening process where one human reviewer was replaced with the tool's votes, finding that this maintained recall (0.92 one-person, one-tool vs. 0.90 two tool-assisted humans) and precision (0.91 vs. 0.92) while reducing screening time by 70%. Implementing an NLP tool in this living systematic review improved efficiency, maintained accuracy, and was well-received by researchers, demonstrating the real-world effectiveness of NLP in expediting evidence synthesis.


Subject(s)
COVID-19 , Natural Language Processing , Humans , Seroepidemiologic Studies , SARS-CoV-2 , Systematic Reviews as Topic
2.
BMC Public Health ; 23(1): 935, 2023 05 24.
Article in English | MEDLINE | ID: covidwho-20244505

ABSTRACT

BACKGROUND: The COVID-19 pandemic was a "wake up" call for public health agencies. Often, these agencies are ill-prepared to communicate with target audiences clearly and effectively for community-level activations and safety operations. The obstacle is a lack of data-driven approaches to obtaining insights from local community stakeholders. Thus, this study suggests a focus on listening at local levels given the abundance of geo-marked data and presents a methodological solution to extracting consumer insights from unstructured text data for health communication. METHODS: This study demonstrates how to combine human and Natural Language Processing (NLP) machine analyses to reliably extract meaningful consumer insights from tweets about COVID and the vaccine. This case study employed Latent Dirichlet Allocation (LDA) topic modeling, Bidirectional Encoder Representations from Transformers (BERT) emotion analysis, and human textual analysis and examined 180,128 tweets scraped by Twitter Application Programming Interface's (API) keyword function from January 2020 to June 2021. The samples came from four medium-sized American cities with larger populations of people of color. RESULTS: The NLP method discovered four topic trends: "COVID Vaccines," "Politics," "Mitigation Measures," and "Community/Local Issues," and emotion changes over time. The human textual analysis profiled the discussions in the selected four markets to add some depth to our understanding of the uniqueness of the different challenges experienced. CONCLUSIONS: This study ultimately demonstrates that our method used here could efficiently reduce a large amount of community feedback (e.g., tweets, social media data) by NLP and ensure contextualization and richness with human interpretation. Recommendations on communicating vaccination are offered based on the findings: (1) the strategic objective should be empowering the public; (2) the message should have local relevance; and, (3) communication needs to be timely.


Subject(s)
COVID-19 , Health Communication , Humans , COVID-19/epidemiology , COVID-19/prevention & control , Cities , Natural Language Processing , Pandemics/prevention & control , Public Health
3.
Sci Rep ; 13(1): 8591, 2023 05 26.
Article in English | MEDLINE | ID: covidwho-20241826

ABSTRACT

The ability to extract critical information about an infectious disease in a timely manner is critical for population health research. The lack of procedures for mining large amounts of health data is a major impediment. The goal of this research is to use natural language processing (NLP) to extract key information (clinical factors, social determinants of health) from free text. The proposed framework describes database construction, NLP modules for locating clinical and non-clinical (social determinants) information, and a detailed evaluation protocol for evaluating results and demonstrating the effectiveness of the proposed framework. The use of COVID-19 case reports is demonstrated for data construction and pandemic surveillance. The proposed approach outperforms benchmark methods in F1-score by about 1-3%. A thorough examination reveals the disease's presence as well as the frequency of symptoms in patients. The findings suggest that prior knowledge gained through transfer learning can be useful when researching infectious diseases with similar presentations in order to accurately predict patient outcomes.


Subject(s)
COVID-19 , Natural Language Processing , Humans , COVID-19/epidemiology , Electronic Health Records , Records , Pandemics
4.
AMIA Annu Symp Proc ; 2022: 313-322, 2022.
Article in English | MEDLINE | ID: covidwho-20238373

ABSTRACT

We investigated the utility of Twitter for conducting multi-faceted geolocation-centric pandemic surveillance, using India as an example. We collected over 4 million COVID19-related tweets related to the Indian outbreak between January and July 2021. We geolocated the tweets, applied natural language processing to characterize the tweets (eg., identifying symptoms and emotions), and compared tweet volumes with the numbers of confirmed COVID-19 cases. Tweet numbers closely mirrored the outbreak, with the 7-day average strongly correlated with confirmed COVID-19 cases nationally (Spearman r=0.944; p=0.001), and also at the state level (Spearman r=0.84, p=0.0003). Fatigue, Dyspnea and Cough were the top symptoms detected, while there was a significant increase in the proportion of tweets expressing negative emotions (eg., fear and sadness). The surge in COVID-19 tweets was followed by increased number of posts expressing concern about black fungus and oxygen supply. Our study illustrates the potential of social media for multi-faceted pandemic surveillance.


Subject(s)
COVID-19 , Social Media , COVID-19/epidemiology , Disease Outbreaks , Humans , Natural Language Processing , Pandemics
5.
Stud Health Technol Inform ; 302: 833-834, 2023 May 18.
Article in English | MEDLINE | ID: covidwho-2323866

ABSTRACT

Retrieving health information is a task of search for health-related information from a variety of sources. Gathering self-reported health information may help enrich the knowledge body of the disease and its symptoms. We investigated retrieving symptom mentions in COVID-19-related Twitter posts with a pretrained large language model (GPT-3) without providing any examples (zero-shot learning). We introduced a new performance measure of total match (TM) to include exact, partial and semantic matches. Our results show that the zero-shot approach is a powerful method without the need to annotate any data, and it can assist in generating instances for few-shot learning which may achieve better performance.


Subject(s)
COVID-19 , Social Media , Humans , Language , Semantics , Natural Language Processing
6.
J Med Internet Res ; 24(11): e40160, 2022 11 18.
Article in English | MEDLINE | ID: covidwho-2310716

ABSTRACT

BACKGROUND: Dry January, a temporary alcohol abstinence campaign, encourages individuals to reflect on their relationship with alcohol by temporarily abstaining from consumption during the month of January. Though Dry January has become a global phenomenon, there has been limited investigation into Dry January participants' experiences. One means through which to gain insights into individuals' Dry January-related experiences is by leveraging large-scale social media data (eg, Twitter chatter) to explore and characterize public discourse concerning Dry January. OBJECTIVE: We sought to answer the following questions: (1) What themes are present within a corpus of tweets about Dry January, and is there consistency in the language used to discuss Dry January across multiple years of tweets (2020-2022)? (2) Do unique themes or patterns emerge in Dry January 2021 tweets after the onset of the COVID-19 pandemic? and (3) What is the association with tweet composition (ie, sentiment and human-authored vs bot-authored) and engagement with Dry January tweets? METHODS: We applied natural language processing techniques to a large sample of tweets (n=222,917) containing the term "dry january" or "dryjanuary" posted from December 15 to February 15 across three separate years of participation (2020-2022). Term frequency inverse document frequency, k-means clustering, and principal component analysis were used for data visualization to identify the optimal number of clusters per year. Once data were visualized, we ran interpretation models to afford within-year (or within-cluster) comparisons. Latent Dirichlet allocation topic modeling was used to examine content within each cluster per given year. Valence Aware Dictionary and Sentiment Reasoner sentiment analysis was used to examine affect per cluster per year. The Botometer automated account check was used to determine average bot score per cluster per year. Last, to assess user engagement with Dry January content, we took the average number of likes and retweets per cluster and ran correlations with other outcome variables of interest. RESULTS: We observed several similar topics per year (eg, Dry January resources, Dry January health benefits, updates related to Dry January progress), suggesting relative consistency in Dry January content over time. Although there was overlap in themes across multiple years of tweets, unique themes related to individuals' experiences with alcohol during the midst of the COVID-19 global pandemic were detected in the corpus of tweets from 2021. Also, tweet composition was associated with engagement, including number of likes, retweets, and quote-tweets per post. Bot-dominant clusters had fewer likes, retweets, or quote tweets compared with human-authored clusters. CONCLUSIONS: The findings underscore the utility for using large-scale social media, such as discussions on Twitter, to study drinking reduction attempts and to monitor the ongoing dynamic needs of persons contemplating, preparing for, or actively pursuing attempts to quit or cut down on their drinking.


Subject(s)
COVID-19 , Social Media , Humans , Natural Language Processing , Infodemiology , Pandemics , COVID-19/epidemiology , Ethanol
7.
J Med Internet Res ; 24(7): e37142, 2022 07 13.
Article in English | MEDLINE | ID: covidwho-2309523

ABSTRACT

BACKGROUND: The COVID-19 pandemic has affected the lives of people globally for over 2 years. Changes in lifestyles due to the pandemic may cause psychosocial stressors for individuals and could lead to mental health problems. To provide high-quality mental health support, health care organizations need to identify COVID-19-specific stressors and monitor the trends in the prevalence of those stressors. OBJECTIVE: This study aims to apply natural language processing (NLP) techniques to social media data to identify the psychosocial stressors during the COVID-19 pandemic and to analyze the trend in the prevalence of these stressors at different stages of the pandemic. METHODS: We obtained a data set of 9266 Reddit posts from the subreddit \rCOVID19_support, from February 14, 2020, to July 19, 2021. We used the latent Dirichlet allocation (LDA) topic model to identify the topics that were mentioned on the subreddit and analyzed the trends in the prevalence of the topics. Lexicons were created for each of the topics and were used to identify the topics of each post. The prevalences of topics identified by the LDA and lexicon approaches were compared. RESULTS: The LDA model identified 6 topics from the data set: (1) "fear of coronavirus," (2) "problems related to social relationships," (3) "mental health symptoms," (4) "family problems," (5) "educational and occupational problems," and (6) "uncertainty on the development of pandemic." According to the results, there was a significant decline in the number of posts about the "fear of coronavirus" after vaccine distribution started. This suggests that the distribution of vaccines may have reduced the perceived risks of coronavirus. The prevalence of discussions on the uncertainty about the pandemic did not decline with the increase in the vaccinated population. In April 2021, when the Delta variant became prevalent in the United States, there was a significant increase in the number of posts about the uncertainty of pandemic development but no obvious effects on the topic of fear of the coronavirus. CONCLUSIONS: We created a dashboard to visualize the trend in the prevalence of topics about COVID-19-related stressors being discussed on a social media platform (Reddit). Our results provide insights into the prevalence of pandemic-related stressors during different stages of the COVID-19 pandemic. The NLP techniques leveraged in this study could also be applied to analyze event-specific stressors in the future.


Subject(s)
COVID-19 , Latent Class Analysis , Natural Language Processing , Pandemics , Social Media , Stress, Psychological , COVID-19/epidemiology , Humans , Mental Health/statistics & numerical data , Prevalence , SARS-CoV-2 , Stress, Psychological/epidemiology , United States/epidemiology
8.
Biomed Res Int ; 2023: 3728131, 2023.
Article in English | MEDLINE | ID: covidwho-2294565

ABSTRACT

Purpose: As a scientific field, bioinformatics has drawn remarkable attention from various fields, such as information technology, mathematics, and modern biological sciences, in recent years. The topic models originating from the field of natural language processing have become the focus of attention with the rapid accumulation of biological datasets. Thus, this research is aimed at modeling the topic content of the bioinformatics literature presented by Iranian researchers in the Scopus Citation Database. Methodology. This research was a descriptive-exploratory study, and the studied population included 3899 papers indexed in the Scopus database, which had been indexed in this database until March 9, 2022. The topic modeling was then performed on the abstracts and titles of the papers. A combination of LDA and TF-IDF was utilized for topic modeling. Findings. The data analysis with topic modeling resulted in identifying seven main topics "Molecular Modeling," "Gene Expression," "Biomarker," "Coronavirus," "Immunoinformatics," "Cancer Bioinformatics," and "Systems Biology." Moreover, "Systems Biology" and "Coronavirus" had the largest and smallest clusters, respectively. Conclusion: The present investigation demonstrated an acceptable performance for the LDA algorithm in classifying the topics included in this field. The extracted topic clusters indicated excellent consistency and topic connection with each other.


Subject(s)
Bibliometrics , Computational Biology , Iran , Computational Biology/methods , Natural Language Processing , Algorithms
9.
Public Health ; 218: 114-120, 2023 May.
Article in English | MEDLINE | ID: covidwho-2291388

ABSTRACT

OBJECTIVES: Mpox has been declared a Public Health Emergency of International Concern by the World Health Organization on July 23, 2022. Since early May 2022, Mpox has been continuously reported in several endemic countries with alarming death rates. This led to several discussions and deliberations on the Mpox virus among the general public through social media and platforms such as health forums. This study proposes natural language processing techniques such as topic modeling to unearth the general public's perspectives and sentiments on growing Mpox cases worldwide. STUDY DESIGN: This was a detailed qualitative study using natural language processing on the user-generated comments from social media. METHODS: A detailed analysis using topic modeling and sentiment analysis on Reddit comments (n = 289,073) that were posted between June 1 and August 5, 2022, was conducted. While the topic modeling was used to infer major themes related to the health emergency and user concerns, the sentiment analysis was conducted to see how the general public responded to different aspects of the outbreak. RESULTS: The results revealed several interesting and useful themes, such as Mpox symptoms, Mpox transmission, international travel, government interventions, and homophobia from the user-generated contents. The results further confirm that there are many stigmas and fear of the unknown nature of the Mpox virus, which is prevalent in almost all topics and themes unearthed. CONCLUSIONS: Analyzing public discourse and sentiments toward health emergencies and disease outbreaks is highly important. The insights that could be leveraged from the user-generated comments from public forums such as social media may be important for community health intervention programs and infodemiology researchers. The findings from this study effectively analyzed the public perceptions that may enable quantifying the effectiveness of measures imposed by governmental administrations. The themes unearthed may also benefit health policy researchers and decision-makers to make informed and data-driven decisions.


Subject(s)
COVID-19 , Monkeypox , Social Media , Humans , COVID-19/epidemiology , Natural Language Processing , Monkeypox/epidemiology , Disease Outbreaks , Attitude
10.
Comput Methods Programs Biomed ; 233: 107474, 2023 May.
Article in English | MEDLINE | ID: covidwho-2305505

ABSTRACT

BACKGROUND AND OBJECTIVE: With the rapid development of information dissemination technology, the amount of events information contained in massive texts now far exceeds the intuitive cognition of humans, and it is hard to understand the progress of events in order of time. Temporal information runs through the whole process of beginning, proceeding, and ending of events, and plays an important role in many natural language processing applications, such as information extraction, question answering, and text summary. Accurately extracting temporal information from Chinese texts and automatically mapping the temporal expressions in natural language to the time axis are crucial to understanding the development of events and dynamic changes in them. METHODS: This study proposes a method integrating machine learning with linguistic features (IMLLF) for extraction and normalization of temporal expressions in Chinese texts to achieve the above objectives. Linguistic features are constructed by analyzing the expression rules of temporal information, and are combined with machine learning to map the natural language form of time onto a one-dimensional timeline. The web text dataset we build is divided into five parts for five-fold cross-validation, to compare the influence of different combinations of linguistic features and different methods. In the open medical dialog dataset, based on the training model obtained from the web text dataset, 200 disease descriptions are randomly selected each time for three rounds of experiments. RESULTS: The F1 of multi-feature fusion is 95.2%, which is better than the single-feature and double-feature combination. The results of experiments showed that the proposed IMLLF method can improve the accuracy of recognition of temporal information in Chinese to a greater extent than classical methods, with an F1-score of over 95% on the web text dataset and medical conversation dataset. In terms of the normalization of time expressions, the accuracy of the IMLLF method is higher than 93%. CONCLUSIONS: IMLLF has better results in extracting and normalizing time expressions on the web text dataset and the medical conversation dataset, which verifies the universality of IMLLF to identify and quantify temporal information. IMLLF method can accurately map the time information to the time axis, which is convenient for doctors to intuitively see when and what happened to the patient, and helps to make better medical decisions.


Subject(s)
Electronic Health Records , Linguistics , Machine Learning , Humans , Natural Language Processing
11.
Sensors (Basel) ; 23(4)2023 Feb 06.
Article in English | MEDLINE | ID: covidwho-2274468

ABSTRACT

COVID-19 forced a number of changes in many areas of life, which resulted in an increase in human activity in cyberspace. Furthermore, the number of cyberattacks has increased. In such circumstances, detection, accurate prioritisation, and timely removal of critical vulnerabilities is of key importance for ensuring the security of various organisations. One of the most-commonly used vulnerability assessment standards is the Common Vulnerability Scoring System (CVSS), which allows for assessing the degree of vulnerability criticality on a scale from 0 to 10. Unfortunately, not all detected vulnerabilities have defined CVSS base scores, or if they do, they are not always expressed using the latest standard (CVSS 3.x). In this work, we propose using machine learning algorithms to convert the CVSS vector from Version 2.0 to 3.x. We discuss in detail the individual steps of the conversion procedure, starting from data acquisition using vulnerability databases and Natural Language Processing (NLP) algorithms, to the vector mapping process based on the optimisation of ML algorithm parameters, and finally, the application of machine learning to calculate the CVSS 3.x vector components. The calculated example results showed the effectiveness of the proposed method for the conversion of the CVSS 2.0 vector to the CVSS 3.x standard.


Subject(s)
COVID-19 , Humans , Algorithms , Databases, Factual , Machine Learning , Natural Language Processing
12.
Front Public Health ; 11: 1063466, 2023.
Article in English | MEDLINE | ID: covidwho-2287550

ABSTRACT

Purpose: The COVID-19 pandemic has drastically disrupted global healthcare systems. With the higher demand for healthcare and misinformation related to COVID-19, there is a need to explore alternative models to improve communication. Artificial Intelligence (AI) and Natural Language Processing (NLP) have emerged as promising solutions to improve healthcare delivery. Chatbots could fill a pivotal role in the dissemination and easy accessibility of accurate information in a pandemic. In this study, we developed a multi-lingual NLP-based AI chatbot, DR-COVID, which responds accurately to open-ended, COVID-19 related questions. This was used to facilitate pandemic education and healthcare delivery. Methods: First, we developed DR-COVID with an ensemble NLP model on the Telegram platform (https://t.me/drcovid_nlp_chatbot). Second, we evaluated various performance metrics. Third, we evaluated multi-lingual text-to-text translation to Chinese, Malay, Tamil, Filipino, Thai, Japanese, French, Spanish, and Portuguese. We utilized 2,728 training questions and 821 test questions in English. Primary outcome measurements were (A) overall and top 3 accuracies; (B) Area Under the Curve (AUC), precision, recall, and F1 score. Overall accuracy referred to a correct response for the top answer, whereas top 3 accuracy referred to an appropriate response for any one answer amongst the top 3 answers. AUC and its relevant matrices were obtained from the Receiver Operation Characteristics (ROC) curve. Secondary outcomes were (A) multi-lingual accuracy; (B) comparison to enterprise-grade chatbot systems. The sharing of training and testing datasets on an open-source platform will also contribute to existing data. Results: Our NLP model, utilizing the ensemble architecture, achieved overall and top 3 accuracies of 0.838 [95% confidence interval (CI): 0.826-0.851] and 0.922 [95% CI: 0.913-0.932] respectively. For overall and top 3 results, AUC scores of 0.917 [95% CI: 0.911-0.925] and 0.960 [95% CI: 0.955-0.964] were achieved respectively. We achieved multi-linguicism with nine non-English languages, with Portuguese performing the best overall at 0.900. Lastly, DR-COVID generated answers more accurately and quickly than other chatbots, within 1.12-2.15 s across three devices tested. Conclusion: DR-COVID is a clinically effective NLP-based conversational AI chatbot, and a promising solution for healthcare delivery in the pandemic era.


Subject(s)
COVID-19 , Deep Learning , Humans , Natural Language Processing , Artificial Intelligence , Pandemics , India
13.
Bioresour Technol ; 372: 128625, 2023 Mar.
Article in English | MEDLINE | ID: covidwho-2287473

ABSTRACT

Given the potential of machine learning algorithms in revolutionizing the bioengineering field, this paper examined and summarized the literature related to artificial intelligence (AI) in the bioprocessing field. Natural language processing (NLP) was employed to explore the direction of the research domain. All the papers from 2013 to 2022 with specific keywords of bioprocessing using AI were extracted from Scopus and grouped into two five-year periods of 2013-to-2017 and 2018-to-2022, where the past and recent research directions were compared. Based on this procedure, selected sample papers from recent five years were subjected to further review and analysis. The result shows that 50% of the publications in the past five-year focused on topics related to hybrid models, ANN, biopharmaceutical manufacturing, and biorefinery. The summarization and analysis of the outcome indicated that implementing AI could improve the design and process engineering strategies in bioprocessing fields.


Subject(s)
Artificial Intelligence , Big Data , Machine Learning , Algorithms , Natural Language Processing
14.
J Am Med Inform Assoc ; 30(6): 1022-1031, 2023 05 19.
Article in English | MEDLINE | ID: covidwho-2265425

ABSTRACT

OBJECTIVE: To develop a computable representation for medical evidence and to contribute a gold standard dataset of annotated randomized controlled trial (RCT) abstracts, along with a natural language processing (NLP) pipeline for transforming free-text RCT evidence in PubMed into the structured representation. MATERIALS AND METHODS: Our representation, EvidenceMap, consists of 3 levels of abstraction: Medical Evidence Entity, Proposition and Map, to represent the hierarchical structure of medical evidence composition. Randomly selected RCT abstracts were annotated following EvidenceMap based on the consensus of 2 independent annotators to train an NLP pipeline. Via a user study, we measured how the EvidenceMap improved evidence comprehension and analyzed its representative capacity by comparing the evidence annotation with EvidenceMap representation and without following any specific guidelines. RESULTS: Two corpora including 229 disease-agnostic and 80 COVID-19 RCT abstracts were annotated, yielding 12 725 entities and 1602 propositions. EvidenceMap saves users 51.9% of the time compared to reading raw-text abstracts. Most evidence elements identified during the freeform annotation were successfully represented by EvidenceMap, and users gave the enrollment, study design, and study Results sections mean 5-scale Likert ratings of 4.85, 4.70, and 4.20, respectively. The end-to-end evaluations of the pipeline show that the evidence proposition formulation achieves F1 scores of 0.84 and 0.86 in the adjusted random index score. CONCLUSIONS: EvidenceMap extends the participant, intervention, comparator, and outcome framework into 3 levels of abstraction for transforming free-text evidence from the clinical literature into a computable structure. It can be used as an interoperable format for better evidence retrieval and synthesis and an interpretable representation to efficiently comprehend RCT findings.


Subject(s)
COVID-19 , Comprehension , Humans , Natural Language Processing , PubMed
15.
J Biomed Semantics ; 14(1): 1, 2023 01 31.
Article in English | MEDLINE | ID: covidwho-2264768

ABSTRACT

BACKGROUND: Information pertaining to mechanisms, management and treatment of disease-causing pathogens including viruses and bacteria is readily available from research publications indexed in MEDLINE. However, identifying the literature that specifically characterises these pathogens and their properties based on experimental research, important for understanding of the molecular basis of diseases caused by these agents, requires sifting through a large number of articles to exclude incidental mentions of the pathogens, or references to pathogens in other non-experimental contexts such as public health. OBJECTIVE: In this work, we lay the foundations for the development of automatic methods for characterising mentions of pathogens in scientific literature, focusing on the task of identifying research that involves the experimental study of a pathogen in an experimental context. There are no manually annotated pathogen corpora available for this purpose, while such resources are necessary to support the development of machine learning-based models. We therefore aim to fill this gap, producing a large data set automatically from MEDLINE under some simplifying assumptions for the task definition, and using it to explore automatic methods that specifically support the detection of experimentally studied pathogen mentions in research publications. METHODS: We developed a pathogen mention characterisation literature data set -READBiomed-Pathogens- automatically using NCBI resources, which we make available. Resources such as the NCBI Taxonomy, MeSH and GenBank can be used effectively to identify relevant literature about experimentally researched pathogens, more specifically using MeSH to link to MEDLINE citations including titles and abstracts with experimentally researched pathogens. We experiment with several machine learning-based natural language processing (NLP) algorithms leveraging this data set as training data, to model the task of detecting papers that specifically describe experimental study of a pathogen. RESULTS: We show that our data set READBiomed-Pathogens can be used to explore natural language processing configurations for experimental pathogen mention characterisation. READBiomed-Pathogens includes citations related to organisms including bacteria, viruses, and a small number of toxins and other disease-causing agents. CONCLUSIONS: We studied the characterisation of experimentally studied pathogens in scientific literature, developing several natural language processing methods supported by an automatically developed data set. As a core contribution of the work, we presented a methodology to automatically construct a data set for pathogen identification using existing biomedical resources. The data set and the annotation code are made publicly available. Performance of the pathogen mention identification and characterisation algorithms were additionally evaluated on a small manually annotated data set shows that the data set that we have generated allows characterising pathogens of interest. TRIAL REGISTRATION: N/A.


Subject(s)
Algorithms , Natural Language Processing , Databases, Genetic , MEDLINE , Machine Learning
16.
J Biomed Inform ; 137: 104258, 2023 01.
Article in English | MEDLINE | ID: covidwho-2244784

ABSTRACT

Textual Emotion Detection (TED) is a rapidly growing area in Natural Language Processing (NLP) that aims to detect emotions expressed through text. In this paper, we provide a review of the latest research and development in TED as applied in health and medicine. We focus on medical and non-medical data types, use cases, and methods where TED has been integral in supporting decision-making. The application of NLP technologies in health, and particularly TED, requires high confidence that these technologies and technology-aided treatment will first, do no harm. Therefore, this review also aims to assess the accuracy of TED systems and provide an update on the state of the technology. The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 guidelines were used in this review. With a specific focus on the identification of different human emotions in text, the more general sentiment analysis studies that only recognize the polarity of text were excluded. A total of 66 papers met the inclusion criteria. This review found that TED in health and medicine is mainly used in the detection of depression, suicidal ideation, and the mental status of patients with asthma, Alzheimer's disease, cancer, and diabetes with major data sources of social media, healthcare services, and counseling centers. Approximately, 44% of the research in the domain is related to COVID-19, investigating the public health response to vaccinations and the emotional response of the public. In most cases, deep learning-based NLP techniques were found to be preferred over other methods due to their superior performance. Developing methods for implementing and evaluating dimensional emotional models, resolving annotation challenges by utilizing health-related lexicons, and using deep learning techniques for multi-faceted and real-time applications were found to be among the main avenues for further development of TED applications in health.


Subject(s)
COVID-19 , Humans , Natural Language Processing , Emotions
17.
Am J Obstet Gynecol MFM ; 5(3): 100834, 2023 03.
Article in English | MEDLINE | ID: covidwho-2227969

ABSTRACT

BACKGROUND: Maternal mental disorders are considered a leading complication of childbirth and a common contributor to maternal death. In addition to undermining maternal welfare, untreated postpartum psychopathology can result in child emotional and physical neglect and associated significant pediatric health costs. Some women may experience traumatic childbirth and develop posttraumatic stress disorder symptoms after delivery (childbirth-related posttraumatic stress disorder). Although women are routinely screened for postpartum depression in the United States, there is no recommended protocol to inform the identification of women who are likely to experience childbirth-related posttraumatic stress disorder. Advancements in computational methods of free text have shown promise in informing the diagnosis of psychiatric conditions. Although the language in narratives of stressful events has been associated with posttrauma outcomes, whether the narratives of childbirth processed via machine learning can be useful for childbirth-related posttraumatic stress disorder screening is unknown. OBJECTIVE: This study aimed to examine the use of written narrative accounts of personal childbirth experiences for the identification of women with childbirth-related posttraumatic stress disorder. To this end, we developed a model based on natural language processing and machine learning algorithms to identify childbirth-related posttraumatic stress disorder via the classification of birth narratives. STUDY DESIGN: Overall, 1127 eligible postpartum women who enrolled in a study survey during the COVID-19 pandemic provided short written childbirth narrative accounts in which they were instructed to focus on the most distressing aspects of their childbirth experience. They also completed a posttraumatic stress disorder symptom screen to determine childbirth-related posttraumatic stress disorder. After the exclusion criteria were applied, data from 995 participants were analyzed. A machine learning-based Sentence-Transformers natural language processing model was used to represent narratives as vectors that served as inputs for a neural network machine learning model developed in this study to identify participants with childbirth-related posttraumatic stress disorder. RESULTS: The machine learning model derived from natural language processing of childbirth narratives achieved good performance (area under the curve, 0.75; F1 score, 0.76; sensitivity, 0.8; specificity, 0.70). Moreover, women with childbirth-related posttraumatic stress disorder generated longer narratives (t test results: t=2.30; p=.02) and used more negative emotional expressions (Wilcoxon test: sadness: p=8.90e-04; W=31,017; anger: p=1.32e-02; W=35,005.50) and death-related words (Wilcoxon test: p=3.48e-05; W=34,538) in describing their childbirth experience than those with no childbirth-related posttraumatic stress disorder. CONCLUSION: This study provided proof of concept that personal childbirth narrative accounts generated in the early postpartum period and analyzed via advanced computational methods can detect with relatively high accuracy women who are likely to endorse childbirth-related posttraumatic stress disorder and those at low risk. This suggests that birth narratives could be promising for informing low-cost, noninvasive tools for maternal mental health screening, and more research that used machine learning to predict early signs of maternal psychiatric morbidity is warranted.


Subject(s)
COVID-19 , Stress Disorders, Post-Traumatic , Pregnancy , Female , Humans , United States , Child , Stress Disorders, Post-Traumatic/diagnosis , Stress Disorders, Post-Traumatic/epidemiology , Stress Disorders, Post-Traumatic/psychology , Natural Language Processing , Pandemics , Delivery, Obstetric/psychology , COVID-19/complications
18.
PLoS One ; 18(2): e0281147, 2023.
Article in English | MEDLINE | ID: covidwho-2224478

ABSTRACT

The ongoing COVID-19 pandemic produced far-reaching effects throughout society, and science is no exception. The scale, speed, and breadth of the scientific community's COVID-19 response lead to the emergence of new research at the remarkable rate of more than 250 papers published per day. This posed a challenge for the scientific community as traditional methods of engagement with the literature were strained by the volume of new research being produced. Meanwhile, the urgency of response lead to an increasingly prominent role for preprint servers and a diffusion of relevant research through many channels simultaneously. These factors created a need for new tools to change the way scientific literature is organized and found by researchers. With this challenge in mind, we present an overview of COVIDScholar https://covidscholar.org, an automated knowledge portal which utilizes natural language processing (NLP) that was built to meet these urgent needs. The search interface for this corpus of more than 260,000 research articles, patents, and clinical trials served more than 33,000 users at an average of 2,000 monthly active users and a peak of more than 8,600 weekly active users in the summer of 2020. Additionally, we include an analysis of trends in COVID-19 research over the course of the pandemic with a particular focus on the first 10 months, which represents a unique period of rapid worldwide shift in scientific attention.


Subject(s)
COVID-19 , Humans , Pandemics , Publications , Natural Language Processing
19.
J Biomed Semantics ; 14(1): 2, 2023 02 02.
Article in English | MEDLINE | ID: covidwho-2224300

ABSTRACT

BACKGROUND: Medical lexicons enable the natural language processing (NLP) of health texts. Lexicons gather terms and concepts from thesauri and ontologies, and linguistic data for part-of-speech (PoS) tagging, lemmatization or natural language generation. To date, there is no such type of resource for Spanish. CONSTRUCTION AND CONTENT: This article describes an unified medical lexicon for Medical Natural Language Processing in Spanish. MedLexSp includes terms and inflected word forms with PoS information and Unified Medical Language System[Formula: see text] (UMLS) semantic types, groups and Concept Unique Identifiers (CUIs). To create it, we used NLP techniques and domain corpora (e.g. MedlinePlus). We also collected terms from the Dictionary of Medical Terms from the Spanish Royal Academy of Medicine, the Medical Subject Headings (MeSH), the Systematized Nomenclature of Medicine - Clinical Terms (SNOMED-CT), the Medical Dictionary for Regulatory Activities Terminology (MedDRA), the International Classification of Diseases vs. 10, the Anatomical Therapeutic Chemical Classification, the National Cancer Institute (NCI) Dictionary, the Online Mendelian Inheritance in Man (OMIM) and OrphaData. Terms related to COVID-19 were assembled by applying a similarity-based approach with word embeddings trained on a large corpus. MedLexSp includes 100 887 lemmas, 302 543 inflected forms (conjugated verbs, and number/gender variants), and 42 958 UMLS CUIs. We report two use cases of MedLexSp. First, applying the lexicon to pre-annotate a corpus of 1200 texts related to clinical trials. Second, PoS tagging and lemmatizing texts about clinical cases. MedLexSp improved the scores for PoS tagging and lemmatization compared to the default Spacy and Stanza python libraries. CONCLUSIONS: The lexicon is distributed in a delimiter-separated value file; an XML file with the Lexical Markup Framework; a lemmatizer module for the Spacy and Stanza libraries; and complementary Lexical Record (LR) files. The embeddings and code to extract COVID-19 terms, and the Spacy and Stanza lemmatizers enriched with medical terms are provided in a public repository.


Subject(s)
COVID-19 , Natural Language Processing , Humans , Language , Vocabulary, Controlled , Unified Medical Language System , Semantics
20.
BMC Med Inform Decis Mak ; 23(1): 20, 2023 01 26.
Article in English | MEDLINE | ID: covidwho-2214579

ABSTRACT

BACKGROUND: Extracting relevant information about infectious diseases is an essential task. However, a significant obstacle in supporting public health research is the lack of methods for effectively mining large amounts of health data. OBJECTIVE: This study aims to use natural language processing (NLP) to extract the key information (clinical factors, social determinants of health) from published cases in the literature. METHODS: The proposed framework integrates a data layer for preparing a data cohort from clinical case reports; an NLP layer to find the clinical and demographic-named entities and relations in the texts; and an evaluation layer for benchmarking performance and analysis. The focus of this study is to extract valuable information from COVID-19 case reports. RESULTS: The named entity recognition implementation in the NLP layer achieves a performance gain of about 1-3% compared to benchmark methods. Furthermore, even without extensive data labeling, the relation extraction method outperforms benchmark methods in terms of accuracy (by 1-8% better). A thorough examination reveals the disease's presence and symptoms prevalence in patients. CONCLUSIONS: A similar approach can be generalized to other infectious diseases. It is worthwhile to use prior knowledge acquired through transfer learning when researching other infectious diseases.


Subject(s)
COVID-19 , Natural Language Processing , Humans , Publications
SELECTION OF CITATIONS
SEARCH DETAIL