Your browser doesn't support javascript.
Predicting Emerging Themes in Rapidly Expanding COVID-19 Literature With Unsupervised Word Embeddings and Machine Learning: Evidence-Based Study.
Pal, Ridam; Chopra, Harshita; Awasthi, Raghav; Bandhey, Harsh; Nagori, Aditya; Sethi, Tavpritesh.
  • Pal R; Department of Computational Biology, Indraprastha Institute of Information Technology Delhi, New Delhi, India.
  • Chopra H; Maharaja Surajmal Institute of Technology, Guru Gobind Singh Indraprastha University, New Delhi, India.
  • Awasthi R; Department of Computational Biology, Indraprastha Institute of Information Technology Delhi, New Delhi, India.
  • Bandhey H; Department of Computational Biology, Indraprastha Institute of Information Technology Delhi, New Delhi, India.
  • Nagori A; Department of Computational Biology, Indraprastha Institute of Information Technology Delhi, New Delhi, India.
  • Sethi T; Council of Scientific & Industrial Research-Institute of Genomics and Integrative Biology, New Delhi, India.
J Med Internet Res ; 24(11): e34067, 2022 11 02.
Article in English | MEDLINE | ID: covidwho-2098982
ABSTRACT

BACKGROUND:

Evidence from peer-reviewed literature is the cornerstone for designing responses to global threats such as COVID-19. In massive and rapidly growing corpuses, such as COVID-19 publications, assimilating and synthesizing information is challenging. Leveraging a robust computational pipeline that evaluates multiple aspects, such as network topological features, communities, and their temporal trends, can make this process more efficient.

OBJECTIVE:

We aimed to show that new knowledge can be captured and tracked using the temporal change in the underlying unsupervised word embeddings of the literature. Further imminent themes can be predicted using machine learning on the evolving associations between words.

METHODS:

Frequently occurring medical entities were extracted from the abstracts of more than 150,000 COVID-19 articles published on the World Health Organization database, collected on a monthly interval starting from February 2020. Word embeddings trained on each month's literature were used to construct networks of entities with cosine similarities as edge weights. Topological features of the subsequent month's network were forecasted based on prior patterns, and new links were predicted using supervised machine learning. Community detection and alluvial diagrams were used to track biomedical themes that evolved over the months.

RESULTS:

We found that thromboembolic complications were detected as an emerging theme as early as August 2020. A shift toward the symptoms of long COVID complications was observed during March 2021, and neurological complications gained significance in June 2021. A prospective validation of the link prediction models achieved an area under the receiver operating characteristic curve of 0.87. Predictive modeling revealed predisposing conditions, symptoms, cross-infection, and neurological complications as dominant research themes in COVID-19 publications based on the patterns observed in previous months.

CONCLUSIONS:

Machine learning-based prediction of emerging links can contribute toward steering research by capturing themes represented by groups of medical entities, based on patterns of semantic relationships over time.
Subject(s)
Keywords

Full text: Available Collection: International databases Database: MEDLINE Main subject: COVID-19 Type of study: Experimental Studies / Observational study / Prognostic study / Randomized controlled trials Topics: Long Covid Limits: Humans Language: English Journal: J Med Internet Res Journal subject: Medical Informatics Year: 2022 Document Type: Article Affiliation country: 34067

Similar

MEDLINE

...
LILACS

LIS


Full text: Available Collection: International databases Database: MEDLINE Main subject: COVID-19 Type of study: Experimental Studies / Observational study / Prognostic study / Randomized controlled trials Topics: Long Covid Limits: Humans Language: English Journal: J Med Internet Res Journal subject: Medical Informatics Year: 2022 Document Type: Article Affiliation country: 34067