Search | VHL Regional Portal

Biomedical text readability after hypernym substitution with fine-tuned large language models.

Swanson, Karl; He, Shuhan; Calvano, Josh; Chen, David; Telvizian, Talar; Jiang, Lawrence; Chong, Paul; Schwell, Jacob; Mak, Gin; Lee, Jarone.

PLOS Digit Health ; 3(4): e0000489, 2024 Apr.

Article in English | MEDLINE | ID: mdl-38625843

ABSTRACT

The advent of patient access to complex medical information online has highlighted the need for simplification of biomedical text to improve patient understanding and engagement in taking ownership of their health. However, comprehension of biomedical text remains a difficult task due to the need for domain-specific expertise. We aimed to study the simplification of biomedical text via large language models (LLMs) commonly used for general natural language processing tasks involve text comprehension, summarization, generation, and prediction of new text from prompts. Specifically, we finetuned three variants of large language models to perform substitutions of complex words and word phrases in biomedical text with a related hypernym. The output of the text substitution process using LLMs was evaluated by comparing the pre- and post-substitution texts using four readability metrics and two measures of sentence complexity. A sample of 1,000 biomedical definitions in the National Library of Medicine's Unified Medical Language System (UMLS) was processed with three LLM approaches, and each showed an improvement in readability and sentence complexity after hypernym substitution. Readability scores were translated from a pre-processed collegiate reading level to a post-processed US high-school level. Comparison between the three LLMs showed that the GPT-J-6b approach had the best improvement in measures of sentence complexity. This study demonstrates the merit of hypernym substitution to improve readability of complex biomedical text for the public and highlights the use case for fine-tuning open-access large language models for biomedical natural language processing.

Impact of medical technologies may be predicted using constructed graph bibliometrics.

Jiang, Lawrence; Raza, Ashir; El Ariss, Abdel-Badih; Chen, David; Danaher-Garcia, Nicole; Lee, Jarone; He, Shuhan.

Sci Rep ; 14(1): 2419, 2024 01 29.

Article in English | MEDLINE | ID: mdl-38287044

ABSTRACT

Scientific research is driven by allocation of funding to different research projects based in part on the predicted scientific impact of the work. Data-driven algorithms can inform decision-making of scarce funding resources by identifying likely high-impact studies using bibliometrics. Compared to standardized citation-based metrics alone, we utilize a machine learning pipeline that analyzes high-dimensional relationships among a range of bibliometric features to improve the accuracy of predicting high-impact research. Random forest classification models were trained using 28 bibliometric features calculated from a dataset of 1,485,958 publications in medicine to retrospectively predict whether a publication would become high-impact. For each random forest model, the balanced accuracy score was above 0.95 and the area under the receiver operating characteristic curve was above 0.99. The high performance of high impact research prediction using our proposed models show that machine learning technologies are promising algorithms that can support funding decision-making for medical research.

Subject(s)

Bibliometrics , Medicine , Retrospective Studies , Algorithms , Machine Learning

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL