Your browser doesn't support javascript.
loading
Dementia risk prediction using decision-focused content selection from medical notes.
Li, Shengyang; Dexter, Paul; Ben-Miled, Zina; Boustani, Malaz.
Affiliation
  • Li S; Department of Electrical and Computer Engineering, Indiana University-Purdue University Indianapolis (IUPUI), Indianapolis, IN, 46202, USA. Electronic address: sl137@iu.edu.
  • Dexter P; Indiana University School of Medicine, 340 W. 10th St, Indianapolis, IN, 46202, USA; Regenstrief Institute, Inc., 1101 W. 10th Street, Indianapolis, IN, 46202, USA. Electronic address: prdexter@iu.edu.
  • Ben-Miled Z; Department of Electrical and Computer Engineering, Indiana University-Purdue University Indianapolis (IUPUI), Indianapolis, IN, 46202, USA; Regenstrief Institute, Inc., 1101 W. 10th Street, Indianapolis, IN, 46202, USA. Electronic address: zmiled@iu.edu.
  • Boustani M; Indiana University School of Medicine, 340 W. 10th St, Indianapolis, IN, 46202, USA; Regenstrief Institute, Inc., 1101 W. 10th Street, Indianapolis, IN, 46202, USA. Electronic address: mboustan@iu.edu.
Comput Biol Med ; 182: 109144, 2024 Sep 18.
Article in En | MEDLINE | ID: mdl-39298882
ABSTRACT
Several general-purpose language model (LM) architectures have been proposed with demonstrated improvement in text summarization and classification. Adapting these architectures to the medical domain requires additional considerations. For instance, the medical history of the patient is documented in the Electronic Health Record (EHR) which includes many medical notes drafted by healthcare providers. Direct processing of these notes may not be possible because the computational complexity of LMs imposes a limit on the length of input text. Therefore, previous applications resorted to content selection using truncation or summarization of the text. Unfortunately, these text processing techniques may lead to information loss, redundancy or irrelevance. In the present paper, a decision-focused content selection technique is proposed. The objective of this technique is to select a subset of sentences from the medical notes of a patient that are relevant to the target outcome over a predefined observation period. This decision-focused content selection methodology is then used to develop a dementia risk prediction model based on the Longformer LM architecture. The results show that the proposed framework delivers an AUC of 78.43 when the summary is restricted to 1024 tokens, outperforming previously proposed content selection techniques. This performance is notable given that the model estimates dementia risk with a one year prediction horizon, relies on an observation period of only one year and solely uses medical notes without other EHR data modalities. Moreover, the proposed techniques overcome the limitation of machine learning models that use a tabular representation of the text by preserving contextual content, enable feature engineering from raw text and circumvent the computational complexity of language models.
Key words

Full text: 1 Collection: 01-internacional Database: MEDLINE Language: En Journal: Comput Biol Med Year: 2024 Document type: Article Country of publication: United States

Full text: 1 Collection: 01-internacional Database: MEDLINE Language: En Journal: Comput Biol Med Year: 2024 Document type: Article Country of publication: United States