Your browser doesn't support javascript.
A transformer-based representation-learning model with unified processing of multimodal input for clinical diagnostics.
Zhou, Hong-Yu; Yu, Yizhou; Wang, Chengdi; Zhang, Shu; Gao, Yuanxu; Pan, Jia; Shao, Jun; Lu, Guangming; Zhang, Kang; Li, Weimin.
  • Zhou HY; Department of Computer Science, The University of Hong Kong, Pokfulam, China.
  • Yu Y; Department of Computer Science, The University of Hong Kong, Pokfulam, China. yizhouy@acm.org.
  • Wang C; Department of Pulmonary and Critical Care Medicine, Med-X Center for Manufacturing, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, China. chengdi_wang@scu.edu.cn.
  • Zhang S; AI Lab, Deepwise Healthcare, Beijing, China.
  • Gao Y; Guangzhou Laboratory, Guangzhou, China.
  • Pan J; Department of Computer Science, The University of Hong Kong, Pokfulam, China.
  • Shao J; Department of Pulmonary and Critical Care Medicine, Med-X Center for Manufacturing, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, China.
  • Lu G; Department of Medical Imaging, Jinling Hospital, Nanjing University School of Medicine, Nanjing, China.
  • Zhang K; Zhuhai International Eye Center and Provincial Key Laboratory of Tumor Interventional Diagnosis and Treatment, Zhuhai People's Hospital and the First Affiliated Hospital of Faculty of Medicine, Macau University of Science and Technology and University Hospital, Guangdong, China. kang.zhang@gmail.com
  • Li W; Department of Big Data and Biomedical Artificial Intelligence, National Biomedical Imaging Center, College of Future Technology, Peking University, Beijing, China. kang.zhang@gmail.com.
Nat Biomed Eng ; 7(6): 743-755, 2023 06.
Article in English | MEDLINE | ID: covidwho-20245377
ABSTRACT
During the diagnostic process, clinicians leverage multimodal information, such as the chief complaint, medical images and laboratory test results. Deep-learning models for aiding diagnosis have yet to meet this requirement of leveraging multimodal information. Here we report a transformer-based representation-learning model as a clinical diagnostic aid that processes multimodal input in a unified manner. Rather than learning modality-specific features, the model leverages embedding layers to convert images and unstructured and structured text into visual tokens and text tokens, and uses bidirectional blocks with intramodal and intermodal attention to learn holistic representations of radiographs, the unstructured chief complaint and clinical history, and structured clinical information such as laboratory test results and patient demographic information. The unified model outperformed an image-only model and non-unified multimodal diagnosis models in the identification of pulmonary disease (by 12% and 9%, respectively) and in the prediction of adverse clinical outcomes in patients with COVID-19 (by 29% and 7%, respectively). Unified multimodal transformer-based models may help streamline the triaging of patients and facilitate the clinical decision-making process.
Subject(s)

Full text: Available Collection: International databases Database: MEDLINE Main subject: COVID-19 Type of study: Prognostic study Limits: Humans Language: English Journal: Nat Biomed Eng Year: 2023 Document Type: Article Affiliation country: S41551-023-01045-x

Similar

MEDLINE

...
LILACS

LIS


Full text: Available Collection: International databases Database: MEDLINE Main subject: COVID-19 Type of study: Prognostic study Limits: Humans Language: English Journal: Nat Biomed Eng Year: 2023 Document Type: Article Affiliation country: S41551-023-01045-x