Your browser doesn't support javascript.
Epidemiologic information discovery from open-access COVID-19 case reports via pretrained language model.
Wang, Zhizheng; Liu, Xiao Fan; Du, Zhanwei; Wang, Lin; Wu, Ye; Holme, Petter; Lachmann, Michael; Lin, Hongfei; Wong, Zoie S Y; Xu, Xiao-Ke; Sun, Yuanyuan.
  • Wang Z; College of Computer Science and Technology, Dalian University of Technology, Haishan Building No.2 Linggong Road, Dalian, Liaoning 116023, China.
  • Liu XF; Web Mining Laboratory, Department of Media and Communication, City University of Hong Kong, Hong Kong Special Administrative Region, China.
  • Du Z; WHO Collaborating Centre for Infectious Disease Epidemiology and Control, School of Public Health, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong Special Administrative Region, China.
  • Wang L; Department of Genetics, University of Cambridge, Cambridge CB2 3EH, UK.
  • Wu Y; Computational Communication Research Center and School of Journalism and Communication, Beijing Normal University, Beijing, China.
  • Holme P; Tokyo Tech World Research Hub Initiative (WRHI), Institute of Innovative Research, Tokyo Institute of Technology, Tokyo, Japan.
  • Lachmann M; Santa Fe Institute, Santa Fe, NM, USA.
  • Lin H; College of Computer Science and Technology, Dalian University of Technology, Haishan Building No.2 Linggong Road, Dalian, Liaoning 116023, China.
  • Wong ZSY; Graduate School of Public Health, St. Luke's International University, Tokyo, Japan.
  • Xu XK; College of Information and Communication Engineering, Dalian Minzu University, Liaoning, China.
  • Sun Y; College of Computer Science and Technology, Dalian University of Technology, Haishan Building No.2 Linggong Road, Dalian, Liaoning 116023, China.
iScience ; 25(10): 105079, 2022 Oct 21.
Article in English | MEDLINE | ID: covidwho-2007782
ABSTRACT
Although open-access data are increasingly common and useful to epidemiological research, the curation of such datasets is resource-intensive and time-consuming. Despite the existence of a major source of COVID-19 data, the regularly disclosed case reports were often written in natural language with an unstructured format. Here, we propose a computational framework that can automatically extract epidemiological information from open-access COVID-19 case reports. We develop this framework by coupling a language model developed using deep neural networks with training samples compiled using an optimized data annotation strategy. When applied to the COVID-19 case reports collected from mainland China, our framework outperforms all other state-of-the-art deep learning models. The information extracted from our approach is highly consistent with that obtained from the gold-standard manual coding, with a matching rate of 80%. To disseminate our algorithm, we provide an open-access online platform that is able to estimate key epidemiological statistics in real time, with much less effort for data curation.
Keywords

Full text: Available Collection: International databases Database: MEDLINE Type of study: Case report / Observational study Language: English Journal: IScience Year: 2022 Document Type: Article Affiliation country: J.isci.2022.105079

Similar

MEDLINE

...
LILACS

LIS


Full text: Available Collection: International databases Database: MEDLINE Type of study: Case report / Observational study Language: English Journal: IScience Year: 2022 Document Type: Article Affiliation country: J.isci.2022.105079