Your browser doesn't support javascript.
Building an OMOP common data model-compliant annotated corpus for COVID-19 clinical trials.
Sun, Yingcheng; Butler, Alex; Stewart, Latoya A; Liu, Hao; Yuan, Chi; Southard, Christopher T; Kim, Jae Hyun; Weng, Chunhua.
  • Sun Y; Department of Biomedical Informatics, Columbia University, New York, NY, USA.
  • Butler A; Department of Biomedical Informatics, Columbia University, New York, NY, USA; Department of Medicine, Columbia University, New York, NY, USA.
  • Stewart LA; Vagelos College of Physicians and Surgeons, Columbia University, New York, NY, USA.
  • Liu H; Department of Biomedical Informatics, Columbia University, New York, NY, USA.
  • Yuan C; Department of Biomedical Informatics, Columbia University, New York, NY, USA.
  • Southard CT; Vagelos College of Physicians and Surgeons, Columbia University, New York, NY, USA.
  • Kim JH; Department of Biomedical Informatics, Columbia University, New York, NY, USA.
  • Weng C; Department of Biomedical Informatics, Columbia University, New York, NY, USA. Electronic address: chunhua@columbia.edu.
J Biomed Inform ; 118: 103790, 2021 06.
Article in English | MEDLINE | ID: covidwho-1196724
ABSTRACT
Clinical trials are essential for generating reliable medical evidence, but often suffer from expensive and delayed patient recruitment because the unstructured eligibility criteria description prevents automatic query generation for eligibility screening. In response to the COVID-19 pandemic, many trials have been created but their information is not computable. We included 700 COVID-19 trials available at the point of study and developed a semi-automatic approach to generate an annotated corpus for COVID-19 clinical trial eligibility criteria called COVIC. A hierarchical annotation schema based on the OMOP Common Data Model was developed to accommodate four levels of annotation granularity i.e., study cohort, eligibility criteria, named entity and standard concept. In COVIC, 39 trials with more than one study cohorts were identified and labelled with an identifier for each cohort. 1,943 criteria for non-clinical characteristics such as "informed consent", "exclusivity of participation" were annotated. 9767 criteria were represented by 18,161 entities in 8 domains, 7,743 attributes of 7 attribute types and 16,443 relationships of 11 relationship types. 17,171 entities were mapped to standard medical concepts and 1,009 attributes were normalized into computable representations. COVIC can serve as a corpus indexed by semantic tags for COVID-19 trial search and analytics, and a benchmark for machine learning based criteria extraction.
Subject(s)
Keywords

Full text: Available Collection: International databases Database: MEDLINE Main subject: Computer Simulation / Clinical Trials as Topic / Eligibility Determination / COVID-19 Type of study: Cohort study / Observational study / Prognostic study / Randomized controlled trials Limits: Humans Language: English Journal: J Biomed Inform Journal subject: Medical Informatics Year: 2021 Document Type: Article Affiliation country: J.jbi.2021.103790

Similar

MEDLINE

...
LILACS

LIS


Full text: Available Collection: International databases Database: MEDLINE Main subject: Computer Simulation / Clinical Trials as Topic / Eligibility Determination / COVID-19 Type of study: Cohort study / Observational study / Prognostic study / Randomized controlled trials Limits: Humans Language: English Journal: J Biomed Inform Journal subject: Medical Informatics Year: 2021 Document Type: Article Affiliation country: J.jbi.2021.103790