Your browser doesn't support javascript.
A comparative analysis of system features used in the TREC-COVID information retrieval challenge.
Chen, Jimmy S; Hersh, William R.
  • Chen JS; School of Medicine, Oregon Health & Science University, Portland, OR, USA. Electronic address: chenjim@ohsu.edu.
  • Hersh WR; Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR, USA.
J Biomed Inform ; 117: 103745, 2021 05.
Article in English | MEDLINE | ID: covidwho-1163986
Preprint
This scientific journal article is probably based on a previously available preprint. It has been identified through a machine matching algorithm, human confirmation is still pending.
See preprint
ABSTRACT
The COVID-19 pandemic has resulted in a rapidly growing quantity of scientific publications from journal articles, preprints, and other sources. The TREC-COVID Challenge was created to evaluate information retrieval (IR) methods and systems for this quickly expanding corpus. Using the COVID-19 Open Research Dataset (CORD-19), several dozen research teams participated in over 5 rounds of the TREC-COVID Challenge. While previous work has compared IR techniques used on other test collections, there are no studies that have analyzed the methods used by participants in the TREC-COVID Challenge. We manually reviewed team run reports from Rounds 2 and 5, extracted features from the documented methodologies, and used a univariate and multivariate regression-based analysis to identify features associated with higher retrieval performance. We observed that fine-tuning datasets with relevance judgments, MS-MARCO, and CORD-19 document vectors was associated with improved performance in Round 2 but not in Round 5. Though the relatively decreased heterogeneity of runs in Round 5 may explain the lack of significance in that round, fine-tuning has been found to improve search performance in previous challenge evaluations by improving a system's ability to map relevant queries and phrases to documents. Furthermore, term expansion was associated with improvement in system performance, and the use of the narrative field in the TREC-COVID topics was associated with decreased system performance in both rounds. These findings emphasize the need for clear queries in search. While our study has some limitations in its generalizability and scope of techniques analyzed, we identified some IR techniques that may be useful in building search systems for COVID-19 using the TREC-COVID test collections.
Subject(s)
Keywords

Full text: Available Collection: International databases Database: MEDLINE Main subject: Information Storage and Retrieval / Pandemics / COVID-19 Type of study: Experimental Studies Limits: Humans Language: English Journal: J Biomed Inform Journal subject: Medical Informatics Year: 2021 Document Type: Article

Similar

MEDLINE

...
LILACS

LIS


Full text: Available Collection: International databases Database: MEDLINE Main subject: Information Storage and Retrieval / Pandemics / COVID-19 Type of study: Experimental Studies Limits: Humans Language: English Journal: J Biomed Inform Journal subject: Medical Informatics Year: 2021 Document Type: Article