Your browser doesn't support javascript.
How do we share data in COVID-19 research? A systematic review of COVID-19 datasets in PubMed Central Articles.
Zuo, Xu; Chen, Yong; Ohno-Machado, Lucila; Xu, Hua.
  • Zuo X; School of Biomedical Informatics, The University of Texas Health Science Center at Houston.
  • Chen Y; University of Pennsylvania.
  • Ohno-Machado L; University of California San Diego.
  • Xu H; School of Biomedical Informatics, The University of Texas Health Science Center at Houston.
Brief Bioinform ; 22(2): 800-811, 2021 03 22.
Article in English | MEDLINE | ID: covidwho-1343640
ABSTRACT

OBJECTIVE:

This study aims at reviewing novel coronavirus disease (COVID-19) datasets extracted from PubMed Central articles, thus providing quantitative analysis to answer questions related to dataset contents, accessibility and citations.

METHODS:

We downloaded COVID-19-related full-text articles published until 31 May 2020 from PubMed Central. Dataset URL links mentioned in full-text articles were extracted, and each dataset was manually reviewed to provide information on 10 variables (1) type of the dataset, (2) geographic region where the data were collected, (3) whether the dataset was immediately downloadable, (4) format of the dataset files, (5) where the dataset was hosted, (6) whether the dataset was updated regularly, (7) the type of license used, (8) whether the metadata were explicitly provided, (9) whether there was a PubMed Central paper describing the dataset and (10) the number of times the dataset was cited by PubMed Central articles. Descriptive statistics about these seven variables were reported for all extracted datasets.

RESULTS:

We found that 28.5% of 12 324 COVID-19 full-text articles in PubMed Central provided at least one dataset link. In total, 128 unique dataset links were mentioned in 12 324 COVID-19 full text articles in PubMed Central. Further analysis showed that epidemiological datasets accounted for the largest portion (53.9%) in the dataset collection, and most datasets (84.4%) were available for immediate download. GitHub was the most popular repository for hosting COVID-19 datasets. CSV, XLSX and JSON were the most popular data formats. Additionally, citation patterns of COVID-19 datasets varied depending on specific datasets.

CONCLUSION:

PubMed Central articles are an important source of COVID-19 datasets, but there is significant heterogeneity in the way these datasets are mentioned, shared, updated and cited.
Subject(s)
Keywords

Full text: Available Collection: International databases Database: MEDLINE Main subject: Information Dissemination / PubMed / Datasets as Topic / SARS-CoV-2 / COVID-19 Type of study: Observational study / Reviews / Systematic review/Meta Analysis Limits: Humans Language: English Journal: Brief Bioinform Journal subject: Biology / Medical Informatics Year: 2021 Document Type: Article

Similar

MEDLINE

...
LILACS

LIS


Full text: Available Collection: International databases Database: MEDLINE Main subject: Information Dissemination / PubMed / Datasets as Topic / SARS-CoV-2 / COVID-19 Type of study: Observational study / Reviews / Systematic review/Meta Analysis Limits: Humans Language: English Journal: Brief Bioinform Journal subject: Biology / Medical Informatics Year: 2021 Document Type: Article