ABSTRACT
Pathogen genomics is a critical tool for public health surveillance, infection control, outbreak investigations as well as research. In order to make use of pathogen genomics data, they must be interpreted using contextual data (metadata). Contextual data include sample metadata, laboratory methods, patient demographics, clinical outcomes and epidemiological information. However, the variability in how contextual information is captured by different authorities and how it is encoded in different databases poses challenges for data interpretation, integration and their use/re-use. The DataHarmonizer is a template-driven spreadsheet application for harmonizing, validating and transforming genomics contextual data into submission-ready formats for public or private repositories. The tool's web browser-based JavaScript environment enables validation and its offline functionality and local installation increases data security. The DataHarmonizer was developed to address the data sharing needs that arose during the COVID-19 pandemic, and was used by members of the Canadian COVID Genomics Network (CanCOGeN) to harmonize SARS-CoV-2 contextual data for national surveillance and for public repository submission. In order to support coordination of international surveillance efforts, we have partnered with the Public Health Alliance for Genomic Epidemiology to also provide a template conforming to its SARS-CoV-2 contextual data specification for use worldwide. Templates are also being developed for One Health and foodborne pathogens. Overall, the DataHarmonizer tool improves the effectiveness and fidelity of contextual data capture as well as its subsequent usability. Harmonization of contextual information across authorities, platforms and systems globally improves interoperability and reusability of data for concerted public health and research initiatives to fight the current pandemic and future public health emergencies. While initially developed for the COVID-19 pandemic, its expansion to other data management applications and pathogens is already underway.
Subject(s)
COVID-19 , Humans , COVID-19/epidemiology , Pandemics , SARS-CoV-2/genetics , Canada , Genomics/methodsABSTRACT
Genomic surveillance during the coronavirus disease 2019 (COVID-19) pandemic has been key to the timely identification of virus variants with important public health consequences, such as variants that can transmit among and cause severe disease in both vaccinated or recovered individuals. The rapid emergence of the Omicron variant highlighted the speed with which the extent of a threat must be assessed. Rapid sequencing and public health institutions' openness to sharing sequence data internationally give an unprecedented opportunity to do this; however, assessing the epidemiological and clinical properties of any new variant remains challenging. Here we highlight a "band of four" key data sources that can help to detect viral variants that threaten COVID-19 management: 1) genetic (virus sequence) data; 2) epidemiological and geographic data; 3) clinical and demographic data; and 4) immunization data. We emphasize the benefits that can be achieved by linking data from these sources and by combining data from these sources with virus sequence data. The considerable challenges of making genomic data available and linked with virus and patient attributes must be balanced against major consequences of not doing so, especially if new variants of concern emerge and spread without timely detection and action.
ABSTRACT
COVID-19 was declared to be a pandemic in March 2020 by the World Health Organization. Timely sharing of viral genomic sequencing data accompanied by a minimal set of contextual data is essential for informing regional, national, and international public health responses. Such contextual data is also necessary for developing, and improving clinical therapies and vaccines, and enhancing the scientific community's understanding of the SARS-CoV-2 virus. The Canadian COVID-19 Genomics Network (CanCOGeN) was launched in April 2020 to coordinate and upscale existing genomics-based COVID-19 research and surveillance efforts. CanCOGeN is performing large-scale sequencing of both the genomes of SARS-CoV-2 virus samples (VirusSeq) and affected Canadians (HostSeq). This paper addresses the privacy concerns associated with sharing the viral sequence data with a pre-defined set of contextual data describing the sample source and case attribute of the sequence data in the Canadian context. Currently, the viral genome sequences are shared by provincial public health laboratories and their healthcare and academic partners, with the Canadian National Microbiology Laboratory and with publicly accessible databases. However, data sharing delays and the provision of incomplete contextual data often occur because publicly releasing such data triggers privacy and data governance concerns. The CanCOGeN Ethics and Governance Expert Working Group thus has investigated several privacy issues cited by CanCOGeN data providers/stewards. This paper addresses these privacy concerns and offers insights primarily in the Canadian context, although similar privacy considerations also exist in other jurisdictions. We maintain that sharing viral sequencing data and its limited associated contextual data in the public domain generally does not pose insurmountable privacy challenges. However, privacy risks associated with reidentification should be actively monitored due to advancements in reidentification methods and the evolving pandemic landscape. We also argue that during a global health emergency such as COVID-19, privacy should not be used as a blanket measure to prevent such genomic data sharing due to the significant benefits it provides towards public health responses and ongoing research activities.
ABSTRACT
One year into the global COVID-19 pandemic, the focus of attention has shifted to the emergence and spread of SARS-CoV-2 variants of concern (VOCs). After nearly a year of the pandemic with little evolutionary change affecting human health, several variants have now been shown to have substantial detrimental effects on transmission and severity of the virus. Public health officials, medical practitioners, scientists, and the broader community have since been scrambling to understand what these variants mean for diagnosis, treatment, and the control of the pandemic through nonpharmaceutical interventions and vaccines. Here we explore the evolutionary processes that are involved in the emergence of new variants, what we can expect in terms of the future emergence of VOCs, and what we can do to minimise their impact.