Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 12 de 12
Filter
1.
Microb Genom ; 10(6)2024 Jun.
Article in English | MEDLINE | ID: mdl-38860884

ABSTRACT

As public health laboratories expand their genomic sequencing and bioinformatics capacity for the surveillance of different pathogens, labs must carry out robust validation, training, and optimization of wet- and dry-lab procedures. Achieving these goals for algorithms, pipelines and instruments often requires that lower quality datasets be made available for analysis and comparison alongside those of higher quality. This range of data quality in reference sets can complicate the sharing of sub-optimal datasets that are vital for the community and for the reproducibility of assays. Sharing of useful, but sub-optimal datasets requires careful annotation and documentation of known issues to enable appropriate interpretation, avoid being mistaken for better quality information, and for these data (and their derivatives) to be easily identifiable in repositories. Unfortunately, there are currently no standardized attributes or mechanisms for tagging poor-quality datasets, or datasets generated for a specific purpose, to maximize their utility, searchability, accessibility and reuse. The Public Health Alliance for Genomic Epidemiology (PHA4GE) is an international community of scientists from public health, industry and academia focused on improving the reproducibility, interoperability, portability, and openness of public health bioinformatic software, skills, tools and data. To address the challenges of sharing lower quality datasets, PHA4GE has developed a set of standardized contextual data tags, namely fields and terms, that can be included in public repository submissions as a means of flagging pathogen sequence data with known quality issues, increasing their discoverability. The contextual data tags were developed through consultations with the community including input from the International Nucleotide Sequence Data Collaboration (INSDC), and have been standardized using ontologies - community-based resources for defining the tag properties and the relationships between them. The standardized tags are agnostic to the organism and the sequencing technique used and thus can be applied to data generated from any pathogen using an array of sequencing techniques. The tags can also be applied to synthetic (lab created) data. The list of standardized tags is maintained by PHA4GE and can be found at https://github.com/pha4ge/contextual_data_QC_tags. Definitions, ontology IDs, examples of use, as well as a JSON representation, are provided. The PHA4GE QC tags were tested, and are now implemented, by the FDA's GenomeTrakr laboratory network as part of its routine submission process for SARS-CoV-2 wastewater surveillance. We hope that these simple, standardized tags will help improve communication regarding quality control in public repositories, in addition to making datasets of variable quality more easily identifiable. Suggestions for additional tags can be submitted to PHA4GE via the New Term Request Form in the GitHub repository. By providing a mechanism for feedback and suggestions, we also expect that the tags will evolve with the needs of the community.


Subject(s)
Computational Biology , Public Health , Quality Control , Humans , Computational Biology/methods , Information Dissemination/methods , Reproducibility of Results , Molecular Sequence Annotation/methods , Genomics/methods , Software
2.
ArXiv ; 2024 May 08.
Article in English | MEDLINE | ID: mdl-38764594

ABSTRACT

The COVID-19 pandemic led to a large global effort to sequence SARS-CoV-2 genomes from patient samples to track viral evolution and inform public health response. Millions of SARS-CoV-2 genome sequences have been deposited in global public repositories. The Canadian COVID-19 Genomics Network (CanCOGeN - VirusSeq), a consortium tasked with coordinating expanded sequencing of SARS-CoV-2 genomes across Canada early in the pandemic, created the Canadian VirusSeq Data Portal, with associated data pipelines and procedures, to support these efforts. The goal of VirusSeq was to allow open access to Canadian SARS-CoV-2 genomic sequences and enhanced, standardized contextual data that were unavailable in other repositories and that meet FAIR standards (Findable, Accessible, Interoperable and Reusable). In addition, the Portal data submission pipeline contains data quality checking procedures and appropriate acknowledgement of data generators that encourages collaboration. From inception to execution, the portal was developed with a conscientious focus on strong data governance principles and practices. Extensive efforts ensured a commitment to Canadian privacy laws, data security standards, and organizational processes. This Portal has been coupled with other resources like Viral AI and was further leveraged by the Coronavirus Variants Rapid Response Network (CoVaRR-Net) to produce a suite of continually updated analytical tools and notebooks. Here we highlight this Portal, including its contextual data not available elsewhere, and the 'Duotang', a web platform that presents key genomic epidemiology and modeling analyses on circulating and emerging SARS-CoV-2 variants in Canada. Duotang presents dynamic changes in variant composition of SARS-CoV-2 in Canada and by province, estimates variant growth, and displays complementary interactive visualizations, with a text overview of the current situation. The VirusSeq Data Portal and Duotang resources, alongside additional analyses and resources computed from the Portal (COVID-MVP, CoVizu), are all open-source and freely available. Together, they provide an updated picture of SARS-CoV-2 evolution to spur scientific discussions, inform public discourse, and support communication with and within public health authorities. They also serve as a framework for other jurisdictions interested in open, collaborative sequence data sharing and analyses.

3.
BMJ Open ; 13(2): e066418, 2023 02 07.
Article in English | MEDLINE | ID: mdl-36750286

ABSTRACT

OBJECTIVES: COVID-19 research has significantly contributed to pandemic response and the enhancement of public health capacity. COVID-19 data collected by provincial/territorial health authorities in Canada are valuable for research advancement yet not readily available to the public, including researchers. To inform developments in public health data-sharing in Canada, we explored Canadians' opinions of public health authorities sharing deidentified individual-level COVID-19 data publicly. DESIGN/SETTING/INTERVENTIONS/OUTCOMES: A national cross-sectional survey was administered in Canada in March 2022, assessing Canadians' opinions on publicly sharing COVID-19 datatypes. Market research firm Léger was employed for recruitment and data collection. PARTICIPANTS: Anyone greater than or equal to 18 years and currently living in Canada. RESULTS: 4981 participants completed the survey with a 92.3% response rate. 79.7% were supportive of provincial/territorial authorities publicly sharing deidentified COVID-19 data, while 20.3% were hesitant/averse/unsure. Datatypes most supported for being shared publicly were symptoms (83.0% in support), geographical region (82.6%) and COVID-19 vaccination status (81.7%). Datatypes with the most aversion were employment sector (27.4% averse), postal area (26.7%) and international travel history (19.7%). Generally supportive Canadians were characterised as being ≥50 years, with higher education, and being vaccinated against COVID-19 at least once. Vaccination status was the most influential predictor of data-sharing opinion, with respondents who were ever vaccinated being 4.20 times more likely (95% CI 3.21 to 5.48, p=0.000) to be generally supportive of data-sharing than those unvaccinated. CONCLUSIONS: These findings suggest that the Canadian public is generally favourable to deidentified data-sharing. Identifying factors that are likely to improve attitudes towards data-sharing are useful to stakeholders involved in data-sharing initiatives, such as public health agencies, in informing the development of public health communication and data-sharing policies. As Canada progresses through the COVID-19 pandemic, and with limited testing and reporting of COVID-19 data, it is essential to improve deidentified data-sharing given the public's general support for these efforts.


Subject(s)
COVID-19 , Humans , Cross-Sectional Studies , Public Opinion , Pandemics , COVID-19 Vaccines , Canada
4.
Microb Genom ; 9(1)2023 01.
Article in English | MEDLINE | ID: mdl-36748616

ABSTRACT

Pathogen genomics is a critical tool for public health surveillance, infection control, outbreak investigations as well as research. In order to make use of pathogen genomics data, they must be interpreted using contextual data (metadata). Contextual data include sample metadata, laboratory methods, patient demographics, clinical outcomes and epidemiological information. However, the variability in how contextual information is captured by different authorities and how it is encoded in different databases poses challenges for data interpretation, integration and their use/re-use. The DataHarmonizer is a template-driven spreadsheet application for harmonizing, validating and transforming genomics contextual data into submission-ready formats for public or private repositories. The tool's web browser-based JavaScript environment enables validation and its offline functionality and local installation increases data security. The DataHarmonizer was developed to address the data sharing needs that arose during the COVID-19 pandemic, and was used by members of the Canadian COVID Genomics Network (CanCOGeN) to harmonize SARS-CoV-2 contextual data for national surveillance and for public repository submission. In order to support coordination of international surveillance efforts, we have partnered with the Public Health Alliance for Genomic Epidemiology to also provide a template conforming to its SARS-CoV-2 contextual data specification for use worldwide. Templates are also being developed for One Health and foodborne pathogens. Overall, the DataHarmonizer tool improves the effectiveness and fidelity of contextual data capture as well as its subsequent usability. Harmonization of contextual information across authorities, platforms and systems globally improves interoperability and reusability of data for concerted public health and research initiatives to fight the current pandemic and future public health emergencies. While initially developed for the COVID-19 pandemic, its expansion to other data management applications and pathogens is already underway.


Subject(s)
COVID-19 , Humans , COVID-19/epidemiology , Pandemics , SARS-CoV-2/genetics , Canada , Genomics/methods
5.
Gigascience ; 112022 02 16.
Article in English | MEDLINE | ID: mdl-35169842

ABSTRACT

BACKGROUND: The Public Health Alliance for Genomic Epidemiology (PHA4GE) (https://pha4ge.org) is a global coalition that is actively working to establish consensus standards, document and share best practices, improve the availability of critical bioinformatics tools and resources, and advocate for greater openness, interoperability, accessibility, and reproducibility in public health microbial bioinformatics. In the face of the current pandemic, PHA4GE has identified a need for a fit-for-purpose, open-source SARS-CoV-2 contextual data standard. RESULTS: As such, we have developed a SARS-CoV-2 contextual data specification package based on harmonizable, publicly available community standards. The specification can be implemented via a collection template, as well as an array of protocols and tools to support both the harmonization and submission of sequence data and contextual information to public biorepositories. CONCLUSIONS: Well-structured, rich contextual data add value, promote reuse, and enable aggregation and integration of disparate datasets. Adoption of the proposed standard and practices will better enable interoperability between datasets and systems, improve the consistency and utility of generated data, and ultimately facilitate novel insights and discoveries in SARS-CoV-2 and COVID-19. The package is now supported by the NCBI's BioSample database.


Subject(s)
COVID-19 , SARS-CoV-2 , Genomics , Humans , Metadata , Public Health , Reproducibility of Results
6.
Front Genet ; 12: 716541, 2021.
Article in English | MEDLINE | ID: mdl-35401651

ABSTRACT

COVID-19 was declared to be a pandemic in March 2020 by the World Health Organization. Timely sharing of viral genomic sequencing data accompanied by a minimal set of contextual data is essential for informing regional, national, and international public health responses. Such contextual data is also necessary for developing, and improving clinical therapies and vaccines, and enhancing the scientific community's understanding of the SARS-CoV-2 virus. The Canadian COVID-19 Genomics Network (CanCOGeN) was launched in April 2020 to coordinate and upscale existing genomics-based COVID-19 research and surveillance efforts. CanCOGeN is performing large-scale sequencing of both the genomes of SARS-CoV-2 virus samples (VirusSeq) and affected Canadians (HostSeq). This paper addresses the privacy concerns associated with sharing the viral sequence data with a pre-defined set of contextual data describing the sample source and case attribute of the sequence data in the Canadian context. Currently, the viral genome sequences are shared by provincial public health laboratories and their healthcare and academic partners, with the Canadian National Microbiology Laboratory and with publicly accessible databases. However, data sharing delays and the provision of incomplete contextual data often occur because publicly releasing such data triggers privacy and data governance concerns. The CanCOGeN Ethics and Governance Expert Working Group thus has investigated several privacy issues cited by CanCOGeN data providers/stewards. This paper addresses these privacy concerns and offers insights primarily in the Canadian context, although similar privacy considerations also exist in other jurisdictions. We maintain that sharing viral sequencing data and its limited associated contextual data in the public domain generally does not pose insurmountable privacy challenges. However, privacy risks associated with reidentification should be actively monitored due to advancements in reidentification methods and the evolving pandemic landscape. We also argue that during a global health emergency such as COVID-19, privacy should not be used as a blanket measure to prevent such genomic data sharing due to the significant benefits it provides towards public health responses and ongoing research activities.

7.
NPJ Sci Food ; 2: 23, 2018.
Article in English | MEDLINE | ID: mdl-31304272

ABSTRACT

The construction of high capacity data sharing networks to support increasing government and commercial data exchange has highlighted a key roadblock: the content of existing Internet-connected information remains siloed due to a multiplicity of local languages and data dictionaries. This lack of a digital lingua franca is obvious in the domain of human food as materials travel from their wild or farm origin, through processing and distribution chains, to consumers. Well defined, hierarchical vocabulary, connected with logical relationships-in other words, an ontology-is urgently needed to help tackle data harmonization problems that span the domains of food security, safety, quality, production, distribution, and consumer health and convenience. FoodOn (http://foodon.org) is a consortium-driven project to build a comprehensive and easily accessible global farm-to-fork ontology about food, that accurately and consistently describes foods commonly known in cultures from around the world. FoodOn addresses food product terminology gaps and supports food traceability. Focusing on human and domesticated animal food description, FoodOn contains animal and plant food sources, food categories and products, and other facets like preservation processes, contact surfaces, and packaging. Much of FoodOn's vocabulary comes from transforming LanguaL, a mature and popular food indexing thesaurus, into a World Wide Web Consortium (W3C) OWL Web Ontology Language-formatted vocabulary that provides system interoperability, quality control, and software-driven intelligence. FoodOn compliments other technologies facilitating food traceability, which is becoming critical in this age of increasing globalization of food networks.

8.
Nucleic Acids Res ; 44(D1): D646-53, 2016 Jan 04.
Article in English | MEDLINE | ID: mdl-26578582

ABSTRACT

The Pseudomonas Genome Database (http://www.pseudomonas.com) is well known for the application of community-based annotation approaches for producing a high-quality Pseudomonas aeruginosa PAO1 genome annotation, and facilitating whole-genome comparative analyses with other Pseudomonas strains. To aid analysis of potentially thousands of complete and draft genome assemblies, this database and analysis platform was upgraded to integrate curated genome annotations and isolate metadata with enhanced tools for larger scale comparative analysis and visualization. Manually curated gene annotations are supplemented with improved computational analyses that help identify putative drug targets and vaccine candidates or assist with evolutionary studies by identifying orthologs, pathogen-associated genes and genomic islands. The database schema has been updated to integrate isolate metadata that will facilitate more powerful analysis of genomes across datasets in the future. We continue to place an emphasis on providing high-quality updates to gene annotations through regular review of the scientific literature and using community-based approaches including a major new Pseudomonas community initiative for the assignment of high-quality gene ontology terms to genes. As we further expand from thousands of genomes, we plan to provide enhancements that will aid data visualization and analysis arising from whole-genome comparative studies including more pan-genome and population-based approaches.


Subject(s)
Databases, Genetic , Genome, Bacterial , Molecular Sequence Annotation , Pseudomonas/genetics , Bacterial Proteins/analysis , Bacterial Proteins/chemistry , Drug Resistance, Bacterial/genetics , Gene Ontology , Genomic Islands , Internet , Pseudomonas/drug effects , Pseudomonas/pathogenicity , Virulence Factors
9.
Mol Microbiol ; 86(6): 1404-23, 2012 Dec.
Article in English | MEDLINE | ID: mdl-23078142

ABSTRACT

The interaction of Cryptococcus neoformans with phagocytic cells of the innate immune system is a key step in disseminated disease leading to meningoencephalitis in immunocompromised individuals. Transcriptional profiling of cryptococcal cells harvested from cell culture medium or from macrophages found differential expression of metabolic and other functions during fungal adaptation to the intracellular environment. We focused on the ACL1 gene for ATP-citrate lyase, which converts citrate to acetyl-CoA, because this gene showed elevated transcript levels in macrophages and because of the importance of acetyl-CoA as a central metabolite. Mutants lacking ACL1 showed delayed growth on medium containing glucose, reduced cellular levels of acetyl-CoA, defective production of virulence factors, increased susceptibility to the antifungal drug fluconazole and decreased survival within macrophages. Importantly, acl1 mutants were unable to cause disease in a murine inhalation model, a phenotype that was more extreme than other mutants with defects in acetyl-CoA production (e.g. an acetyl-CoA synthetase mutant). Loss of virulence is likely due to perturbation of critical physiological interconnections between virulence factor expression and metabolism in C. neoformans. Phylogenetic analysis and structural modelling of cryptococcal Acl1 identified three indels unique to fungal protein sequences; these differences may provide opportunities for the development of pathogen-specific inhibitors.


Subject(s)
ATP Citrate (pro-S)-Lyase/deficiency , Acetyl Coenzyme A/metabolism , Cryptococcus neoformans/metabolism , Cryptococcus neoformans/pathogenicity , Virulence Factors/metabolism , ATP Citrate (pro-S)-Lyase/metabolism , Amino Acid Sequence , Animals , Cell Line , Citric Acid/metabolism , Cryptococcosis/microbiology , Cryptococcosis/pathology , Cryptococcus neoformans/enzymology , Cryptococcus neoformans/genetics , Culture Media/chemistry , Disease Models, Animal , Glucose/metabolism , INDEL Mutation , Macrophages/immunology , Macrophages/microbiology , Mice , Microbial Viability , Models, Molecular , Molecular Sequence Data , Phylogeny , Sequence Homology, Amino Acid , Virulence
10.
Eukaryot Cell ; 11(2): 109-18, 2012 Feb.
Article in English | MEDLINE | ID: mdl-22140231

ABSTRACT

The basidiomycete fungus Cryptococcus neoformans infects humans via inhalation of desiccated yeast cells or spores from the environment. In the absence of effective immune containment, the initial pulmonary infection often spreads to the central nervous system to result in meningoencephalitis. The fungus must therefore make the transition from the environment to different mammalian niches that include the intracellular locale of phagocytic cells and extracellular sites in the lung, bloodstream, and central nervous system. Recent studies provide insights into mechanisms of adaptation during this transition that include the expression of antiphagocytic functions, the remodeling of central carbon metabolism, the expression of specific nutrient acquisition systems, and the response to hypoxia. Specific transcription factors regulate these functions as well as the expression of one or more of the major known virulence factors of C. neoformans. Therefore, virulence factor expression is to a large extent embedded in the regulation of a variety of functions needed for growth in mammalian hosts. In this regard, the complex integration of these processes is reminiscent of the master regulators of virulence in bacterial pathogens.


Subject(s)
Cryptococcus neoformans/physiology , Cryptococcus neoformans/pathogenicity , Gene Expression Regulation, Fungal , Host-Pathogen Interactions , Adaptation, Physiological , Animals , Humans , Iron/metabolism , Mammals , Virulence , Virulence Factors/genetics , Virulence Factors/metabolism
12.
Nat Rev Microbiol ; 9(3): 193-203, 2011 Mar.
Article in English | MEDLINE | ID: mdl-21326274

ABSTRACT

Cryptococcus neoformans is generally considered to be an opportunistic fungal pathogen because of its tendency to infect immunocompromised individuals, particularly those infected with HIV. However, this view has been challenged by the recent discovery of specialized interactions between the fungus and its mammalian hosts, and by the emergence of the related species Cryptococcus gattii as a primary pathogen of immunocompetent populations. In this Review, we highlight features of cryptococcal pathogens that reveal their adaptation to the mammalian environment. These features include not only remarkably sophisticated interactions with phagocytic cells to promote intracellular survival, dissemination to the central nervous system and escape, but also surprising morphological and genomic adaptations such as the formation of polyploid giant cells in the lung.


Subject(s)
Cryptococcosis/microbiology , Cryptococcus/pathogenicity , Opportunistic Infections/microbiology , Communicable Diseases, Emerging/microbiology , Cryptococcus/cytology , Humans , Spores, Fungal/pathogenicity , Virulence
SELECTION OF CITATIONS
SEARCH DETAIL
...