Pesquisa | Portal Regional da BVS

PHA4GE quality control contextual data tags: standardized annotations for sharing public health sequence datasets with known quality issues to facilitate testing and training.

Griffiths, Emma J; Mendes, Inês; Maguire, Finlay; Guthrie, Jennifer L; Wee, Bryan A; Schmedes, Sarah; Holt, Kathryn; Yadav, Chanchal; Cameron, Rhiannon; Barclay, Charlotte; Dooley, Damion; MacCannell, Duncan; Chindelevitch, Leonid; Karsch-Mizrachi, Ilene; Waheed, Zahra; Katz, Lee; Petit Iii, Robert; Dave, Mugdha; Oluniyi, Paul; Nasar, Muhammad Ibtisam; Raphenya, Amogelang; Hsiao, William W L; Timme, Ruth E.

Microb Genom ; 10(6)2024 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-38860884

RESUMO

As public health laboratories expand their genomic sequencing and bioinformatics capacity for the surveillance of different pathogens, labs must carry out robust validation, training, and optimization of wet- and dry-lab procedures. Achieving these goals for algorithms, pipelines and instruments often requires that lower quality datasets be made available for analysis and comparison alongside those of higher quality. This range of data quality in reference sets can complicate the sharing of sub-optimal datasets that are vital for the community and for the reproducibility of assays. Sharing of useful, but sub-optimal datasets requires careful annotation and documentation of known issues to enable appropriate interpretation, avoid being mistaken for better quality information, and for these data (and their derivatives) to be easily identifiable in repositories. Unfortunately, there are currently no standardized attributes or mechanisms for tagging poor-quality datasets, or datasets generated for a specific purpose, to maximize their utility, searchability, accessibility and reuse. The Public Health Alliance for Genomic Epidemiology (PHA4GE) is an international community of scientists from public health, industry and academia focused on improving the reproducibility, interoperability, portability, and openness of public health bioinformatic software, skills, tools and data. To address the challenges of sharing lower quality datasets, PHA4GE has developed a set of standardized contextual data tags, namely fields and terms, that can be included in public repository submissions as a means of flagging pathogen sequence data with known quality issues, increasing their discoverability. The contextual data tags were developed through consultations with the community including input from the International Nucleotide Sequence Data Collaboration (INSDC), and have been standardized using ontologies - community-based resources for defining the tag properties and the relationships between them. The standardized tags are agnostic to the organism and the sequencing technique used and thus can be applied to data generated from any pathogen using an array of sequencing techniques. The tags can also be applied to synthetic (lab created) data. The list of standardized tags is maintained by PHA4GE and can be found at https://github.com/pha4ge/contextual_data_QC_tags. Definitions, ontology IDs, examples of use, as well as a JSON representation, are provided. The PHA4GE QC tags were tested, and are now implemented, by the FDA's GenomeTrakr laboratory network as part of its routine submission process for SARS-CoV-2 wastewater surveillance. We hope that these simple, standardized tags will help improve communication regarding quality control in public repositories, in addition to making datasets of variable quality more easily identifiable. Suggestions for additional tags can be submitted to PHA4GE via the New Term Request Form in the GitHub repository. By providing a mechanism for feedback and suggestions, we also expect that the tags will evolve with the needs of the community.

Assuntos

Biologia Computacional , Saúde Pública , Controle de Qualidade , Humanos , Biologia Computacional/métodos , Disseminação de Informação/métodos , Reprodutibilidade dos Testes , Anotação de Sequência Molecular/métodos , Genômica/métodos , Software

The Human Phenotype Ontology in 2024: phenotypes around the world.

Gargano, Michael A; Matentzoglu, Nicolas; Coleman, Ben; Addo-Lartey, Eunice B; Anagnostopoulos, Anna V; Anderton, Joel; Avillach, Paul; Bagley, Anita M; Bakstein, Eduard; Balhoff, James P; Baynam, Gareth; Bello, Susan M; Berk, Michael; Bertram, Holli; Bishop, Somer; Blau, Hannah; Bodenstein, David F; Botas, Pablo; Boztug, Kaan; Cady, Jolana; Callahan, Tiffany J; Cameron, Rhiannon; Carbon, Seth J; Castellanos, Francisco; Caufield, J Harry; Chan, Lauren E; Chute, Christopher G; Cruz-Rojo, Jaime; Dahan-Oliel, Noémi; Davids, Jon R; de Dieuleveult, Maud; de Souza, Vinicius; de Vries, Bert B A; de Vries, Esther; DePaulo, J Raymond; Derfalvi, Beata; Dhombres, Ferdinand; Diaz-Byrd, Claudia; Dingemans, Alexander J M; Donadille, Bruno; Duyzend, Michael; Elfeky, Reem; Essaid, Shahim; Fabrizzi, Carolina; Fico, Giovanna; Firth, Helen V; Freudenberg-Hua, Yun; Fullerton, Janice M; Gabriel, Davera L; Gilmour, Kimberly.

Nucleic Acids Res ; 52(D1): D1333-D1346, 2024 Jan 05.

Artigo em Inglês | MEDLINE | ID: mdl-37953324

RESUMO

The Human Phenotype Ontology (HPO) is a widely used resource that comprehensively organizes and defines the phenotypic features of human disease, enabling computational inference and supporting genomic and phenotypic analyses through semantic similarity and machine learning algorithms. The HPO has widespread applications in clinical diagnostics and translational research, including genomic diagnostics, gene-disease discovery, and cohort analytics. In recent years, groups around the world have developed translations of the HPO from English to other languages, and the HPO browser has been internationalized, allowing users to view HPO term labels and in many cases synonyms and definitions in ten languages in addition to English. Since our last report, a total of 2239 new HPO terms and 49235 new HPO annotations were developed, many in collaboration with external groups in the fields of psychiatry, arthrogryposis, immunology and cardiology. The Medical Action Ontology (MAxO) is a new effort to model treatments and other measures taken for clinical management. Finally, the HPO consortium is contributing to efforts to integrate the HPO and the GA4GH Phenopacket Schema into electronic health records (EHRs) with the goal of more standardized and computable integration of rare disease data in EHRs.

Assuntos

Ontologias Biológicas , Humanos , Fenótipo , Genômica , Algoritmos , Doenças Raras

The DataHarmonizer: a tool for faster data harmonization, validation, aggregation and analysis of pathogen genomics contextual information.

Gill, Ivan S; Griffiths, Emma J; Dooley, Damion; Cameron, Rhiannon; Savic Kallesøe, Sarah; John, Nithu Sara; Sehar, Anoosha; Gosal, Gurinder; Alexander, David; Chapel, Madison; Croxen, Matthew A; Delisle, Benjamin; Di Tullio, Rachelle; Gaston, Daniel; Duggan, Ana; Guthrie, Jennifer L; Horsman, Mark; Joshi, Esha; Kearny, Levon; Knox, Natalie; Lau, Lynette; LeBlanc, Jason J; Li, Vincent; Lyons, Pierre; MacKenzie, Keith; McArthur, Andrew G; Panousis, Emily M; Palmer, John; Prystajecky, Natalie; Smith, Kerri N; Tanner, Jennifer; Townend, Christopher; Tyler, Andrea; Van Domselaar, Gary; Hsiao, William W L.

Microb Genom ; 9(1)2023 01.

Artigo em Inglês | MEDLINE | ID: mdl-36748616

RESUMO

Pathogen genomics is a critical tool for public health surveillance, infection control, outbreak investigations as well as research. In order to make use of pathogen genomics data, they must be interpreted using contextual data (metadata). Contextual data include sample metadata, laboratory methods, patient demographics, clinical outcomes and epidemiological information. However, the variability in how contextual information is captured by different authorities and how it is encoded in different databases poses challenges for data interpretation, integration and their use/re-use. The DataHarmonizer is a template-driven spreadsheet application for harmonizing, validating and transforming genomics contextual data into submission-ready formats for public or private repositories. The tool's web browser-based JavaScript environment enables validation and its offline functionality and local installation increases data security. The DataHarmonizer was developed to address the data sharing needs that arose during the COVID-19 pandemic, and was used by members of the Canadian COVID Genomics Network (CanCOGeN) to harmonize SARS-CoV-2 contextual data for national surveillance and for public repository submission. In order to support coordination of international surveillance efforts, we have partnered with the Public Health Alliance for Genomic Epidemiology to also provide a template conforming to its SARS-CoV-2 contextual data specification for use worldwide. Templates are also being developed for One Health and foodborne pathogens. Overall, the DataHarmonizer tool improves the effectiveness and fidelity of contextual data capture as well as its subsequent usability. Harmonization of contextual information across authorities, platforms and systems globally improves interoperability and reusability of data for concerted public health and research initiatives to fight the current pandemic and future public health emergencies. While initially developed for the COVID-19 pandemic, its expansion to other data management applications and pathogens is already underway.

Assuntos

COVID-19 , Humanos , COVID-19/epidemiologia , Pandemias , SARS-CoV-2/genética , Canadá , Genômica/métodos

Future-proofing and maximizing the utility of metadata: The PHA4GE SARS-CoV-2 contextual data specification package.

Griffiths, Emma J; Timme, Ruth E; Mendes, Catarina Inês; Page, Andrew J; Alikhan, Nabil-Fareed; Fornika, Dan; Maguire, Finlay; Campos, Josefina; Park, Daniel; Olawoye, Idowu B; Oluniyi, Paul E; Anderson, Dominique; Christoffels, Alan; da Silva, Anders Gonçalves; Cameron, Rhiannon; Dooley, Damion; Katz, Lee S; Black, Allison; Karsch-Mizrachi, Ilene; Barrett, Tanya; Johnston, Anjanette; Connor, Thomas R; Nicholls, Samuel M; Witney, Adam A; Tyson, Gregory H; Tausch, Simon H; Raphenya, Amogelang R; Alcock, Brian; Aanensen, David M; Hodcroft, Emma; Hsiao, William W L; Vasconcelos, Ana Tereza R; MacCannell, Duncan R.

Gigascience ; 112022 02 16.

Artigo em Inglês | MEDLINE | ID: mdl-35169842

RESUMO

BACKGROUND: The Public Health Alliance for Genomic Epidemiology (PHA4GE) (https://pha4ge.org) is a global coalition that is actively working to establish consensus standards, document and share best practices, improve the availability of critical bioinformatics tools and resources, and advocate for greater openness, interoperability, accessibility, and reproducibility in public health microbial bioinformatics. In the face of the current pandemic, PHA4GE has identified a need for a fit-for-purpose, open-source SARS-CoV-2 contextual data standard. RESULTS: As such, we have developed a SARS-CoV-2 contextual data specification package based on harmonizable, publicly available community standards. The specification can be implemented via a collection template, as well as an array of protocols and tools to support both the harmonization and submission of sequence data and contextual information to public biorepositories. CONCLUSIONS: Well-structured, rich contextual data add value, promote reuse, and enable aggregation and integration of disparate datasets. Adoption of the proposed standard and practices will better enable interoperability between datasets and systems, improve the consistency and utility of generated data, and ultimately facilitate novel insights and discoveries in SARS-CoV-2 and COVID-19. The package is now supported by the NCBI's BioSample database.

Assuntos

COVID-19 , SARS-CoV-2 , Genômica , Humanos , Metadados , Saúde Pública , Reprodutibilidade dos Testes

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA