Your browser doesn't support javascript.
PepGM: a probabilistic graphical model for taxonomic inference of viral proteome samples with associated confidence scores.
Holstein, Tanja; Kistner, Franziska; Martens, Lennart; Muth, Thilo.
  • Holstein T; Section S.3 eScience, Federal Institute for Materials Research and Testing (BAM), 12205, Berlin, Germany.
  • Kistner F; VIB-Ugent Center for Medical Biotechnology, 9052, Zwijnaarde, Belgium.
  • Martens L; Department of Biomolecular Medicine, Ghent University, 9000, Ghent, Belgium.
  • Muth T; Section S.3 eScience, Federal Institute for Materials Research and Testing (BAM), 12205, Berlin, Germany.
Bioinformatics ; 39(5)2023 05 04.
Article in English | MEDLINE | ID: covidwho-2315402
ABSTRACT
MOTIVATION Inferring taxonomy in mass spectrometry-based shotgun proteomics is a complex task. In multi-species or viral samples of unknown taxonomic origin, the presence of proteins and corresponding taxa must be inferred from a list of identified peptides, which is often complicated by protein homology many proteins do not only share peptides within a taxon but also between taxa. However, the correct taxonomic inference is crucial when identifying different viral strains with high-sequence homology-considering, e.g., the different epidemiological characteristics of the various strains of severe acute respiratory syndrome-related coronavirus-2. Additionally, many viruses mutate frequently, further complicating the correct identification of viral proteomic samples.

RESULTS:

We present PepGM, a probabilistic graphical model for the taxonomic assignment of virus proteomic samples with strain-level resolution and associated confidence scores. PepGM combines the results of a standard proteomic database search algorithm with belief propagation to calculate the marginal distributions, and thus confidence scores, for potential taxonomic assignments. We demonstrate the performance of PepGM using several publicly available virus proteomic datasets, showing its strain-level resolution performance. In two out of eight cases, the taxonomic assignments were only correct on the species level, which PepGM clearly indicates by lower confidence scores. AVAILABILITY AND IMPLEMENTATION PepGM is written in Python and embedded into a Snakemake workflow. It is available at https//github.com/BAMeScience/PepGM.
Subject(s)

Full text: Available Collection: International databases Database: MEDLINE Main subject: Viruses / COVID-19 Type of study: Prognostic study Limits: Humans Language: English Journal subject: Medical Informatics Year: 2023 Document Type: Article Affiliation country: Bioinformatics

Similar

MEDLINE

...
LILACS

LIS


Full text: Available Collection: International databases Database: MEDLINE Main subject: Viruses / COVID-19 Type of study: Prognostic study Limits: Humans Language: English Journal subject: Medical Informatics Year: 2023 Document Type: Article Affiliation country: Bioinformatics