Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 4 de 4
Filter
Add more filters










Database
Language
Publication year range
1.
Proc Natl Acad Sci U S A ; 121(12): e2313574121, 2024 Mar 19.
Article in English | MEDLINE | ID: mdl-38478693

ABSTRACT

This study supports the development of predictive bacteriophage (phage) therapy: the concept of phage cocktail selection to treat a bacterial infection based on machine learning (ML) models. For this purpose, ML models were trained on thousands of measured interactions between a panel of phage and sequenced bacterial isolates. The concept was applied to Escherichia coli associated with urinary tract infections. This is an important common infection in humans and companion animals from which multidrug-resistant (MDR) bloodstream infections can originate. The global threat of MDR infection has reinvigorated international efforts into alternatives to antibiotics including phage therapy. E. coli exhibit extensive genome-level variation due to horizontal gene transfer via phage and plasmids. Associated with this, phage selection for E. coli is difficult as individual isolates can exhibit considerable variation in phage susceptibility due to differences in factors important to phage infection including phage receptor profiles and resistance mechanisms. The activity of 31 phage was measured on 314 isolates with growth curves in artificial urine. Random Forest models were built for each phage from bacterial genome features, and the more generalist phage, acting on over 20% of the bacterial population, exhibited F1 scores of >0.6 and could be used to predict phage cocktails effective against previously untested strains. The study demonstrates the potential of predictive ML models which integrate bacterial genomics with phage activity datasets allowing their use on data derived from direct sequencing of clinical samples to inform rapid and effective phage therapy.


Subject(s)
Bacteriophages , Escherichia coli Infections , Phage Therapy , Urinary Tract Infections , Humans , Animals , Escherichia coli/genetics , Escherichia coli Infections/microbiology , Bacteriophages/genetics , Anti-Bacterial Agents/pharmacology , Urinary Tract Infections/drug therapy
2.
Microb Genom ; 9(10)2023 Oct.
Article in English | MEDLINE | ID: mdl-37843883

ABSTRACT

Salmonella enterica is a taxonomically diverse pathogen with over 2600 serovars associated with a wide variety of animal hosts including humans, other mammals, birds and reptiles. Some serovars are host-specific or host-restricted and cause disease in distinct host species, while others, such as serovar S. Typhimurium (STm), are generalists and have the potential to colonize a wide variety of species. However, even within generalist serovars such as STm it is becoming clear that pathovariants exist that differ in tropism and virulence. Identifying the genetic factors underlying host specificity is complex, but the availability of thousands of genome sequences and advances in machine learning have made it possible to build specific host prediction models to aid outbreak control and predict the human pathogenic potential of isolates from animals and other reservoirs. We have advanced this area by building host-association prediction models trained on a wide range of genomic features and compared them with predictions based on nearest-neighbour phylogeny. SNPs, protein variants (PVs), antimicrobial resistance (AMR) profiles and intergenic regions (IGRs) were extracted from 3883 high-quality STm assemblies collected from humans, swine, bovine and poultry in the USA, and used to construct Random Forest (RF) machine learning models. An additional 244 recent STm assemblies from farm animals were used as a test set for further validation. The models based on PVs and IGRs had the best performance in terms of predicting the host of origin of isolates and outperformed nearest-neighbour phylogenetic host prediction as well as models based on SNPs or AMR data. However, the models did not yield reliable predictions when tested with isolates that were phylogenetically distinct from the training set. The IGR and PV models were often able to differentiate human isolates in clusters where the majority of isolates were from a single animal source. Notably, IGRs were the feature with the best performance across multiple models which may be due to IGRs acting as both a representation of their flanking genes, equivalent to PVs, while also capturing genomic regulatory variation, such as altered promoter regions. The IGR and PV models predict that ~45 % of the human infections with STm in the USA originate from bovine, ~40 % from poultry and ~14.5 % from swine, although sequences of isolates from other sources were not used for training. In summary, the research demonstrates a significant gain in accuracy for models with IGRs and PVs as features compared to SNP-based and core genome phylogeny predictions when applied within the existing population structure. This article contains data hosted by Microreact.


Subject(s)
Salmonella Infections, Animal , Salmonella typhimurium , Animals , Cattle , Humans , Swine , Salmonella Infections, Animal/epidemiology , Phylogeny , DNA, Intergenic , Genome, Bacterial , Genomics , Machine Learning , Mammals/genetics
3.
Methods Mol Biol ; 2291: 99-117, 2021.
Article in English | MEDLINE | ID: mdl-33704750

ABSTRACT

Escherichia coli is a species of bacteria that can be present in a wide variety of mammalian hosts and potentially soil environments. E. coli has an open genome and can show considerable diversity in gene content between isolates. It is a reasonable assumption that gene content reflects evolution of strains in particular host environments and therefore can be used to predict the host most likely to be the source of an isolate. An extrapolation of this argument is that strains may also have gene content that favors success in multiple hosts and so the possibility of successful transmission from one host to another, for example, from cattle to human, can also be predicted based on gene content. In this methods chapter, we consider the issue of Shiga toxin (Stx)-producing E. coli (STEC) strains that are present in ruminants as the main host reservoir and for which we know that a subset causes life-threatening infections in humans. We show how the genome sequences of E. coli isolated from both cattle and humans can be used to build a classifier to predict human and cattle host association and how this can be applied to score key STEC serotypes known to be associated with human infection. With the example dataset used, serogroups O157, O26, and O111 show the highest, and O103 and O145 the lowest, predictions for human association. The long-term ambition is to combine such machine learning predictions with phylogeny to predict the zoonotic threat of an isolate based on its whole genome sequence (WGS).


Subject(s)
Escherichia coli Infections/genetics , Genome, Bacterial , Machine Learning , Phylogeny , Serogroup , Shiga-Toxigenic Escherichia coli , Whole Genome Sequencing , Animals , Cattle , Humans , Shiga-Toxigenic Escherichia coli/classification , Shiga-Toxigenic Escherichia coli/genetics
4.
Virus Evol ; 7(1): veaa099, 2021 Jan.
Article in English | MEDLINE | ID: mdl-33505707

ABSTRACT

Modern DNA sequencing has instituted a new era in human cytomegalovirus (HCMV) genomics. A key development has been the ability to determine the genome sequences of HCMV strains directly from clinical material. This involves the application of complex and often non-standardized bioinformatics approaches to analysing data of variable quality in a process that requires substantial manual intervention. To relieve this bottleneck, we have developed GRACy (Genome Reconstruction and Annotation of Cytomegalovirus), an easy-to-use toolkit for analysing HCMV sequence data. GRACy automates and integrates modules for read filtering, genotyping, genome assembly, genome annotation, variant analysis, and data submission. These modules were tested extensively on simulated and experimental data and outperformed generic approaches. GRACy is written in Python and is embedded in a graphical user interface with all required dependencies installed by a single command. It runs on the Linux operating system and is designed to allow the future implementation of a cross-platform version. GRACy is distributed under a GPL 3.0 license and is freely available at https://bioinformatics.cvr.ac.uk/software/ with the manual and a test dataset.

SELECTION OF CITATIONS
SEARCH DETAIL
...