Search | VHL Regional Portal

A computational framework to explore large-scale biosynthetic diversity.

Navarro-Muñoz, Jorge C; Selem-Mojica, Nelly; Mullowney, Michael W; Kautsar, Satria A; Tryon, James H; Parkinson, Elizabeth I; De Los Santos, Emmanuel L C; Yeong, Marley; Cruz-Morales, Pablo; Abubucker, Sahar; Roeters, Arne; Lokhorst, Wouter; Fernandez-Guerra, Antonio; Cappelini, Luciana Teresa Dias; Goering, Anthony W; Thomson, Regan J; Metcalf, William W; Kelleher, Neil L; Barona-Gomez, Francisco; Medema, Marnix H.

Nat Chem Biol ; 16(1): 60-68, 2020 01.

Article in English | MEDLINE | ID: mdl-31768033

ABSTRACT

Genome mining has become a key technology to exploit natural product diversity. Although initially performed on a single-genome basis, the process is now being scaled up to mine entire genera, strain collections and microbiomes. However, no bioinformatic framework is currently available for effectively analyzing datasets of this size and complexity. In the present study, a streamlined computational workflow is provided, consisting of two new software tools: the 'biosynthetic gene similarity clustering and prospecting engine' (BiG-SCAPE), which facilitates fast and interactive sequence similarity network analysis of biosynthetic gene clusters and gene cluster families; and the 'core analysis of syntenic orthologues to prioritize natural product gene clusters' (CORASON), which elucidates phylogenetic relationships within and across these families. BiG-SCAPE is validated by correlating its output to metabolomic data across 363 actinobacterial strains and the discovery potential of CORASON is demonstrated by comprehensively mapping biosynthetic diversity across a range of detoxin/rimosamide-related gene cluster families, culminating in the characterization of seven detoxin analogues.

Subject(s)

Actinobacteria/genetics , Biosynthetic Pathways/genetics , Computational Biology/methods , Genome, Bacterial , Algorithms , Biological Products , Cluster Analysis , Data Mining/methods , Genomics , Metabolomics , Microbiota , Multigene Family , Phylogeny , Reproducibility of Results , Software

HC StratoMineR: A Web-Based Tool for the Rapid Analysis of High-Content Datasets.

Omta, Wienand A; van Heesbeen, Roy G; Pagliero, Romina J; van der Velden, Lieke M; Lelieveld, Daphne; Nellen, Mehdi; Kramer, Maik; Yeong, Marley; Saeidi, Amir M; Medema, Rene H; Spruit, Marco; Brinkkemper, Sjaak; Klumperman, Judith; Egan, David A.

Assay Drug Dev Technol ; 14(8): 439-452, 2016 10.

Article in English | MEDLINE | ID: mdl-27636821

ABSTRACT

High-content screening (HCS) can generate large multidimensional datasets and when aligned with the appropriate data mining tools, it can yield valuable insights into the mechanism of action of bioactive molecules. However, easy-to-use data mining tools are not widely available, with the result that these datasets are frequently underutilized. Here, we present HC StratoMineR, a web-based tool for high-content data analysis. It is a decision-supportive platform that guides even non-expert users through a high-content data analysis workflow. HC StratoMineR is built by using My Structured Query Language for storage and querying, PHP: Hypertext Preprocessor as the main programming language, and jQuery for additional user interface functionality. R is used for statistical calculations, logic and data visualizations. Furthermore, C++ and graphical processor unit power is diffusely embedded in R by using the rcpp and rpud libraries for operations that are computationally highly intensive. We show that we can use HC StratoMineR for the analysis of multivariate data from a high-content siRNA knock-down screen and a small-molecule screen. It can be used to rapidly filter out undesirable data; to select relevant data; and to perform quality control, data reduction, data exploration, morphological hit picking, and data clustering. Our results demonstrate that HC StratoMineR can be used to functionally categorize HCS hits and, thus, provide valuable information for hit prioritization.

Subject(s)

Data Mining/methods , Databases, Factual/statistics & numerical data , Internet , Statistics as Topic/methods , Cluster Analysis , HeLa Cells , Humans , MCF-7 Cells

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL