Search | VHL Regional Portal

Combining Supervised and Unsupervised Machine Learning Methods for Phenotypic Functional Genomics Screening.

Omta, Wienand A; van Heesbeen, Roy G; Shen, Ian; de Nobel, Jacob; Robers, Desmond; van der Velden, Lieke M; Medema, René H; Siebes, Arno P J M; Feelders, Ad J; Brinkkemper, Sjaak; Klumperman, Judith S; Spruit, Marco René; Brinkhuis, Matthieu J S; Egan, David A.

SLAS Discov ; 25(6): 655-664, 2020 Jul.

Article in English | MEDLINE | ID: mdl-32400262

ABSTRACT

There has been an increase in the use of machine learning and artificial intelligence (AI) for the analysis of image-based cellular screens. The accuracy of these analyses, however, is greatly dependent on the quality of the training sets used for building the machine learning models. We propose that unsupervised exploratory methods should first be applied to the data set to gain a better insight into the quality of the data. This improves the selection and labeling of data for creating training sets before the application of machine learning. We demonstrate this using a high-content genome-wide small interfering RNA screen. We perform an unsupervised exploratory data analysis to facilitate the identification of four robust phenotypes, which we subsequently use as a training set for building a high-quality random forest machine learning model to differentiate four phenotypes with an accuracy of 91.1% and a kappa of 0.85. Our approach enhanced our ability to extract new knowledge from the screen when compared with the use of unsupervised methods alone.

Subject(s)

Genomics , High-Throughput Screening Assays/methods , Supervised Machine Learning , Unsupervised Machine Learning , Genome, Human/genetics , Humans , Phenotype , RNA, Small Interfering/genetics

Improving Comprehension Efficiency of High Content Screening Data Through Interactive Visualizations.

Omta, Wienand A; Nobel, Jacob de; Klumperman, Judith; Egan, David A; Spruit, Marco R; Brinkhuis, Matthieu J S.

Assay Drug Dev Technol ; 15(6): 247-256, 2017.

Article in English | MEDLINE | ID: mdl-28837357

ABSTRACT

In this study, an experiment is conducted to measure the performance in speed and accuracy of interactive visualizations. A platform for interactive data visualizations was implemented using Django, D3, and Angular. Using this platform, a questionnaire was designed to measure a difference in performance between interactive and noninteractive data visualizations. In this questionnaire consisting of 12 questions, participants were given tasks in which they had to identify trends or patterns. Other tasks were directed at comparing and selecting algorithms with a certain outcome based on visualizations. All tasks were performed on high content screening data sets with the help of visualizations. The difference in time to carry out tasks and accuracy of performance was measured between a group viewing interactive visualizations and a group viewing noninteractive visualizations. The study shows a significant advantage in time and accuracy in the group that used interactive visualizations over the group that used noninteractive visualizations. In tasks comparing results of different algorithms, a significant decrease in time was observed in using interactive visualizations over noninteractive visualizations.

Subject(s)

Electronic Data Processing , High-Throughput Screening Assays , Algorithms , Surveys and Questionnaires

HC StratoMineR: A Web-Based Tool for the Rapid Analysis of High-Content Datasets.

Omta, Wienand A; van Heesbeen, Roy G; Pagliero, Romina J; van der Velden, Lieke M; Lelieveld, Daphne; Nellen, Mehdi; Kramer, Maik; Yeong, Marley; Saeidi, Amir M; Medema, Rene H; Spruit, Marco; Brinkkemper, Sjaak; Klumperman, Judith; Egan, David A.

Assay Drug Dev Technol ; 14(8): 439-452, 2016 10.

Article in English | MEDLINE | ID: mdl-27636821

ABSTRACT

High-content screening (HCS) can generate large multidimensional datasets and when aligned with the appropriate data mining tools, it can yield valuable insights into the mechanism of action of bioactive molecules. However, easy-to-use data mining tools are not widely available, with the result that these datasets are frequently underutilized. Here, we present HC StratoMineR, a web-based tool for high-content data analysis. It is a decision-supportive platform that guides even non-expert users through a high-content data analysis workflow. HC StratoMineR is built by using My Structured Query Language for storage and querying, PHP: Hypertext Preprocessor as the main programming language, and jQuery for additional user interface functionality. R is used for statistical calculations, logic and data visualizations. Furthermore, C++ and graphical processor unit power is diffusely embedded in R by using the rcpp and rpud libraries for operations that are computationally highly intensive. We show that we can use HC StratoMineR for the analysis of multivariate data from a high-content siRNA knock-down screen and a small-molecule screen. It can be used to rapidly filter out undesirable data; to select relevant data; and to perform quality control, data reduction, data exploration, morphological hit picking, and data clustering. Our results demonstrate that HC StratoMineR can be used to functionally categorize HCS hits and, thus, provide valuable information for hit prioritization.

Subject(s)

Data Mining/methods , Databases, Factual/statistics & numerical data , Internet , Statistics as Topic/methods , Cluster Analysis , HeLa Cells , Humans , MCF-7 Cells

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL