RESUMO
This paper demonstrates the ability of mach- ine learning approaches to identify a few genes among the 23,398 genes of the human genome to experiment on in the laboratory to establish new drug mechanisms. As a case study, this paper uses MDA-MB-231 breast cancer single-cells treated with the antidiabetic drug metformin. We show that mixture-model-based unsupervised methods with validation from hierarchical clustering can identify single-cell subpopulations (clusters). These clusters are characterized by a small set of genes (1% of the genome) that have significant differential expression across the clusters and are also highly correlated with pathways with anticancer effects driven by metformin. Among the identified small set of genes associated with reduced breast cancer incidence, laboratory experiments on one of the genes, CDC42, showed that its downregulation by metformin inhibited cancer cell migration and proliferation, thus validating the ability of machine learning approaches to identify biologically relevant candidates for laboratory experiments. Given the large size of the human genome and limitations in cost and skilled resources, the broader impact of this work in identifying a small set of differentially expressed genes after drug treatment lies in augmenting the drug-disease knowledge of pharmacogenomics experts in laboratory investigations, which could help establish novel biological mechanisms associated with drug response in diseases beyond breast cancer.
Assuntos
Antineoplásicos/farmacologia , Regulação Neoplásica da Expressão Gênica/efeitos dos fármacos , Análise de Célula Única/métodos , Neoplasias de Mama Triplo Negativas , Aprendizado de Máquina não Supervisionado , Linhagem Celular Tumoral , Análise por Conglomerados , Feminino , Perfilação da Expressão Gênica/métodos , Genômica/métodos , Humanos , Metformina/farmacologia , Neoplasias de Mama Triplo Negativas/genética , Neoplasias de Mama Triplo Negativas/metabolismoRESUMO
We demonstrate that model-based unsupervised learning can uniquely discriminate single-cell subpopulations by their gene expression distributions, which in turn allow us to identify specific genes for focused functional studies. This method was applied to MDA-MB-231 breast cancer cells treated with the antidiabetic drug metformin, which is being repurposed for treatment of triple-negative breast cancer. Unsupervised learning identified a cluster of metformin-treated cells characterized by a significant suppression of 230 genes (p-value < 2E-16). This analysis corroborates known studies of metformin action: a) pathway analysis indicated known mechanisms related to metformin action, including the citric acid (TCA) cycle, oxidative phosphorylation, and mitochondrial dysfunction (p-value < 1E-9); b) 70% of these 230 genes were functionally implicated in metformin response; c) among remaining lesser functionally-studied genes for metformin-response was CDC42, down-regulated in breast cancer treated with metformin. However, CDC42's mechanisms in metformin response remained unclear. Our functional studies showed that CDC42 was involved in metformin-induced inhibition of cell proliferation and cell migration mediated through an AMPK-independent mechanism. Our results points to 230 genes that might serve as metformin response signatures, which needs to be tested in patients treated with metformin and, further investigation of CDC42 and AMPK-independence's role in metformin's anticancer mechanisms.