Search | VHL Regional Portal

Cellular State Transformations Using Deep Learning for Precision Medicine Applications.

Targonski, Colin; Bender, M Reed; Shealy, Benjamin T; Husain, Benafsh; Paseman, Bill; Smith, Melissa C; Feltus, F Alex.

Patterns (N Y) ; 1(6): 100087, 2020 Sep 11.

Article in English | MEDLINE | ID: mdl-33205131

ABSTRACT

We introduce the Transcriptome State Perturbation Generator (TSPG) as a novel deep-learning method to identify changes in genomic expression that occur between tissue states using generative adversarial networks. TSPG learns the transcriptome perturbations from RNA-sequencing data required to shift from a source to a target class. We apply TSPG as an effective method of detecting biologically relevant alternate expression patterns between normal and tumor human tissue samples. We demonstrate that the application of TSPG to expression data obtained from a biopsy sample of a patient's kidney cancer can identify patient-specific differentially expressed genes between their individual tumor sample and a target class of healthy kidney gene expression. By utilizing TSPG in a precision medicine application in which the patient sample is not replicated (i.e., n = 1 ), we present a novel technique of determining significant transcriptional aberrations that can be used to help identify potential targeted therapies.

A generalized deep learning approach for local structure identification in molecular simulations.

DeFever, Ryan S; Targonski, Colin; Hall, Steven W; Smith, Melissa C; Sarupria, Sapna.

Chem Sci ; 10(32): 7503-7515, 2019 Aug 28.

Article in English | MEDLINE | ID: mdl-31768235

ABSTRACT

Identifying local structure in molecular simulations is of utmost importance. The most common existing approach to identify local structure is to calculate some geometrical quantity referred to as an order parameter. In simple cases order parameters are physically intuitive and trivial to develop (e.g., ion-pair distance), however in most cases, order parameter development becomes a much more difficult endeavor (e.g., crystal structure identification). Using ideas from computer vision, we adapt a specific type of neural network called a PointNet to identify local structural environments in molecular simulations. A primary challenge in applying machine learning techniques to simulation is selecting the appropriate input features. This challenge is system-specific and requires significant human input and intuition. In contrast, our approach is a generic framework that requires no system-specific feature engineering and operates on the raw output of the simulations, i.e., atomic positions. We demonstrate the method on crystal structure identification in Lennard-Jones (four different phases), water (eight different phases), and mesophase (six different phases) systems. The method achieves as high as 99.5% accuracy in crystal structure identification. The method is applicable to heterogeneous nucleation and it can even predict the crystal phases of atoms near external interfaces. We demonstrate the versatility of our approach by using our method to identify surface hydrophobicity based solely upon positions and orientations of surrounding water molecules. Our results suggest the approach will be broadly applicable to many types of local structure in simulations.

Uncovering biomarker genes with enriched classification potential from Hallmark gene sets.

Targonski, Colin A; Shearer, Courtney A; Shealy, Benjamin T; Smith, Melissa C; Feltus, F Alex.

Sci Rep ; 9(1): 9747, 2019 07 05.

Article in English | MEDLINE | ID: mdl-31278367

ABSTRACT

Given the complex relationship between gene expression and phenotypic outcomes, computationally efficient approaches are needed to sift through large high-dimensional datasets in order to identify biologically relevant biomarkers. In this report, we describe a method of identifying the most salient biomarker genes in a dataset, which we call "candidate genes", by evaluating the ability of gene combinations to classify samples from a dataset, which we call "classification potential". Our algorithm, Gene Oracle, uses a neural network to test user defined gene sets for polygenic classification potential and then uses a combinatorial approach to further decompose selected gene sets into candidate and non-candidate biomarker genes. We tested this algorithm on curated gene sets from the Molecular Signatures Database (MSigDB) quantified in RNAseq gene expression matrices obtained from The Cancer Genome Atlas (TCGA) and Genotype-Tissue Expression (GTEx) data repositories. First, we identified which MSigDB Hallmark subsets have significant classification potential for both the TCGA and GTEx datasets. Then, we identified the most discriminatory candidate biomarker genes in each Hallmark gene set and provide evidence that the improved biomarker potential of these genes may be due to reduced functional complexity.

Subject(s)

Biomarkers, Tumor/genetics , Genetic Association Studies , Genetic Predisposition to Disease , Oncogenes , Algorithms , Computational Biology/methods , Databases, Genetic , Gene Expression Profiling , Gene Ontology , Humans

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL