Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters










Database
Language
Publication year range
1.
Front Big Data ; 5: 941451, 2022.
Article in English | MEDLINE | ID: mdl-36172548

ABSTRACT

Recent years have seen an increase in the application of machine learning to the analysis of physical and biological systems, including cancer progression. A fundamental downside to these tools is that their complexity and nonlinearity makes it almost impossible to establish a deterministic, a priori relationship between their input and output, and thus their predictions are not wholly accountable. We begin with a series of proofs establishing that this holds even for the simplest possible model of a neural network; the effects of specific loss functions are explored more fully in Appendices. We return to first principles and consider how to construct a physics-inspired model of tumor growth without resorting to stochastic gradient descent or artificial nonlinearities. We derive an algorithm which explores the space of possible parameters in a model of tumor growth and identifies candidate equations much faster than a simulated annealing approach. We test this algorithm on synthetic tumor-growth trajectories and show that it can efficiently and reliably narrow down the area of parameter space where the correct values are located. This approach has the potential to greatly improve the speed and reliability with which patient-specific models of cancer growth can be identified in a clinical setting.

2.
Bioinformatics ; 38(5): 1277-1286, 2022 02 07.
Article in English | MEDLINE | ID: mdl-34864884

ABSTRACT

MOTIVATION: Single-cell RNA sequencing allows high-resolution views of individual cells for libraries of up to millions of samples, thus motivating the use of deep learning for analysis. In this study, we introduce the use of graph neural networks for the unsupervised exploration of scRNA-seq data by developing a variational graph autoencoder architecture with graph attention layers that operates directly on the connectivity between cells, focusing on dimensionality reduction and clustering. With the help of several case studies, we show that our model, named CellVGAE, can be effectively used for exploratory analysis even on challenging datasets, by extracting meaningful features from the data and providing the means to visualize and interpret different aspects of the model. RESULTS: We show that CellVGAE is more interpretable than existing scRNA-seq variational architectures by analysing the graph attention coefficients. By drawing parallels with other scRNA-seq studies on interpretability, we assess the validity of the relationships modelled by attention, and furthermore, we show that CellVGAE can intrinsically capture information such as pseudotime and NF-ĸB activation dynamics, the latter being a property that is not generally shared by existing neural alternatives. We then evaluate the dimensionality reduction and clustering performance on 9 difficult and well-annotated datasets by comparing with three leading neural and non-neural techniques, concluding that CellVGAE outperforms competing methods. Finally, we report a decrease in training times of up to × 20 on a dataset of 1.3 million cells compared to existing deep learning architectures. AVAILABILITYAND IMPLEMENTATION: The CellVGAE code is available at https://github.com/davidbuterez/CellVGAE. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Gene Expression Profiling , Single-Cell Gene Expression Analysis , Gene Expression Profiling/methods , Sequence Analysis, RNA/methods , Workflow , Single-Cell Analysis/methods , Cluster Analysis
3.
Bioinformatics ; 38(3): 730-737, 2022 01 12.
Article in English | MEDLINE | ID: mdl-33471074

ABSTRACT

MOTIVATION: High-throughput gene expression can be used to address a wide range of fundamental biological problems, but datasets of an appropriate size are often unavailable. Moreover, existing transcriptomics simulators have been criticized because they fail to emulate key properties of gene expression data. In this article, we develop a method based on a conditional generative adversarial network to generate realistic transcriptomics data for Escherichia coli and humans. We assess the performance of our approach across several tissues and cancer-types. RESULTS: We show that our model preserves several gene expression properties significantly better than widely used simulators, such as SynTReN or GeneNetWeaver. The synthetic data preserve tissue- and cancer-specific properties of transcriptomics data. Moreover, it exhibits real gene clusters and ontologies both at local and global scales, suggesting that the model learns to approximate the gene expression manifold in a biologically meaningful way. AVAILABILITY AND IMPLEMENTATION: Code is available at: https://github.com/rvinas/adversarial-gene-expression. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Escherichia coli , Gene Expression Profiling , Humans , Gene Expression Profiling/methods , Gene Expression
4.
Sci Rep ; 10(1): 9790, 2020 06 17.
Article in English | MEDLINE | ID: mdl-32555334

ABSTRACT

Using machine learning techniques to build representations from biomedical data can help us understand the latent biological mechanism of action and lead to important discoveries. Recent developments in single-cell RNA-sequencing protocols have allowed measuring gene expression for individual cells in a population, thus opening up the possibility of finding answers to biomedical questions about cell differentiation. In this paper, we explore unsupervised generative neural methods, based on the variational autoencoder, that can model cell differentiation by building meaningful representations from the high dimensional and complex gene expression data. We use disentanglement methods based on information theory to improve the data representation and achieve better separation of the biological factors of variation in the gene expression data. In addition, we use a graph autoencoder consisting of graph convolutional layers to predict relationships between single-cells. Based on these models, we develop a computational framework that consists of methods for identifying the cell types in the dataset, finding driver genes for the differentiation process and obtaining a better understanding of relationships between cells. We illustrate our methods on datasets from multiple species and also from different sequencing technologies.


Subject(s)
Cell Differentiation , Machine Learning , Models, Biological , Animals , Datasets as Topic , Gene Expression , Humans , Models, Statistical , RNA-Seq
5.
Front Genet ; 10: 1205, 2019.
Article in English | MEDLINE | ID: mdl-31921281

ABSTRACT

International initiatives such as the Molecular Taxonomy of Breast Cancer International Consortium are collecting multiple data sets at different genome-scales with the aim to identify novel cancer bio-markers and predict patient survival. To analyze such data, several machine learning, bioinformatics, and statistical methods have been applied, among them neural networks such as autoencoders. Although these models provide a good statistical learning framework to analyze multi-omic and/or clinical data, there is a distinct lack of work on how to integrate diverse patient data and identify the optimal design best suited to the available data.In this paper, we investigate several autoencoder architectures that integrate a variety of cancer patient data types (e.g., multi-omics and clinical data). We perform extensive analyses of these approaches and provide a clear methodological and computational framework for designing systems that enable clinicians to investigate cancer traits and translate the results into clinical applications. We demonstrate how these networks can be designed, built, and, in particular, applied to tasks of integrative analyses of heterogeneous breast cancer data. The results show that these approaches yield relevant data representations that, in turn, lead to accurate and stable diagnosis.

SELECTION OF CITATIONS
SEARCH DETAIL
...