Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 4 de 4
Filter
Add more filters










Database
Language
Publication year range
1.
Bioinformatics ; 37(19): 3212-3219, 2021 Oct 11.
Article in English | MEDLINE | ID: mdl-33822889

ABSTRACT

MOTIVATION: When learning to subtype complex disease based on next-generation sequencing data, the amount of available data is often limited. Recent works have tried to leverage data from other domains to design better predictors in the target domain of interest with varying degrees of success. But they are either limited to the cases requiring the outcome label correspondence across domains or cannot leverage the label information at all. Moreover, the existing methods cannot usually benefit from other information available a priori such as gene interaction networks. RESULTS: In this article, we develop a generative optimal Bayesian supervised domain adaptation (OBSDA) model that can integrate RNA sequencing (RNA-Seq) data from different domains along with their labels for improving prediction accuracy in the target domain. Our model can be applied in cases where different domains share the same labels or have different ones. OBSDA is based on a hierarchical Bayesian negative binomial model with parameter factorization, for which the optimal predictor can be derived by marginalization of likelihood over the posterior of the parameters. We first provide an efficient Gibbs sampler for parameter inference in OBSDA. Then, we leverage the gene-gene network prior information and construct an informed and flexible variational family to infer the posterior distributions of model parameters. Comprehensive experiments on real-world RNA-Seq data demonstrate the superior performance of OBSDA, in terms of accuracy in identifying cancer subtypes by utilizing data from different domains. Moreover, we show that by taking advantage of the prior network information we can further improve the performance. AVAILABILITY AND IMPLEMENTATION: The source code for implementations of OBSDA and SI-OBSDA are available at the following link. https://github.com/SHBLK/BSDA. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

2.
BMC Bioinformatics ; 20(Suppl 12): 321, 2019 Jun 20.
Article in English | MEDLINE | ID: mdl-31216989

ABSTRACT

BACKGROUND: Missing values frequently arise in modern biomedical studies due to various reasons, including missing tests or complex profiling technologies for different omics measurements. Missing values can complicate the application of clustering algorithms, whose goals are to group points based on some similarity criterion. A common practice for dealing with missing values in the context of clustering is to first impute the missing values, and then apply the clustering algorithm on the completed data. RESULTS: We consider missing values in the context of optimal clustering, which finds an optimal clustering operator with reference to an underlying random labeled point process (RLPP). We show how the missing-value problem fits neatly into the overall framework of optimal clustering by incorporating the missing value mechanism into the random labeled point process and then marginalizing out the missing-value process. In particular, we demonstrate the proposed framework for the Gaussian model with arbitrary covariance structures. Comprehensive experimental studies on both synthetic and real-world RNA-seq data show the superior performance of the proposed optimal clustering with missing values when compared to various clustering approaches. CONCLUSION: Optimal clustering with missing values obviates the need for imputation-based pre-processing of the data, while at the same time possessing smaller clustering errors.


Subject(s)
Algorithms , Breast Neoplasms/genetics , Cluster Analysis , Computer Simulation , Female , Gene Expression Profiling , Humans , Models, Theoretical , Normal Distribution , Probability
3.
Article in English | MEDLINE | ID: mdl-29990066

ABSTRACT

Gene-expression-based classification and regression are major concerns in translational genomics. If the feature-label distribution is known, then an optimal classifier can be derived. If the predictor-target distribution is known, then an optimal regression function can be derived. In practice, neither is known, data must be employed, and, for small samples, prior knowledge concerning the feature-label or predictor-target distribution can be used in the learning process. Optimal Bayesian classification and optimal Bayesian regression provide optimality under uncertainty. With optimal Bayesian classification (or regression), uncertainty is treated directly on the feature-label (or predictor-target) distribution. The fundamental engineering problem is prior construction. The Regularized Expected Mean Log-Likelihood Prior (REMLP) utilizes pathway information and provides viable priors for the feature-label distribution, assuming that the training data contain labels. In practice, the labels may not be observed. This paper extends the REMLP methodology to a Gaussian mixture model (GMM) when the labels are unknown. Prior construction bundled with prior update via Bayesian sampling results in Monte Carlo approximations to the optimal Bayesian regression function and optimal Bayesian classifier. Simulations demonstrate that the GMM REMLP prior yields better performance than the EM algorithm for small data sets. We apply it to phenotype classification when the prior knowledge consists of colon cancer pathways.


Subject(s)
Gene Expression Profiling/methods , Genomics/methods , Models, Statistical , Algorithms , Bayes Theorem , Colonic Neoplasms/genetics , Databases, Genetic , Humans , Normal Distribution
4.
BMC Bioinformatics ; 18(Suppl 14): 552, 2017 12 28.
Article in English | MEDLINE | ID: mdl-29297278

ABSTRACT

BACKGROUND: Phenotypic classification is problematic because small samples are ubiquitous; and, for these, use of prior knowledge is critical. If knowledge concerning the feature-label distribution - for instance, genetic pathways - is available, then it can be used in learning. Optimal Bayesian classification provides optimal classification under model uncertainty. It differs from classical Bayesian methods in which a classification model is assumed and prior distributions are placed on model parameters. With optimal Bayesian classification, uncertainty is treated directly on the feature-label distribution, which assures full utilization of prior knowledge and is guaranteed to outperform classical methods. RESULTS: The salient problem confronting optimal Bayesian classification is prior construction. In this paper, we propose a new prior construction methodology based on a general framework of constraints in the form of conditional probability statements. We call this prior the maximal knowledge-driven information prior (MKDIP). The new constraint framework is more flexible than our previous methods as it naturally handles the potential inconsistency in archived regulatory relationships and conditioning can be augmented by other knowledge, such as population statistics. We also extend the application of prior construction to a multinomial mixture model when labels are unknown, which often occurs in practice. The performance of the proposed methods is examined on two important pathway families, the mammalian cell-cycle and a set of p53-related pathways, and also on a publicly available gene expression dataset of non-small cell lung cancer when combined with the existing prior knowledge on relevant signaling pathways. CONCLUSION: The new proposed general prior construction framework extends the prior construction methodology to a more flexible framework that results in better inference when proper prior knowledge exists. Moreover, the extension of optimal Bayesian classification to multinomial mixtures where data sets are both small and unlabeled, enables superior classifier design using small, unstructured data sets. We have demonstrated the effectiveness of our approach using pathway information and available knowledge of gene regulating functions; however, the underlying theory can be applied to a wide variety of knowledge types, and other applications when there are small samples.


Subject(s)
Algorithms , Animals , Bayes Theorem , Carcinoma, Non-Small-Cell Lung/genetics , Cell Cycle , Entropy , Humans , Information Theory , Lung Neoplasms/genetics , Mammals/metabolism , Probability , Signal Transduction , Tumor Suppressor Protein p53/metabolism
SELECTION OF CITATIONS
SEARCH DETAIL
...