Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 3 de 3
Filter
Add more filters










Database
Language
Publication year range
1.
BMC Bioinformatics ; 20(Suppl 12): 321, 2019 Jun 20.
Article in English | MEDLINE | ID: mdl-31216989

ABSTRACT

BACKGROUND: Missing values frequently arise in modern biomedical studies due to various reasons, including missing tests or complex profiling technologies for different omics measurements. Missing values can complicate the application of clustering algorithms, whose goals are to group points based on some similarity criterion. A common practice for dealing with missing values in the context of clustering is to first impute the missing values, and then apply the clustering algorithm on the completed data. RESULTS: We consider missing values in the context of optimal clustering, which finds an optimal clustering operator with reference to an underlying random labeled point process (RLPP). We show how the missing-value problem fits neatly into the overall framework of optimal clustering by incorporating the missing value mechanism into the random labeled point process and then marginalizing out the missing-value process. In particular, we demonstrate the proposed framework for the Gaussian model with arbitrary covariance structures. Comprehensive experimental studies on both synthetic and real-world RNA-seq data show the superior performance of the proposed optimal clustering with missing values when compared to various clustering approaches. CONCLUSION: Optimal clustering with missing values obviates the need for imputation-based pre-processing of the data, while at the same time possessing smaller clustering errors.


Subject(s)
Algorithms , Breast Neoplasms/genetics , Cluster Analysis , Computer Simulation , Female , Gene Expression Profiling , Humans , Models, Theoretical , Normal Distribution , Probability
2.
Bioinformatics ; 34(13): i61-i69, 2018 07 01.
Article in English | MEDLINE | ID: mdl-29949981

ABSTRACT

Motivation: High-throughput sequencing technologies, in particular RNA sequencing (RNA-seq), have become the basic practice for genomic studies in biomedical research. In addition to studying genes individually, for example, through differential expression analysis, investigating co-ordinated expression variations of genes may help reveal the underlying cellular mechanisms to derive better understanding and more effective prognosis and intervention strategies. Although there exists a variety of co-expression network based methods to analyze microarray data for this purpose, instead of blindly extending these methods for microarray data that may introduce unnecessary bias, it is crucial to develop methods well adapted to RNA-seq data to identify the functional modules of genes with similar expression patterns. Results: We have developed a fully Bayesian covariate-dependent negative binomial factor analysis (dNBFA) method-dNBFA-for RNA-seq count data, to capture coordinated gene expression changes, while considering effects from covariates reflecting different influencing factors. Unlike existing co-expression network based methods, our proposed model does not require multiple ad-hoc choices on data processing, transformation, as well as co-expression measures and can be directly applied to RNA-seq data. Furthermore, being capable of incorporating covariate information, the proposed method can tackle setups with complex confounding factors in different experiment designs. Finally, the natural model parameterization removes the need for a normalization preprocessing step, as commonly adopted to compensate for the effect of sequencing-depth variations. Efficient Bayesian inference of model parameters is derived by exploiting conditional conjugacy via novel data augmentation techniques. Experimental results on several real-world RNA-seq datasets on complex diseases suggest dNBFA as a powerful tool for discovering the gene modules with significant differential expression and meaningful biological insight. Availability and implementation: dNBFA is implemented in R language and is available at https://github.com/siamakz/dNBFA.


Subject(s)
Gene Expression Profiling/methods , Gene Regulatory Networks , High-Throughput Nucleotide Sequencing/methods , Sequence Analysis, RNA/methods , Software , Autistic Disorder/genetics , Bayes Theorem , Factor Analysis, Statistical , Humans , Neoplasms/genetics
3.
EURASIP J Bioinform Syst Biol ; 2016(1): 5, 2016 Dec.
Article in English | MEDLINE | ID: mdl-26893596

ABSTRACT

BACKGROUND AND MOTIVATIONS: Module identification has been studied extensively in order to gain deeper understanding of complex systems, such as social networks as well as biological networks. Modules are often defined as groups of vertices in these networks that are topologically cohesive with similar interaction patterns with the rest of the vertices. Most of the existing module identification algorithms assume that the given networks are faithfully measured without errors. However, in many real-world applications, for example, when analyzing protein-protein interaction networks from high-throughput profiling techniques, there is significant noise with both false positive and missing links between vertices. In this paper, we propose a new model for more robust module identification by taking advantage of multiple observed networks with significant noise so that signals in multiple networks can be strengthened and help improve the solution quality by combining information from various sources. METHODS: We adopt a hierarchical Bayesian model to integrate multiple noisy snapshots that capture the underlying modular structure of the networks under study. By introducing a latent root assignment matrix and its relations to instantaneous module assignments in all the observed networks to capture the underlying modular structure and combine information across multiple networks, an efficient variational Bayes algorithm can be derived to accurately and robustly identify the underlying modules from multiple noisy networks. RESULTS: Experiments on synthetic and protein-protein interaction data sets show that our proposed model enhances both the accuracy and resolution in detecting cohesive modules, and it is less vulnerable to noise in the observed data. In addition, it shows higher power in predicting missing edges compared to individual-network methods.

SELECTION OF CITATIONS
SEARCH DETAIL
...