Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 7 de 7
Filter
Add more filters










Database
Language
Publication year range
1.
Brief Bioinform ; 24(1)2023 01 19.
Article in English | MEDLINE | ID: mdl-36445207

ABSTRACT

Driven by multi-omics data, some multi-view clustering algorithms have been successfully applied to cancer subtypes prediction, aiming to identify subtypes with biometric differences in the same cancer, thereby improving the clinical prognosis of patients and designing personalized treatment plan. Due to the fact that the number of patients in omics data is much smaller than the number of genes, multi-view spectral clustering based on similarity learning has been widely developed. However, these algorithms still suffer some problems, such as over-reliance on the quality of pre-defined similarity matrices for clustering results, inability to reasonably handle noise and redundant information in high-dimensional omics data, ignoring complementary information between omics data, etc. This paper proposes multi-view spectral clustering with latent representation learning (MSCLRL) method to alleviate the above problems. First, MSCLRL generates a corresponding low-dimensional latent representation for each omics data, which can effectively retain the unique information of each omics and improve the robustness and accuracy of the similarity matrix. Second, the obtained latent representations are assigned appropriate weights by MSCLRL, and global similarity learning is performed to generate an integrated similarity matrix. Third, the integrated similarity matrix is used to feed back and update the low-dimensional representation of each omics. Finally, the final integrated similarity matrix is used for clustering. In 10 benchmark multi-omics datasets and 2 separate cancer case studies, the experiments confirmed that the proposed method obtained statistically and biologically meaningful cancer subtypes.


Subject(s)
Multiomics , Neoplasms , Humans , Algorithms , Neoplasms/genetics , Cluster Analysis
2.
Front Genet ; 13: 1032768, 2022.
Article in English | MEDLINE | ID: mdl-36685873

ABSTRACT

Integrating multi-omics data for cancer subtype recognition is an important task in bioinformatics. Recently, deep learning has been applied to recognize the subtype of cancers. However, existing studies almost integrate the multi-omics data simply by concatenation as the single data and then learn a latent low-dimensional representation through a deep learning model, which did not consider the distribution differently of omics data. Moreover, these methods ignore the relationship of samples. To tackle these problems, we proposed SADLN: A self-attention based deep learning network of integrating multi-omics data for cancer subtype recognition. SADLN combined encoder, self-attention, decoder, and discriminator into a unified framework, which can not only integrate multi-omics data but also adaptively model the sample's relationship for learning an accurately latent low-dimensional representation. With the integrated representation learned from the network, SADLN used Gaussian Mixture Model to identify cancer subtypes. Experiments on ten cancer datasets of TCGA demonstrated the advantages of SADLN compared to ten methods. The Self-Attention Based Deep Learning Network (SADLN) is an effective method of integrating multi-omics data for cancer subtype recognition.

3.
Comput Intell Neurosci ; 2021: 9990297, 2021.
Article in English | MEDLINE | ID: mdl-34925501

ABSTRACT

Clustering of tumor samples can help identify cancer types and discover new cancer subtypes, which is essential for effective cancer treatment. Although many traditional clustering methods have been proposed for tumor sample clustering, advanced algorithms with better performance are still needed. Low-rank subspace clustering is a popular algorithm in recent years. In this paper, we propose a novel one-step robust low-rank subspace segmentation method (ORLRS) for clustering the tumor sample. For a gene expression data set, we seek its lowest rank representation matrix and the noise matrix. By imposing the discrete constraint on the low-rank matrix, without performing spectral clustering, ORLRS learns the cluster indicators of subspaces directly, i.e., performing the clustering task in one step. To improve the robustness of the method, capped norm is adopted to remove the extreme data outliers in the noise matrix. Furthermore, we conduct an efficient solution to solve the problem of ORLRS. Experiments on several tumor gene expression data demonstrate the effectiveness of ORLRS.


Subject(s)
Algorithms , Neoplasms , Cluster Analysis , Gene Expression , Humans
4.
Front Genet ; 12: 718915, 2021.
Article in English | MEDLINE | ID: mdl-34552619

ABSTRACT

It is a vital task to design an integrated machine learning model to discover cancer subtypes and understand the heterogeneity of cancer based on multiple omics data. In recent years, some multi-view clustering algorithms have been proposed and applied to the prediction of cancer subtypes. Among them, the multi-view clustering methods based on graph learning are widely concerned. These multi-view approaches usually have one or more of the following problems. Many multi-view algorithms use the original omics data matrix to construct the similarity matrix and ignore the learning of the similarity matrix. They separate the data clustering process from the graph learning process, resulting in a highly dependent clustering performance on the predefined graph. In the process of graph fusion, these methods simply take the average value of the affinity graph of multiple views to represent the result of the fusion graph, and the rich heterogeneous information is not fully utilized. To solve the above problems, in this paper, a Multi-view Spectral Clustering Based on Multi-smooth Representation Fusion (MRF-MSC) method was proposed. Firstly, MRF-MSC constructs a smooth representation for each data type, which can be viewed as a sample (patient) similarity matrix. The smooth representation can explicitly enhance the grouping effect. Secondly, MRF-MSC integrates the smooth representation of multiple omics data to form a similarity matrix containing all biological data information through graph fusion. In addition, MRF-MSC adaptively gives weight factors to the smooth regularization representation of each omics data by using the self-weighting method. Finally, MRF-MSC imposes constrained Laplacian rank on the fusion similarity matrix to get a better cluster structure. The above problems can be transformed into spectral clustering for solving, and the clustering results can be obtained. MRF-MSC unifies the above process of graph construction, graph fusion and spectral clustering under one framework, which can learn better data representation and high-quality graphs, so as to achieve better clustering effect. In the experiment, MRF-MSC obtained good experimental results on the TCGA cancer data sets.

5.
Genes (Basel) ; 12(4)2021 04 03.
Article in English | MEDLINE | ID: mdl-33916856

ABSTRACT

Integrating multigenomic data to recognize cancer subtype is an important task in bioinformatics. In recent years, some multiview clustering algorithms have been proposed and applied to identify cancer subtype. However, these clustering algorithms ignore that each data contributes differently to the clustering results during the fusion process, and they require additional clustering steps to generate the final labels. In this paper, a new one-step method for cancer subtype recognition based on graph learning framework is designed, called Laplacian Rank Constrained Multiview Clustering (LRCMC). LRCMC first forms a graph for a single biological data to reveal the relationship between data points and uses affinity matrix to encode the graph structure. Then, it adds weights to measure the contribution of each graph and finally merges these individual graphs into a consensus graph. In addition, LRCMC constructs the adaptive neighbors to adjust the similarity of sample points, and it uses the rank constraint on the Laplacian matrix to ensure that each graph structure has the same connected components. Experiments on several benchmark datasets and The Cancer Genome Atlas (TCGA) datasets have demonstrated the effectiveness of the proposed algorithm comparing to the state-of-the-art methods.


Subject(s)
Algorithms , Biomarkers, Tumor/metabolism , Computational Biology/methods , Gene Expression Regulation, Neoplastic , Models, Theoretical , Neoplasms/classification , Biomarkers, Tumor/genetics , Cluster Analysis , Humans , Neoplasms/genetics , Neoplasms/metabolism , Neoplasms/pathology , Pattern Recognition, Automated , Prognosis , Survival Rate
6.
Oncotarget ; 8(51): 89021-89032, 2017 Oct 24.
Article in English | MEDLINE | ID: mdl-29179495

ABSTRACT

Recently, with the rapid progress of high-throughput sequencing technology, diverse genomic data are easy to be obtained. To effectively exploit the value of those data, integrative methods are urgently needed. In this paper, based on SNF (Similarity Network Diffusion) [1], we proposed a new integrative method named ndmaSNF (network diffusion model assisted SNF), which can be used for cancer subtype discovery with the advantage of making use of somatic mutation data and other discrete data. Firstly, we incorporate network diffusion model on mutation data to make it smoothed and adaptive. Then, the mutation data along with other data types are utilized in the SNF framework by constructing patient-by-patient similarity networks for each data type. Finally, a fused patient network containing all the information from different input data types is obtained by using a nonlinear iterative method. The fused network can be used for cancer subtype discovery through the clustering algorithm. Experimental results on four cancer datasets showed that our ndmaSNF method can find subtypes with significant differences in the survival profile and other clinical features.

7.
IEEE/ACM Trans Comput Biol Bioinform ; 14(5): 1115-1121, 2017.
Article in English | MEDLINE | ID: mdl-28113782

ABSTRACT

One major goal of large-scale cancer omics study is to understand molecular mechanisms of cancer and find new biomedical targets. To deal with the high-dimensional multidimensional cancer omics data (DNA methylation, mRNA expression, etc.), which can be used to discover new insight on identifying cancer subtypes, clustering methods are usually used to find an effective low-dimensional subspace of the original data and then cluster cancer samples in the reduced subspace. However, due to data-type diversity and big data volume, few methods can integrate these data and map them into an effective low-dimensional subspace. In this paper, we develop a dimension-reduction and data-integration method for indentifying cancer subtypes, named Scluster. First, Scluster, respectively, projects the different original data into the principal subspaces by an adaptive sparse reduced-rank regression method. Then, a fused patient-by-patient network is obtained for these subgroups through a scaled exponential similarity kernel method. Finally, candidate cancer subtypes are identified using spectral clustering method. We demonstrate the efficiency of our Scluster method using three cancers by jointly analyzing mRNA expression, miRNA expression, and DNA methylation data. The evaluation results and analyses show that Scluster is effective for predicting survival and identifies novel cancer subtypes of large-scale multi-omics data.


Subject(s)
Biomarkers, Tumor/genetics , Genes, Neoplasm/genetics , Genomics/methods , Models, Genetic , Neoplasm Proteins/genetics , Neoplasms/genetics , Neoplasms/mortality , Computer Simulation , Data Interpretation, Statistical , Humans , Neoplasms/classification , Survival Analysis , Systems Integration
SELECTION OF CITATIONS
SEARCH DETAIL
...