Search | VHL Regional Portal

1.

A Delayed Spiking Neural Membrane System for Adaptive Nearest Neighbor-Based Density Peak Clustering.

Ren, Qianqian; Zhang, Lianlian; Liu, Shaoyi; Liu, Jin-Xing; Shang, Junliang; Liu, Xiyu.

Int J Neural Syst ; : 2450050, 2024 Jul 06.

Article in English | MEDLINE | ID: mdl-38973024

ABSTRACT

Although the density peak clustering (DPC) algorithm can effectively distribute samples and quickly identify noise points, it lacks adaptability and cannot consider the local data structure. In addition, clustering algorithms generally suffer from high time complexity. Prior research suggests that clustering algorithms grounded in P systems can mitigate time complexity concerns. Within the realm of membrane systems (P systems), spiking neural P systems (SN P systems), inspired by biological nervous systems, are third-generation neural networks that possess intricate structures and offer substantial parallelism advantages. Thus, this study first improved the DPC by introducing the maximum nearest neighbor distance and K-nearest neighbors (KNN). Moreover, a method based on delayed spiking neural P systems (DSN P systems) was proposed to improve the performance of the algorithm. Subsequently, the DSNP-ANDPC algorithm was proposed. The effectiveness of DSNP-ANDPC was evaluated through comprehensive evaluations across four synthetic datasets and 10 real-world datasets. The proposed method outperformed the other comparison methods in most cases.

2.

FSCME: A Feature Selection Method Combining Copula Correlation and Maximal Information Coefficient by Entropy Weights.

Zhong, Qi; Shang, Junliang; Ren, Qianqian; Li, Feng; Jiao, Cui-Na; Liu, Jin-Xing.

IEEE J Biomed Health Inform ; PP2024 Jun 04.

Article in English | MEDLINE | ID: mdl-38833405

ABSTRACT

Feature selection is a critical component of data mining and has garnered significant attention in recent years. However, feature selection methods based on information entropy often introduce complex mutual information forms to measure features, leading to increased redundancy and potential errors. To address this issue, we propose FSCME, a feature selection method combining Copula correlation (Ccor) and the maximum information coefficient (MIC) by entropy weights. The FSCME takes into consideration the relevance between features and labels, as well as the redundancy among candidate features and selected features. Therefore, the FSCME utilizes Ccor to measure the redundancy between features, while also estimating the relevance between features and labels. Meanwhile, the FSCME employs MIC to enhance the credibility of the correlation between features and labels. Moreover, this study employs the Entropy Weight Method (EWM) to evaluate and assign weights to the Ccor and MIC. The experimental results demonstrate that FSCME yields a more effective feature subset for subsequent clustering processes, significantly improving the classification performance compared to the other six feature selection methods. The source codes of the FSCME are available online at https://github.com/CDMBlab/FSCME.

3.

Diagnosis-Guided Deep Subspace Clustering Association Study for Pathogenetic Markers Identification of Alzheimer's Disease Based on Comparative Atlases.

Jiao, Cui-Na; Shang, Junliang; Li, Feng; Cui, Xinchun; Wang, Yan-Li; Gao, Ying-Lian; Liu, Jin-Xing.

IEEE J Biomed Health Inform ; 28(5): 3029-3041, 2024 May.

Article in English | MEDLINE | ID: mdl-38427553

ABSTRACT

The roles of brain region activities and genotypic functions in the pathogenesis of Alzheimer's disease (AD) remain unclear. Meanwhile, current imaging genetics methods are difficult to identify potential pathogenetic markers by correlation analysis between brain network and genetic variation. To discover disease-related brain connectome from the specific brain structure and the fine-grained level, based on the Automated Anatomical Labeling (AAL) and human Brainnetome atlases, the functional brain network is first constructed for each subject. Specifically, the upper triangle elements of the functional connectivity matrix are extracted as connectivity features. The clustering coefficient and the average weighted node degree are developed to assess the significance of every brain area. Since the constructed brain network and genetic data are characterized by non-linearity, high-dimensionality, and few subjects, the deep subspace clustering algorithm is proposed to reconstruct the original data. Our multilayer neural network helps capture the non-linear manifolds, and subspace clustering learns pairwise affinities between samples. Moreover, most approaches in neuroimaging genetics are unsupervised learning, neglecting the diagnostic information related to diseases. We presented a label constraint with diagnostic status to instruct the imaging genetics correlation analysis. To this end, a diagnosis-guided deep subspace clustering association (DDSCA) method is developed to discover brain connectome and risk genetic factors by integrating genotypes with functional network phenotypes. Extensive experiments prove that DDSCA achieves superior performance to most association methods and effectively selects disease-relevant genetic markers and brain connectome at the coarse-grained and fine-grained levels.

Subject(s)

Alzheimer Disease , Brain , Magnetic Resonance Imaging , Humans , Alzheimer Disease/genetics , Alzheimer Disease/diagnostic imaging , Cluster Analysis , Brain/diagnostic imaging , Magnetic Resonance Imaging/methods , Connectome/methods , Algorithms , Aged , Biomarkers , Female , Male , Atlases as Topic , Neuroimaging/methods

4.

A Clustering Method for Single-Cell RNA-Seq Data Based on Automatic Weighting Penalty and Low-Rank Representation.

Wang, Juan; Wang, Zhenchang; Yuan, Shasha; Zheng, Chunhou; Liu, Jinxing; Shang, Junliang.

IEEE/ACM Trans Comput Biol Bioinform ; 21(3): 360-371, 2024.

Article in English | MEDLINE | ID: mdl-38319777

ABSTRACT

Advances in high-throughput single-cell RNA sequencing (scRNA-seq) technology have provided more comprehensive biological information on cell expression. Clustering analysis is a critical step in scRNA-seq research and provides clear knowledge of the cell identity. Unfortunately, the characteristics of scRNA-seq data and the limitations of existing technologies make clustering encounter a considerable challenge. Meanwhile, some existing methods treat different features equally and ignore differences in feature contributions, which leads to a loss of information. To overcome limitations, we introduce a weighted distance constraint into the construction of the similarity graph and combine the similarity constraint. We propose the Joint Automatic Weighting Similarity Graph and Low-rank Representation (JAGLRR) clustering method. Evaluating the contributions of each feature and assigning various weight values can increase the significance of valuable features while decreasing the interference of redundant features. The similarity constraint allows the model to generate a more symmetric affinity matrix. Benefitting from that affinity matrix, JAGLRR recovers the original linear relationship of the data more accurately and obtains more discriminative information. The results on simulated datasets and 8 real datasets show that JAGLRR outperforms 11 existing comparison methods in clustering experiments, with higher clustering accuracy and stability.

Subject(s)

Algorithms , Computational Biology , RNA-Seq , Single-Cell Analysis , Cluster Analysis , Single-Cell Analysis/methods , Computational Biology/methods , RNA-Seq/methods , Humans , Animals , Sequence Analysis, RNA/methods , Mice , Single-Cell Gene Expression Analysis

5.

HGSMDA: miRNA-Disease Association Prediction Based on HyperGCN and Sørensen-Dice Loss.

Chang, Zhenghua; Zhu, Rong; Liu, Jinxing; Shang, Junliang; Dai, Lingyun.

Noncoding RNA ; 10(1)2024 Jan 26.

Article in English | MEDLINE | ID: mdl-38392964

ABSTRACT

Biological research has demonstrated the significance of identifying miRNA-disease associations in the context of disease prevention, diagnosis, and treatment. However, the utilization of experimental approaches involving biological subjects to infer these associations is both costly and inefficient. Consequently, there is a pressing need to devise novel approaches that offer enhanced accuracy and effectiveness. Presently, the predominant methods employed for predicting disease associations rely on Graph Convolutional Network (GCN) techniques. However, the Graph Convolutional Network algorithm, which is locally aggregated, solely incorporates information from the immediate neighboring nodes of a given node at each layer. Consequently, GCN cannot simultaneously aggregate information from multiple nodes. This constraint significantly impacts the predictive efficacy of the model. To tackle this problem, we propose a novel approach, based on HyperGCN and Sørensen-Dice loss (HGSMDA), for predicting associations between miRNAs and diseases. In the initial phase, we developed multiple networks to represent the similarity between miRNAs and diseases and employed GCNs to extract information from diverse perspectives. Subsequently, we draw into HyperGCN to construct a miRNA-disease heteromorphic hypergraph using hypernodes and train GCN on the graph to aggregate information. Finally, we utilized the Sørensen-Dice loss function to evaluate the degree of similarity between the predicted outcomes and the ground truth values, thereby enabling the prediction of associations between miRNAs and diseases. In order to assess the soundness of our methodology, an extensive series of experiments was conducted employing the Human MicroRNA Disease Database (HMDD v3.2) as the dataset. The experimental outcomes unequivocally indicate that HGSMDA exhibits remarkable efficacy when compared to alternative methodologies. Furthermore, the predictive capacity of HGSMDA was corroborated through a case study focused on colon cancer. These findings strongly imply that HGSMDA represents a dependable and valid framework, thereby offering a novel avenue for investigating the intricate association between miRNAs and diseases.

6.

Joint L_2,p-norm and random walk graph constrained PCA for single-cell RNA-seq data.

Wang, Tai-Ge; Shang, Jun-Liang; Liu, Jin-Xing; Li, Feng; Yuan, Shasha; Wang, Juan.

Comput Methods Biomech Biomed Engin ; 27(4): 498-511, 2024.

Article in English | MEDLINE | ID: mdl-36912759

ABSTRACT

The development and widespread utilization of high-throughput sequencing technologies in biology has fueled the rapid growth of single-cell RNA sequencing (scRNA-seq) data over the past decade. The development of scRNA-seq technology has significantly expanded researchers' understanding of cellular heterogeneity. Accurate cell type identification is the prerequisite for any research on heterogeneous cell populations. However, due to the high noise and high dimensionality of scRNA-seq data, improving the effectiveness of cell type identification remains a challenge. As an effective dimensionality reduction method, Principal Component Analysis (PCA) is an essential tool for visualizing high-dimensional scRNA-seq data and identifying cell subpopulations. However, traditional PCA has some defects when used in mining the nonlinear manifold structure of the data and usually suffers from over-density of principal components (PCs). Therefore, we present a novel method in this paper called joint L2,p-norm and random walk graph constrained PCA (RWPPCA). RWPPCA aims to retain the data's local information in the process of mapping high-dimensional data to low-dimensional space, to more accurately obtain sparse principal components and to then identify cell types more precisely. Specifically, RWPPCA combines the random walk (RW) algorithm with graph regularization to more accurately determine the local geometric relationships between data points. Moreover, to mitigate the adverse effects of dense PCs, the L2,p-norm is introduced to make the PCs sparser, thus increasing their interpretability. Then, we evaluate the effectiveness of RWPPCA on simulated data and scRNA-seq data. The results show that RWPPCA performs well in cell type identification and outperforms other comparison methods.

Subject(s)

Single-Cell Analysis , Single-Cell Gene Expression Analysis , Principal Component Analysis , Single-Cell Analysis/methods , Algorithms , Cluster Analysis

7.

iLncDA-RSN: identification of lncRNA-disease associations based on reliable similarity networks.

Li, Yahan; Zhang, Mingrui; Shang, Junliang; Li, Feng; Ren, Qianqian; Liu, Jin-Xing.

Front Genet ; 14: 1249171, 2023.

Article in English | MEDLINE | ID: mdl-37614816

ABSTRACT

Identification of disease-associated long non-coding RNAs (lncRNAs) is crucial for unveiling the underlying genetic mechanisms of complex diseases. Multiple types of similarity networks of lncRNAs (or diseases) can complementary and comprehensively characterize their similarities. Hence, in this study, we presented a computational model iLncDA-RSN based on reliable similarity networks for identifying potential lncRNA-disease associations (LDAs). Specifically, for constructing reliable similarity networks of lncRNAs and diseases, miRNA heuristic information with lncRNAs and diseases is firstly introduced to construct their respective Jaccard similarity networks; then Gaussian interaction profile (GIP) kernel similarity networks and Jaccard similarity networks of lncRNAs and diseases are provided based on the lncRNA-disease association network; a random walk with restart strategy is finally applied on Jaccard similarity networks, GIP kernel similarity networks, as well as lncRNA functional similarity network and disease semantic similarity network to construct reliable similarity networks. Depending on the lncRNA-disease association network and the reliable similarity networks, feature vectors of lncRNA-disease pairs are integrated from lncRNA and disease perspectives respectively, and then dimensionality reduced by the elastic net. Two random forests are at last used together on different lncRNA-disease association feature sets to identify potential LDAs. The iLncDA-RSN is evaluated by five-fold cross-validation to analyse its prediction performance, results of which show that the iLncDA-RSN outperforms the compared models. Furthermore, case studies of different complex diseases demonstrate the effectiveness of the iLncDA-RSN in identifying potential LDAs.

8.

The lower He-sea points playing a significant role in postoperative ileus in colorectal cancer treated with acupuncture: based on machine-learning.

Zhang, Xu; Yang, Wenjing; Shang, Junliang; Dan, Wenchao; Shi, Lin; Tong, Li; Yang, Guowang.

Front Oncol ; 13: 1206196, 2023.

Article in English | MEDLINE | ID: mdl-37564931

ABSTRACT

Background: Postoperative ileus (POI) is a common complication following abdominal surgery, which can lead to significant negative impacts on patients' well-being and healthcare costs. However, the efficacy of current treatments is not satisfactory. The purpose of this study was to evaluate the therapeutic effects of acupuncture intervention and explore the regulation of acupoint selection for treating POI in colorectal cancer (CRC) patients. Methods: We searched eight electronic databases to identify randomized controlled trials (RCTs) on acupuncture for POI in CRC and conducted a meta-analysis. Subsequently, we utilized the Apriori algorithm and the Frequent pattern growth algorithm, in conjunction with complex network and cluster analysis, to identify association rules of acupoints. Results: The meta-analysis showed that acupuncture led to significant reductions in time to first defecation (MD=-20.93, 95%CI: -25.35, -16.51; I2 = 93.0%; p < 0.01; n=2805), first flatus (MD=-15.08, 95%CI: -18.39, -11.76; I2 = 96%; p < 0.01; n=3284), and bowel sounds recovery (MD=-10.96, 95%CI: -14.20, -7.72; I2 = 94%; p < 0.01; n=2043). A subgroup analysis revealed that acupuncture not only reduced the duration of POI when administered alongside conventional care but also further expedited the recovery of gut function after colorectal surgery when integrated into the enhanced recovery after surgery (ERAS) pathway. The studies included in the analysis reported no instances of serious adverse events associated with acupuncture. We identified Zusanli (ST36), Shangjuxu (ST37), Neiguan (PC6), Sanyinjiao (SP6), Xiajuxu (ST39), Hegu (LI4), Tianshu (ST25), and Zhongwan (RN12) as primary acupoints for treating POI. Association rule mining suggested potential acupoint combinations including {ST37, ST39}≥{ST36}, {PC6, ST37}≥{ST36}, {SP6, ST37}≥{ST36}, and {ST25, ST37}≥{ST36}. Conclusion: Meta-analysis indicates acupuncture's safety and superior effectiveness over postoperative care alone in facilitating gastrointestinal recovery. Machine-learning approaches highlight the importance of the lower He-sea points, including Zusanli (ST36) and Shangjuxu (ST37), in treating POI in CRC patients. Incorporating additional acupoints such as Neiguan (PC6) (for pain and vomiting) and Sanyinjiao (SP6) (for abdominal distension and poor appetite) can optimize treatment outcomes. These findings offer valuable insights for refining treatment protocols in both clinical and experimental settings, ultimately enhancing patient care.

9.

ARGLRR: A Sparse Low-Rank Representation Single-Cell RNA-Sequencing Data Clustering Method Combined with a New Graph Regularization.

Wang, Zhen-Chang; Liu, Jin-Xing; Shang, Jun-Liang; Dai, Ling-Yun; Zheng, Chun-Hou; Wang, Juan.

J Comput Biol ; 30(8): 848-860, 2023 08.

Article in English | MEDLINE | ID: mdl-37471220

ABSTRACT

The development of single-cell transcriptome sequencing technologies has opened new ways to study biological phenomena at the cellular level. A key application of such technologies involves the employment of single-cell RNA sequencing (scRNA-seq) data to identify distinct cell types through clustering, which in turn provides evidence for revealing heterogeneity. Despite the promise of this approach, the inherent characteristics of scRNA-seq data, such as higher noise levels and lower coverage, pose major challenges to existing clustering methods and compromise their accuracy. In this study, we propose a method called Adjusted Random walk Graph regularization Sparse Low-Rank Representation (ARGLRR), a practical sparse subspace clustering method, to identify cell types. The fundamental low-rank representation (LRR) model is concerned with the global structure of data. To address the limited ability of the LRR method to capture local structure, we introduced adjusted random walk graph regularization in its framework. ARGLRR allows for the capture of both local and global structures in scRNA-seq data. Additionally, the imposition of similarity constraints into the LRR framework further improves the ability of the proposed model to estimate cell-to-cell similarity and capture global structural relationships between cells. ARGLRR surpasses other advanced comparison approaches on nine known scRNA-seq data sets judging by the results. In the normalized mutual information and Adjusted Rand Index metrics on the scRNA-seq data sets clustering experiments, ARGLRR outperforms the best-performing comparative method by 6.99% and 5.85%, respectively. In addition, we visualize the result using Uniform Manifold Approximation and Projection. Visualization results show that the usage of ARGLRR enhances the separation of different cell types within the similarity matrix.

Subject(s)

Algorithms , RNA , Cluster Analysis , Single-Cell Analysis/methods , Sequence Analysis, RNA , Gene Expression Profiling

10.

Multi-View Enhanced Tensor Nuclear Norm and Local Constraint Model for Cancer Clustering and Feature Gene Selection.

Qiao, Qian; Yuan, Sha-Sha; Shang, Junliang; Liu, Jin-Xing.

J Comput Biol ; 30(8): 889-899, 2023 08.

Article in English | MEDLINE | ID: mdl-37471239

ABSTRACT

The analysis of cancer data from multi-omics can effectively promote cancer research. The main focus of this article is to cluster cancer samples and identify feature genes to reveal the correlation between cancers and genes, with the primary approach being the analysis of multi-view cancer omics data. Our proposed solution, the Multi-View Enhanced Tensor Nuclear Norm and Local Constraint (MVET-LC) model, aims to utilize the consistency and complementarity of omics data to support biological research. The model is designed to maximize the utilization of multi-view data and incorporates a nuclear norm and local constraint to achieve this goal. The first step involves introducing the concept of enhanced partial sum of tensor nuclear norm, which significantly enhances the flexibility of the tensor nuclear norm. After that, we incorporate total variation regularization into the MVET-LC model to further augment its performance. It enables MVET-LC to make use of the relationship between tensor data structures and sparse data while paying attention to the feature details of the tensor data. To tackle the iterative optimization problem of MVET-LC, the alternating direction method of multipliers is utilized. Through experimental validation, it is demonstrated that our proposed model outperforms other comparison models.

Subject(s)

Algorithms , Neoplasms , Humans , Neoplasms/genetics , Cluster Analysis

11.

A Graph Representation Approach Based on Light Gradient Boosting Machine for Predicting Drug-Disease Associations.

Wang, Ying; Liu, Jin-Xing; Wang, Juan; Shang, Junliang; Gao, Ying-Lian.

J Comput Biol ; 30(8): 937-947, 2023 08.

Article in English | MEDLINE | ID: mdl-37486669

ABSTRACT

Determining the association between drug and disease is important in drug development. However, existing approaches for drug-disease associations (DDAs) prediction are too homogeneous in terms of feature extraction. Here, a novel graph representation approach based on light gradient boosting machine (GRLGB) is proposed for prediction of DDAs. After the introduction of the protein into a heterogeneous network, nodes features were extracted from two perspectives: network topology and biological knowledge. Finally, the GRLGB classifier was applied to predict potential DDAs. GRLGB achieved satisfactory results on Bdataset and Fdataset through 10-fold cross-validation. To further prove the reliability of the GRLGB, case studies involving anxiety disorders and clozapine were conducted. The results suggest that GRLGB can identify novel DDAs.

Subject(s)

Computational Biology , Proteins , Reproducibility of Results , Computational Biology/methods , Algorithms

12.

MGCNRF: Prediction of Disease-Related miRNAs Based on Multiple Graph Convolutional Networks and Random Forest.

Yang, Yi; Sun, Yan; Li, Feng; Guan, Boxin; Liu, Jin-Xing; Shang, Junliang.

IEEE Trans Neural Netw Learn Syst ; PP2023 Jul 17.

Article in English | MEDLINE | ID: mdl-37459265

ABSTRACT

Increasing microRNAs (miRNAs) have been confirmed to be inextricably linked to various diseases, and the discovery of their associations has become a routine way of treating diseases. To overcome the time-consuming and laborious shortcoming of traditional experiments in verifying the associations of miRNAs and diseases (MDAs), a variety of computational methods have emerged. However, these methods still have many shortcomings in terms of predictive performance and accuracy. In this study, a model based on multiple graph convolutional networks and random forest (MGCNRF) was proposed for the prediction MDAs. Specifically, MGCNRF first mapped miRNA functional similarity and sequence similarity, disease semantic similarity and target similarity, and the known MDAs into four different two-layer heterogeneous networks. Second, MGCNRF applied four heterogeneous networks into four different layered attention graph convolutional networks (GCNs), respectively, to extract MDA embeddings. Finally, MGCNRF integrated the embeddings of every MDA into the features of the miRNA-disease pair and predicted potential MDAs through the random forest (RF). Fivefold cross-validation was applied to verify the prediction performance of MGCNRF, which outperforms the other seven state-of-the-art methods by area under curve. Furthermore, the accuracy and the case studies of different diseases further demonstrate the scientific rationale of MGCNRF. In conclusion, MGCNRF can serve as a scientific tool for predicting potential MDAs.

13.

NLRRC: A Novel Clustering Method of Jointing Non-Negative LRR and Random Walk Graph Regularized NMF for Single-Cell Type Identification.

Wang, Juan; Wang, Lin-Ping; Yuan, Sha-Sha; Li, Feng; Liu, Jin-Xing; Shang, Jun-Liang.

IEEE J Biomed Health Inform ; 27(10): 5199-5209, 2023 10.

Article in English | MEDLINE | ID: mdl-37506010

ABSTRACT

The development of single-cell RNA sequencing (scRNA-seq) technology has opened up a new perspective for us to study disease mechanisms at the single cell level. Cell clustering reveals the natural grouping of cells, which is a vital step in scRNA-seq data analysis. However, the high noise and dropout of single-cell data pose numerous challenges to cell clustering. In this study, we propose a novel matrix factorization method named NLRRC for single-cell type identification. NLRRC joins non-negative low-rank representation (LRR) and random walk graph regularized NMF (RWNMFC) to accurately reveal the natural grouping of cells. Specifically, we find the lowest rank representation of single-cell samples by non-negative LRR to reduce the difficulty of analyzing high-dimensional samples and capture the global information of the samples. Meanwhile, by using random walk graph regularization (RWGR) and NMF, RWNMFC captures manifold structure and cluster information before generating a cluster allocation matrix. The cluster assignment matrix contains cluster labels, which can be used directly to get the clustering results. The performance of NLRRC is validated on simulated and real single-cell datasets. The results of the experiments illustrate that NLRRC has a significant advantage in single-cell type identification.

Subject(s)

Algorithms , Single-Cell Analysis , Humans , Cluster Analysis , Gene Expression Profiling/methods

14.

Network embedding framework for driver gene discovery by combining functional and structural information.

Chu, Xin; Guan, Boxin; Dai, Lingyun; Liu, Jin-Xing; Li, Feng; Shang, Junliang.

BMC Genomics ; 24(1): 426, 2023 Jul 29.

Article in English | MEDLINE | ID: mdl-37516822

ABSTRACT

Comprehensive analysis of multiple data sets can identify potential driver genes for various cancers. In recent years, driver gene discovery based on massive mutation data and gene interaction networks has attracted increasing attention, but there is still a need to explore combining functional and structural information of genes in protein interaction networks to identify driver genes. Therefore, we propose a network embedding framework combining functional and structural information to identify driver genes. Firstly, we combine the mutation data and gene interaction networks to construct mutation integration network using network propagation algorithm. Secondly, the struc2vec model is used for extracting gene features from the mutation integration network, which contains both gene's functional and structural information. Finally, machine learning algorithms are utilized to identify the driver genes. Compared with the previous four excellent methods, our method can find gene pairs that are distant from each other through structural similarities and has better performance in identifying driver genes for 12 cancers in the cancer genome atlas. At the same time, we also conduct a comparative analysis of three gene interaction networks, three gene standard sets, and five machine learning algorithms. Our framework provides a new perspective for feature selection to identify novel driver genes.

Subject(s)

Algorithms , Gene Regulatory Networks , Genetic Association Studies , Machine Learning , Protein Interaction Mapping

15.

scFED: Clustering Identifying Cell Types of scRNA-Seq Data Based on Feature Engineering Denoising.

Liu, Yang; Li, Feng; Shang, Junliang; Liu, Jinxing; Wang, Juan; Ge, Daohui.

Interdiscip Sci ; 15(4): 590-601, 2023 Dec.

Article in English | MEDLINE | ID: mdl-37402002

ABSTRACT

Recently developed single-cell RNA-seq (scRNA-seq) technology has given researchers the chance to investigate single-cell level of disease development. Clustering is one of the most essential strategies for analyzing scRNA-seq data. Choosing high-quality feature sets can significantly enhance the outcomes of single-cell clustering and classification. But computationally burdensome and highly expressed genes cannot afford a stabilized and predictive feature set for technical reasons. In this study, we introduce scFED, a feature-engineered gene selection framework. scFED identifies prospective feature sets to eliminate the noise fluctuation. And fuse them with existing knowledge from the tissue-specific cellular taxonomy reference database (CellMatch) to avoid the influence of subjective factors. Then present a reconstruction approach for noise reduction and crucial information amplification. We apply scFED on four genuine single-cell datasets and compare it with other techniques. According to the results, scFED improves clustering, decreases dimension of the scRNA-seq data, improves cell type identification when combined with clustering algorithms, and has higher performance than other methods. Therefore, scFED offers certain benefits in scRNA-seq data gene selection.

16.

ETGPDA: identification of piRNA-disease associations based on embedding transformation graph convolutional network.

Meng, Xianghan; Shang, Junliang; Ge, Daohui; Yang, Yi; Zhang, Tongdui; Liu, Jin-Xing.

BMC Genomics ; 24(1): 279, 2023 May 25.

Article in English | MEDLINE | ID: mdl-37226081

ABSTRACT

BACKGROUND: Piwi-interacting RNAs (piRNAs) have been proven to be closely associated with human diseases. The identification of the potential associations between piRNA and disease is of great significance for complex diseases. Traditional "wet experiment" is time-consuming and high-priced, predicting the piRNA-disease associations by computational methods is of great significance. METHODS: In this paper, a method based on the embedding transformation graph convolution network is proposed to predict the piRNA-disease associations, named ETGPDA. Specifically, a heterogeneous network is constructed based on the similarity information of piRNA and disease, as well as the known piRNA-disease associations, which is applied to extract low-dimensional embeddings of piRNA and disease based on graph convolutional network with an attention mechanism. Furthermore, the embedding transformation module is developed for the problem of embedding space inconsistency, which is lightweighter, stronger learning ability and higher accuracy. Finally, the piRNA-disease association score is calculated by the similarity of the piRNA and disease embedding. RESULTS: Evaluated by fivefold cross-validation, the AUC of ETGPDA achieves 0.9603, which is better than the other five selected computational models. The case studies based on Head and neck squamous cell carcinoma and Alzheimer's disease further prove the superior performance of ETGPDA. CONCLUSIONS: Hence, the ETGPDA is an effective method for predicting the hidden piRNA-disease associations.

Subject(s)

Alzheimer Disease , Head and Neck Neoplasms , Humans , Piwi-Interacting RNA , Alzheimer Disease/genetics , Learning , Research Design

17.

GCCN: Graph Capsule Convolutional Network for Progressive Mild Cognitive Impairment Prediction and Pathogenesis Identification Based on Imaging Genetic Data.

Shang, Junliang; Zou, Qi; Ren, Qianqian; Guan, Boxin; Li, Feng; Liu, Jin-Xing; Sun, Yan.

IEEE J Biomed Health Inform ; 27(6): 2968-2979, 2023 06.

Article in English | MEDLINE | ID: mdl-37030856

ABSTRACT

In this study, we proposed a novel method called the graph capsule convolutional network (GCCN) to predict the progression from mild cognitive impairment to dementia and identify its pathogenesis. First, we proposed a novel risk gene discovery component to indirectly target genes with higher interactions with others. These risk genes and brain regions were collected as nodes to construct heterogeneous pathogenic information association graphs. Second, the graph capsules were established by projecting heterogeneous pathogenic information into a set of disentangled latent components. The orientation and length of capsules are representations of the format and intensity of pathogenic information. Third, graph capsule convolution network was used to model the information flows among pathogenic factors and elaborates the convergence of primary capsules to advanced capsules. The advanced capsule is a concept that organizes pathogenic information based on its consistency, and the synergistic effects of advanced capsules directed the development of the disease. Finally, discriminative pathogenic information flows were captured by a straightforward built-in interpretation mechanism, i.e., the dynamic routing mechanism, and applied to the identification of pathogenesis. GCCN has been experimentally shown to be significantly advanced on public datasets. Further experiments have shown that the pathogenic factors identified by GCCN are evidential and closely related to progressive mild cognitive impairment.

Subject(s)

Cognitive Dysfunction , Humans , Capsules , Cognitive Dysfunction/diagnostic imaging , Cognitive Dysfunction/genetics , Diagnostic Imaging

18.

KGLRR: A low-rank representation K-means with graph regularization constraint method for Single-cell type identification.

Wang, Lin-Ping; Liu, Jin-Xing; Shang, Jun-Liang; Kong, Xiang-Zhen; Guan, Bo-Xin; Wang, Juan.

Comput Biol Chem ; 104: 107862, 2023 Jun.

Article in English | MEDLINE | ID: mdl-37031647

ABSTRACT

Single-cell RNA sequencing technology provides a tremendous opportunity for studying disease mechanisms at the single-cell level. Cell type identification is a key step in the research of disease mechanisms. Many clustering algorithms have been proposed to identify cell types. Most clustering algorithms perform similarity calculation before cell clustering. Because clustering and similarity calculation are independent, a low-rank matrix obtained only by similarity calculation may be unable to fully reveal the patterns in single-cell data. In this study, to capture accurate single-cell clustering information, we propose a novel method based on a low-rank representation model, called KGLRR, that combines the low-rank representation approach with K-means clustering. The cluster centroid is updated as the cell dimension decreases to better from new clusters and improve the quality of clustering information. In addition, the low-rank representation model ignores local geometric information, so the graph regularization constraint is introduced. KGLRR is tested on both simulated and real single-cell datasets to validate the effectiveness of the new method. The experimental results show that KGLRR is more robust and accurate in cell type identification than other advanced algorithms.

Subject(s)

Algorithms , Cluster Analysis

19.

A Personalized Low-Rank Subspace Clustering Method Based on Locality and Similarity Constraints for scRNA-seq Data Analysis.

Qiao, Tian-Jing; Liu, Jin-Xing; Shang, Junliang; Yuan, Shasha; Zheng, Chun-Hou; Wang, Juan.

IEEE J Biomed Health Inform ; 27(5): 2575-2584, 2023 05.

Article in English | MEDLINE | ID: mdl-37027680

ABSTRACT

Single-cell RNA sequencing (scRNA-seq) technology can provide expression profile of single cells, which propels biological research into a new chapter. Clustering individual cells based on their transcriptome is a critical objective of scRNA-seq data analysis. However, the high-dimensional, sparse and noisy nature of scRNA-seq data pose a challenge to single-cell clustering. Therefore, it is urgent to develop a clustering method targeting scRNA-seq data characteristics. Due to its powerful subspace learning capability and robustness to noise, the subspace segmentation method based on low-rank representation (LRR) is broadly used in clustering researches and achieves satisfactory results. In view of this, we propose a personalized low-rank subspace clustering method, namely PLRLS, to learn more accurate subspace structures from both global and local perspectives. Specifically, we first introduce the local structure constraint to capture the local structure information of the data, while helping our method to obtain better inter-cluster separability and intra-cluster compactness. Then, in order to retain the important similarity information that is ignored by the LRR model, we utilize the fractional function to extract similarity information between cells, and introduce this information as the similarity constraint into the LRR framework. The fractional function is an efficient similarity measure designed for scRNA-seq data, which has theoretical and practical implications. In the end, based on the LRR matrix learned from PLRLS, we perform downstream analyses on real scRNA-seq datasets, including spectral clustering, visualization and marker gene identification. Comparative experiments show that the proposed method achieves superior clustering accuracy and robustness.

Subject(s)

Algorithms , Single-Cell Gene Expression Analysis , Humans , Transcriptome , Cluster Analysis , Data Analysis , Single-Cell Analysis/methods , Gene Expression Profiling/methods

20.

DM-MOGA: a multi-objective optimization genetic algorithm for identifying disease modules of non-small cell lung cancer.

Shang, Junliang; Zhu, Xuhui; Sun, Yan; Li, Feng; Kong, Xiangzhen; Liu, Jin-Xing.

BMC Bioinformatics ; 24(1): 13, 2023 Jan 09.

Article in English | MEDLINE | ID: mdl-36624376

ABSTRACT

BACKGROUND: Constructing molecular interaction networks from microarray data and then identifying disease module biomarkers can provide insight into the underlying pathogenic mechanisms of non-small cell lung cancer. A promising approach for identifying disease modules in the network is community detection. RESULTS: In order to identify disease modules from gene co-expression networks, a community detection method is proposed based on multi-objective optimization genetic algorithm with decomposition. The method is named DM-MOGA and possesses two highlights. First, the boundary correction strategy is designed for the modules obtained in the process of local module detection and pre-simplification. Second, during the evolution, we introduce Davies-Bouldin index and clustering coefficient as fitness functions which are improved and migrated to weighted networks. In order to identify modules that are more relevant to diseases, the above strategies are designed to consider the network topology of genes and the strength of connections with other genes at the same time. Experimental results of different gene expression datasets of non-small cell lung cancer demonstrate that the core modules obtained by DM-MOGA are more effective than those obtained by several other advanced module identification methods. CONCLUSIONS: The proposed method identifies disease-relevant modules by optimizing two novel fitness functions to simultaneously consider the local topology of each gene and its connection strength with other genes. The association of the identified core modules with lung cancer has been confirmed by pathway and gene ontology enrichment analysis.

Subject(s)

Carcinoma, Non-Small-Cell Lung , Lung Neoplasms , Humans , Carcinoma, Non-Small-Cell Lung/genetics , Lung Neoplasms/genetics , Gene Regulatory Networks , Microarray Analysis , Algorithms , Gene Expression Profiling/methods

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL