Search | VHL Regional Portal

1.

Drug-target interaction predictions with multi-view similarity network fusion strategy and deep interactive attention mechanism.

Song, Wei; Xu, Lewen; Han, Chenguang; Tian, Zhen; Zou, Quan.

Bioinformatics ; 2024 Jun 05.

Article in English | MEDLINE | ID: mdl-38837345

ABSTRACT

MOTIVATION: Accurately identifying the drug-target interactions (DTIs) is one of the crucial steps in the drug discovery and drug repositioning process. Currently, many computational-based models have already been proposed for DTI prediction and achieved some significant improvement. However, these approaches pay little attention to fuse the multi-view similarity networks related to drugs and targets in an appropriate way. Besides, how to fully incorporate the known interaction relationships to accurately represent drugs and targets is not well investigated. Therefore, there is still a need to improve the accuracy of DTI prediction models. RESULTS: In this study, we propose a novel approach that employs Multi-view similarity network fusion strategy and deep Interactive attention mechanism to predict Drug-Target Interactions (MIDTI). First, MIDTI constructs multi-view similarity networks of drugs and targets with their diverse information and integrates these similarity networks effectively in an unsupervised manner. Then, MIDTI obtains the embeddings of drugs and targets from multi-type networks simultaneously. After that, MIDTI adopts the deep interactive attention mechanism to further learn their discriminative embeddings comprehensively with the known DTI relationships. Finally, we feed the learned representations of drugs and targets to the multilayer perceptron (MLP) model and predict the underlying interactions. Extensive results indicate that MIDTI significantly outperforms other baseline methods on the DTI prediction task. The results of the ablation experiments also confirm the effectiveness of the attention mechanism in the multi-view similarity network fusion strategy and the deep interactive attention mechanism. AVAILABILITY: https://github.com/XuLew/MIDTI. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

2.

Fusion of multi-source relationships and topology to infer lncRNA-protein interactions.

Zhang, Xinyu; Liu, Mingzhe; Li, Zhen; Zhuo, Linlin; Fu, Xiangzheng; Zou, Quan.

Mol Ther Nucleic Acids ; 35(2): 102187, 2024 Jun 11.

Article in English | MEDLINE | ID: mdl-38706631

ABSTRACT

Long non-coding RNAs (lncRNAs) are important factors involved in biological regulatory networks. Accurately predicting lncRNA-protein interactions (LPIs) is vital for clarifying lncRNA's functions and pathogenic mechanisms. Existing deep learning models have yet to yield satisfactory results in LPI prediction. Recently, graph autoencoders (GAEs) have seen rapid development, excelling in tasks like link prediction and node classification. We employed GAE technology for LPI prediction, devising the FMSRT-LPI model based on path masking and degree regression strategies and thereby achieving satisfactory outcomes. This represents the first known integration of path masking and degree regression strategies into the GAE framework for potential LPI inference. The effectiveness of our FMSRT-LPI model primarily relies on four key aspects. First, within the GAE framework, our model integrates multi-source relationships of lncRNAs and proteins with LPN's topological data. Second, the implemented masking strategy efficiently identifies LPN's key paths, reconstructs the network, and reduces the impact of redundant or incorrect data. Third, the integrated degree decoder balances degree and structural information, enhancing node representation. Fourth, the PolyLoss function we introduced is more appropriate for LPI prediction tasks. The results on multiple public datasets further demonstrate our model's potential in LPI prediction.

3.

scMNMF: a novel method for single-cell multi-omics clustering based on matrix factorization.

Qiu, Yushan; Guo, Dong; Zhao, Pu; Zou, Quan.

Brief Bioinform ; 25(3)2024 Mar 27.

Article in English | MEDLINE | ID: mdl-38754408

ABSTRACT

MOTIVATION: The technology for analyzing single-cell multi-omics data has advanced rapidly and has provided comprehensive and accurate cellular information by exploring cell heterogeneity in genomics, transcriptomics, epigenomics, metabolomics and proteomics data. However, because of the high-dimensional and sparse characteristics of single-cell multi-omics data, as well as the limitations of various analysis algorithms, the clustering performance is generally poor. Matrix factorization is an unsupervised, dimensionality reduction-based method that can cluster individuals and discover related omics variables from different blocks. Here, we present a novel algorithm that performs joint dimensionality reduction learning and cell clustering analysis on single-cell multi-omics data using non-negative matrix factorization that we named scMNMF. We formulate the objective function of joint learning as a constrained optimization problem and derive the corresponding iterative formulas through alternating iterative algorithms. The major advantage of the scMNMF algorithm remains its capability to explore hidden related features among omics data. Additionally, the feature selection for dimensionality reduction and cell clustering mutually influence each other iteratively, leading to a more effective discovery of cell types. We validated the performance of the scMNMF algorithm using two simulated and five real datasets. The results show that scMNMF outperformed seven other state-of-the-art algorithms in various measurements. AVAILABILITY AND IMPLEMENTATION: scMNMF code can be found at https://github.com/yushanqiu/scMNMF.

Subject(s)

Algorithms , Single-Cell Analysis , Single-Cell Analysis/methods , Cluster Analysis , Humans , Genomics/methods , Computational Biology/methods , Proteomics/methods , Metabolomics/methods , Epigenomics/methods , Multiomics

4.

ADAM17 variant causes hair loss via ubiquitin ligase TRIM47 mediated degradation.

Wang, Xiaoxiao; Pan, Chaolan; Zheng, Luyao; Wang, Jianbo; Zou, Quan; Sun, Peiyi; Zhou, Kaili; Zhao, Anqi; Cao, Qiaoyu; He, Wei; Wang, Yumeng; Cheng, Ruhong; Yao, Zhirong; Zhang, Si; Zhang, Hui; Li, Ming.

JCI Insight ; 2024 May 21.

Article in English | MEDLINE | ID: mdl-38771644

ABSTRACT

Hypotrichosis is a genetic disorder which characterized by a diffuse and progressive loss of scalp and/or body hair. Nonetheless, the causative genes for several affected individuals remain elusive, and the underlying mechanisms have yet to be fully elucidated. Here, we discovered a dominant variant in ADAM17 gene caused hypotrichosis with woolly hair. Adam17 (p.D647N) knock-in mice model mimicked the hair abnormality in patients. ADAM17 (p.D647N) mutation led to hair follicle stem cells (HFSCs) exhaustion and caused abnormal hair follicles, ultimately resulting in alopecia. Mechanistic studies revealed that ADAM17 binds directly to E3 ubiquitin ligase TRIM47. ADAM17 (p.D647N) variant enhanced the association between ADAM17 and TRIM47, leading to an increase in ubiquitination and subsequent degradation of ADAM17 protein. Furthermore, reduced ADAM17 protein expression affected Notch signaling pathway, impairing the activation, proliferation, and differentiation of HFSCs during hair follicle regeneration. Overexpression of NICD rescued the reduced proliferation ability caused by Adam17 variant in primary fibroblast cells.

5.

RDscan: Extracting RNA-disease relationship from the literature based on pre-training model.

Zhang, Yang; Yang, Yu; Ren, Liping; Ning, Lin; Zou, Quan; Luo, Nanchao; Zhang, Yinghui; Liu, Ruijun.

Methods ; 2024 May 22.

Article in English | MEDLINE | ID: mdl-38789016

ABSTRACT

With the rapid advancements in molecular biology and genomics, a multitude of connections between RNA and diseases has been unveiled, making the efficient and accurate extraction of RNA-disease (RD) relationships from extensive biomedical literature crucial for advancing research in this field. This study introduces RDscan, a novel text mining method developed based on the pre-training and fine-tuning strategy, aimed at automatically extracting RD-related information from a vast corpus of literature using pre-trained biomedical large language models (LLM). Initially, we constructed a dedicated RD corpus by manually curating from literature, comprising 2,082 positive and 2,000 negative sentences, alongside an independent test dataset (comprising 500 positive and 500 negative sentences) for training and evaluating RDscan. Subsequently, by fine-tuning the Bioformer and BioBERT pre-trained models, RDscan demonstrated exceptional performance in text classification and named entity recognition (NER) tasks. In 5-fold cross-validation, RDscan significantly outperformed traditional machine learning methods (Support Vector Machine, Logistic Regression and Random Forest). In addition, we have developed an accessible webserver that assists users in extracting RD relationships from text. In summary, RDscan represents the first text mining tool specifically designed for RD relationship extraction, and is poised to emerge as an invaluable tool for researchers dedicated to exploring the intricate interactions between RNA and diseases. Webserver of RDscan is free available at https://cellknowledge.com.cn/RDscan/.

6.

Integrated convolution and self-attention for improving peptide toxicity prediction.

Jiao, Shihu; Ye, Xiucai; Sakurai, Tetsuya; Zou, Quan; Liu, Ruijun.

Bioinformatics ; 40(5)2024 May 02.

Article in English | MEDLINE | ID: mdl-38696758

ABSTRACT

MOTIVATION: Peptides are promising agents for the treatment of a variety of diseases due to their specificity and efficacy. However, the development of peptide-based drugs is often hindered by the potential toxicity of peptides, which poses a significant barrier to their clinical application. Traditional experimental methods for evaluating peptide toxicity are time-consuming and costly, making the development process inefficient. Therefore, there is an urgent need for computational tools specifically designed to predict peptide toxicity accurately and rapidly, facilitating the identification of safe peptide candidates for drug development. RESULTS: We provide here a novel computational approach, CAPTP, which leverages the power of convolutional and self-attention to enhance the prediction of peptide toxicity from amino acid sequences. CAPTP demonstrates outstanding performance, achieving a Matthews correlation coefficient of approximately 0.82 in both cross-validation settings and on independent test datasets. This performance surpasses that of existing state-of-the-art peptide toxicity predictors. Importantly, CAPTP maintains its robustness and generalizability even when dealing with data imbalances. Further analysis by CAPTP reveals that certain sequential patterns, particularly in the head and central regions of peptides, are crucial in determining their toxicity. This insight can significantly inform and guide the design of safer peptide drugs. AVAILABILITY AND IMPLEMENTATION: The source code for CAPTP is freely available at https://github.com/jiaoshihu/CAPTP.

Subject(s)

Computational Biology , Peptides , Peptides/chemistry , Computational Biology/methods , Humans , Amino Acid Sequence , Algorithms , Software

7.

Deciphering Microbial Adaptation in the Rhizosphere: Insights into Niche Preference, Functional Profiles, and Cross-Kingdom Co-occurrences.

Wang, Yansu; Zou, Quan.

Microb Ecol ; 87(1): 74, 2024 May 21.

Article in English | MEDLINE | ID: mdl-38771320

ABSTRACT

Rhizosphere microbial communities are to be as critical factors for plant growth and vitality, and their adaptive differentiation strategies have received increasing amounts of attention but are poorly understood. In this study, we obtained bacterial and fungal amplicon sequences from the rhizosphere and bulk soils of various ecosystems to investigate the potential mechanisms of microbial adaptation to the rhizosphere environment. Our focus encompasses three aspects: niche preference, functional profiles, and cross-kingdom co-occurrence patterns. Our findings revealed a correlation between niche similarity and nucleotide distance, suggesting that niche adaptation explains nucleotide variation among some closely related amplicon sequence variants (ASVs). Furthermore, biological macromolecule metabolism and communication among abundant bacteria increase in the rhizosphere conditions, suggesting that bacterial function is trait-mediated in terms of fitness in new habitats. Additionally, our analysis of cross-kingdom networks revealed that fungi act as intermediaries that facilitate connections between bacteria, indicating that microbes can modify their cooperative relationships to adapt. Overall, the evidence for rhizosphere microbial community adaptation, via differences in gene and functional and co-occurrence patterns, elucidates the adaptive benefits of genetic and functional flexibility of the rhizosphere microbiota through niche shifts.

Subject(s)

Adaptation, Physiological , Bacteria , Fungi , Microbiota , Rhizosphere , Soil Microbiology , Fungi/genetics , Fungi/classification , Fungi/physiology , Bacteria/genetics , Bacteria/classification , Bacteria/metabolism , Bacteria/isolation & purification , Ecosystem , Bacterial Physiological Phenomena

8.

msBERT-Promoter: a multi-scale ensemble predictor based on BERT pre-trained model for the two-stage prediction of DNA promoters and their strengths.

Li, Yazi; Wei, Xiaoman; Yang, Qinglin; Xiong, An; Li, Xingfeng; Zou, Quan; Cui, Feifei; Zhang, Zilong.

BMC Biol ; 22(1): 126, 2024 May 30.

Article in English | MEDLINE | ID: mdl-38816885

ABSTRACT

BACKGROUND: A promoter is a specific sequence in DNA that has transcriptional regulatory functions, playing a role in initiating gene expression. Identifying promoters and their strengths can provide valuable information related to human diseases. In recent years, computational methods have gained prominence as an effective means for identifying promoter, offering a more efficient alternative to labor-intensive biological approaches. RESULTS: In this study, a two-stage integrated predictor called "msBERT-Promoter" is proposed for identifying promoters and predicting their strengths. The model incorporates multi-scale sequence information through a tokenization strategy and fine-tunes the DNABERT model. Soft voting is then used to fuse the multi-scale information, effectively addressing the issue of insufficient DNA sequence information extraction in traditional models. To the best of our knowledge, this is the first time an integrated approach has been used in the DNABERT model for promoter identification and strength prediction. Our model achieves accuracy rates of 96.2% for promoter identification and 79.8% for promoter strength prediction, significantly outperforming existing methods. Furthermore, through attention mechanism analysis, we demonstrate that our model can effectively combine local and global sequence information, enhancing its interpretability. CONCLUSIONS: msBERT-Promoter provides an effective tool that successfully captures sequence-related attributes of DNA promoters and can accurately identify promoters and predict their strengths. This work paves a new path for the application of artificial intelligence in traditional biology.

Subject(s)

Promoter Regions, Genetic , Computational Biology/methods , DNA/genetics , Humans , Models, Genetic , Sequence Analysis, DNA/methods

9.

Application and Comparison of Machine Learning and Database-Based Methods in Taxonomic Classification of High-Throughput Sequencing Data.

Tian, Qinzhong; Zhang, Pinglu; Zhai, Yixiao; Wang, Yansu; Zou, Quan.

Genome Biol Evol ; 16(5)2024 May 02.

Article in English | MEDLINE | ID: mdl-38748485

ABSTRACT

The advent of high-throughput sequencing technologies has not only revolutionized the field of bioinformatics but has also heightened the demand for efficient taxonomic classification. Despite technological advancements, efficiently processing and analyzing the deluge of sequencing data for precise taxonomic classification remains a formidable challenge. Existing classification approaches primarily fall into two categories, database-based methods and machine learning methods, each presenting its own set of challenges and advantages. On this basis, the aim of our study was to conduct a comparative analysis between these two methods while also investigating the merits of integrating multiple database-based methods. Through an in-depth comparative study, we evaluated the performance of both methodological categories in taxonomic classification by utilizing simulated data sets. Our analysis revealed that database-based methods excel in classification accuracy when backed by a rich and comprehensive reference database. Conversely, while machine learning methods show superior performance in scenarios where reference sequences are sparse or lacking, they generally show inferior performance compared with database methods under most conditions. Moreover, our study confirms that integrating multiple database-based methods does, in fact, enhance classification accuracy. These findings shed new light on the taxonomic classification of high-throughput sequencing data and bear substantial implications for the future development of computational biology. For those interested in further exploring our methods, the source code of this study is publicly available on https://github.com/LoadStar822/Genome-Classifier-Performance-Evaluator. Additionally, a dedicated webpage showcasing our collected database, data sets, and various classification software can be found at http://lab.malab.cn/~tqz/project/taxonomic/.

Subject(s)

High-Throughput Nucleotide Sequencing , Machine Learning , Databases, Genetic , Computational Biology/methods , Classification/methods

10.

DeepAVP-TPPred: identification of antiviral peptides using transformed image-based localized descriptors and binary tree growth algorithm.

Ullah, Matee; Akbar, Shahid; Raza, Ali; Zou, Quan.

Bioinformatics ; 40(5)2024 May 02.

Article in English | MEDLINE | ID: mdl-38710482

ABSTRACT

MOTIVATION: Despite the extensive manufacturing of antiviral drugs and vaccination, viral infections continue to be a major human ailment. Antiviral peptides (AVPs) have emerged as potential candidates in the pursuit of novel antiviral drugs. These peptides show vigorous antiviral activity against a diverse range of viruses by targeting different phases of the viral life cycle. Therefore, the accurate prediction of AVPs is an essential yet challenging task. Lately, many machine learning-based approaches have developed for this purpose; however, their limited capabilities in terms of feature engineering, accuracy, and generalization make these methods restricted. RESULTS: In the present study, we aim to develop an efficient machine learning-based approach for the identification of AVPs, referred to as DeepAVP-TPPred, to address the aforementioned problems. First, we extract two new transformed feature sets using our designed image-based feature extraction algorithms and integrate them with an evolutionary information-based feature. Next, these feature sets were optimized using a novel feature selection approach called binary tree growth Algorithm. Finally, the optimal feature space from the training dataset was fed to the deep neural network to build the final classification model. The proposed model DeepAVP-TPPred was tested using stringent 5-fold cross-validation and two independent dataset testing methods, which achieved the maximum performance and showed enhanced efficiency over existing predictors in terms of both accuracy and generalization capabilities. AVAILABILITY AND IMPLEMENTATION: https://github.com/MateeullahKhan/DeepAVP-TPPred.

Subject(s)

Algorithms , Antiviral Agents , Machine Learning , Antiviral Agents/pharmacology , Antiviral Agents/chemistry , Peptides/chemistry , Humans , Computational Biology/methods , Neural Networks, Computer

11.

TPMA: A two pointers meta-alignment tool to ensemble different multiple nucleic acid sequence alignments.

Zhai, Yixiao; Chao, Jiannan; Wang, Yizheng; Zhang, Pinglu; Tang, Furong; Zou, Quan.

PLoS Comput Biol ; 20(4): e1011988, 2024 Apr.

Article in English | MEDLINE | ID: mdl-38557416

ABSTRACT

Accurate multiple sequence alignment (MSA) is imperative for the comprehensive analysis of biological sequences. However, a notable challenge arises as no single MSA tool consistently outperforms its counterparts across diverse datasets. Users often have to try multiple MSA tools to achieve optimal alignment results, which can be time-consuming and memory-intensive. While the overall accuracy of certain MSA results may be lower, there could be local regions with the highest alignment scores, prompting researchers to seek a tool capable of merging these locally optimal results from multiple initial alignments into a globally optimal alignment. In this study, we introduce Two Pointers Meta-Alignment (TPMA), a novel tool designed for the integration of nucleic acid sequence alignments. TPMA employs two pointers to partition the initial alignments into blocks containing identical sequence fragments. It selects blocks with the high sum of pairs (SP) scores to concatenate them into an alignment with an overall SP score superior to that of the initial alignments. Through tests on simulated and real datasets, the experimental results consistently demonstrate that TPMA outperforms M-Coffee in terms of aSP, Q, and total column (TC) scores across most datasets. Even in cases where TPMA's scores are comparable to M-Coffee, TPMA exhibits significantly lower running time and memory consumption. Furthermore, we comprehensively assessed all the MSA tools used in the experiments, considering accuracy, time, and memory consumption. We propose accurate and fast combination strategies for small and large datasets, which streamline the user tool selection process and facilitate large-scale dataset integration. The dataset and source code of TPMA are available on GitHub (https://github.com/malabz/TPMA).

Subject(s)

Algorithms , Nucleic Acids , Sequence Alignment , Coffee , Software

12.

Electrocatalytic C-H/S-H Coupling of Amino Pyrazoles and Thiophenols: Synthesis of Amino Pyrazole Thioether Derivatives.

Zhang, Wenbao; Zou, Quan; Wang, Qian; Jin, Dongsheng; Jiang, Shan; Qian, Peng.

J Org Chem ; 89(8): 5434-5441, 2024 Apr 19.

Article in English | MEDLINE | ID: mdl-38581391

ABSTRACT

A mild method for the C-H/S-H coupling of pyrazol-5-amines and thiophenols was developed via electrochemistry, giving diverse amino pyrazole thioether derivatives in 37-98% yields. This electrochemical reaction is sustainable and an atom-efficient approach with good functional group tolerance and scalability by avoiding metal and external chemical oxidants.

13.

PseU-KeMRF: A novel method for identifying RNA pseudouridine sites.

Chen, Mingshuai; Zou, Quan; Qi, Ren; Ding, Yijie.

IEEE/ACM Trans Comput Biol Bioinform ; PP2024 Apr 16.

Article in English | MEDLINE | ID: mdl-38625768

ABSTRACT

Pseudouridine is a type of abundant RNA modification that is seen in many different animals and is crucial for a variety of biological functions. Accurately identifying pseudouridine sites within the RNA sequence is vital for the subsequent study of various biological mechanisms of pseudouridine. However, the use of traditional experimental methods faces certain challenges. The development of fast and convenient computational methods is necessary to accurately identify pseudouridine sites from RNA sequence information. To address this, we introduce a novel pseudouridine site prediction model called PseU-KeMRF, which can identify pseudouridine sites in three species, H. sapiens, S. cerevisiae, and M. musculus. Through comprehensive analysis, we selected four RNA coding schemes, including binary feature, position-specific trinucleotide propensity based on single strand (PSTNPss), nucleotide chemical property (NCP) and pseudo k-tuple composition (PseKNC). Then the support vector machine-recursive feature elimination (SVM-RFE) method was used for feature selection and the feature subset was optimized. Finally, the best feature subsets are input into the kernel based on multinomial random forests (KeMRF) classifier for cross-validation and independent testing. As a new classification method, compared with the traditional random forest, KeMRF not only improves the node splitting process of decision tree construction based on multinomial distribution, but also combines the easy to interpret kernel method for prediction, which makes the classification performance better. Our results indicate superior predictive performance of PseU-KeMRF over other existing models. On the three species' training datasets, the testing accuracy of PseU-KeMRF was 0.66%, 3.66%, and 2.76% higher, respectively, than the best available methods. Moreover, PseU-KeMRF's accuracy on independent testing datasets was 15.15% and 11.0% higher, respectively, than the best available methods. The above results can prove that PseU-KeMRF is a highly competitive predictive model that can successfully identify pseudouridine sites in RNA sequences.

14.

MAMLCDA: A Meta-Learning Model for Predicting circRNA-Disease Association Based on MAML Combined With CNN.

Tian, Yuanyi; Zou, Quan; Wang, ChunYu; Jia, Cangzhi.

IEEE J Biomed Health Inform ; PP2024 Apr 05.

Article in English | MEDLINE | ID: mdl-38578862

ABSTRACT

Circular RNAs (circRNAs) exist in vivo and are a class of noncoding RNA molecules. They have a single-stranded, closed, annular structure. Many studies have shown that circRNAs and diseases are linked. Therefore, it is critical to build a reliable and accurate predictor to find the circRNA-disease association. In this paper, we presented a meta-learning model named MAMLCDA to identify the circRNA-disease association, which is based on model-agnostic meta-learning (MAML) combined with CNN classification. Specifically, similarities between diseases and circRNAs are extracted and integrated to characterize their relationships, and k-means is used to cluster majority samples and select a certain number of samples from each cluster to obtain the same number of negative samples as the positive samples. To further reduce the dimension of the features and save operation time, we applied probabilistic principal component analysis (PPCA) to compact the integrated circRNA and disease similarity network feature vectors. The feature vectors are converted into images. At this time, the prediction problem is transformed into the 2-way 1-shot problem of the image and input into the model with MAML as the meta-learner and CNN as the base-learner. Comparison results of five-fold cross-validation on two benchmark datasets illustrate that MAMLCDA outperforms several state-of-the-art approaches with the best accuracies of 95.33% and 98%. Therefore, MAMLCDA can help to understand the pathogenesis of complex diseases at the circRNA level.

15.

Multi-kernel Learning Fusion Algorithm Based on RNN and GRU for ASD Diagnosis and Pathogenic Brain Region Extraction.

Chen, Jie; Zhang, Huilian; Zou, Quan; Liao, Bo; Bi, Xia-An.

Interdiscip Sci ; 2024 Apr 29.

Article in English | MEDLINE | ID: mdl-38683281

ABSTRACT

Autism spectrum disorder (ASD) is a complex, severe disorder related to brain development. It impairs patient language communication and social behaviors. In recent years, ASD researches have focused on a single-modal neuroimaging data, neglecting the complementarity between multi-modal data. This omission may lead to poor classification. Therefore, it is important to study multi-modal data of ASD for revealing its pathogenesis. Furthermore, recurrent neural network (RNN) and gated recurrent unit (GRU) are effective for sequence data processing. In this paper, we introduce a novel framework for a Multi-Kernel Learning Fusion algorithm based on RNN and GRU (MKLF-RAG). The framework utilizes RNN and GRU to provide feature selection for data of different modalities. Then these features are fused by MKLF algorithm to detect the pathological mechanisms of ASD and extract the most relevant the Regions of Interest (ROIs) for the disease. The MKLF-RAG proposed in this paper has been tested in a variety of experiments with the Autism Brain Imaging Data Exchange (ABIDE) database. Experimental findings indicate that our framework notably enhances the classification accuracy for ASD. Compared with other methods, MKLF-RAG demonstrates superior efficacy across multiple evaluation metrics and could provide valuable insights into the early diagnosis of ASD.

16.

Electrochemically Enable N-Sulfenylation/Phosphinylation of Sulfoximines via Oxidative Dehydrocoupling Reaction.

Zhang, Wenbao; Jin, Dongsheng; Hu, Yongkang; Yin, Kun; Zou, Quan; Tang, Liang; Qian, Peng.

J Org Chem ; 89(9): 6106-6116, 2024 May 03.

Article in English | MEDLINE | ID: mdl-38632856

ABSTRACT

An electrochemical oxidative cross-coupling strategy for the synthesis of N-sulfenylsulfoximines from sulfoximines and thiols was accomplished, giving diverse N-sulfenylsulfoximines in moderate to good yields. Moreover, this strategy can be extended to construct the N-P bond of N-phosphinylated sulfoximines. With electrons as reagents, the oxidative dehydrogenation cross-coupling reaction proceeds smoothly in the absence of traditional redox reagents.

17.

scTPC: a novel semisupervised deep clustering model for scRNA-seq data.

Qiu, Yushan; Yang, Lingfei; Jiang, Hao; Zou, Quan.

Bioinformatics ; 40(5)2024 May 02.

Article in English | MEDLINE | ID: mdl-38684178

ABSTRACT

MOTIVATION: Continuous advancements in single-cell RNA sequencing (scRNA-seq) technology have enabled researchers to further explore the study of cell heterogeneity, trajectory inference, identification of rare cell types, and neurology. Accurate scRNA-seq data clustering is crucial in single-cell sequencing data analysis. However, the high dimensionality, sparsity, and presence of "false" zero values in the data can pose challenges to clustering. Furthermore, current unsupervised clustering algorithms have not effectively leveraged prior biological knowledge, making cell clustering even more challenging. RESULTS: This study investigates a semisupervised clustering model called scTPC, which integrates the triplet constraint, pairwise constraint, and cross-entropy constraint based on deep learning. Specifically, the model begins by pretraining a denoising autoencoder based on a zero-inflated negative binomial distribution. Deep clustering is then performed in the learned latent feature space using triplet constraints and pairwise constraints generated from partial labeled cells. Finally, to address imbalanced cell-type datasets, a weighted cross-entropy loss is introduced to optimize the model. A series of experimental results on 10 real scRNA-seq datasets and five simulated datasets demonstrate that scTPC achieves accurate clustering with a well-designed framework. AVAILABILITY AND IMPLEMENTATION: scTPC is a Python-based algorithm, and the code is available from https://github.com/LF-Yang/Code or https://zenodo.org/records/10951780.

Subject(s)

Algorithms , Single-Cell Analysis , Single-Cell Analysis/methods , Cluster Analysis , Humans , Sequence Analysis, RNA/methods , RNA-Seq/methods , Deep Learning , Software , Single-Cell Gene Expression Analysis

18.

Revisiting drug-protein interaction prediction: a novel global-local perspective.

Zhou, Zhecheng; Liao, Qingquan; Wei, Jinhang; Zhuo, Linlin; Wu, Xiaonan; Fu, Xiangzheng; Zou, Quan.

Bioinformatics ; 40(5)2024 May 02.

Article in English | MEDLINE | ID: mdl-38648052

ABSTRACT

MOTIVATION: Accurate inference of potential drug-protein interactions (DPIs) aids in understanding drug mechanisms and developing novel treatments. Existing deep learning models, however, struggle with accurate node representation in DPI prediction, limiting their performance. RESULTS: We propose a new computational framework that integrates global and local features of nodes in the drug-protein bipartite graph for efficient DPI inference. Initially, we employ pre-trained models to acquire fundamental knowledge of drugs and proteins and to determine their initial features. Subsequently, the MinHash and HyperLogLog algorithms are utilized to estimate the similarity and set cardinality between drug and protein subgraphs, serving as their local features. Then, an energy-constrained diffusion mechanism is integrated into the transformer architecture, capturing interdependencies between nodes in the drug-protein bipartite graph and extracting their global features. Finally, we fuse the local and global features of nodes and employ multilayer perceptrons to predict the likelihood of potential DPIs. A comprehensive and precise node representation guarantees efficient prediction of unknown DPIs by the model. Various experiments validate the accuracy and reliability of our model, with molecular docking results revealing its capability to identify potential DPIs not present in existing databases. This approach is expected to offer valuable insights for furthering drug repurposing and personalized medicine research. AVAILABILITY AND IMPLEMENTATION: Our code and data are accessible at: https://github.com/ZZCrazy00/DPI.

Subject(s)

Algorithms , Molecular Docking Simulation , Proteins , Proteins/chemistry , Proteins/metabolism , Pharmaceutical Preparations/chemistry , Pharmaceutical Preparations/metabolism , Computational Biology/methods , Deep Learning

19.

Integrating Single-Cell and Spatial Transcriptomics Reveals Heterogeneity of Early Pig Skin Development and a Subpopulation with Hair Placode Formation.

Wang, Yi; Jiang, Yao; Ni, Guiyan; Li, Shujuan; Balderson, Brad; Zou, Quan; Liu, Huatao; Jiang, Yifan; Sun, Jingchun; Ding, Xiangdong.

Adv Sci (Weinh) ; 11(20): e2306703, 2024 May.

Article in English | MEDLINE | ID: mdl-38561967

ABSTRACT

The dermis and epidermis, crucial structural layers of the skin, encompass appendages, hair follicles (HFs), and intricate cellular heterogeneity. However, an integrated spatiotemporal transcriptomic atlas of embryonic skin has not yet been described and would be invaluable for studying skin-related diseases in humans. Here, single-cell and spatial transcriptomic analyses are performed on skin samples of normal and hairless fetal pigs across four developmental periods. The cross-species comparison of skin cells illustrated that the pig epidermis is more representative of the human epidermis than mice epidermis. Moreover, Phenome-wide association study analysis revealed that the conserved genes between pigs and humans are strongly associated with human skin-related diseases. In the epidermis, two lineage differentiation trajectories describe hair follicle (HF) morphogenesis and epidermal development. By comparing normal and hairless fetal pigs, it is found that the hair placode (Pc), the most characteristic initial structure in HFs, arises from progenitor-like OGN+/UCHL1+ cells. These progenitors appear earlier in development than the previously described early Pc cells and exhibit abnormal proliferation and migration during differentiation in hairless pigs. The study provides a valuable resource for in-depth insights into HF development, which may serve as a key reference atlas for studying human skin disease etiology using porcine models.

Subject(s)

Hair Follicle , Transcriptome , Animals , Swine/genetics , Swine/embryology , Hair Follicle/metabolism , Hair Follicle/embryology , Hair Follicle/growth & development , Transcriptome/genetics , Single-Cell Analysis/methods , Skin/metabolism , Skin/embryology , Cell Differentiation/genetics , Gene Expression Profiling/methods , Humans , Mice

20.

Editorial: Artificial intelligence in drug discovery and development.

Wei, Leyi; Zou, Quan; Zeng, Xiangxiang.

Methods ; 226: 133-137, 2024 Jun.

Article in English | MEDLINE | ID: mdl-38582311

Subject(s)

Artificial Intelligence , Drug Discovery , Drug Discovery/methods , Humans , Drug Development/methods , Drug Development/trends

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL