Search | VHL Regional Portal

Identifying B-cell epitopes using AlphaFold2 predicted structures and pretrained language model.

Zeng, Yuansong; Wei, Zhuoyi; Yuan, Qianmu; Chen, Sheng; Yu, Weijiang; Lu, Yutong; Gao, Jianzhao; Yang, Yuedong.

Bioinformatics ; 39(4)2023 04 03.

Article in English | MEDLINE | ID: mdl-37039829

ABSTRACT

MOTIVATION: Identifying the B-cell epitopes is an essential step for guiding rational vaccine development and immunotherapies. Since experimental approaches are expensive and time-consuming, many computational methods have been designed to assist B-cell epitope prediction. However, existing sequence-based methods have limited performance since they only use contextual features of the sequential neighbors while neglecting structural information. RESULTS: Based on the recent breakthrough of AlphaFold2 in protein structure prediction, we propose GraphBepi, a novel graph-based model for accurate B-cell epitope prediction. For one protein, the predicted structure from AlphaFold2 is used to construct the protein graph, where the nodes/residues are encoded by ESM-2 learning representations. The graph is input into the edge-enhanced deep graph neural network (EGNN) to capture the spatial information in the predicted 3D structures. In parallel, a bidirectional long short-term memory neural networks (BiLSTM) are employed to capture long-range dependencies in the sequence. The learned low-dimensional representations by EGNN and BiLSTM are then combined into a multilayer perceptron for predicting B-cell epitopes. Through comprehensive tests on the curated epitope dataset, GraphBepi was shown to outperform the state-of-the-art methods by more than 5.5% and 44.0% in terms of AUC and AUPR, respectively. A web server is freely available at http://bio-web1.nscc-gz.cn/app/graphbepi. AVAILABILITY AND IMPLEMENTATION: The datasets, pre-computed features, source codes, and the trained model are available at https://github.com/biomed-AI/GraphBepi.

Subject(s)

Epitopes, B-Lymphocyte , Neural Networks, Computer , Epitopes, B-Lymphocyte/chemistry , Proteins/chemistry , Software , Language

Spatial transcriptomics prediction from histology jointly through Transformer and graph neural networks.

Zeng, Yuansong; Wei, Zhuoyi; Yu, Weijiang; Yin, Rui; Yuan, Yuchen; Li, Bingling; Tang, Zhonghui; Lu, Yutong; Yang, Yuedong.

Brief Bioinform ; 23(5)2022 09 20.

Article in English | MEDLINE | ID: mdl-35849101

ABSTRACT

The rapid development of spatial transcriptomics allows the measurement of RNA abundance at a high spatial resolution, making it possible to simultaneously profile gene expression, spatial locations of cells or spots, and the corresponding hematoxylin and eosin-stained histology images. It turns promising to predict gene expression from histology images that are relatively easy and cheap to obtain. For this purpose, several methods are devised, but they have not fully captured the internal relations of the 2D vision features or spatial dependency between spots. Here, we developed Hist2ST, a deep learning-based model to predict RNA-seq expression from histology images. Around each sequenced spot, the corresponding histology image is cropped into an image patch and fed into a convolutional module to extract 2D vision features. Meanwhile, the spatial relations with the whole image and neighbored patches are captured through Transformer and graph neural network modules, respectively. These learned features are then used to predict the gene expression by following the zero-inflated negative binomial distribution. To alleviate the impact by the small spatial transcriptomics data, a self-distillation mechanism is employed for efficient learning of the model. By comprehensive tests on cancer and normal datasets, Hist2ST was shown to outperform existing methods in terms of both gene expression prediction and spatial region identification. Further pathway analyses indicated that our model could reserve biological information. Thus, Hist2ST enables generating spatial transcriptomics data from histology images for elucidating molecular signatures of tissues.

Subject(s)

Image Processing, Computer-Assisted , Transcriptome , Eosine Yellowish-(YS) , Hematoxylin , Image Processing, Computer-Assisted/methods , Neural Networks, Computer , RNA

A parameter-free deep embedded clustering method for single-cell RNA-seq data.

Zeng, Yuansong; Wei, Zhuoyi; Zhong, Fengqi; Pan, Zixiang; Lu, Yutong; Yang, Yuedong.

Brief Bioinform ; 23(5)2022 09 20.

Article in English | MEDLINE | ID: mdl-35524494

ABSTRACT

Clustering analysis is widely used in single-cell ribonucleic acid (RNA)-sequencing (scRNA-seq) data to discover cell heterogeneity and cell states. While many clustering methods have been developed for scRNA-seq analysis, most of these methods require to provide the number of clusters. However, it is not easy to know the exact number of cell types in advance, and experienced determination is not always reliable. Here, we have developed ADClust, an automatic deep embedding clustering method for scRNA-seq data, which can accurately cluster cells without requiring a predefined number of clusters. Specifically, ADClust first obtains low-dimensional representation through pre-trained autoencoder and uses the representations to cluster cells into initial micro-clusters. The clusters are then compared in between by a statistical test, and similar micro-clusters are merged into larger clusters. According to the clustering, cell representations are updated so that each cell will be pulled toward centers of its assigned cluster and similar clusters, while cells are separated to keep distances between clusters. This is accomplished through jointly optimizing the carefully designed clustering and autoencoder loss functions. This merging process continues until convergence. ADClust was tested on 11 real scRNA-seq datasets and was shown to outperform existing methods in terms of both clustering performance and the accuracy on the number of the determined clusters. More importantly, our model provides high speed and scalability for large datasets.

Subject(s)

RNA , Single-Cell Analysis , Algorithms , Cluster Analysis , Gene Expression Profiling/methods , RNA/genetics , RNA-Seq , Sequence Analysis, RNA/methods , Single-Cell Analysis/methods

A robust and scalable graph neural network for accurate single-cell classification.

Zeng, Yuansong; Wei, Zhuoyi; Pan, Zixiang; Lu, Yutong; Yang, Yuedong.

Brief Bioinform ; 23(2)2022 03 10.

Article in English | MEDLINE | ID: mdl-35018408

ABSTRACT

Single-cell RNA sequencing (scRNA-seq) techniques provide high-resolution data on cellular heterogeneity in diverse tissues, and a critical step for the data analysis is cell type identification. Traditional methods usually cluster the cells and manually identify cell clusters through marker genes, which is time-consuming and subjective. With the launch of several large-scale single-cell projects, millions of sequenced cells have been annotated and it is promising to transfer labels from the annotated datasets to newly generated datasets. One powerful way for the transferring is to learn cell relations through the graph neural network (GNN), but traditional GNNs are difficult to process millions of cells due to the expensive costs of the message-passing procedure at each training epoch. Here, we have developed a robust and scalable GNN-based method for accurate single-cell classification (GraphCS), where the graph is constructed to connect similar cells within and between labelled and unlabeled scRNA-seq datasets for propagation of shared information. To overcome the slow information propagation of GNN at each training epoch, the diffused information is pre-calculated via the approximate Generalized PageRank algorithm, enabling sublinear complexity over cell numbers. Compared with existing methods, GraphCS demonstrates better performance on simulated, cross-platform, cross-species and cross-omics scRNA-seq datasets. More importantly, our model provides a high speed and scalability on large datasets, and can achieve superior performance for 1 million cells within 50 min.

Subject(s)

Neural Networks, Computer , Single-Cell Analysis , Algorithms , Learning , Sequence Analysis, RNA/methods , Single-Cell Analysis/methods , Exome Sequencing

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL