Search | VHL Regional Portal

Prediction of chemical reaction yields with large-scale multi-view pre-training.

Shi, Runhan; Yu, Gufeng; Huo, Xiaohong; Yang, Yang.

J Cheminform ; 16(1): 22, 2024 Feb 25.

Article in English | MEDLINE | ID: mdl-38403627

ABSTRACT

Developing machine learning models with high generalization capability for predicting chemical reaction yields is of significant interest and importance. The efficacy of such models depends heavily on the representation of chemical reactions, which has commonly been learned from SMILES or graphs of molecules using deep neural networks. However, the progression of chemical reactions is inherently determined by the molecular 3D geometric properties, which have been recently highlighted as crucial features in accurately predicting molecular properties and chemical reactions. Additionally, large-scale pre-training has been shown to be essential in enhancing the generalization capability of complex deep learning models. Based on these considerations, we propose the Reaction Multi-View Pre-training (ReaMVP) framework, which leverages self-supervised learning techniques and a two-stage pre-training strategy to predict chemical reaction yields. By incorporating multi-view learning with 3D geometric information, ReaMVP achieves state-of-the-art performance on two benchmark datasets. Notably, the experimental results indicate that ReaMVP has a significant advantage in predicting out-of-sample data, suggesting an enhanced generalization ability to predict new reactions. Scientific Contribution: This study presents the ReaMVP framework, which improves the generalization capability of machine learning models for predicting chemical reaction yields. By integrating sequential and geometric views and leveraging self-supervised learning techniques with a two-stage pre-training strategy, ReaMVP achieves state-of-the-art performance on benchmark datasets. The framework demonstrates superior predictive ability for out-of-sample data and enhances the prediction of new reactions.

MIGGRI: A multi-instance graph neural network model for inferring gene regulatory networks for Drosophila from spatial expression images.

Huang, Yuyang; Yu, Gufeng; Yang, Yang.

PLoS Comput Biol ; 19(11): e1011623, 2023 Nov.

Article in English | MEDLINE | ID: mdl-37939200

ABSTRACT

Recent breakthrough in spatial transcriptomics has brought great opportunities for exploring gene regulatory networks (GRNs) from a brand-new perspective. Especially, the local expression patterns and spatio-temporal regulation mechanisms captured by spatial expression images allow more delicate delineation of the interplay between transcript factors and their target genes. However, the complexity and size of spatial image collections pose significant challenges to GRN inference using image-based methods. Extracting regulatory information from expression images is difficult due to the lack of supervision and the multi-instance nature of the problem, where a gene often corresponds to multiple images captured from different views. While graph models, particularly graph neural networks, have emerged as a promising method for leveraging underlying structure information from known GRNs, incorporating expression images into graphs is not straightforward. To address these challenges, we propose a two-stage approach, MIGGRI, for capturing comprehensive regulatory patterns from image collections for each gene and known interactions. Our approach involves a multi-instance graph neural network (GNN) model for GRN inference, which first extracts gene regulatory features from spatial expression images via contrastive learning, and then feeds them to a multi-instance GNN for semi-supervised learning. We apply our approach to a large set of Drosophila embryonic spatial gene expression images. MIGGRI achieves outstanding performance in the inference of GRNs for early eye development and mesoderm development of Drosophila, and shows robustness in the scenarios of missing image information. Additionally, we perform interpretable analysis on image reconstruction and functional subgraphs that may reveal potential pathways or coordinate regulations. By leveraging the power of graph neural networks and the information contained in spatial expression images, our approach has the potential to advance our understanding of gene regulation in complex biological systems.

Subject(s)

Drosophila , Gene Regulatory Networks , Animals , Drosophila/genetics , Gene Regulatory Networks/genetics , Computational Biology/methods , Neural Networks, Computer , Gene Expression Profiling/methods

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL