ABSTRACT
Protein-ligand interactions (PLIs) determine the efficacy and safety profiles of small molecule drugs. Existing methods rely on either structural information or resource-intensive computations to predict PLI, casting doubt on whether it is possible to perform structure-free PLI predictions at low computational cost. Here we show that a light-weight graph neural network (GNN), trained with quantitative PLIs of a small number of proteins and ligands, is able to predict the strength of unseen PLIs. The model has no direct access to structural information about the protein-ligand complexes. Instead, the predictive power is provided by encoding the entire chemical and proteomic space in a single heterogeneous graph, encapsulating primary protein sequence, gene expression, the protein-protein interaction network, and structural similarities between ligands. This novel approach performs competitively with, or better than, structure-aware models. Our results suggest that existing PLI prediction methods may be improved by incorporating representation learning techniques that embed biological and chemical knowledge.