Search | VHL Regional Portal

The generative capacity of probabilistic protein sequence models.

McGee, Francisco; Hauri, Sandro; Novinger, Quentin; Vucetic, Slobodan; Levy, Ronald M; Carnevale, Vincenzo; Haldane, Allan.

Nat Commun ; 12(1): 6302, 2021 11 02.

Article in English | MEDLINE | ID: mdl-34728624

ABSTRACT

Potts models and variational autoencoders (VAEs) have recently gained popularity as generative protein sequence models (GPSMs) to explore fitness landscapes and predict mutation effects. Despite encouraging results, current model evaluation metrics leave unclear whether GPSMs faithfully reproduce the complex multi-residue mutational patterns observed in natural sequences due to epistasis. Here, we develop a set of sequence statistics to assess the "generative capacity" of three current GPSMs: the pairwise Potts Hamiltonian, the VAE, and the site-independent model. We show that the Potts model's generative capacity is largest, as the higher-order mutational statistics generated by the model agree with those observed for natural sequences, while the VAE's lies between the Potts and site-independent models. Importantly, our work provides a new framework for evaluating and interpreting GPSM accuracy which emphasizes the role of higher-order covariation and epistasis, with broader implications for probabilistic sequence models in general.

Subject(s)

Mutation , Proteins/chemistry , Sequence Alignment/methods , Algorithms , Amino Acid Sequence , Computer Simulation , Databases, Protein , Humans , Models, Statistical , Protein Structural Elements , Proteins/genetics , Structure-Activity Relationship

Structure motif-centric learning framework for inorganic crystalline systems.

Banjade, Huta R; Hauri, Sandro; Zhang, Shanshan; Ricci, Francesco; Gong, Weiyi; Hautier, Geoffroy; Vucetic, Slobodan; Yan, Qimin.

Sci Adv ; 7(17)2021 Apr.

Article in English | MEDLINE | ID: mdl-33883136

ABSTRACT

Incorporation of physical principles in a machine learning (ML) architecture is a fundamental step toward the continued development of artificial intelligence for inorganic materials. As inspired by the Pauling's rule, we propose that structure motifs in inorganic crystals can serve as a central input to a machine learning framework. We demonstrated that the presence of structure motifs and their connections in a large set of crystalline compounds can be converted into unique vector representations using an unsupervised learning algorithm. To demonstrate the use of structure motif information, a motif-centric learning framework is created by combining motif information with the atom-based graph neural networks to form an atom-motif dual graph network (AMDNet), which is more accurate in predicting the electronic structures of metal oxides such as bandgaps. The work illustrates the route toward fundamental design of graph neural network learning architecture for complex materials by incorporating beyond-atom physical principles.

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL