Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 70
Filter
Add more filters










Publication year range
1.
Comput Struct Biotechnol J ; 23: 1016-1025, 2024 Dec.
Article in English | MEDLINE | ID: mdl-38425487

ABSTRACT

Geometric deep learning has demonstrated a great potential in non-Euclidean data analysis. The incorporation of geometric insights into learning architecture is vital to its success. Here we propose a curvature-enhanced graph convolutional network (CGCN) for biomolecular interaction prediction. Our CGCN employs Ollivier-Ricci curvature (ORC) to characterize network local geometric properties and enhance the learning capability of GCNs. More specifically, ORCs are evaluated based on the local topology from node neighborhoods, and further incorporated into the weight function for the feature aggregation in message-passing procedure. Our CGCN model is extensively validated on fourteen real-world bimolecular interaction networks and analyzed in details using a series of well-designed simulated data. It has been found that our CGCN can achieve the state-of-the-art results. It outperforms all existing models, as far as we know, in thirteen out of the fourteen real-world datasets and ranks as the second in the rest one. The results from the simulated data show that our CGCN model is superior to the traditional GCN models regardless of the positive-to-negative-curvature ratios, network densities, and network sizes (when larger than 500).

2.
Comput Biol Med ; 169: 107918, 2024 Feb.
Article in English | MEDLINE | ID: mdl-38194782

ABSTRACT

Protein mutations can significantly influence protein solubility, which results in altered protein functions and leads to various diseases. Despite tremendous effort, machine learning prediction of protein solubility changes upon mutation remains a challenging task as indicated by the poor scores of normalized Correct Prediction Ratio (CPR). Part of the challenge stems from the fact that there is no three-dimensional (3D) structures for the wild-type and mutant proteins. This work integrates persistent Laplacians and pre-trained Transformer for the task. The Transformer, pretrained with hundreds of millions of protein sequences, embeds wild-type and mutant sequences, while persistent Laplacians track the topological invariant change and homotopic shape evolution induced by mutations in 3D protein structures, which are rendered from AlphaFold2. The resulting machine learning model was trained on an extensive data set labeled with three solubility types. Our model outperforms all existing predictive methods and improves the state-of-the-art up to 15%.


Subject(s)
Machine Learning , Solubility , Amino Acid Sequence , Mutation
3.
ArXiv ; 2023 Nov 02.
Article in English | MEDLINE | ID: mdl-37961732

ABSTRACT

Protein mutations can significantly influence protein solubility, which results in altered protein functions and leads to various diseases. Despite of tremendous effort, machine learning prediction of protein solubility changes upon mutation remains a challenging task as indicated by the poor scores of normalized Correct Prediction Ratio (CPR). Part of the challenge stems from the fact that there is no three-dimensional (3D) structures for the wild-type and mutant proteins. This work integrates persistent Laplacians and pre-trained Transformer for the task. The Transformer, pretrained with hunderds of millions of protein sequences, embeds wild-type and mutant sequences, while persistent Laplacians track the topological invariant change and homotopic shape evolution induced by mutations in 3D protein structures, which are rendered from AlphaFold2. The resulting machine learning model was trained on an extensive data set labeled with three solubility types. Our model outperforms all existing predictive methods and improves the state-of-the-art up to 15%.

4.
J Phys Chem B ; 127(46): 10077-10087, 2023 Nov 23.
Article in English | MEDLINE | ID: mdl-37942925

ABSTRACT

To discover new materials, high-throughput screening (HTS) with machine learning (ML) requires universally available descriptors that can accurately predict the desired properties. For elastomers, experimental and simulation data in current descriptors may not be available for all candidates of interest, hindering elastomer discovery through HTS. To address this challenge, we introduce structure-based multilevel (SM) descriptors of elastomers derived solely from molecular structure that is universally available. Our SM descriptors are hierarchically organized to capture both local soft and hard segment structures as well as the global structures of elastomers. With the SM-Morgan Fingerprint (SM-MF) descriptor, one of our SM descriptors, a machine learning model accurately predicts elastomer toughness with a remarkable accuracy of 0.91. Furthermore, an HTS pipeline is established to swiftly screen elastomers with targeted toughness. We also demonstrate the generality and applicability of SM descriptors by using them to construct HTS pipelines for screening elastomers with a targeted critical strain or Young's modulus. The user-friendliness and low computational cost of SM descriptors make them a promising tool to significantly enhance HTS in the search for novel materials.

5.
Cell Rep Methods ; 3(11): 100621, 2023 Nov 20.
Article in English | MEDLINE | ID: mdl-37875121

ABSTRACT

Molecular representation learning plays an important role in molecular property prediction. Existing molecular property prediction models rely on the de facto standard of covalent-bond-based molecular graphs for representing molecular topology at the atomic level and totally ignore the non-covalent interactions within the molecule. In this study, we propose a molecular geometric deep learning model to predict the properties of molecules that aims to comprehensively consider the information of covalent and non-covalent interactions of molecules. The essential idea is to incorporate a more general molecular representation into geometric deep learning (GDL) models. We systematically test molecular GDL (Mol-GDL) on fourteen commonly used benchmark datasets. The results show that Mol-GDL can achieve a better performance than state-of-the-art (SOTA) methods. Extensive tests have demonstrated the important role of non-covalent interactions in molecular property prediction and the effectiveness of Mol-GDL models.


Subject(s)
Deep Learning , Benchmarking , Models, Molecular
6.
Sci Rep ; 13(1): 11183, 2023 Jul 11.
Article in English | MEDLINE | ID: mdl-37433870

ABSTRACT

Molecular representations are of fundamental importance for the modeling and analysing molecular systems. The successes in drug design and materials discovery have been greatly contributed by molecular representation models. In this paper, we present a computational framework for molecular representation that is mathematically rigorous and based on the persistent Dirac operator. The properties of the discrete weighted and unweighted Dirac matrix are systematically discussed, and the biological meanings of both homological and non-homological eigenvectors are studied. We also evaluate the impact of various weighting schemes on the weighted Dirac matrix. Additionally, a set of physical persistent attributes that characterize the persistence and variation of spectrum properties of Dirac matrices during a filtration process is proposed to be molecular fingerprints. Our persistent attributes are used to classify molecular configurations of nine different types of organic-inorganic halide perovskites. The combination of persistent attributes with gradient boosting tree model has achieved great success in molecular solvation free energy prediction. The results show that our model is effective in characterizing the molecular structures, demonstrating the power of our molecular representation and featurization approach.

7.
Methods Mol Biol ; 2671: 307-318, 2023.
Article in English | MEDLINE | ID: mdl-37308652

ABSTRACT

Recent experiments have shown that the molecular complex of vault has large conformational changes at its shoulder and cap regions in solution. From the comparison of two configuration structures, it has been found that the shoulder region can twist and move outward, while the cap region will rotate and push upward correspondingly. To further understand these experimental results, in this paper, we study the vault dynamics for the first time. Since vault has an extremely large-sized structure with around 63,336 Cα atoms, traditional normal mode method with the Cα coarse-grained representation will fall short. We employ a newly invented multiscale virtual particle-based anisotropic network model (MVP-ANM). To reduce the complexity, the 39-folder vault structure is coarse-grained to about 6000 virtual particles, which significantly reduces the computational cost while still maintaining the basic structure information. Among the 14 low frequency eigenmodes from Mode 7 to Mode 20, two eigenmodes, i.e., Mode 9 and Mode 20, are found to be directly associated with the experimental observations. In Mode 9, shoulder region undergoes a significant expansion while the cap part is lifted upward. In Mode 20, a clear rotation of both shoulder and cap regions is well observed. Our results are consistent with the experimental observations. More importantly, these low frequency eigenmodes indicate that the vault waist, shoulder and lower cap regions are the most likely regions for the opening of the vault particle. And the opening mechanism is highly likely to be rotation and expansion at these regions. As far as we know, this is the first work to provide the normal mode analysis for the vault complex.


Subject(s)
Anisotropy , Rotation
8.
Entropy (Basel) ; 25(6)2023 May 25.
Article in English | MEDLINE | ID: mdl-37372190

ABSTRACT

An important challenge in the study of complex systems is to identify appropriate effective variables at different times. In this paper, we explain why structures that are persistent with respect to changes in length and time scales are proper effective variables, and illustrate how persistent structures can be identified from the spectra and Fiedler vector of the graph Laplacian at different stages of the topological data analysis (TDA) filtration process for twelve toy models. We then investigated four market crashes, three of which were related to the COVID-19 pandemic. In all four crashes, a persistent gap opens up in the Laplacian spectra when we go from a normal phase to a crash phase. In the crash phase, the persistent structure associated with the gap remains distinguishable up to a characteristic length scale ϵ* where the first non-zero Laplacian eigenvalue changes most rapidly. Before ϵ*, the distribution of components in the Fiedler vector is predominantly bi-modal, and this distribution becomes uni-modal after ϵ*. Our findings hint at the possibility of understanding market crashs in terms of both continuous and discontinuous changes. Beyond the graph Laplacian, we can also employ Hodge Laplacians of higher order for future research.

9.
J Chem Inf Model ; 63(13): 4216-4227, 2023 07 10.
Article in English | MEDLINE | ID: mdl-37381769

ABSTRACT

The Coronavirus disease 2019 (COVID-19) has affected people's lives and the development of the global economy. Biologically, protein-protein interactions between SARS-CoV-2 surface spike (S) protein and human ACE2 protein are the key mechanism behind the COVID-19 disease. In this study, we provide insights into interactions between the SARS-CoV-2 S-protein and ACE2, and propose topological indices to quantitatively characterize the impact of mutations on binding affinity changes (ΔΔG). In our model, a series of nested simplicial complexes and their related adjacency matrices at various different scales are generated from a specially designed filtration process, based on the 3D structures of spike-ACE2 protein complexes. We develop a set of multiscale simplicial complexes-based topological indices, for the first time. Unlike previous graph network models, which give only a qualitative analysis, our topological indices can provide a quantitative prediction of the binding affinity change caused by mutations and achieve great accuracy. In particular, for mutations that happened at specifical amino acids, such as Polar amino acids or Arginine amino acids, the correlation between our topological gravity model index and binding affinity change, in terms of Pearson correlation coefficient, can be higher than 0.8. As far as we know, this is the first time multiscale topological indices have been used in the quantitative analysis of protein-protein interactions.


Subject(s)
COVID-19 , SARS-CoV-2 , Humans , SARS-CoV-2/genetics , SARS-CoV-2/metabolism , Angiotensin-Converting Enzyme 2/metabolism , Protein Binding , Mutation , Spike Glycoprotein, Coronavirus/metabolism
10.
J Chem Inf Model ; 63(10): 2928-2935, 2023 05 22.
Article in English | MEDLINE | ID: mdl-37167016

ABSTRACT

Artificial Intelligence (AI) techniques are of great potential to fundamentally change antibiotic discovery industries. Efficient and effective molecular featurization is key to all highly accurate learning models for antibiotic discovery. In this paper, we propose a fingerprint-enhanced graph attention network (FinGAT) model by the combination of sequence-based 2D fingerprints and structure-based graph representation. In our feature learning process, sequence information is transformed into a fingerprint vector, and structural information is encoded through a GAT module into another vector. These two vectors are concatenated and input into a multilayer perceptron (MLP) for antibiotic activity classification. Our model is extensively tested and compared with existing models. It has been found that our FinGAT can outperform various state-of-the-art GNN models in antibiotic discovery.


Subject(s)
Anti-Bacterial Agents , Artificial Intelligence , Anti-Bacterial Agents/pharmacology , Learning , Neural Networks, Computer
11.
Methods Mol Biol ; 2627: 211-229, 2023.
Article in English | MEDLINE | ID: mdl-36959450

ABSTRACT

Molecular representations are of great importance for machine learning models in RNA data analysis. Essentially, efficient molecular descriptors or fingerprints that characterize the intrinsic structural and interactional information of RNAs can significantly boost the performance of all learning modeling. In this paper, we introduce two persistent models, including persistent homology and persistent spectral, for RNA structure and interaction representations and their applications in RNA data analysis. Different from traditional geometric and graph representations, persistent homology is built on simplicial complex, which is a generalization of graph models to higher-dimensional situations. Hypergraph is a further generalization of simplicial complexes and hypergraph-based embedded persistent homology has been proposed recently. Moreover, persistent spectral models, which combine filtration process with spectral models, including spectral graph, spectral simplicial complex, and spectral hypergraph, are proposed for molecular representation. The persistent attributes for RNAs can be obtained from these two persistent models and further combined with machine learning models for RNA structure, flexibility, dynamics, and function analysis.


Subject(s)
Data Analysis , RNA , RNA/genetics
12.
Brief Bioinform ; 24(2)2023 03 19.
Article in English | MEDLINE | ID: mdl-36790858

ABSTRACT

Protein-protein interactions (PPIs) play crucial roles in almost all biological processes from cell-signaling and membrane transport to metabolism and immune systems. Efficient characterization of PPIs at the molecular level is key to the fundamental understanding of PPI mechanisms. Even with the gigantic amount of PPI models from graphs, networks, geometry and topology, it remains as a great challenge to design functional models that efficiently characterize the complicated multiphysical information within PPIs. Here we propose persistent Tor-algebra (PTA) model for a unified algebraic representation of the multiphysical interactions. Mathematically, our PTA is inherently algebraic data analysis. In our PTA model, protein structures and interactions are described as a series of face rings and Tor modules, from which PTA model is developed. The multiphysical information within/between biomolecules are implicitly characterized by PTA and further represented as PTA barcodes. To test our PTA models, we consider PTA-based ensemble learning for PPI binding affinity prediction. The two most commonly used datasets, i.e. SKEMPI and AB-Bind, are employed. It has been found that our model outperforms all the existing models as far as we know. Mathematically, our PTA model provides a highly efficient way for the characterization of molecular structures and interactions.


Subject(s)
Protein Interaction Mapping , Proteins , Proteins/chemistry , Protein Interaction Maps
13.
Acta Math Sin Engl Ser ; 38(10): 1901-1938, 2022.
Article in English | MEDLINE | ID: mdl-36407804

ABSTRACT

With the great advancement of experimental tools, a tremendous amount of biomolecular data has been generated and accumulated in various databases. The high dimensionality, structural complexity, the nonlinearity, and entanglements of biomolecular data, ranging from DNA knots, RNA secondary structures, protein folding configurations, chromosomes, DNA origami, molecular assembly, to others at the macromolecular level, pose a severe challenge in their analysis and characterization. In the past few decades, mathematical concepts, models, algorithms, and tools from algebraic topology, combinatorial topology, computational topology, and topological data analysis, have demonstrated great power and begun to play an essential role in tackling the biomolecular data challenge. In this work, we introduce biomolecular topology, which concerns the topological problems and models originated from the biomolecular systems. More specifically, the biomolecular topology encompasses topological structures, properties and relations that are emerged from biomolecular structures, dynamics, interactions, and functions. We discuss the various types of biomolecular topology from structures (of proteins, DNAs, and RNAs), protein folding, and protein assembly. A brief discussion of databanks (and databases), theoretical models, and computational algorithms, is presented. Further, we systematically review related topological models, including graphs, simplicial complexes, persistent homology, persistent Laplacians, de Rham-Hodge theory, Yau-Hausdorff distance, and the topology-based machine learning models.

14.
ACS Nano ; 16(9): 13279-13293, 2022 09 27.
Article in English | MEDLINE | ID: mdl-36067337

ABSTRACT

Disease X is a hypothetical unknown disease that has the potential to cause an epidemic or pandemic outbreak in the future. Nanosensors are attractive portable devices that can swiftly screen disease biomarkers on site, reducing the reliance on laboratory-based analyses. However, conventional data analytics limit the progress of nanosensor research. In this Perspective, we highlight the integral role of machine learning (ML) algorithms in advancing nanosensing strategies toward Disease X detection. We first summarize recent progress in utilizing ML algorithms for the smart design and fabrication of custom nanosensor platforms as well as realizing rapid on-site prediction of infection statuses. Subsequently, we discuss promising prospects in further harnessing the potential of ML algorithms in other aspects of nanosensor development and biomarker detection.


Subject(s)
Algorithms , Machine Learning , Biomarkers
15.
J Chem Inf Model ; 62(17): 3961-3969, 2022 09 12.
Article in English | MEDLINE | ID: mdl-36040839

ABSTRACT

Protein-protein interactions (PPIs) are involved in almost all biological processes in the cell. Understanding protein-protein interactions holds the key for the understanding of biological functions, diseases and the development of therapeutics. Recently, artificial intelligence (AI) models have demonstrated great power in PPIs. However, a key issue for all AI-based PPI models is efficient molecular representations and featurization. Here, we propose Hom-complex-based PPI representation, and Hom-complex-based machine learning models for the prediction of PPI binding affinity changes upon mutation, for the first time. In our model, various Hom complexes Hom(G1, G) can be generated for the graph representation G of protein-protein complex by using different graphs G1, which reveal G1-related inner connections within the graph representation G of protein-protein complex. Further, for a specific graph G1, a series of nested Hom complexes are generated to give a multiscale characterization of the PPIs. Its persistent homology and persistent Euler characteristic are used as molecular descriptors and further combined with the machine learning model, in particular, gradient boosting tree (GBT). We systematically test our model on the two most-commonly used data sets, that is, SKEMPI and AB-Bind. It has been found that our model outperforms all the existing models as far as we know, which demonstrates the great potential of our model for the analysis of PPIs. Our model can be used for the analysis and design of efficient antibodies for SARS-CoV-2.


Subject(s)
Artificial Intelligence , COVID-19 , Humans , Machine Learning , Mutation , Protein Binding , SARS-CoV-2/genetics
16.
Brief Bioinform ; 23(4)2022 07 18.
Article in English | MEDLINE | ID: mdl-35696650

ABSTRACT

Graph neural networks (GNNs) are the most promising deep learning models that can revolutionize non-Euclidean data analysis. However, their full potential is severely curtailed by poorly represented molecular graphs and features. Here, we propose a multiphysical graph neural network (MP-GNN) model based on the developed multiphysical molecular graph representation and featurization. All kinds of molecular interactions, between different atom types and at different scales, are systematically represented by a series of scale-specific and element-specific graphs with distance-related node features. From these graphs, graph convolution network (GCN) models are constructed with specially designed weight-sharing architectures. Base learners are constructed from GCN models from different elements at different scales, and further consolidated together using both one-scale and multi-scale ensemble learning schemes. Our MP-GNN has two distinct properties. First, our MP-GNN incorporates multiscale interactions using more than one molecular graph. Atomic interactions from various different scales are not modeled by one specific graph (as in traditional GNNs), instead they are represented by a series of graphs at different scales. Second, it is free from the complicated feature generation process as in conventional GNN methods. In our MP-GNN, various atom interactions are embedded into element-specific graph representations with only distance-related node features. A unique GNN architecture is designed to incorporate all the information into a consolidated model. Our MP-GNN has been extensively validated on the widely used benchmark test datasets from PDBbind, including PDBbind-v2007, PDBbind-v2013 and PDBbind-v2016. Our model can outperform all existing models as far as we know. Further, our MP-GNN is used in coronavirus disease 2019 drug design. Based on a dataset with 185 complexes of inhibitors for severe acute respiratory syndrome coronavirus (SARS-CoV/SARS-CoV-2), we evaluate their binding affinities using our MP-GNN. It has been found that our MP-GNN is of high accuracy. This demonstrates the great potential of our MP-GNN for the screening of potential drugs for SARS-CoV-2. Availability: The Multiphysical graph neural network (MP-GNN) model can be found in https://github.com/Alibaba-DAMO-DrugAI/MGNN. Additional data or code will be available upon reasonable request.


Subject(s)
COVID-19 Drug Treatment , Data Analysis , Drug Design , Humans , Neural Networks, Computer , SARS-CoV-2
17.
Sci Rep ; 12(1): 9699, 2022 06 11.
Article in English | MEDLINE | ID: mdl-35690623

ABSTRACT

Hodge theory reveals the deep intrinsic relations of differential forms and provides a bridge between differential geometry, algebraic topology, and functional analysis. Here we use Hodge Laplacian and Hodge decomposition models to analyze biomolecular structures. Different from traditional graph-based methods, biomolecular structures are represented as simplicial complexes, which can be viewed as a generalization of graph models to their higher-dimensional counterparts. Hodge Laplacian matrices at different dimensions can be generated from the simplicial complex. The spectral information of these matrices can be used to study intrinsic topological information of biomolecular structures. Essentially, the number (or multiplicity) of k-th dimensional zero eigenvalues is equivalent to the k-th Betti number, i.e., the number of k-th dimensional homology groups. The associated eigenvectors indicate the homological generators, i.e., circles or holes within the molecular-based simplicial complex. Furthermore, Hodge decomposition-based HodgeRank model is used to characterize the folding or compactness of the molecular structures, in particular, the topological associated domain (TAD) in high-throughput chromosome conformation capture (Hi-C) data. Mathematically, molecular structures are represented in simplicial complexes with certain edge flows. The HodgeRank-based average/total inconsistency (AI/TI) is used for the quantitative measurements of the folding or compactness of TADs. This is the first quantitative measurement for TAD regions, as far as we know.


Subject(s)
Chromosomes , Data Analysis , Molecular Structure
18.
Brief Bioinform ; 23(4)2022 07 18.
Article in English | MEDLINE | ID: mdl-35536545

ABSTRACT

The three-dimensional (3D) chromosomal structure plays an essential role in all DNA-templated processes, including gene transcription, DNA replication and other cellular processes. Although developing chromosome conformation capture (3C) methods, such as Hi-C, which can generate chromosomal contact data characterized genome-wide chromosomal structural properties, understanding 3D genomic nature-based on Hi-C data remains lacking. Here, we propose a persistent spectral simplicial complex (PerSpectSC) model to describe Hi-C data for the first time. Specifically, a filtration process is introduced to generate a series of nested simplicial complexes at different scales. For each of these simplicial complexes, its spectral information can be calculated from the corresponding Hodge Laplacian matrix. PerSpectSC model describes the persistence and variation of the spectral information of the nested simplicial complexes during the filtration process. Different from all previous models, our PerSpectSC-based features provide a quantitative global-scale characterization of chromosome structures and topology. Our descriptors can successfully classify cell types and also cellular differentiation stages for all the 24 types of chromosomes simultaneously. In particular, persistent minimum best characterizes cell types and Dim (1) persistent multiplicity best characterizes cellular differentiation. These results demonstrate the great potential of our PerSpectSC-based models in polymeric data analysis.


Subject(s)
Chromosomes , Genomics , Cell Differentiation , Chromosomes/genetics , Genomics/methods , Machine Learning , Molecular Conformation
19.
PLoS Comput Biol ; 18(4): e1009943, 2022 04.
Article in English | MEDLINE | ID: mdl-35385478

ABSTRACT

With the great advancements in experimental data, computational power and learning algorithms, artificial intelligence (AI) based drug design has begun to gain momentum recently. AI-based drug design has great promise to revolutionize pharmaceutical industries by significantly reducing the time and cost in drug discovery processes. However, a major issue remains for all AI-based learning model that is efficient molecular representations. Here we propose Dowker complex (DC) based molecular interaction representations and Riemann Zeta function based molecular featurization, for the first time. Molecular interactions between proteins and ligands (or others) are modeled as Dowker complexes. A multiscale representation is generated by using a filtration process, during which a series of DCs are generated at different scales. Combinatorial (Hodge) Laplacian matrices are constructed from these DCs, and the Riemann zeta functions from their spectral information can be used as molecular descriptors. To validate our models, we consider protein-ligand binding affinity prediction. Our DC-based machine learning (DCML) models, in particular, DC-based gradient boosting tree (DC-GBT), are tested on three most-commonly used datasets, i.e., including PDBbind-2007, PDBbind-2013 and PDBbind-2016, and extensively compared with other existing state-of-the-art models. It has been found that our DC-based descriptors can achieve the state-of-the-art results and have better performance than all machine learning models with traditional molecular descriptors. Our Dowker complex based machine learning models can be used in other tasks in AI-based drug design and molecular data analysis.


Subject(s)
Artificial Intelligence , Machine Learning , Ligands , Protein Binding , Proteins/chemistry
20.
Brief Bioinform ; 23(2)2022 03 10.
Article in English | MEDLINE | ID: mdl-35189639

ABSTRACT

Protein-protein interactions (PPIs) play a significant role in nearly all cellular and biological activities. Data-driven machine learning models have demonstrated great power in PPIs. However, the design of efficient molecular featurization poses a great challenge for all learning models for PPIs. Here, we propose persistent spectral (PerSpect) based PPI representation and featurization, and PerSpect-based ensemble learning (PerSpect-EL) models for PPI binding affinity prediction, for the first time. In our model, a sequence of Hodge (or combinatorial) Laplacian (HL) matrices at various different scales are generated from a specially designed filtration process. PerSpect attributes, which are statistical and combinatorial properties of spectrum information from these HL matrices, are used as features for PPI characterization. Each PerSpect attribute is input into a 1D convolutional neural network (CNN), and these CNN networks are stacked together in our PerSpect-based ensemble learning models. We systematically test our model on the two most commonly used datasets, i.e. SKEMPI and AB-Bind. It has been found that our model can achieve state-of-the-art results and outperform all existing models to the best of our knowledge.


Subject(s)
Machine Learning , Neural Networks, Computer , Protein Binding
SELECTION OF CITATIONS
SEARCH DETAIL
...