Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 20
Filter
1.
J Chem Inf Model ; 59(11): 4880-4892, 2019 11 25.
Article in English | MEDLINE | ID: mdl-31532656

ABSTRACT

We present a method for visualizing and navigating large screening datasets while also taking into account their activities and properties. Our approach is to annotate the data with all possible scaffolds contained within each molecule. We have developed a Spotfire visualization, coupled to a fuzzy clustering approach based on the scaffold decomposition of the screening deck, used to drive the hit triage process. Progression decisions can be made using aggregate scaffold parameters and data from multiple datasets merged at the scaffold level. This visualization reveals overlaps that help prioritize hits, highlight tractable series, and posit ways to combine aspects of multiple hits. The structure-activity relationship of a large and complex hit is automatically mapped onto all constituent scaffolds making it possible to navigate, via any shared scaffold, to all related hits. This scaffold "walking" helps address bias toward a handful of potent and ligand-efficient molecules at the expense of coverage of chemical space. We consider two scaffold generation methods and explored their similarities and differences both qualitatively and quantitatively. The workflow of a Spotfire visualization used in combination with fuzzy clustering and structure annotation provides an intuitive view of large and diverse screening datasets. This allows teams to effortlessly navigate between structurally related molecules and enriches the population of leads considered and progressed in a manner complementary to established approaches.


Subject(s)
Drug Discovery , Small Molecule Libraries/chemistry , Cluster Analysis , Datasets as Topic , Drug Discovery/methods , Fuzzy Logic , Humans , Ligands , Small Molecule Libraries/pharmacology
2.
ACS Infect Dis ; 5(10): 1738-1753, 2019 10 11.
Article in English | MEDLINE | ID: mdl-31373203

ABSTRACT

Emerging resistance to current antimalarial medicines underscores the importance of identifying new drug targets and novel compounds. Malaria parasites are purine auxotrophic and import purines via the Plasmodium falciparum equilibrative nucleoside transporter type 1 (PfENT1). We previously showed that PfENT1 inhibitors block parasite proliferation in culture. Our goal was to identify additional, possibly more optimal chemical starting points for a drug discovery campaign. We performed a high throughput screen (HTS) of GlaxoSmithKline's 1.8 million compound library with a yeast-based assay to identify PfENT1 inhibitors. We used a parallel progression strategy for hit validation and expansion, with an emphasis on chemical properties in addition to potency. In one arm, the most active hits were tested for human cell toxicity; 201 had minimal toxicity. The second arm, hit expansion, used a scaffold-based substructure search with the HTS hits as templates to identify over 2000 compounds; 123 compounds had activity. Of these 324 compounds, 175 compounds inhibited proliferation of P. falciparum parasite strain 3D7 with IC50 values between 0.8 and ∼180 µM. One hundred forty-two compounds inhibited PfENT1 knockout (pfent1Δ) parasite growth, indicating they also hit secondary targets. Thirty-two hits inhibited growth of 3D7 but not pfent1Δ parasites. Thus, PfENT1 inhibition was sufficient to block parasite proliferation. Therefore, PfENT1 may be a viable target for antimalarial drug development. Six compounds with novel chemical scaffolds were extensively characterized in yeast-, parasite-, and human-erythrocyte-based assays. The inhibitors showed similar potencies against drug sensitive and resistant P. falciparum strains. They represent attractive starting points for development of novel antimalarial drugs.


Subject(s)
Antimalarials/pharmacology , Biological Transport/drug effects , Cell Proliferation/drug effects , Drug Discovery , Plasmodium falciparum/drug effects , Purines/metabolism , Antimalarials/chemistry , Erythrocytes/drug effects , Gene Knockout Techniques , Hep G2 Cells/drug effects , High-Throughput Screening Assays , Humans , Malaria/parasitology , Malaria, Falciparum/parasitology , Nucleobase, Nucleoside, Nucleotide, and Nucleic Acid Transport Proteins/drug effects , Nucleobase, Nucleoside, Nucleotide, and Nucleic Acid Transport Proteins/genetics , Plasmodium falciparum/genetics , Plasmodium falciparum/growth & development , Plasmodium falciparum/metabolism , Protozoan Proteins/drug effects , Protozoan Proteins/genetics , Transcriptome , Yeasts/drug effects
3.
J Med Chem ; 62(10): 5096-5110, 2019 05 23.
Article in English | MEDLINE | ID: mdl-31013427

ABSTRACT

RIP1 kinase regulates necroptosis and inflammation and may play an important role in contributing to a variety of human pathologies, including inflammatory and neurological diseases. Currently, RIP1 kinase inhibitors have advanced into early clinical trials for evaluation in inflammatory diseases such as psoriasis, rheumatoid arthritis, and ulcerative colitis and neurological diseases such as amyotrophic lateral sclerosis and Alzheimer's disease. In this paper, we report on the design of potent and highly selective dihydropyrazole (DHP) RIP1 kinase inhibitors starting from a high-throughput screen and the lead-optimization of this series from a lead with minimal rat oral exposure to the identification of dihydropyrazole 77 with good pharmacokinetic profiles in multiple species. Additionally, we identified a potent murine RIP1 kinase inhibitor 76 as a valuable in vivo tool molecule suitable for evaluating the role of RIP1 kinase in chronic models of disease. DHP 76 showed efficacy in mouse models of both multiple sclerosis and human retinitis pigmentosa.


Subject(s)
Enzyme Inhibitors/chemical synthesis , Enzyme Inhibitors/pharmacology , Nuclear Pore Complex Proteins/antagonists & inhibitors , Pyrazoles/chemical synthesis , Pyrazoles/pharmacology , RNA-Binding Proteins/antagonists & inhibitors , Animals , Biological Availability , Cell Line , Chronic Disease , Drug Design , Encephalomyelitis, Autoimmune, Experimental/drug therapy , Enzyme Inhibitors/pharmacokinetics , Haplorhini , High-Throughput Screening Assays , Humans , Mice , Mice, Inbred C57BL , Models, Molecular , Multiple Sclerosis/drug therapy , Pyrazoles/pharmacokinetics , Rats , Retinitis Pigmentosa/drug therapy , Structure-Activity Relationship
4.
Eur J Pharmacol ; 818: 306-327, 2018 Jan 05.
Article in English | MEDLINE | ID: mdl-29050968

ABSTRACT

Despite the importance of the hERG channel in drug discovery and the sizable number of antagonist molecules discovered, only a few hERG agonists have been discovered. Here we report a novel hERG agonist; SKF-32802 and a structural analog of the agonist NS3623, SB-335573. These were discovered through a similarity search of published hERG agonists. SKF-32802 incorporates an amide linker rather than NS3623's urea, resulting in a compound with a different mechanism of action. We find that both compounds decrease the time constant of open channel kinetics, increase the amplitude of the envelope of tails assay, mildly increased the amplitude of the IV curve, bind the hERG channel in either open or closed states, increase the plateau of the voltage dependence of activation and modulate the effects of the hERG antagonist, quinidine. Neither compound affects inactivation nor deactivation kinetics, a property unique among hERG agonists. Additionally, SKF-32802 induces a leftward shift in the voltage dependence of activation. Our structural models show that both compounds make strong bridging interactions with multiple channel subunits and are stabilized by internal hydrogen bonding similar to NS3623, PD-307243 and RPR26024. While SB-335573 binds in a nearly identical fashion as NS3623, SKF-32802 makes an additional hydrogen bond with neighboring threonine 623. In summary, SB-335573 is a type 4 agonist which increases open channel probability while SKF-32802 is a type 3 agonist which induces a leftward shift in the voltage dependence of activation.


Subject(s)
Aniline Compounds/chemistry , Aniline Compounds/pharmacology , Drug Discovery , Electrophysiological Phenomena/drug effects , Ether-A-Go-Go Potassium Channels/agonists , Tetrazoles/chemistry , Tetrazoles/pharmacology , Aniline Compounds/metabolism , Animals , CHO Cells , Cricetinae , Cricetulus , Dose-Response Relationship, Drug , Ether-A-Go-Go Potassium Channels/chemistry , Ether-A-Go-Go Potassium Channels/metabolism , Humans , Ion Channel Gating/drug effects , Kinetics , Molecular Docking Simulation , Protein Conformation , Tetrazoles/metabolism
5.
J Med Chem ; 60(4): 1247-1261, 2017 02 23.
Article in English | MEDLINE | ID: mdl-28151659

ABSTRACT

RIP1 regulates necroptosis and inflammation and may play an important role in contributing to a variety of human pathologies, including immune-mediated inflammatory diseases. Small-molecule inhibitors of RIP1 kinase that are suitable for advancement into the clinic have yet to be described. Herein, we report our lead optimization of a benzoxazepinone hit from a DNA-encoded library and the discovery and profile of clinical candidate GSK2982772 (compound 5), currently in phase 2a clinical studies for psoriasis, rheumatoid arthritis, and ulcerative colitis. Compound 5 potently binds to RIP1 with exquisite kinase specificity and has excellent activity in blocking many TNF-dependent cellular responses. Highlighting its potential as a novel anti-inflammatory agent, the inhibitor was also able to reduce spontaneous production of cytokines from human ulcerative colitis explants. The highly favorable physicochemical and ADMET properties of 5, combined with high potency, led to a predicted low oral dose in humans.


Subject(s)
Anti-Inflammatory Agents/chemistry , Anti-Inflammatory Agents/pharmacology , Colitis, Ulcerative/drug therapy , Inflammation/drug therapy , Protein Kinase Inhibitors/chemistry , Protein Kinase Inhibitors/pharmacology , Receptor-Interacting Protein Serine-Threonine Kinases/antagonists & inhibitors , Animals , Benzazepines/chemistry , Benzazepines/pharmacology , Colitis, Ulcerative/immunology , Cytokines/immunology , Dogs , Haplorhini , Humans , Inflammation/immunology , Mice , Molecular Docking Simulation , Rabbits , Rats , Receptor-Interacting Protein Serine-Threonine Kinases/immunology , Swine , Swine, Miniature , Tumor Necrosis Factor-alpha/immunology
6.
J Med Chem ; 59(5): 2163-78, 2016 Mar 10.
Article in English | MEDLINE | ID: mdl-26854747

ABSTRACT

The recent discovery of the role of receptor interacting protein 1 (RIP1) kinase in tumor necrosis factor (TNF)-mediated inflammation has led to its emergence as a highly promising target for the treatment of multiple inflammatory diseases. We screened RIP1 against GSK's DNA-encoded small-molecule libraries and identified a novel highly potent benzoxazepinone inhibitor series. We demonstrate that this template possesses complete monokinase selectivity for RIP1 plus unique species selectivity for primate versus nonprimate RIP1. We elucidate the conformation of RIP1 bound to this benzoxazepinone inhibitor driving its high kinase selectivity and design specific mutations in murine RIP1 to restore potency to levels similar to primate RIP1. This series differentiates itself from known RIP1 inhibitors in combining high potency and kinase selectivity with good pharmacokinetic profiles in rodents. The favorable developability profile of this benzoxazepinone template, as exemplified by compound 14 (GSK'481), makes it an excellent starting point for further optimization into a RIP1 clinical candidate.


Subject(s)
DNA/chemistry , Isoxazoles/pharmacology , Oxazepines/pharmacology , Protein Kinase Inhibitors/pharmacology , Receptor-Interacting Protein Serine-Threonine Kinases/antagonists & inhibitors , Small Molecule Libraries/pharmacology , Animals , Cell Line, Tumor , Crystallography, X-Ray , Dose-Response Relationship, Drug , HT29 Cells , Humans , Isoxazoles/chemical synthesis , Isoxazoles/chemistry , Mice , Models, Molecular , Molecular Structure , Oxazepines/chemical synthesis , Oxazepines/chemistry , Protein Kinase Inhibitors/chemical synthesis , Protein Kinase Inhibitors/chemistry , Receptor-Interacting Protein Serine-Threonine Kinases/metabolism , Small Molecule Libraries/chemical synthesis , Small Molecule Libraries/chemistry , Structure-Activity Relationship , U937 Cells
7.
J Biomol Screen ; 19(5): 749-57, 2014 Jun.
Article in English | MEDLINE | ID: mdl-24518065

ABSTRACT

In this article, we describe two complementary data-mining approaches used to characterize the GlaxoSmithKline (GSK) natural-products set (NPS) based on information from the high-throughput screening (HTS) databases. Both methods rely on the aggregation and analysis of a large set of single-shot screening data for a number of biological assays, with the goal to reveal natural-product chemical motifs. One of them is an established method based on the data-driven clustering of compounds using a wide range of descriptors,(1)whereas the other method partitions and hierarchically clusters the data to identify chemical cores.(2,3)Both methods successfully find structural scaffolds that significantly hit different groups of discrete drug targets, compared with their relative frequency of demonstrating inhibitory activity in a large number of screens. We describe how these methods can be applied to unveil hidden information in large single-shot HTS data sets. Applied prospectively, this type of information could contribute to the design of new chemical templates for drug-target classes and guide synthetic efforts for lead optimization of tractable hits that are based on natural-product chemical motifs. Relevant findings for 7TM receptors (7TMRs), ion channels, class-7 transferases (protein kinases), hydrolases, and oxidoreductases will be discussed.


Subject(s)
Biological Products/pharmacology , Data Mining/methods , Algorithms , Amino Acid Motifs , Chemistry, Pharmaceutical/methods , Cluster Analysis , Drug Design , High-Throughput Screening Assays/methods , Humans , Hydrolases/chemistry , Inhibitory Concentration 50 , Models, Molecular , Models, Statistical , Oxidoreductases/chemistry , Small Molecule Libraries/pharmacology , Structure-Activity Relationship
8.
ACS Med Chem Lett ; 4(12): 1238-43, 2013 Dec 12.
Article in English | MEDLINE | ID: mdl-24900635

ABSTRACT

Potent inhibitors of RIP1 kinase from three distinct series, 1-aminoisoquinolines, pyrrolo[2,3-b]pyridines, and furo[2,3-d]pyrimidines, all of the type II class recognizing a DLG-out inactive conformation, were identified from screening of our in-house kinase focused sets. An exemplar from the furo[2,3-d]pyrimidine series showed a dose proportional response in protection from hypothermia in a mouse model of TNFα induced lethal shock.

9.
J Biomol Screen ; 17(5): 555-71, 2012 Jun.
Article in English | MEDLINE | ID: mdl-22392809

ABSTRACT

Epigenetic gene regulation is a critical process controlling differentiation and development, the malfunction of which may underpin a variety of diseases. In this article, we review the current landscape of small-molecule epigenetic modulators including drugs on the market, key compounds in clinical trials, and chemical probes being used in epigenetic mechanistic studies. Hit identification strategies for the discovery of small-molecule epigenetic modulators are summarized with respect to writers, erasers, and readers of histone marks. Perspectives are provided on opportunities for new hit discovery approaches, some of which may define the next generation of therapeutic intervention strategies for epigenetic processes.


Subject(s)
Drug Discovery , Epigenesis, Genetic , High-Throughput Screening Assays , Drug Discovery/methods , Epigenesis, Genetic/drug effects , Epigenomics/methods , Gene Expression Regulation/drug effects , Histones/metabolism , Humans , Protein Binding/drug effects , Small Molecule Libraries/pharmacology
10.
IEEE Trans Inf Technol Biomed ; 14(5): 1137-43, 2010 Sep.
Article in English | MEDLINE | ID: mdl-20570776

ABSTRACT

We describe a new approach for inferring the functional relationships between nonhomologous protein families by looking at statistical enrichment of alternative function predictions in classification hierarchies such as Gene Ontology (GO) and Structural Classification of Proteins (SCOP). Protein structures are represented by robust graph representations, and the fast frequent subgraph mining algorithm is applied to protein families to generate sets of family-specific packing motifs, i.e., amino acid residue-packing patterns shared by most family members but infrequent in other proteins. The function of a protein is inferred by identifying in it motifs characteristic of a known family. We employ these family-specific motifs to elucidate functional relationships between families in the GO and SCOP hierarchies. Specifically, we postulate that two families are functionally related if one family is statistically enriched by motifs characteristic of another family, i.e., if the number of proteins in a family containing a motif from another family is greater than expected by chance. This function-inference method can help annotate proteins of unknown function, establish functional neighbors of existing families, and help specify alternate functions for known proteins.


Subject(s)
Algorithms , Computational Biology/methods , Data Mining/methods , Protein Interaction Domains and Motifs , Proteins/chemistry , Genomics/methods , Models, Molecular , NADP/chemistry , Nuclear Proteins/chemistry , Phosphoprotein Phosphatases/chemistry , Protein Conformation , Proteins/classification
11.
Mol Inform ; 29(11): 758-70, 2010 Nov 15.
Article in English | MEDLINE | ID: mdl-27464266

ABSTRACT

Since its inception in 1996, the stochastic proximity embedding (SPE) algorithm and its variants have been applied to a wide range of problems in computational chemistry and biology with notable success. At its core, SPE attempts to generate Euclidean coordinates for a set of points so that they satisfy a prescribed set of geometric constraints. The algorithm's appeal rests on three factors: 1) its conceptual and programmatic simplicity; 2) its superior speed and scaling properties; and 3) its broad applicability. Here, we review some of the key applications, outline known limitations and ways to circumvent them, and highlight additional problem domains where the use of this technique could lead to significant breakthroughs.

12.
J Comput Aided Mol Des ; 23(11): 785-97, 2009 Nov.
Article in English | MEDLINE | ID: mdl-19548090

ABSTRACT

This paper describes several case studies concerning protein function inference from its structure using our novel approach described in the accompanying paper. This approach employs family-specific motifs, i.e. three-dimensional amino acid packing patterns that are statistically prevalent within a protein family. For our case studies we have selected families from the SCOP and EC classifications and analyzed the discriminating power of the motifs in depth. We have devised several benchmarks to compare motifs mined from unweighted topological graph representations of protein structures with those from distance-labeled (weighted) representations, demonstrating the superiority of the latter for function inference in most families. We have tested the robustness of our motif library by inferring the function of new members added to SCOP families, and discriminating between several families that are structurally similar but functionally divergent. Furthermore we have applied our method to predict function for several proteins characterized in structural genomics projects, including orphan structures, and we discuss several selected predictions in depth. Some of our predictions have been corroborated by other computational methods, and some have been validated by independent experimental studies, validating our approach for protein function inference from structure.


Subject(s)
Models, Molecular , Proteins/chemistry , Proteins/metabolism , Algorithms , Amino Acid Motifs , Catalytic Domain , Computational Biology , Databases, Protein , Proteins/classification , Sensitivity and Specificity
13.
J Comput Aided Mol Des ; 23(11): 773-84, 2009 Nov.
Article in English | MEDLINE | ID: mdl-19543979

ABSTRACT

Protein function prediction is one of the central problems in computational biology. We present a novel automated protein structure-based function prediction method using libraries of local residue packing patterns that are common to most proteins in a known functional family. Critical to this approach is the representation of a protein structure as a graph where residue vertices (residue name used as a vertex label) are connected by geometrical proximity edges. The approach employs two steps. First, it uses a fast subgraph mining algorithm to find all occurrences of family-specific labeled subgraphs for all well characterized protein structural and functional families. Second, it queries a new structure for occurrences of a set of motifs characteristic of a known family, using a graph index to speed up Ullman's subgraph isomorphism algorithm. The confidence of function inference from structure depends on the number of family-specific motifs found in the query structure compared with their distribution in a large non-redundant database of proteins. This method can assign a new structure to a specific functional family in cases where sequence alignments, sequence patterns, structural superposition and active site templates fail to provide accurate annotation.


Subject(s)
Models, Molecular , Models, Statistical , Proteins/chemistry , Proteins/metabolism , Algorithms , Amino Acid Motifs , Computational Biology , Databases, Protein , Proteins/classification , Sensitivity and Specificity
14.
J Comput Chem ; 29(6): 965-82, 2008 Apr 30.
Article in English | MEDLINE | ID: mdl-17999384

ABSTRACT

We present a method for simultaneous three-dimensional (3D) structure generation and pharmacophore-based alignment using a self-organizing algorithm called Stochastic Proximity Embedding (SPE). Current flexible molecular alignment methods either start from a single low-energy structure for each molecule and tweak bonds or torsion angles, or choose from multiple conformations of each molecule. Methods that generate structures and align them iteratively (e.g., genetic algorithms) are often slow. In earlier work, we used SPE to generate good-quality 3D conformations by iteratively adjusting pairwise distances between atoms based on a set of geometric rules, and showed that it samples conformational space better and runs faster than earlier programs. In this work, we run SPE on the entire ensemble of molecules to be aligned. Additional information about which atoms or groups of atoms in each molecule correspond to points in the pharmacophore can come from an automatically generated hypothesis or be specified manually. We add distance terms to SPE to bring pharmacophore points from different molecules closer in space, and also to line up normal/direction vectors associated with these points. We also permit pharmacophore points to be constrained to lie near external coordinates from a binding site. The aligned 3D molecular structures are nearly correct if the pharmacophore hypothesis is chemically feasible; postprocessing by minimization of suitable distance and energy functions further improves the structures and weeds out infeasible hypotheses. The method can be used to test 3D pharmacophores for a diverse set of active ligands, starting from only a hypothesis about corresponding atoms or groups.


Subject(s)
Algorithms , Drug Design , Pharmaceutical Preparations/chemistry , Binding Sites , Ligands , Models, Molecular , Molecular Conformation , Pharmacology , Stochastic Processes , Structure-Activity Relationship
15.
Chem Biol Drug Des ; 70(2): 123-33, 2007 Aug.
Article in English | MEDLINE | ID: mdl-17683373

ABSTRACT

Conformational sampling is a problem of central importance in computer-aided drug design. A good conformational search method must not exhibit any intrinsic bias, and must provide confidence that important regions of conformational space are not missed during the search. A recent study by Carta et al. showed that this is not always the case, and that several popular conformational search methods, such as Omega, are very sensitive to the relative ordering of atoms and bonds in the connection table. Here, we examine the performance of a newer method known as stochastic proximity embedding, or SPE, using five diverse bioactive ligands extracted from the PDB. Our results confirm that the conformational ensembles produced by SPE using different permuted inputs are statistically indistinguishable, and well within the range of variability that would be expected from the stochastic nature of the method itself. This, along with the results of a more comprehensive comparative study (Agrafiotis et al., J. Chem. Info. Model, 2007, in press), provides further evidence that SPE is one of the most robust and competitive conformational search methods described to date.


Subject(s)
Drug Design , Molecular Conformation , Stochastic Processes , Combinatorial Chemistry Techniques , Pharmaceutical Preparations
16.
J Chem Inf Model ; 47(4): 1279-93, 2007.
Article in English | MEDLINE | ID: mdl-17511441

ABSTRACT

Chemoinformatics is a large scientific discipline that deals with the storage, organization, management, retrieval, analysis, dissemination, visualization, and use of chemical information. Chemoinformatics techniques are used extensively in drug discovery and development. Although many consider it a mature field, the advent of high-throughput experimental techniques and the need to analyze very large data sets have brought new life and challenges to it. Here, we review a selection of papers published in 2006 that caught our attention with regard to the novelty of the methodology that was presented. The field is seeing significant growth, which will be further catalyzed by the widespread availability of public databases to support the development and validation of new approaches.


Subject(s)
Informatics , Combinatorial Chemistry Techniques , Drug Industry , Genomics , Quantitative Structure-Activity Relationship
17.
J Chem Inf Model ; 47(1): 69-75, 2007.
Article in English | MEDLINE | ID: mdl-17238250

ABSTRACT

A new radial space-filling method for visualizing cluster hierarchies is presented. The method, referred to as a radial clustergram, arranges the clusters into a series of layers, each representing a different level of the tree. It uses adjacency of nodes instead of links to represent parent-child relationships and allocates sufficient screen real estate to each node to allow effective visualization of cluster properties through color-coding. Radial clustergrams combine the most appealing features of other cluster visualization techniques but avoid their pitfalls. Compared to classical dendrograms and hyperbolic trees, they make much more efficient use of space; compared to treemaps, they are more effective in conveying hierarchical structure and displaying properties of nodes higher in the tree. A fisheye lens is used to focus on areas of interest, without losing sight of the global context. The utility of the method is demonstrated using examples from the fields of molecular diversity and conformational analysis.


Subject(s)
Computer Graphics , Molecular Conformation , Classification/methods , Cluster Analysis
18.
Protein Sci ; 15(6): 1537-43, 2006 Jun.
Article in English | MEDLINE | ID: mdl-16731985

ABSTRACT

We describe a method to assign a protein structure to a functional family using family-specific fingerprints. Fingerprints represent amino acid packing patterns that occur in most members of a family but are rare in the background, a nonredundant subset of PDB; their information is additional to sequence alignments, sequence patterns, structural superposition, and active-site templates. Fingerprints were derived for 120 families in SCOP using Frequent Subgraph Mining. For a new structure, all occurrences of these family-specific fingerprints may be found by a fast algorithm for subgraph isomorphism; the structure can then be assigned to a family with a confidence value derived from the number of fingerprints found and their distribution in background proteins. In validation experiments, we infer the function of new members added to SCOP families and we discriminate between structurally similar, but functionally divergent TIM barrel families. We then apply our method to predict function for several structural genomics proteins, including orphan structures. Some predictions have been corroborated by other computational methods and some validated by subsequent functional characterization.


Subject(s)
Proteins/chemistry , Proteins/metabolism , Structure-Activity Relationship , Bacterial Proteins/chemistry , Bacterial Proteins/metabolism , Escherichia coli Proteins/chemistry , Escherichia coli Proteins/metabolism , Models, Molecular , Protein Conformation , Proteins/genetics , Reproducibility of Results , Software
19.
Article in English | MEDLINE | ID: mdl-17369641

ABSTRACT

Structure motifs are amino acid packing patterns that occur frequently within a set of protein structures. We define a labeled graph representation of protein structure in which vertices correspond to amino acid residues and edges connect pairs of residues and are labeled by (1) the Euclidian distance between the C(alpha) atoms of the two residues and (2) a boolean indicating whether the two residues are in physical/chemical contact. Using this representation, a structure motif corresponds to a labeled clique that occurs frequently among the graphs representing the protein structures. The pairwise distance constraints on each edge in a clique serve to limit the variation in geometry among different occurrences of a structure motif. We present an efficient constrained subgraph mining algorithm to discover structure motifs in this setting. Compared with contact graph representations, the number of spurious structure motifs is greatly reduced. Using this algorithm, structure motifs were located for several SCOP families including the Eukaryotic Serine Proteases, Nuclear Binding Domains, Papain-like Cysteine Proteases, and FAD/NAD-linked Reductases. For each family, we typically obtain a handful of motifs within seconds of processing time. The occurrences of these motifs throughout the PDB were strongly associated with the original SCOP family, as measured using a hyper-geometric distribution. The motifs were found to cover functionally important sites like the catalytic triad for Serine Proteases and co-factor binding sites for Nuclear Binding Domains. The fact that many motifs are highly family-specific can be used to classify new proteins or to provide functional annotation in Structural Genomics Projects.


Subject(s)
Computational Biology/methods , Proteins/chemistry , Proteomics/methods , Algorithms , Amino Acid Motifs , Animals , Cysteine Endopeptidases/chemistry , Models, Molecular , Models, Statistical , Multigene Family , Oxidoreductases/chemistry , Protein Binding , Protein Structure, Tertiary
20.
J Comput Biol ; 12(6): 657-71, 2005.
Article in English | MEDLINE | ID: mdl-16108709

ABSTRACT

We find recurring amino-acid residue packing patterns, or spatial motifs, that are characteristic of protein structural families, by applying a novel frequent subgraph mining algorithm to graph representations of protein three-dimensional structure. Graph nodes represent amino acids, and edges are chosen in one of three ways: first, using a threshold for contact distance between residues; second, using Delaunay tessellation; and third, using the recently developed almost-Delaunay edges. For a set of graphs representing a protein family from the Structural Classification of Proteins (SCOP) database, subgraph mining typically identifies several hundred common subgraphs corresponding to spatial motifs that are frequently found in proteins in the family but rarely found outside of it. We find that some of the large motifs map onto known functional regions in two protein families explored in this study, i.e., serine proteases and kinases. We find that graphs based on almost-Delaunay edges significantly reduce the number of edges in the graph representation and hence present computational advantage, yet the patterns extracted from such graphs have a biological interpretation approximately equivalent to that of those extracted from distance based graphs.


Subject(s)
Algorithms , Computational Biology , Computer Graphics , Proteins/chemistry , Proteins/classification , Structural Homology, Protein , Amino Acid Motifs , Databases, Protein , Models, Molecular , Models, Statistical , Molecular Structure
SELECTION OF CITATIONS
SEARCH DETAIL
...