Search | VHL Regional Portal

ALOHA: a novel probability fusion approach for scoring multi-parameter drug-likeness during the lead optimization stage of drug discovery.

Debe, Derek A; Mamidipaka, Ravindra B; Gregg, Robert J; Metz, James T; Gupta, Rishi R; Muchmore, Steven W.

J Comput Aided Mol Des ; 27(9): 771-82, 2013 Sep.

Article in English | MEDLINE | ID: mdl-24113765

ABSTRACT

Automated lead optimization helper application (ALOHA) is a novel fitness scoring approach for small molecule lead optimization. ALOHA employs a series of generalized Bayesian models trained from public and proprietary pharmacokinetic, absorption, distribution, metabolism, and excretion, and toxicology data to determine regions of chemical space that are likely to have excellent drug-like properties. The input to ALOHA is a list of molecules, and the output is a set of individual probabilities as well as an overall probability that each of the molecules will pass a panel of user selected assays. In addition to providing a summary of how and when to apply ALOHA, this paper will discuss the validation of ALOHA's Bayesian models and probability fusion approach. Most notably, ALOHA is demonstrated to discriminate between members of the same chemical series with strong statistical significance, suggesting that ALOHA can be used effectively to select compound candidates for synthesis and progression at the lead optimization stage of drug discovery.

Subject(s)

Algorithms , Drug Design , Drug Discovery , Pharmaceutical Preparations/analysis , Software , Bayes Theorem , Blood Proteins/analysis , Cell Survival/drug effects , Drug Evaluation, Preclinical , Hep G2 Cells , Humans , Mutagenicity Tests , Prospective Studies

A different look at the quality of modeled three-dimensional protein structures.

Poleksic, Aleksandar; Fienup, Mark; Danzer, Joseph F; Debe, Derek A.

J Bioinform Comput Biol ; 6(2): 335-45, 2008 Apr.

Article in English | MEDLINE | ID: mdl-18464326

ABSTRACT

Measuring the accuracy of protein three-dimensional structures is one of the most important problems in protein structure prediction. For structure-based drug design, the accuracy of the binding site is far more important than the accuracy of any other region of the protein. We have developed an automated method for assessing the quality of a protein model by focusing on the set of residues in the small molecule binding site. Small molecule binding sites typically involve multiple regions of the protein coming together in space, and their accuracy has been observed to be sensitive to even small alignment errors. In addition, ligand binding sites contain the critical information required for drug design, making their accuracy particularly important. We analyzed the accuracy of the binding sites on two sets of protein models: the predictions submitted by the top-performing CASP7 groups, and the models generated by four widely used homology modeling packages. The results of our CASP7 analysis significantly differ from the previous findings, implying that the binding site measure does not correlate with the traditional model quality measures used in the structure prediction benchmarks. For the modeling programs, the resolution of binding sites is extremely sensitive to the degree of sequence homology between the query and the template, even when the most accurate alignments are used in the homology modeling process.

Subject(s)

Protein Conformation , Proteins/chemistry , Animals , Computational Biology , Databases, Protein , Humans , Models, Molecular

Application of belief theory to similarity data fusion for use in analog searching and lead hopping.

Muchmore, Steven W; Debe, Derek A; Metz, James T; Brown, Scott P; Martin, Yvonne C; Hajduk, Philip J.

J Chem Inf Model ; 48(5): 941-8, 2008 May.

Article in English | MEDLINE | ID: mdl-18416545

ABSTRACT

A wide variety of computational algorithms have been developed that strive to capture the chemical similarity between two compounds for use in virtual screening and lead discovery. One limitation of such approaches is that, while a returned similarity value reflects the perceived degree of relatedness between any two compounds, there is no direct correlation between this value and the expectation or confidence that any two molecules will in fact be equally active. A lack of a common framework for interpretation of similarity measures also confounds the reliable fusion of information from different algorithms. Here, we present a probabilistic framework for interpreting similarity measures that directly correlates the similarity value to a quantitative expectation that two molecules will in fact be equipotent. The approach is based on extensive benchmarking of 10 different similarity methods (MACCS keys, Daylight fingerprints, maximum common subgraphs, rapid overlay of chemical structures (ROCS) shape similarity, and six connectivity-based fingerprints) against a database of more than 150,000 compounds with activity data against 23 protein targets. Given this unified and probabilistic framework for interpreting chemical similarity, principles derived from decision theory can then be applied to combine the evidence from different similarity measures in such a way that both capitalizes on the strengths of the individual approaches and maintains a quantitative estimate of the likelihood that any two molecules will exhibit similar biological activity.

Subject(s)

Algorithms , Drug Evaluation, Preclinical/methods , Pharmaceutical Preparations/chemistry , Probability

Interrogating the druggable genome with structural informatics.

Hambly, Kevin; Danzer, Joseph; Muskal, Steven; Debe, Derek A.

Mol Divers ; 10(3): 273-81, 2006 Aug.

Article in English | MEDLINE | ID: mdl-17031532

ABSTRACT

Structural genomics projects are producing protein structure data at an unprecedented rate. In this paper, we present the Target Informatics Platform (TIP), a novel structural informatics approach for amplifying the rapidly expanding body of experimental protein structure information to enhance the discovery and optimization of small molecule protein modulators on a genomic scale. In TIP, existing experimental structure information is augmented using a homology modeling approach, and binding sites across multiple target families are compared using a clique detection algorithm. We report here a detailed analysis of the structural coverage for the set of druggable human targets, highlighting drug target families where the level of structural knowledge is currently quite high, as well as those areas where structural knowledge is sparse. Furthermore, we demonstrate the utility of TIP's intra- and inter-family binding site similarity analysis using a series of retrospective case studies. Our analysis underscores the utility of a structural informatics infrastructure for extracting drug discovery-relevant information from structural data, aiding researchers in the identification of lead discovery and optimization opportunities as well as potential "off-target" liabilities.

Subject(s)

Drug Design , Enzyme Inhibitors/pharmacology , Genome, Human , Informatics , Proteins/chemistry , Databases, Factual , Enzyme Inhibitors/chemistry , Genomics/methods , Humans , Models, Molecular , Molecular Structure , Proteins/genetics , Proteins/metabolism , Structure-Activity Relationship

SPINFAST: using structure alignment profiles to enhance the accuracy and assess the reliability of protein side-chain modeling.

Poleksic, Aleksandar; Danzer, Joseph F; Palmer, Brian A; Olafson, Barry D; Debe, Derek A.

Proteins ; 65(4): 953-8, 2006 Dec 01.

Article in English | MEDLINE | ID: mdl-17006949

ABSTRACT

We present a novel, knowledge-based method for the side-chain addition step in protein structure modeling. The foundation of the method is a conditional probability equation, which specifies the probability that a side-chain will occupy a specific rotamer state, given a set of evidence about the rotamer states adopted by the side-chains at aligned positions in structurally homologous crystal structures. We demonstrate that our method increases the accuracy of homology model side-chain addition when compared with the widely employed practice of preserving the side-chain conformation from the homology template to the target at conserved residue positions. Furthermore, we demonstrate that our method accurately estimates the probability that the correct rotamer state has been selected. This interesting result implies that our method can be used to understand the reliability of each and every side-chain in a protein homology model.

Subject(s)

Models, Molecular , Proteins/chemistry , Sequence Alignment/methods , Structural Homology, Protein , Amino Acid Sequence , Computer Simulation , Databases, Protein , Protein Conformation , Sequence Homology, Amino Acid

StructSorter: a method for continuously updating a comprehensive protein structure alignment database.

Palmer, Brian; Danzer, Joseph F; Hambly, Kevin; Debe, Derek A.

J Chem Inf Model ; 46(4): 1871-6, 2006.

Article in English | MEDLINE | ID: mdl-16859318

ABSTRACT

Advances in protein crystallography and homology modeling techniques are producing vast amounts of high resolution protein structure data at ever increasing rates. As such, the ability to quickly and easily extract structural similarities is a key tool in discovering important functional relationships. We report on an approach for creating and maintaining a database of pairwise structure alignments for a comprehensive database comprising the PDB and homology models for the human and select pathogen genomes. Our approach consists of a novel, multistage method for determining pairwise structural similarity coupled with an efficient clustering protocol that approximates a full NxN assessment in a fraction of the time. Since biologists are commonly interested in recently released structures, and the homology models built from them, an automatically updating database of structural alignments has great value. Our approach yields a querying system that allows scientists to retrieve databank-wide protein structure similarities as easily as retrieving protein sequence similarities via BLAST or PSI-BLAST. Basic, noncommercial access to the database can be requested at https://tip.eidogen-sertanty.com/.

Subject(s)

Databases, Protein , Protein Conformation , Models, Chemical

STRUCTFAST: protein sequence remote homology detection and alignment using novel dynamic programming and profile-profile scoring.

Debe, Derek A; Danzer, Joseph F; Goddard, William A; Poleksic, Aleksandar.

Proteins ; 64(4): 960-7, 2006 Sep 01.

Article in English | MEDLINE | ID: mdl-16786595

ABSTRACT

STRUCTFAST is a novel profile-profile alignment algorithm capable of detecting weak similarities between protein sequences. The increased sensitivity and accuracy of the STRUCTFAST method are achieved through several unique features. First, the algorithm utilizes a novel dynamic programming engine capable of incorporating important information from a structural family directly into the alignment process. Second, the algorithm employs a rigorous analytical formula for profile-profile scoring to overcome the limitations of ad hoc scoring functions that require adjustable parameter training. Third, the algorithm employs Convergent Island Statistics (CIS) to compute the statistical significance of alignment scores independently for each pair of sequences. STRUCTFAST routinely produces alignments that meet or exceed the quality obtained by an expert human homology modeler, as evidenced by its performance in the latest CAFASP4 and CASP6 blind prediction benchmark experiments.

Subject(s)

Proteins/chemistry , Sequence Alignment/methods , Sequence Homology, Amino Acid , Algorithms , Software

Convergent Island Statistics: a fast method for determining local alignment score significance.

Poleksic, Aleksandar; Danzer, Joseph F; Hambly, Kevin; Debe, Derek A.

Bioinformatics ; 21(12): 2827-31, 2005 Jun 15.

Article in English | MEDLINE | ID: mdl-15817690

ABSTRACT

MOTIVATION: Background distribution statistics for profile-based sequence alignment algorithms cannot be calculated analytically, and hence such algorithms must resort to measuring the significance of an alignment score by assessing its location among a distribution of background alignment scores. The Gumbel parameters that describe this background distribution are usually pre-computed for a limited number of scoring systems, gap schemes, and sequence lengths and compositions. The use of such look-ups is known to introduce errors, which compromise the significance assessment of a remote homology relationship. One solution is to estimate the background distribution for each pair of interest by generating a large number of sequence shuffles and use the distribution of their scores to approximate the parameters of the underlying extreme value distribution. This is computationally very expensive, as a large number of shuffles are needed to precisely estimate the score statistics. RESULTS: Convergent Island Statistics (CIS) is a computationally efficient solution to the problem of calculating the Gumbel distribution parameters for an arbitrary pair of sequences and an arbitrary set of gap and scoring schemes. The basic idea behind our method is to recognize the lack of similarity for any pair of sequences early in the shuffling process and thus save on the search time. The method is particularly useful in the context of profile-profile alignment algorithms where the normalization of alignment scores has traditionally been a challenging task. CONTACT: aleksandar@eidogen.com SUPPLEMENTARY INFORMATION: http://www.eidogen-sertanty.com/Documents/convergent_island_stats_sup.pdf.

Subject(s)

Algorithms , Models, Chemical , Models, Statistical , Sequence Alignment/methods , Sequence Analysis, Protein/methods , Amino Acid Sequence , Computer Simulation , Models, Molecular , Molecular Sequence Data , Sequence Homology, Amino Acid

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL