Search | VHL Regional Portal

1.

DNA sequence and structure: direct and indirect recognition in protein-DNA binding.

Steffen, N R; Murphy, S D; Tolleri, L; Hatfield, G W; Lathrop, R H.

Bioinformatics ; 18 Suppl 1: S22-30, 2002.

Article in English | MEDLINE | ID: mdl-12169527

ABSTRACT

MOTIVATION: Direct recognition, or direct readout, of DNA bases by a DNA-binding protein involves amino acids that interact directly with features specific to each base. Experimental evidence also shows that in many cases the protein achieves partial sequence specificity by indirect recognition, i.e., by recognizing structural properties of the DNA. (1) Could threading a DNA sequence onto a crystal structure of bound DNA help explain the indirect recognition component of sequence specificity? (2) Might the resulting pure-structure computational motif manifest itself in familiar sequence-based computational motifs? RESULTS: The starting structure motif was a crystal structure of DNA bound to the integration host factor protein (IHF) of E. coli. IHF is known to exhibit both direct and indirect recognition of its binding sites. (1) Threading DNA sequences onto the crystal structure showed statistically significant partial separation of 60 IHF binding sites from random and intragenic sequences and was positively correlated with binding affinity. (2) The crystal structure was shown to be equivalent to a linear Markov network, and so, to a joint probability distribution over sequences, computable in linear time. It was transformed algorithmically into several common pure-sequence representations, including (a) small sets of short exact strings, (b) weight matrices, (c) consensus regular patterns, (d) multiple sequence alignments, and (e) phylogenetic trees. In all cases the pure-sequence motifs retained statistically significant partial separation of the IHF binding sites from random and intragenic sequences. Most exhibited positive correlation with binding affinity. The multiple alignment showed some conserved columns, and the phylogenetic tree partially mixed low-energy sequences with IHF binding sites but separated high-energy sequences. The conclusion is that deformation energy explains part of indirect recognition, which explains part of IHF sequence-specific binding.

Subject(s)

Algorithms , DNA, Bacterial/chemistry , DNA-Binding Proteins/chemistry , Escherichia coli Proteins/chemistry , Integration Host Factors/chemistry , Models, Chemical , Models, Molecular , Sequence Analysis, DNA/methods , Amino Acid Sequence , Binding Sites , Macromolecular Substances , Models, Statistical , Molecular Sequence Data , Protein Binding , Sequence Alignment/methods , Structure-Activity Relationship

2.

A multi-queue branch-and-bound algorithm for anytime optimal search with biological applications.

Lathrop, R H; Sazhin, A; Sun, Y; Steffin, N; Irani, S S.

Genome Inform ; 12: 73-82, 2001.

Article in English | MEDLINE | ID: mdl-11791226

ABSTRACT

Many practical biological problems involve an intractable (NP-hard) search through a large space of possibilities. This paper describes preliminary results from a multi-queue variant of branch-and-bound search that combines anytime and optimal search behavior. The algorithm applies to problems whose solutions may be described by an N-dimensional vector. It produces an approximate solution quickly, then iteratively improves the result over time until a global optimum is produced. A global optimum may be produced before producing its proof of global optimality. Local minima are never revisited. We describe preliminary applications to ab initio protein backbone prediction, small drug-like molecule conformations, and protein-DNA binding motif discovery. The results are encouraging, although still quite preliminary.

Subject(s)

Algorithms , Computational Biology , DNA/metabolism , Molecular Conformation , Molecular Structure , Protein Binding , Proteins/chemistry

3.

An anytime local-to-global optimization algorithm for protein threading in theta (m2ñ2) space.

Lathrop, R H.

J Comput Biol ; 6(3-4): 405-18, 1999.

Article in English | MEDLINE | ID: mdl-10582575

ABSTRACT

This paper describes a novel anytime branch-and-bound or best-first threading search algorithm for gapped block protein sequence-structure alignment with general sequence residue pair interactions. The new algorithm (1) returns a good approximate answer quickly, (2) iteratively improves that answer to the global optimum if allowed more time, (3) eventually produces a proof that the final answer found is indeed the global optimum, and (4) always terminates correctly within a bounded number of steps if allowed sufficient space and time. It runs in polynomial space, which is asymptotically dominated by the theta(m2ñ2) space required by the lower bound computation. Using previously published data sets and the Bryant-Lawrence (1993) objective function, the algorithm found the true (proven) global optimum in less than 5 min in all search spaces size 10(25) or smaller (sequences to 478 residues), and a putative (not guaranteed) optimum in less than 5 hr in all search spaces size 10(60) or smaller (sequences to 793 residues, cores to 42 secondary structure segments). The threading in the largest case studied was eventually proven to be globally optimal; the corresponding search speed in that case was the equivalent of 1.5 x 10(56) threadings/sec, a speed-up exceeding 10(25) over previously published batch branch-and-bound speeds, and exceeding 10(50) over previously published exhaustive search speeds, using the same objective function and threading paradigm. Implementation-independent measures of search efficiency are defined for equivalent branching factor, depth, and probability of success per draw; empirical data on these measures are given. The general approach should apply to other alignment methodologies and search methods that use a divide-and-conquer strategy.

Subject(s)

Algorithms , Proteins/chemistry , Sequence Alignment/methods , Databases, Factual , Evaluation Studies as Topic , Protein Folding , Sequence Alignment/statistics & numerical data

4.

A Bayes-optimal sequence-structure theory that unifies protein sequence-structure recognition and alignment.

Lathrop, R H; Rogers, R G; Smith, T F; White, J V.

Bull Math Biol ; 60(6): 1039-71, 1998 Nov.

Article in English | MEDLINE | ID: mdl-9866450

ABSTRACT

A rigorous Bayesian analysis is presented that unifies protein sequence-structure alignment and recognition. Given a sequence, explicit formulae are derived to select (1) its globally most probable core structure from a structure library; (2) its globally most probable alignment to a given core structure; (3) its most probable joint core structure and alignment chosen globally across the entire library; and (4) its most probable individual segments, secondary structure, and super-secondary structures across the entire library. The computations involved are NP-hard in the general case (3D-3D). Fast exact recursions for the restricted sequence singleton-only (1D-3D) case are given. Conclusions include: (a) the most probable joint core structure and alignment is not necessarily the most probable alignment of the most probable core structure, but rather maximizes the product of core and alignment probabilities; (b) use of a sequence-independent linear or affine gap penalty may result in the highest-probability threading not having the lowest score; (c) selecting the most probable core structure from the library (core structure selection or fold recognition only) involves comparing probabilities summed over all possible alignments of the sequence to the core, and not comparing individual optimal (or near-optimal) sequence-structure alignments; and (d) assuming uninformative priors, core structure selection is equivalent to comparing the ratio of two global means.

Subject(s)

Amino Acid Sequence , Bayes Theorem , Protein Structure, Tertiary , Sequence Alignment

5.

Modeling protein homopolymeric repeats: possible polyglutamine structural motifs for Huntington's disease.

Lathrop, R H; Casale, M; Tobias, D J; Marsh, J L; Thompson, L M.

Proc Int Conf Intell Syst Mol Biol ; 6: 105-14, 1998.

Article in English | MEDLINE | ID: mdl-9783215

ABSTRACT

We describe a prototype system (Poly-X) for assisting an expert user in modeling protein repeats. Poly-X reduces the large number of degrees of freedom required to specify a protein motif in complete atomic detail. The result is a small number of parameters that are easily understood by, and under the direct control of, a domain expert. The system was applied to the polyglutamine (poly-Q) repeat in the first exon of huntingtin, the gene implicated in Huntington's disease. We present four poly-Q structural motifs: two poly-Q beta-sheet motifs (parallel and antiparallel) that constitute plausible alternatives to a similar previously published poly-Q beta-sheet motif, and two novel poly-Q helix motifs (alpha-helix and pi-helix). To our knowledge, helical forms of polyglutamine have not been proposed before. The motifs suggest that there may be several plausible aggregation structures for the intranuclear inclusion bodies which have been found in diseased neurons, and may help in the effort to understand the structural basis for Huntington's disease.

Subject(s)

Computer Simulation , Huntington Disease/metabolism , Models, Molecular , Nerve Tissue Proteins/chemistry , Nuclear Proteins/chemistry , Amino Acid Sequence , Artificial Intelligence , Expert Systems , Humans , Huntingtin Protein , Huntington Disease/genetics , Molecular Sequence Data , Nerve Tissue Proteins/genetics , Nuclear Proteins/genetics , Peptides/chemistry , Protein Structure, Secondary , Repetitive Sequences, Amino Acid

6.

Global optimum protein threading with gapped alignment and empirical pair score functions.

Lathrop, R H; Smith, T F.

J Mol Biol ; 255(4): 641-65, 1996 Feb 02.

Article in English | MEDLINE | ID: mdl-8568903

ABSTRACT

We describe a branch-and-bound search algorithm for finding the exact global optimum gapped sequence-structure alignment ("threading") between a protein sequence and a protein core or structural model, using an arbitrary amino acid pair score function (e.g. contact potentials, knowledge-based potentials, potentials of mean force, etc.). The search method imposes minimal conditions on how structural environments are defined or the form of the score function, and allows arbitrary sequence-specific functions for scoring loops and active site residues. Consequently the search method can be used with many different score functions and threading methodologies; this paper illustrates five from the literature. On a desktop workstation running LISP, we have found the global optimum protein sequence-structure alignment in NP-hard search spaces as large as 9.6 x 10(31), at rates ranging as high as 6.8 x 10(28) equivalent threadings per second (most of which are pruned before they ever are examined explicitly). Continuing the procedure past the global optimum enumerates successive candidate threadings in monotonically increasing score order. We give efficient algorithms for search space size, uniform random sampling, segment placement probabilities, mean, standard deviation and partition function. The method should prove useful for structure prediction, as well as for critical evaluation of new pair score functions.

Subject(s)

Algorithms , Proteins/chemistry , Sequence Alignment , Amino Acid Sequence , Models, Chemical , Molecular Sequence Data , Protein Folding

7.

A shape-based machine learning tool for drug design.

Jain, A N; Dietterich, T G; Lathrop, R H; Chapman, D; Critchlow, R E; Bauer, B E; Webster, T A; Lozano-Perez, T.

J Comput Aided Mol Des ; 8(6): 635-52, 1994 Dec.

Article in English | MEDLINE | ID: mdl-7738601

ABSTRACT

Building predictive models for iterative drug design in the absence of a known target protein structure is an important challenge. We present a novel technique, Compass, that removes a major obstacle to accurate prediction by automatically selecting conformations and alignments of molecules without the benefit of a characterized active site. The technique combines explicit representation of molecular shape with neural network learning methods to produce highly predictive models, even across chemically distinct classes of molecules. We apply the method to predicting human perception of musk odor and show how the resulting models can provide graphical guidance for chemical modifications.

Subject(s)

Computer-Aided Design , Drug Design , Software , Algorithms , Fatty Acids, Monounsaturated/chemistry , Humans , Models, Molecular , Molecular Conformation , Molecular Structure , Neural Networks, Computer , Odorants/analysis

8.

The protein threading problem with sequence amino acid interaction preferences is NP-complete.

Lathrop, R H.

Protein Eng ; 7(9): 1059-68, 1994 Sep.

Article in English | MEDLINE | ID: mdl-7831276

ABSTRACT

In recent protein structure prediction research there has been a great deal of interest in using amino acid interaction preferences (e.g. contact potentials or potentials of mean force) to align ('thread') a protein sequence to a known structural motif. An important open question is whether a polynomial time algorithm for finding the globally optimal threading is possible. We identify the two critical conditions governing this question: (i) variable-length gaps are admitted into the alignment, and (ii) interactions between amino acids from the sequence are admitted into the score function. We prove that if both these conditions are allowed then the protein threading decision problem (does there exist a threading with a score < or = K?) is NP-complete (in the strong sense, i.e. is not merely a number problem) and the related problem of finding the globally optimal protein threading is NP-hard. Therefore, no polynomial time algorithm is possible (unless P = NP). This result augments existing proofs that the direct protein folding problem is NP-complete by providing the corresponding proof for the 'inverse' protein folding problem. It provides a theoretical basis for understanding algorithms currently in use and indicates that computational strategies from other NP-complete problems may be useful for predictive algorithms.

Subject(s)

Protein Engineering , Protein Folding , Proteins/chemistry , Algorithms , Amino Acid Sequence , Amino Acids/chemistry , Molecular Structure , Protein Engineering/methods , Protein Engineering/statistics & numerical data , Proteins/genetics

9.

Acid helix-turn activator motif.

Zhu, Q L; Smith, T F; Lathrop, R H; Figge, J.

Proteins ; 8(2): 156-63, 1990.

Article in English | MEDLINE | ID: mdl-2172962

ABSTRACT

A common sequence/structural motif pattern has been identified within the steroid/thyroid hormone receptors and other transcriptional activators using a new massively parallel symbolic learning assistant computer system. The pattern appears nearly diagnostic of transcription activation, including relative activation strength, among nuclear and DNA-binding prokaryotic proteins. In cases where mutation/deletion/chimeric studies have identified the activation domain, the pattern matches within that domain. These facts and the nature of the pattern itself strongly support the idea that the patterned domain is directly involved in a protein-protein transcription activation interaction.

Subject(s)

Trans-Activators/chemistry , Amino Acid Sequence , Bacteria/analysis , DNA-Binding Proteins/chemistry , Molecular Sequence Data , Protein Conformation , Receptors, Cell Surface/chemistry , Sequence Homology, Nucleic Acid , Viruses/analysis , Yeasts/analysis

10.

Potential structural motifs for reverse transcriptases.

Webster, T A; Patarca, R; Lathrop, R H; Smith, T F.

Mol Biol Evol ; 6(3): 317-20, 1989 May.

Article in English | MEDLINE | ID: mdl-2482917

Subject(s)

Biological Evolution , RNA-Directed DNA Polymerase/genetics , Amino Acid Sequence , DNA Transposable Elements , Molecular Sequence Data , Protein Conformation , Sequence Homology, Nucleic Acid

11.

Pattern descriptors and the unidentified reading frame 6 human mtDNA dinucleotide-binding site.

Webster, T A; Lathrop, R H; Smith, T F.

Proteins ; 3(2): 97-101, 1988.

Article in English | MEDLINE | ID: mdl-3165195

ABSTRACT

In an effort to identify the structural elements essential to a given protein function a new pattern-directed inference system has been developed. It has been employed to identify a potential dinucleotide-binding domain within the human mitochondrial unidentified reading frame 6 product, thereby supporting an earlier study that this gene may encode a NADH dehydrogenase subunit.

Subject(s)

DNA, Mitochondrial , Oligonucleotides , Binding Sites , DNA, Mitochondrial/genetics , Dinucleoside Phosphates , Humans , NADH Dehydrogenase/genetics , Nucleic Acid Conformation , Protein Conformation

12.

Prediction of a common structural domain in aminoacyl-tRNA synthetases through use of a new pattern-directed inference system.

Webster, T A; Lathrop, R H; Smith, T F.

Biochemistry ; 26(22): 6950-7, 1987 Nov 03.

Article in English | MEDLINE | ID: mdl-3322392

ABSTRACT

The aminoacyl-tRNA synthetases are united by a common function with little evidence of a common structural relationship. Outside of an 11 amino acid stretch called the "signature sequence", no global primary sequence similarity exists. The signature sequence matches 4-11 amino acids in several aminoacyl-tRNA synthetases. High-resolution X-ray data are available for two of these enzymes, revealing that their signature sequence regions are small segments of a common mononucleotide binding foldlike structure. A new methodology for the analysis of dissimilar primary sequences supports the expectation that all of the signature sequence regions form a common structure. In our analysis, two complex pattern descriptors were constructed to describe the synthetase mononucleotide binding fold. These were compared to primary sequences annotated with predicted secondary structures and hydropathy profiles. Regions in 8 out of 12 (67%) heterologous aminoacyl-tRNA synthetase groups (where each group is specific for the same amino acid) match the first descriptor, and 7 of these (58%) also match the second descriptor. In contrast, only 4 regions in a set of 54 control proteins (7.4%) match the first descriptor, and only 2 regions (3.7%) match both. Alignment of these 8 regions to the descriptor (1) positions all known signature sequence regions as the first loop of a mononucleotide binding foldlike structure, (2) extends the previous alignments by another 40-odd amino acids, and (3) identifies potential sites in 3 out of 6 heterologous aminoacyl-tRNA synthetases with no previous alignments. Potential sites are also proposed for two additional heterologous synthetases on the basis of matches to less specific descriptors.

Subject(s)

Amino Acyl-tRNA Synthetases/metabolism , Amino Acid Sequence , Escherichia coli/enzymology , Flavin-Adenine Dinucleotide/metabolism , Models, Molecular , NAD/metabolism , Oxidoreductases/metabolism , Protein Binding , Protein Conformation , Saccharomyces cerevisiae/enzymology , Structure-Activity Relationship , X-Ray Diffraction

13.

Consensus topography in the ATP binding site of the simian virus 40 and polyomavirus large tumor antigens.

Bradley, M K; Smith, T F; Lathrop, R H; Livingston, D M; Webster, T A.

Proc Natl Acad Sci U S A ; 84(12): 4026-30, 1987 Jun.

Article in English | MEDLINE | ID: mdl-3035562

ABSTRACT

The location and sequence composition of a consensus element of the nucleotide binding site in both simian virus 40 (SV40) and polyomavirus (PyV) large tumor antigens (T antigens) can be predicted with the assistance of a computer-based pattern-matching system, ARIADNE. The latter was used to optimally align elements of T antigen primary sequence and predicted secondary structure with a "descriptor" for a mononucleotide binding fold. Additional consensus elements of the nucleotide binding site in these two proteins were derived from comparisons of T antigen primary and predicted secondary structures with x-ray structures of the nucleotide binding sites in four otherwise unrelated proteins. Each of these elements was predicted to be encompassed within a 110-residue segment that is highly conserved between the two T antigens residues 418-528 in SV40 T antigen and residues 565-675 in PyV). Results of biochemical and immunologic experiments on the nucleotide binding behavior of these proteins were found to be consistent with these predictions. Taken together, the latter have resulted in a topological model of the ATP binding site in these two oncogene products.

Subject(s)

Adenosine Triphosphate/metabolism , Antigens, Viral, Tumor/metabolism , Oncogene Proteins, Viral/metabolism , Polyomavirus/enzymology , Protein Kinases/metabolism , Simian virus 40/enzymology , Amino Acid Sequence , Antigens, Polyomavirus Transforming , Binding Sites , Cyanogen Bromide , Models, Molecular , Peptide Fragments/analysis , Protein Binding , Protein Conformation , Software

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL