Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 7 de 7
Filter
Add more filters










Database
Language
Publication year range
1.
Pac Symp Biocomput ; : 399-410, 2004.
Article in English | MEDLINE | ID: mdl-14992520

ABSTRACT

We describe a novel approach to the problem of automatically clustering protein sequences and discovering protein families, subfamilies etc., based on the theory of infinite Gaussian mixtures models. This method allows the data itself to dictate how many mixture components are required to model it, and provides a measure of the probability that two proteins belong to the same cluster. We illustrate our methods with application to three data sets: globin sequences, globin sequences with known three-dimensional structures and G-protein coupled receptor sequences. The consistency of the clusters indicate that our method is producing biologically meaningful results, which provide a very good indication of the underlying families and subfamilies. With the inclusion of secondary structure and residue solvent accessibility information, we obtain a classification of sequences of known structure which both reflects and extends their SCOP classifications. A supplementray web site containing larger versions of the figures is available at http://public.kgi.edu/approximately wid/PSB04/index.html


Subject(s)
Computational Biology , Proteins/chemistry , Proteins/genetics , Amino Acid Sequence , Cluster Analysis , Databases, Protein , Globins/chemistry , Globins/genetics , Models, Statistical , Normal Distribution , Proteins/classification , Receptors, G-Protein-Coupled/chemistry , Receptors, G-Protein-Coupled/genetics
2.
Bioinformatics ; 18(6): 788-801, 2002 Jun.
Article in English | MEDLINE | ID: mdl-12075014

ABSTRACT

MOTIVATION: The Bayesian network approach is a framework which combines graphical representation and probability theory, which includes, as a special case, hidden Markov models. Hidden Markov models trained on amino acid sequence or secondary structure data alone have been shown to have potential for addressing the problem of protein fold and superfamily classification. RESULTS: This paper describes a novel implementation of a Bayesian network which simultaneously learns amino acid sequence, secondary structure and residue accessibility for proteins of known three-dimensional structure. An awareness of the errors inherent in predicted secondary structure may be incorporated into the model by means of a confusion matrix. Training and validation data have been derived for a number of protein superfamilies from the Structural Classification of Proteins (SCOP) database. Cross validation results using posterior probability classification demonstrate that the Bayesian network performs better in classifying proteins of known structural superfamily than a hidden Markov model trained on amino acid sequences alone.


Subject(s)
Bayes Theorem , Protein Folding , Amino Acid Sequence , Computational Biology , Markov Chains , Models, Molecular , Probability Theory , Protein Structure, Secondary
3.
Acta Crystallogr A ; 57(Pt 2): 163-75, 2001 Mar.
Article in English | MEDLINE | ID: mdl-11223503

ABSTRACT

An exponential modeling algorithm is developed for protein structure completion by X-ray crystallography and tested on experimental data from a 59-residue protein. An initial noisy difference Fourier map of missing residues of up to half of the protein is transformed by the algorithm into one that allows easy identification of the continuous tube of electron density associated with that polypeptide chain. The method incorporates the paradigm of phase hypothesis generation and cross validation within an automated scheme.


Subject(s)
Proteins/chemistry , Algorithms , Crystallography, X-Ray , Elapid Venoms/chemistry , Fourier Analysis , Models, Molecular , Protein Conformation
4.
Bioinformatics ; 15(6): 521-2, 1999 Jun.
Article in English | MEDLINE | ID: mdl-10383476

ABSTRACT

SUMMARY: Protein Analyst is a flexible tool for the analysis of protein sequences with emphasis on the integration of sequence and structural information. AVAILABILITY: The software will be available from the Oxford Molecular Biolib web site (http://www. oxmol.co.uk/biolib) and will be free to the academic research community.


Subject(s)
Proteins/chemistry , Software , Algorithms , Computational Biology , Evaluation Studies as Topic , Models, Molecular , Protein Structure, Secondary , Sequence Alignment/methods , Sequence Alignment/statistics & numerical data , Sequence Analysis/methods , Sequence Analysis/statistics & numerical data , Sequence Homology, Amino Acid , Software Design
5.
J Mol Graph ; 13(5): 291-8, 299-300, 1995 Oct.
Article in English | MEDLINE | ID: mdl-8603058

ABSTRACT

This article describes the integration of programs from the widely used CCP4 macromolecular crystallography package into a modern data flow visualization environment (application visualization system [AVS]), which provides a simple graphical user interface, a visual programming paradigm, and a variety of 1-, 2-, and 3-D data visualization tools for the display of graphical information and the results of crystallographic calculations, such as electron density and Patterson maps. The CCP4 suite comprises a number of separate Fortran 77 programs, which communicate via common file formats. Each program is encapsulated into an AVS macro module, and may be linked to others in a data flow network, reflecting the nature of many crystallographic calculations. Named pipes are used to pass input parameters from a graphical user interface to the program module, and also to intercept line printer output, which can be filtered to extract graphical information and significant numerical parameters. These may be passed to downstream modules, permitting calculations to be automated if no user interaction is required, or giving the user the opportunity to make selections in an interactive manner.


Subject(s)
Computer Graphics , Crystallography, X-Ray/methods , Molecular Structure , User-Computer Interface , Base Sequence , Models, Molecular , Molecular Sequence Data , Nucleic Acid Conformation , Oligodeoxyribonucleotides/chemistry , Protein Conformation , Software
6.
Nature ; 274(5670): 433-7, 1978 Aug 03.
Article in English | MEDLINE | ID: mdl-672971

ABSTRACT

High resolution studies on the crystal structure of glycogen phosphorylase b have identified the catalytic site to which the substrate glucose-1-phosphate binds strongly with some local conformational changes. The site is situated 8 A (phosphate-to-phosphate distance) from pyridoxal phosphate, an essential cofactor of all glycogen phosphorylases. The catalytic site is 33 A from the site in the N-terminal portion of the molecule to which adenine nucleotides bind. In contrast to phosphorylase a (the active form of the enzyme which is phosphorylated at Ser 14), the positions of the first 19 residues of phosphorylase b are not well defined.


Subject(s)
Phosphorylases , Adenosine Monophosphate/metabolism , Adenosine Triphosphate/metabolism , Allosteric Site , Binding Sites , Glucosephosphates/metabolism , Glycogen/metabolism , Models, Molecular , Muscles/enzymology , Phosphorylases/metabolism , Protein Conformation , Pyridoxal Phosphate/metabolism , X-Ray Diffraction
SELECTION OF CITATIONS
SEARCH DETAIL
...