Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 11 de 11
Filter
Add more filters










Publication year range
1.
J Mol Biol ; 313(5): 1195-206, 2001 Nov 09.
Article in English | MEDLINE | ID: mdl-11700074

ABSTRACT

A structural survey of the Escherichia coli proteins occurring in metabolic networks in the KEGG database (release 19 of LIGAND) has been carried out. A measure of structural coverage of a network is defined and calculated for each network. Twenty-four networks have 50 % or more of the enzyme steps assigned in E. coli and of these 21 have a structural coverage of 50 % or more. For those proteins that have a region matching a SCOP domain 50 % fall on or below the 30 % sequence identity threshold and represent non-trivial comparative modelling targets highlighting the need for experimental structure determination studies. The survey reveals the predominance of alpha/beta and alpha+beta folds for enzymes involved in metabolic pathways and that this general trend is maintained at the level of each pathway. The most popular superfamilies are coenzyme binding domains and are involved in the supply of energy to reactions. Although a few superfamilies are found in many pathways, in general there is a specificity of a particular superfamily for a particular pathway.


Subject(s)
Computational Biology/methods , Escherichia coli Proteins/chemistry , Escherichia coli Proteins/metabolism , Escherichia coli/chemistry , Escherichia coli/metabolism , Binding Sites , Coenzymes/metabolism , Databases, Protein , Escherichia coli/enzymology , Escherichia coli/genetics , Escherichia coli Proteins/classification , Escherichia coli Proteins/genetics , Gene Duplication , Genes, Bacterial/genetics , Genes, Duplicate , Models, Molecular , Protein Binding , Protein Structure, Secondary , Protein Structure, Tertiary , Software , Structure-Activity Relationship , Substrate Specificity
2.
Bioinformatics ; 15(6): 521-2, 1999 Jun.
Article in English | MEDLINE | ID: mdl-10383476

ABSTRACT

SUMMARY: Protein Analyst is a flexible tool for the analysis of protein sequences with emphasis on the integration of sequence and structural information. AVAILABILITY: The software will be available from the Oxford Molecular Biolib web site (http://www. oxmol.co.uk/biolib) and will be free to the academic research community.


Subject(s)
Proteins/chemistry , Software , Algorithms , Computational Biology , Evaluation Studies as Topic , Models, Molecular , Protein Structure, Secondary , Sequence Alignment/methods , Sequence Alignment/statistics & numerical data , Sequence Analysis/methods , Sequence Analysis/statistics & numerical data , Sequence Homology, Amino Acid , Software Design
3.
Protein Eng ; 11(8): 627-30, 1998 Aug.
Article in English | MEDLINE | ID: mdl-9749915

ABSTRACT

Although it is well known that significant sequence similarity between proteins is reflected at the structural level, it is commonly assumed that any misaligned regions, as judged by the correct structure based alignment, are those where the local sequence identity is lower than the global. Recent studies have shown that this is not always the case and there can exist short stretches of high local identity which is not reflected in the structure based alignment. An analysis is presented of 290 pairs of homologous proteins with a view to quantifying the occurrence of these misleading local sequence alignments (MLSAs). It is found that such MLSAs are likely if the global sequence identity is less than 40% and can occur even when it is greater than 60%. The results have implications for automated homology modelling and also for the inference of function made by comparison.


Subject(s)
Models, Theoretical , Proteins/chemistry , Sequence Homology, Amino Acid , Algorithms , Animals , Cytochrome Reductases/chemistry , Cytochrome-B(5) Reductase , Databases, Factual , Escherichia coli/enzymology , Evolution, Molecular , Superoxide Dismutase/chemistry , Swine
4.
Protein Eng ; 11(1): 1-9, 1998 Jan.
Article in English | MEDLINE | ID: mdl-9579654

ABSTRACT

Fold recognition methods aim to use the information in the known protein structures (the targets) to identify that the sequence of a protein of unknown structure (the probe) will adopt a known fold. This paper highlights that the structural similarities sought by these methods can be divided into two types: remote homologues and analogues. Homologues are the result of divergent evolution and often share a common function. We define remote homologues as those that are not easily detectable by sequence comparison methods alone. Analogues do not have a common ancestor and generally do not have a common function. Several sets of empirical matrices for residue substitution, secondary structure conservation and residue accessibility conservation have previously been derived from aligned pairs of remote homologues and analogues (Russell et al., J. Mol. Biol., 1997, 269, 423-439). Here a method for fold recognition, FOLDFIT, is introduced that uses these matrices to match the sequences, secondary structures and residue accessibilities of the probe and target. The approach is evaluated on distinct datasets of analogous and remotely homologous folds. The accuracy of FOLDFIT with the different matrices on the two datasets is contrasted to results from another fold recognition method (THREADER) and to searches using mutation matrices in the absence of any structural information. FOLDFIT identifies at top rank 12 out of 18 remotely homologous folds and five out of nine analogous folds. The average alignment accuracies for residue and secondary structure equivalencing are much higher for homologous folds (residue approximately 42%, secondary structure approximately 78%) than for analogues folds (approximately 12%, approximately 47%). Sequence searches alone can be successful for several homologues in the testing sets but nearly always fail for the analogues. These results suggest that the recognition of analogous and remotely homologous folds should be assessed separately. This study has implications for the development and comparative evaluation of fold recognition algorithms.


Subject(s)
Protein Folding , Sequence Alignment , Evolution, Molecular , Protein Structure, Secondary
5.
J Mol Biol ; 269(3): 423-39, 1997 Jun 13.
Article in English | MEDLINE | ID: mdl-9199410

ABSTRACT

An analysis was performed on 335 pairs of structurally aligned proteins derived from the structural classification of proteins (SCOP http://scop.mrc-lmb.cam.ac.uk/scop/) database. These similarities were divided into analogues, defined as proteins with similar three-dimensional structures (same SCOP fold classification) but generally with different functions and little evidence of a common ancestor (different SCOP superfamily classification). Homologues were defined as pairs of similar structures likely to be the result of evolutionary divergence (same superfamily) and were divided into remote, medium and close sub-divisions based on the percentage sequence identity. Particular attention was paid to the differences between analogues and remote homologues, since both types of similarities are generally undetectable by sequence comparison and their detection is the aim of fold recognition methods. Distributions of sequence identities and substitution matrices suggest a higher degree of sequence similarity in remote homologues than in analogues. Matrices for remote homologues show similarity to existing mutation matrices, providing some validity for their use in previously described fold recognition methods. In contrast, matrices derived from analogous proteins show little conservation of amino acid properties beyond broad conservation of hydrophobic or polar character. Secondary structure and accessibility were more conserved on average in remote homologues than in analogues, though there was no apparent difference in the root-mean-square deviation between these two types of similarities. Alignments of remote homologues and analogues show a similar number of gaps, openings (one or more sequential gaps) and inserted/deleted secondary structure elements, and both generally contain more gaps/openings/deleted secondary structure elements than medium and close homologues. These results suggest that gap parameters for fold recognition should be more lenient than those used in sequence comparison. Parameters were derived from the analogue and remote homologue datasets for potential used in fold recognition methods. Implications for protein fold recognition and evolution are discussed.


Subject(s)
Models, Molecular , Protein Folding , Proteins/chemistry , Sequence Analysis/methods , Sequence Homology, Amino Acid , Computer Simulation , DNA Transposable Elements , Databases, Factual , Mutation , Proteins/genetics , Sequence Alignment , Sequence Deletion
6.
Comput Appl Biosci ; 10(5): 545-6, 1994 Sep.
Article in English | MEDLINE | ID: mdl-7828071

ABSTRACT

A program PdbMotif, which automatically identifies protein motifs in a protein data bank file and generates a script file which can be read directly by the molecular rendering program RasMol, is described. PdbMotif accepts the standard PROSITE pattern syntax and will scan the PROSITE pattern database or a set of user defined patterns. Any motifs detected are automatically highlighted in the RasMol image.


Subject(s)
Computer Graphics , Databases, Factual , Proteins/chemistry , Software , Algorithms , Amino Acid Sequence , Models, Molecular , Molecular Structure , Pattern Recognition, Automated , Sequence Analysis , User-Computer Interface
7.
Protein Eng ; 7(2): 165-71, 1994 Feb.
Article in English | MEDLINE | ID: mdl-8170920

ABSTRACT

The automatic identification of motifs associated with a given function is an important challenge for molecular sequence analysis. A method is presented for the extraction of such patterns from large sets of unaligned sequences with related but general function, for example, a set of heat shock proteins. In such a set of proteins there can often be several subfamilies each characterized by one or more distinct motifs. The aim is to develop computational tools to identify these motifs. The algorithm presented locates high frequency words of length k with a given number of positions, r, fixed. Statistics for a binomial distribution are used to assess the significance of the words. The high-frequency words are clustered and highly populated clusters retained. The composition of the clusters is displayed graphically. A set of motifs associated with the sequence family can automatically be extracted. The method is benchmarked on a set of 106 heat shock sequences and a set of 257 toxin sequences. It is shown to recover previously identified motifs.


Subject(s)
Algorithms , Consensus Sequence , Models, Molecular , Protein Structure, Tertiary , Amino Acid Sequence , Heat-Shock Proteins/chemistry , Molecular Sequence Data , Sequence Homology, Amino Acid
8.
J Mol Biol ; 228(1): 170-87, 1992 Nov 05.
Article in English | MEDLINE | ID: mdl-1447780

ABSTRACT

A multiple alignment of five (beta/alpha)8-barrel enzymes has been derived from their structure. The eight beta-strands and eight alpha-helices of the (beta/alpha)8-barrel are correctly aligned and the equivalenced residues in these regions fulfil similar structural roles. Each beta-strand has a central core of usually four residues, two residues contribute side-chains to the barrel core and the other two residues are involved in beta-strand/alpha-helix contacts. However, the fold imposes no constraints on the volumes of the residues at either a local or global level: the volume of the beta-barrel core varies between 1088 A3 in glycolate oxidase and 1571 A3 in taka-amylase. Sequence motifs derived from the multiple alignment were scanned against a database of 124 protein sequences, including 17 (beta/alpha)8-barrel enzymes. The results were evaluated in terms of the discrimination of (beta/alpha)8-barrel sequences and the quality of the alignments obtained. One motif was able to identify the top 12% of high scoring sequences as forming (beta/alpha)8-barrels with 50% accuracy and the bottom 50% of sequences as not being (beta/alpha)8-barrel proteins with 100% accuracy. However, in most instances the alignments were poor. The reasons for this are discussed with reference to the (beta/alpha)8-barrel proteins and the sequence motif method in general.


Subject(s)
Aldose-Ketose Isomerases , Enzymes/chemistry , Protein Folding , Protein Structure, Secondary , Alcohol Oxidoreductases/chemistry , Amino Acid Sequence , Carbohydrate Epimerases/chemistry , Methods , Molecular Sequence Data , Ribulose-Bisphosphate Carboxylase/chemistry , Sequence Alignment , Templates, Genetic , Triose-Phosphate Isomerase/chemistry , alpha-Amylases/chemistry
9.
Protein Eng ; 5(4): 305-11, 1992 Jun.
Article in English | MEDLINE | ID: mdl-1409552

ABSTRACT

A major problem in predicting protein structure by homology modelling is that the sequence alignment from which the model is built may not be the best one in terms of the correct equivalencing of residues assessed by structural or functional criteria. A useful strategy is to generate and examine a number of suboptimal alignments as better alignments can often be found away from the optimal. A procedure to filter rapidly suboptimal alignments based on measurement of core volumes and packing pair potentials is investigated. The approach is benchmarked on three pairs of sequences which are non-trivial to align correctly, namely two immunoglobulin domains, plastocyanin with azurin and two distant globin sequences. It is shown to be useful to reduce a large ensemble of possible alignments down to a few which correspond more closely to the correct (structure based) alignment.


Subject(s)
Protein Conformation , Sequence Alignment/methods , Algorithms , Amino Acid Sequence , Animals , Humans , Molecular Sequence Data , Sequence Homology
10.
J Mol Biol ; 219(4): 727-32, 1991 Jun 20.
Article in English | MEDLINE | ID: mdl-1905360

ABSTRACT

A major problem in sequence alignments based on the standard dynamic programming method is that the optimal path does not necessarily yield the best equivalencing of residues assessed by structural or functional criteria. An algorithm is presented that finds suboptimal alignments of protein sequences by a simple modification to the standard dynamic programming method. The standard pairwise weight matrix elements are modified in order to penalize, but not eliminate, the equivalencing of residues obtained from previous alignments. The algorithm thereby yields a limited set of alternate alignments that can differ considerably from the optimal. The approach is benchmarked on the alignments of immunoglobulin domains. Without a prior knowledge of the optimal choice of gap penalty, one of the suboptimal alignments is shown to be more accurate than the optimal.


Subject(s)
Immunoglobulin Fragments/chemistry , Proteins/chemistry , Sequence Alignment/methods , Algorithms , Amino Acid Sequence , Immunoglobulin Constant Regions , Immunoglobulin Heavy Chains/chemistry , Immunoglobulin Light Chains/chemistry , Immunoglobulin Variable Region/chemistry , Molecular Sequence Data , Software
11.
Protein Eng ; 3(5): 419-23, 1990 Apr.
Article in English | MEDLINE | ID: mdl-2112248

ABSTRACT

The estimation of free energy differences from computer simulation of macromolecular systems is important for rational strategies for drug design and for protein engineering. As an example of one mutation, we have studied the free energy change resulting from the conversion of a polar group (OH) to an apolar group (CH3) in aqueous solution. We have estimated the effect of various local environments on the magnitude of the free energy difference and find that significant environmental effects are found. We have also studied the reliability of the results in detail.


Subject(s)
Mutation , Proteins , Amino Acid Sequence , Computer Simulation , Ethane , Hydrogen Bonding , Methanol , Molecular Sequence Data , Monte Carlo Method , Protein Conformation , Protein Engineering , Proteins/genetics , Thermodynamics , Threonine , Valine
SELECTION OF CITATIONS
SEARCH DETAIL
...