Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters










Database
Language
Publication year range
1.
BMC Bioinformatics ; 7: 357, 2006 Jul 25.
Article in English | MEDLINE | ID: mdl-16869963

ABSTRACT

BACKGROUND: Protein sequence alignment is one of the basic tools in bioinformatics. Correct alignments are required for a range of tasks including the derivation of phylogenetic trees and protein structure prediction. Numerous studies have shown that the incorporation of predicted secondary structure information into alignment algorithms improves their performance. Secondary structure predictors have to be trained on a set of somewhat arbitrarily defined states (e.g. helix, strand, coil), and it has been shown that the choice of these states has some effect on alignment quality. However, it is not unlikely that prediction of other structural features also could provide an improvement. In this study we use an unsupervised clustering method, the self-organizing map, to assign sequence profile windows to "structural states" and assess their use in sequence alignment. RESULTS: The addition of self-organizing map locations as inputs to a profile-profile scoring function improves the alignment quality of distantly related proteins slightly. The improvement is slightly smaller than that gained from the inclusion of predicted secondary structure. However, the information seems to be complementary as the two prediction schemes can be combined to improve the alignment quality by a further small but significant amount. CONCLUSION: It has been observed in many studies that predicted secondary structure significantly improves the alignments. Here we have shown that the addition of self-organizing map locations can further improve the alignments as the self-organizing map locations seem to contain some information that is not captured by the predicted secondary structure.


Subject(s)
Evolution, Molecular , Protein Structure, Secondary , Sequence Alignment/methods , Sequence Analysis, Protein/methods , Computational Biology/methods , Computational Biology/trends , Databases, Protein , Forecasting , Neural Networks, Computer , Sequence Alignment/trends , Sequence Analysis, Protein/trends , Sequence Homology, Amino Acid
2.
BMC Bioinformatics ; 6: 253, 2005 Oct 14.
Article in English | MEDLINE | ID: mdl-16225676

ABSTRACT

BACKGROUND: Profile-profile methods have been used for some years now to detect and align homologous proteins. The best such methods use information from the background distribution of amino acids and substitution tables either when constructing the profiles or in the scoring. This makes the methods dependent on the quality and choice of substitution table as well as the construction of the profiles. Here, we introduce a novel method called ProfNet that is used to derive a profile-profile scoring function. The method optimizes the discrimination between scores of related and unrelated residues and it is fast and straightforward to use. This new method derives a scoring function that is mainly dependent on the actual alignment of residues from a training set, and it does not use any additional information about the background distribution. RESULTS: It is shown that ProfNet improves the discrimination of related and unrelated residues. Further it can be used to improve the alignment of distantly related proteins. CONCLUSION: The best performance is obtained using superfamily related proteins in the training of ProfNet, and a classifier that is related to the distance between the structurally aligned residues. The main difference between the new scoring function and a traditional profile-profile scoring function is that conserved residues on average score higher with the new function.


Subject(s)
Models, Genetic , Proteins/classification , Sequence Alignment/methods , Neural Networks, Computer , ROC Curve , Sequence Alignment/instrumentation
3.
Proteins ; 57(1): 188-97, 2004 Oct 01.
Article in English | MEDLINE | ID: mdl-15326603

ABSTRACT

To improve the detection of related proteins, it is often useful to include evolutionary information for both the query and target proteins. One method to include this information is by the use of profile-profile alignments, where a profile from the query protein is compared with the profiles from the target proteins. Profile-profile alignments can be implemented in several fundamentally different ways. The similarity between two positions can be calculated using a dot-product, a probabilistic model, or an information theoretical measure. Here, we present a large-scale comparison of different profile-profile alignment methods. We show that the profile-profile methods perform at least 30% better than standard sequence-profile methods both in their ability to recognize superfamily-related proteins and in the quality of the obtained alignments. Although the performance of all methods is quite similar, profile-profile methods that use a probabilistic scoring function have an advantage as they can create good alignments and show a good fold recognition capacity using the same gap-penalties, while the other methods need to use different parameters to obtain comparable performances.


Subject(s)
Protein Conformation , Sequence Alignment/methods , Structural Homology, Protein , Models, Chemical , Probability
4.
Proteins ; 54(2): 342-50, 2004 Feb 01.
Article in English | MEDLINE | ID: mdl-14696196

ABSTRACT

In this study, we show that it is possible to increase the performance over PSI-BLAST by using evolutionary information for both query and target sequences. This information can be used in three different ways: by sequence linking, profile-profile alignments, and by combining sequence-profile and profile-sequence searches. If only PSI-BLAST is used, 16% of superfamily-related protein domains can be detected at 90% specificity, but if a sequence-profile and a profile-sequence search are combined, this is increased to 20%, profile-profile searches detects 19%, whereas a linking procedure identifies 22% of these proteins. All three methods show equal performance, but the best combination of speed and accuracy seems to be obtained by the combined searches, because this method shows a good performance even at high specificity and the lowest computational cost. In addition, we show that the E-values reported by all these methods, including PSI-BLAST, underestimate the true rate of false positives. This behavior is seen even if a very strict E-value cutoff and a limited number of iterations are used. However, the difference is more pronounced with a looser E-value cutoff and more iterations.


Subject(s)
Computational Biology/methods , Evolution, Molecular , Protein Folding , Proteins/chemistry , Proteins/metabolism , Software , Proteins/classification , Sensitivity and Specificity , Time Factors
5.
Acta Crystallogr D Biol Crystallogr ; 58(Pt 5): 768-76, 2002 May.
Article in English | MEDLINE | ID: mdl-11976487

ABSTRACT

The main-chain conformations of 237 384 amino acids in 1042 protein subunits from the PDB were analyzed with Ramachandran plots. The populated areas of the empirical Ramachandran plot differed markedly from the classical plot in all regions. All amino acids in alpha-helices are found within a very narrow range of phi, psi angles. As many as 40% of all amino acids are found in this most populated region, covering only 2% of the Ramachandran plot. The beta-sheet region is clearly subdivided into two distinct regions. These do not arise from the parallel and antiparallel beta-strands, which have quite similar conformations. One beta region is mainly from amino acids in random coil. The third and smallest populated area of the Ramachandran plot, often denoted left-handed alpha-helix, has a different position than that originally suggested by Ramachandran. Each of the 20 amino acids has its own very characteristic Ramachandran plot. Most of the glycines have conformations that were considered to be less favoured. These results may be useful for checking secondary-structure assignments in the PDB and for predicting protein folding.


Subject(s)
Amino Acids/chemistry , Proteins/chemistry , Crystallography, X-Ray , Databases, Protein , Molecular Conformation , Protein Structure, Secondary , Sensitivity and Specificity , Temperature
SELECTION OF CITATIONS
SEARCH DETAIL
...