ABSTRACT
UNLABELLED: X-ray crystallography is the most widely used method to determine the 3D structure of protein molecules. One of the most difficult steps in protein crystallography is model-building, which consists of constructing a backbone and then amino acid side chains into an electron density map. Interpretation of electron density maps represents a major bottleneck in protein structure determination pipelines, and thus, automated techniques to interpret maps can greatly improve the throughput. We have developed WebTex, a simple and yet powerful web interface to TEXTAL, a program that automates this process of fitting atoms into electron density maps. TEXTAL can also be downloaded for local installation. AVAILABILITY: Web interface, downloadable binaries and documentation at http://textal.tamu.edu
Subject(s)
Crystallography/methods , Internet , Models, Chemical , Models, Molecular , Proteins/chemistry , Proteins/ultrastructure , Software , Algorithms , Computer Simulation , Protein Conformation , Sequence Analysis, Protein/methodsABSTRACT
High-throughput computational methods in X-ray protein crystallography are indispensable to meet the goals of structural genomics. In particular, automated interpretation of electron density maps, especially those at mediocre resolution, can significantly speed up the protein structure determination process. TEXTAL(TM) is a software application that uses pattern recognition, case-based reasoning and nearest neighbor learning to produce reasonably refined molecular models, even with average quality data. In this work, we discuss a key issue to enable fast and accurate interpretation of typically noisy electron density data: what features should be used to characterize the density patterns, and how relevant are they? We discuss the challenges of constructing features in this domain, and describe SLIDER, an algorithm to determine the weights of these features. SLIDER searches a space of weights using ranking of matching patterns (relative to mismatching ones) as its evaluation function. Exhaustive search being intractable, SLIDER adopts a greedy approach that judiciously restricts the search space only to weight values that cause the ranking of good matches to change. We show that SLIDER contributes significantly in finding the similarity between density patterns, and discuss the sensitivity of feature relevance to the underlying similarity metric.
Subject(s)
Algorithms , Artificial Intelligence , Crystallography, X-Ray/methods , Models, Molecular , Pattern Recognition, Automated/methods , Proteins/chemistry , Software , Absorptiometry, Photon/methods , Computer Simulation , Electrons , Protein Conformation , Proteins/analysis , Proteins/ultrastructureABSTRACT
Feature selection and weighting are central problems in pattern recognition and instance-based learning. In this work, we discuss the challenges of constructing and weighting features to recognize 3D patterns of electron density to determine protein structures. We present SLIDER, a feature-weighting algorithm that adjusts weights iteratively such that patterns that match query instances are better ranked than mismatching ones. Moreover, SLIDER makes judicious choices of weight values to be considered in each iteration, by examining specific weights at which matching and mismatching patterns switch as nearest neighbors to query instances. This approach reduces the space of weight vectors to be searched. We make the following two main observations: (1) SLIDER efficiently generates weights that contribute significantly in the retrieval of matching electron density patterns; (2) the optimum weight vector is sensitive to the distance metric i.e. feature relevance can be, to a certain extent, sensitive to the underlying metric used to compare patterns.
Subject(s)
Absorptiometry, Photon/methods , Algorithms , Crystallography, X-Ray/methods , Models, Chemical , Models, Molecular , Proteins/chemistry , Proteins/ultrastructure , Artificial Intelligence , Computer Simulation , Electrons , Imaging, Three-Dimensional/methods , Pattern Recognition, Automated/methods , Protein Conformation , Proteins/analysis , SoftwareABSTRACT
A new software system called PHENIX (Python-based Hierarchical ENvironment for Integrated Xtallography) is being developed for the automation of crystallographic structure solution. This will provide the necessary algorithms to proceed from reduced intensity data to a refined molecular model, and facilitate structure solution for both the novice and expert crystallographer. Here, the features of PHENIXare reviewed and the recent advances in infrastructure and algorithms are briefly described.