ABSTRACT
We present an efficient algorithm for generating a small set of coarse alignments between interacting proteins using meaningful features on their surfaces. The proteins are treated as rigid bodies, but the results are more generally useful as the produced configurations can serve as input to local improvement algorithms that allow for protein flexibility. We apply our algorithm to a diverse set of protein complexes from the Protein Data Bank, demonstrating the effectivity of our algorithm, both for bound and for unbound protein docking problems.
Subject(s)
Proteins/chemistry , Algorithms , Amino Acid Sequence , Computational Biology/methods , Models, Molecular , Protein Conformation , Sensitivity and Specificity , Sequence Alignment , SoftwareABSTRACT
Identification and size characterization of surface pockets and occluded cavities are initial steps in protein structure-based ligand design. A new program, CAST, for automatically locating and measuring protein pockets and cavities, is based on precise computational geometry methods, including alpha shape and discrete flow theory. CAST identifies and measures pockets and pocket mouth openings, as well as cavities. The program specifies the atoms lining pockets, pocket openings, and buried cavities; the volume and area of pockets and cavities; and the area and circumference of mouth openings. CAST analysis of over 100 proteins has been carried out; proteins examined include a set of 51 monomeric enzyme-ligand structures, several elastase-inhibitor complexes, the FK506 binding protein, 30 HIV-1 protease-inhibitor complexes, and a number of small and large protein inhibitors. Medium-sized globular proteins typically have 10-20 pockets/cavities. Most often, binding sites are pockets with 1-2 mouth openings; much less frequently they are cavities. Ligand binding pockets vary widely in size, most within the range 10(2)-10(3)A3. Statistical analysis reveals that the number of pockets and cavities is correlated with protein size, but there is no correlation between the size of the protein and the size of binding sites. Most frequently, the largest pocket/cavity is the active site, but there are a number of instructive exceptions. Ligand volume and binding site volume are somewhat correlated when binding site volume is < or =700 A3, but the ligand seldom occupies the entire site. Auxiliary pockets near the active site have been suggested as additional binding surface for designed ligands (Mattos C et al., 1994, Nat Struct Biol 1:55-58). Analysis of elastase-inhibitor complexes suggests that CAST can identify ancillary pockets suitable for recruitment in ligand design strategies. Analysis of the FK506 binding protein, and of compounds developed in SAR by NMR (Shuker SB et al., 1996, Science 274:1531-1534), indicates that CAST pocket computation may provide a priori identification of target proteins for linked-fragment design. CAST analysis of 30 HIV-1 protease-inhibitor complexes shows that the flexible active site pocket can vary over a range of 853-1,566 A3, and that there are two pockets near or adjoining the active site that may be recruited for ligand design.
Subject(s)
Proteins/chemistry , Algorithms , Bacterial Proteins/chemistry , Binding Sites , Enzymes/chemistry , Ligands , Models, Molecular , Protein Structure, Tertiary , SoftwareABSTRACT
The size and shape of macromolecules such as proteins and nucleic acids play an important role in their functions. Prior efforts to quantify these properties have been based on various discretization or tessellation procedures involving analytical or numerical computations. In this article, we present an analytically exact method for computing the metric properties of macromolecules based on the alpha shape theory. This method uses the duality between alpha complex and the weighted Voronoi decomposition of a molecule. We describe the intuitive ideas and concepts behind the alpha shape theory and the algorithm for computing areas and volumes of macromolecules. We apply our method to compute areas and volumes of a number of protein systems. We also discuss several difficulties commonly encountered in molecular shape computations and outline methods to overcome these problems.
Subject(s)
Algorithms , Nucleic Acid Conformation , Protein Conformation , Macromolecular Substances , Mathematical Computing , Models, MolecularABSTRACT
The structures of proteins are well-packed, yet they contain numerous cavities which play key roles in accommodating small molecules, or enabling conformational changes. From high-resolution structures it is possible to identify these cavities. We have developed a precise algorithm based on alpha shapes for measuring space-filling-based molecular models (such as van der Waals, solvent accessible, and molecular surface descriptions). We applied this method for accurate computation of the surface area and volume of cavities in several proteins. In addition, all of the atoms/residues lining the cavities are identified. We use this method to study the structure and the stability of proteins, as well as to locate cavities that could contain structural water molecules in the proton transport pathway in the membrane protein bacteriorhodopsin.
Subject(s)
Algorithms , Protein Conformation , Bacteriorhodopsins/chemistry , Macromolecular Substances , Mathematical Computing , Models, Molecular , Myoglobin/chemistry , Ribonucleases/chemistryABSTRACT
The shape of a protein is important for its functions. This includes the location and size of identifiable regions in its complement space. We formally define pockets as regions in the complement with limited accessibility from the outside. Pockets can be efficiently constructed by an algorithm based on alpha complexes. The algorithm is implemented and applied to proteins with known three-dimensional conformations.