Search | VHL Regional Portal

An approach to functionally relevant clustering of the protein universe: Active site profile-based clustering of protein structures and sequences.

Knutson, Stacy T; Westwood, Brian M; Leuthaeuser, Janelle B; Turner, Brandon E; Nguyendac, Don; Shea, Gabrielle; Kumar, Kiran; Hayden, Julia D; Harper, Angela F; Brown, Shoshana D; Morris, John H; Ferrin, Thomas E; Babbitt, Patricia C; Fetrow, Jacquelyn S.

Protein Sci ; 26(4): 677-699, 2017 04.

Article in English | MEDLINE | ID: mdl-28054422

ABSTRACT

Protein function identification remains a significant problem. Solving this problem at the molecular functional level would allow mechanistic determinant identification-amino acids that distinguish details between functional families within a superfamily. Active site profiling was developed to identify mechanistic determinants. DASP and DASP2 were developed as tools to search sequence databases using active site profiling. Here, TuLIP (Two-Level Iterative clustering Process) is introduced as an iterative, divisive clustering process that utilizes active site profiling to separate structurally characterized superfamily members into functionally relevant clusters. Underlying TuLIP is the observation that functionally relevant families (curated by Structure-Function Linkage Database, SFLD) self-identify in DASP2 searches; clusters containing multiple functional families do not. Each TuLIP iteration produces candidate clusters, each evaluated to determine if it self-identifies using DASP2. If so, it is deemed a functionally relevant group. Divisive clustering continues until each structure is either a functionally relevant group member or a singlet. TuLIP is validated on enolase and glutathione transferase structures, superfamilies well-curated by SFLD. Correlation is strong; small numbers of structures prevent statistically significant analysis. TuLIP-identified enolase clusters are used in DASP2 GenBank searches to identify sequences sharing functional site features. Analysis shows a true positive rate of 96%, false negative rate of 4%, and maximum false positive rate of 4%. F-measure and performance analysis on the enolase search results and comparison to GEMMA and SCI-PHY demonstrate that TuLIP avoids the over-division problem of these methods. Mechanistic determinants for enolase families are evaluated and shown to correlate well with literature results.

Subject(s)

Databases, Protein , Glutathione Transferase/chemistry , Glutathione Transferase/genetics , Phosphopyruvate Hydratase/chemistry , Phosphopyruvate Hydratase/genetics , Sequence Analysis, Protein/methods

Comparison of topological clustering within protein networks using edge metrics that evaluate full sequence, full structure, and active site microenvironment similarity.

Leuthaeuser, Janelle B; Knutson, Stacy T; Kumar, Kiran; Babbitt, Patricia C; Fetrow, Jacquelyn S.

Protein Sci ; 24(9): 1423-39, 2015 Sep.

Article in English | MEDLINE | ID: mdl-26073648

ABSTRACT

The development of accurate protein function annotation methods has emerged as a major unsolved biological problem. Protein similarity networks, one approach to function annotation via annotation transfer, group proteins into similarity-based clusters. An underlying assumption is that the edge metric used to identify such clusters correlates with functional information. In this contribution, this assumption is evaluated by observing topologies in similarity networks using three different edge metrics: sequence (BLAST), structure (TM-Align), and active site similarity (active site profiling, implemented in DASP). Network topologies for four well-studied protein superfamilies (enolase, peroxiredoxin (Prx), glutathione transferase (GST), and crotonase) were compared with curated functional hierarchies and structure. As expected, network topology differs, depending on edge metric; comparison of topologies provides valuable information on structure/function relationships. Subnetworks based on active site similarity correlate with known functional hierarchies at a single edge threshold more often than sequence- or structure-based networks. Sequence- and structure-based networks are useful for identifying sequence and domain similarities and differences; therefore, it is important to consider the clustering goal before deciding appropriate edge metric. Further, conserved active site residues identified in enolase and GST active site subnetworks correspond with published functionally important residues. Extension of this analysis yields predictions of functionally determinant residues for GST subgroups. These results support the hypothesis that active site similarity-based networks reveal clusters that share functional details and lay the foundation for capturing functionally relevant hierarchies using an approach that is both automatable and can deliver greater precision in function annotation than current similarity-based methods.

Subject(s)

Molecular Sequence Annotation/methods , Proteins/chemistry , Amino Acid Sequence , Catalytic Domain , Cellular Microenvironment , Cluster Analysis , Computational Biology/methods , Databases, Protein , Molecular Sequence Data , Protein Interaction Maps , Structure-Activity Relationship

Analysis of the peroxiredoxin family: using active-site structure and sequence information for global classification and residue analysis.

Nelson, Kimberly J; Knutson, Stacy T; Soito, Laura; Klomsiri, Chananat; Poole, Leslie B; Fetrow, Jacquelyn S.

Proteins ; 79(3): 947-64, 2011 Mar.

Article in English | MEDLINE | ID: mdl-21287625

ABSTRACT

Peroxiredoxins (Prxs) are a widespread and highly expressed family of cysteine-based peroxidases that react very rapidly with H2O2, organic peroxides, and peroxynitrite. Correct subfamily classification has been problematic because Prx subfamilies are frequently not correlated with phylogenetic distribution and diverge in their preferred reductant, oligomerization state, and tendency toward overoxidation. We have developed a method that uses the Deacon Active Site Profiler (DASP) tool to extract functional-site profiles from structurally characterized proteins to computationally define subfamilies and to identify new Prx subfamily members from GenBank(nr). For the 58 literature-defined Prx test proteins, 57 were correctly assigned, and none were assigned to the incorrect subfamily. The >3500 putative Prx sequences identified were then used to analyze residue conservation in the active site of each Prx subfamily. Our results indicate that the existence and location of the resolving cysteine vary in some subfamilies (e.g., Prx5) to a greater degree than previously appreciated and that interactions at the A interface (common to Prx5, Tpx, and higher order AhpC/Prx1 structures) are important for stabilization of the correct active-site geometry. Interestingly, this method also allows us to further divide the AhpC/Prx1 into four groups that are correlated with functional characteristics. The DASP method provides more accurate subfamily classification than PSI-BLAST for members of the Prx family and can now readily be applied to other large protein families.

Subject(s)

Peroxiredoxins/chemistry , Amino Acid Sequence , Catalytic Domain , Entropy , Models, Molecular , Molecular Sequence Data , Phylogeny , Sequence Homology, Amino Acid

PREX: PeroxiRedoxin classification indEX, a database of subfamily assignments across the diverse peroxiredoxin family.

Soito, Laura; Williamson, Chris; Knutson, Stacy T; Fetrow, Jacquelyn S; Poole, Leslie B; Nelson, Kimberly J.

Nucleic Acids Res ; 39(Database issue): D332-7, 2011 Jan.

Article in English | MEDLINE | ID: mdl-21036863

ABSTRACT

PREX (http://www.csb.wfu.edu/prex/) is a database of currently 3516 peroxiredoxin (Prx or PRDX) protein sequences unambiguously classified into one of six distinct subfamilies. Peroxiredoxins are a diverse and ubiquitous family of highly expressed, cysteine-dependent peroxidases that are important for antioxidant defense and for the regulation of cell signaling pathways in eukaryotes. Subfamily members were identified using the Deacon Active Site Profiler (DASP) bioinformatics tool to focus in on functionally relevant sequence fragments surrounding key residues required for protein activity. Searches of this database can be conducted by protein annotation, accession number, PDB ID, organism name or protein sequence. Output includes the subfamily to which each classified Prx belongs, accession and GI numbers, genus and species and the functional site signature used for classification. The query sequence is also presented aligned with a select group of Prxs for manual evaluation and interpretation by the user. A synopsis of the characteristics of members of each subfamily is also provided along with pertinent references.

Subject(s)

Databases, Protein , Peroxiredoxins/classification , Peroxiredoxins/chemistry , User-Computer Interface

Functional site profiling and electrostatic analysis of cysteines modifiable to cysteine sulfenic acid.

Salsbury, Freddie R; Knutson, Stacy T; Poole, Leslie B; Fetrow, Jacquelyn S.

Protein Sci ; 17(2): 299-312, 2008 Feb.

Article in English | MEDLINE | ID: mdl-18227433

ABSTRACT

Cysteine sulfenic acid (Cys-SOH), a reversible modification, is a catalytic intermediate at enzyme active sites, a sensor for oxidative stress, a regulator of some transcription factors, and a redox-signaling intermediate. This post-translational modification is not random: specific features near the cysteine control its reactivity. To identify features responsible for the propensity of cysteines to be modified to sulfenic acid, a list of 47 proteins (containing 49 known Cys-SOH sites) was compiled. Modifiable cysteines are found in proteins from most structural classes and many functional classes, but have no propensity for any one type of protein secondary structure. To identify features affecting cysteine reactivity, these sites were analyzed using both functional site profiling and electrostatic analysis. Overall, the solvent exposure of modifiable cysteines is not different from the average cysteine. The combined sequence, structure, and electrostatic approaches reveal mechanistic determinants not obvious from overall sequence comparison, including: (1) pKaS of some modifiable cysteines are affected by backbone features only; (2) charged residues are underrepresented in the structure near modifiable sites; (3) threonine and other polar residues can exert a large influence on the cysteine pKa; and (4) hydrogen bonding patterns are suggested to be important. This compilation of Cys-SOH modification sites and their features provides a quantitative assessment of previous observations and a basis for further analysis and prediction of these sites. Agreement with known experimental data indicates the utility of this combined approach for identifying mechanistic determinants at protein functional sites.

Subject(s)

Cysteine/analogs & derivatives , Cysteine/chemistry , Proteins/chemistry , Sulfenic Acids/chemistry , Amino Acid Sequence , Binding Sites , Catalysis , Cysteine/metabolism , Hydrogen Bonding , Proteins/metabolism , Sequence Alignment , Static Electricity , Sulfenic Acids/metabolism

Mutations in alpha-helical solvent-exposed sites of eglin c have long-range effects: evidence from molecular dynamics simulations.

Fetrow, Jacquelyn S; Knutson, Stacy T; Edgell, Marshall Hall.

Proteins ; 63(2): 356-72, 2006 May 01.

Article in English | MEDLINE | ID: mdl-16342264

ABSTRACT

Eglin c is a small protease inhibitor whose structural and thermodynamic properties have been well studied. Previous thermodynamic measurements on mutants at solvent-accessible positions in the protein's helix have shown the unexpected result that the data could be best fit by the inclusion of residue- and position-specific parameters to the model. To explore the origins of this surprising result, long molecular dynamics simulations in explicit solvent have been performed. These simulations indicate specific long-range interactions between the solvent-exposed residues in the eglin c alpha-helix and binding loop, an unexpected observation for such a small protein. The residues involved in the interaction are on opposite sides of the protein, about 25 A apart. Simulations of alanine substitutions at the solvent-exposed helix positions, arginine 22, glutamic acid 23, threonine 26, and leucine 27, show both small and large perturbations of eglin c dynamics. Two mutations exhibit large impacts on the long-range helix-loop interactions. Previous stability measurements (Yi et al., Biochemistry 2003;42:7594-7603) had indicated that an alanine substitution at position 27 was less stabilizing than at other solvent-exposed positions in the helix. The L27A mutation effects observed in these simulations suggest that the position-dependent loss of stability measured in wet bench experiments is derived from changes in dynamics that involve long-range interactions; thus, these simulations support the hypothesis that solvent-exposed positions in helices are not always equivalent.

Subject(s)

Proteins/chemistry , Computational Biology , Computer Simulation , Models, Molecular , Mutation/genetics , Nuclear Magnetic Resonance, Biomolecular , Protein Structure, Secondary , Protein Structure, Tertiary , Proteins/genetics

Chemical and structural diversity in cyclooxygenase protein active sites.

Huff, Ryan G; Bayram, Ersin; Tan, Huan; Knutson, Stacy T; Knaggs, Michael H; Richon, Allen B; Santago, Peter; Fetrow, Jacquelyn S.

Chem Biodivers ; 2(11): 1533-52, 2005 Nov.

Article in English | MEDLINE | ID: mdl-17191953

ABSTRACT

A major pharmaceutical problem is designing diverse and selective lead compounds. The human genome sequence provides opportunities to discover compounds that are protein selective if we can develop methods to identify specificity determinants from sequence alone. We have analyzed sequence and structural diversity of sheep COX-1 and mouse COX-2 proteins by Active Site Profiling (ASP). Eleven residues that should serve as specificity determinants between COX-1 and COX-2 were identified; however, the literature suggests that only one has been utilized in structure-based discovery. ASP was used to create a position-specific scoring matrix, which was used to identify possible cross-reacting proteins from the human sequences. This method proved selective for cyclooxygenases, comparing well with results using BLAST. The methods identify a probable misannotation of a cyclooxygenase in which there is high sequence similarity scores using BLAST, but ASP shows it does not contain the residues necessary for cyclooxygenase function. ASP Analysis of human COX proteins suggests that some specificity determinants that distinguish COX-1 and COX-2 proteins are similar between sheep COX-1/mouse COX-2 and human COX-1/COX2; however, residue identities at those positions are not necessarily conserved. Our results lay groundwork for development of family-specific pattern recognition methods to selectively match compounds with proteins.

Subject(s)

Cyclooxygenase 1/chemistry , Cyclooxygenase 1/genetics , Cyclooxygenase 2/chemistry , Cyclooxygenase 2/genetics , Membrane Proteins/chemistry , Membrane Proteins/genetics , Amino Acid Sequence , Animals , Binding Sites/physiology , Cyclooxygenase 1/metabolism , Cyclooxygenase 2/metabolism , Cyclooxygenase 2 Inhibitors/chemistry , Cyclooxygenase 2 Inhibitors/metabolism , Cyclooxygenase Inhibitors/chemistry , Cyclooxygenase Inhibitors/metabolism , Humans , Membrane Proteins/metabolism , Mice , Molecular Sequence Data , Sequence Homology, Amino Acid , Sheep, Domestic

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL