Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 18 de 18
Filter
Add more filters










Publication year range
1.
medRxiv ; 2020 Dec 15.
Article in English | MEDLINE | ID: mdl-33354685

ABSTRACT

Disease dynamics, human mobility, and public policies co-evolve during a pandemic such as COVID-19. Understanding dynamic human mobility changes and spatial interaction patterns are crucial for understanding and forecasting COVID-19 dynamics. We introduce a novel graph-based neural network(GNN) to incorporate global aggregated mobility flows for a better understanding of the impact of human mobility on COVID-19 dynamics as well as better forecasting of disease dynamics. We propose a recurrent message passing graph neural network that embeds spatio-temporal disease dynamics and human mobility dynamics for daily state-level new confirmed cases forecasting. This work represents one of the early papers on the use of GNNs to forecast COVID-19 incidence dynamics and our methods are competitive to existing methods. We show that the spatial and temporal dynamic mobility graph leveraged by the graph neural network enables better long-term forecasting performance compared to baselines.

2.
medRxiv ; 2020 Aug 22.
Article in English | MEDLINE | ID: mdl-32577671

ABSTRACT

This work quantifies mobility changes observed during the different phases of the pandemic world-wide at multiple resolutions -- county, state, country -- using an anonymized aggregate mobility map that captures population flows between geographic cells of size 5 km 2 . As we overlay the global mobility map with epidemic incidence curves and dates of government interventions, we observe that as case counts rose, mobility fell and has since then seen a slow but steady increase in flows. Further, in order to understand mixing within a region, we propose a new metric to quantify the effect of social distancing on the basis of mobility.Taking two very different countries sampled from the global spectrum, We analyze in detail the mobility patterns of the United States (US) and India. We then carry out a counterfactual analysis of delaying the lockdown and show that a one week delay would have doubled the reported number of cases in the US and India. Finally, we quantify the effect of college students returning back to school for the fall semester on COVID-19 dynamics in the surrounding community. We employ the data from a recent university outbreak (reported on August 16, 2020) to infer possible R eff values and mobility flows combined with daily prevalence data and census data to obtain an estimate of new cases that might arrive on a college campus. We find that maintaining social distancing at existing levels would be effective in mitigating the extra seeding of cases. However, potential behavioral change and increased social interaction amongst students (30% increase in R eff ) along with extra seeding can increase the number of cases by 20% over a period of one month in the encompassing county. To our knowledge, this work is the first to model in near real-time, the interplay of human mobility, epidemic dynamics and public policies across multiple spatial resolutions and at a global scale.

3.
ArXiv ; 2020 Dec 23.
Article in English | MEDLINE | ID: mdl-33398245

ABSTRACT

The Mumbai Suburban Railways, locals, are a key transit infrastructure of the city and is crucial for resuming normal economic activity. Due to high density during transit, the potential risk of disease transmission is high, and the government has taken a wait and see approach to resume normal operations. To reduce disease transmission, policymakers can enforce reduced crowding and mandate wearing of masks. Cohorting - forming groups of travelers that always travel together, is an additional policy to reduce disease transmission on locals without severe restrictions. Cohorting allows us to: (i) form traveler bubbles, thereby decreasing the number of distinct interactions over time; (ii) potentially quarantine an entire cohort if a single case is detected, making contact tracing more efficient, and (iii) target cohorts for testing and early detection of symptomatic as well as asymptomatic cases. Studying impact of cohorts using compartmental models is challenging because of the ensuing representational complexity. Agent-based models provide a natural way to represent cohorts along with the representation of the cohort members with the larger social network. This paper describes a novel multi-scale agent-based model to study the impact of cohorting strategies on COVID-19 dynamics in Mumbai. We achieve this by modeling the Mumbai urban region using a detailed agent-based model comprising of 12.4 million agents. Individual cohorts and their inter-cohort interactions as they travel on locals are modeled using local mean field approximations. The resulting multi-scale model in conjunction with a detailed disease transmission and intervention simulator is used to assess various cohorting strategies. The results provide a quantitative trade-off between cohort size and its impact on disease dynamics and well being. The results show that cohorts can provide significant benefit in terms of reduced transmission without significantly impacting ridership and or economic & social activity.

4.
Bioinformatics ; 28(17): 2285-7, 2012 Sep 01.
Article in English | MEDLINE | ID: mdl-22789588

ABSTRACT

MOTIVATION: The exponential growth of scientific literature has resulted in a massive amount of unstructured natural language data that cannot be directly handled by means of bioinformatics tools. Such tools generally require structured data, often generated through a cumbersome process of manual literature curation. Herein, we present MyMiner, a free and user-friendly text annotation tool aimed to assist in carrying out the main biocuration tasks and to provide labelled data for the development of text mining systems. MyMiner allows easy classification and labelling of textual data according to user-specified classes as well as predefined biological entities. The usefulness and efficiency of this application have been tested for a range of real-life annotation scenarios of various research topics. AVAILABILITY: http://myminer.armi.monash.edu.au.


Subject(s)
Data Mining , Software , Information Storage and Retrieval/methods , Internet
5.
J Mol Biol ; 416(3): 346-66, 2012 Feb 24.
Article in English | MEDLINE | ID: mdl-22244851

ABSTRACT

GNL1, a putative nucleolar GTPase, belongs to the MMR1-HSR1 family of large GTPases that are emerging as crucial coordinators of signaling cascades in different cellular compartments. Members of this family share very closely related G-domains, but the signals and pathways regulating their subcellular localization with respect to cell growth remain unknown. To understand the nuclear transport mechanism of GNL1, we have identified a novel arginine/lysine-rich nucleolar localization signal in the NH(2)-terminus that is shown to translocate GNL1 and a heterologous protein to the nucleus/nucleolus in a pathway that is independent of importin-α and importin-ß. In addition, the present investigation provided evidence that GNL1 localized to the nucleus and the nucleolus only in G2 stage, in contrast to its cytoplasmic localization in the G1 and S phases of the cell cycle. Using heterokaryon assay, we have demonstrated that GNL1 shuttles between the nucleus and the cytoplasm and that the motif between amino acids 201 and 225 is essential for its export from the nucleus by a signal-mediated CRM1-independent pathway. Alanine-scanning mutagenesis of conserved residues within G-domains suggests that the G2 motif is critical for guanine nucleotide triphosphate (GTP) binding of GNL1 and further showed that nucleolar retention of GNL1 is regulated by a GTP-gating-mediated mechanism. Expression of wild-type GNL1 promotes G2/M transition, in contrast to the G-domain mutant (G2m), which fails to localize to the nucleolus. These data suggest that nucleolar translocation during G2 phase may be critical for faster M-phase transition during cell proliferation. Replacement of conserved residues within the G5 motif alters the stability of GNL1 without changing GTP binding activity. Finally, our data suggest that ongoing transcription is essential for the efficient localization of GNL1 to the nucleolus. Overall, the results reported here demonstrate that multiple mechanisms are involved in the translocation of GNL1 to the nucleolus in a cell cycle-dependent manner to regulate cell growth and proliferation.


Subject(s)
Cell Cycle , Cell Nucleolus/enzymology , GTP-Binding Proteins/metabolism , Active Transport, Cell Nucleus , Amino Acid Sequence , Arginine/chemistry , Cell Line , GTP-Binding Proteins/chemistry , Humans , Lysine/chemistry , Molecular Sequence Data , Nuclear Localization Signals/chemistry , Nuclear Localization Signals/metabolism , Protein Structure, Tertiary
6.
BMC Bioinformatics ; 12 Suppl 8: S3, 2011 Oct 03.
Article in English | MEDLINE | ID: mdl-22151929

ABSTRACT

BACKGROUND: Determining usefulness of biomedical text mining systems requires realistic task definition and data selection criteria without artificial constraints, measuring performance aspects that go beyond traditional metrics. The BioCreative III Protein-Protein Interaction (PPI) tasks were motivated by such considerations, trying to address aspects including how the end user would oversee the generated output, for instance by providing ranked results, textual evidence for human interpretation or measuring time savings by using automated systems. Detecting articles describing complex biological events like PPIs was addressed in the Article Classification Task (ACT), where participants were asked to implement tools for detecting PPI-describing abstracts. Therefore the BCIII-ACT corpus was provided, which includes a training, development and test set of over 12,000 PPI relevant and non-relevant PubMed abstracts labeled manually by domain experts and recording also the human classification times. The Interaction Method Task (IMT) went beyond abstracts and required mining for associations between more than 3,500 full text articles and interaction detection method ontology concepts that had been applied to detect the PPIs reported in them. RESULTS: A total of 11 teams participated in at least one of the two PPI tasks (10 in ACT and 8 in the IMT) and a total of 62 persons were involved either as participants or in preparing data sets/evaluating these tasks. Per task, each team was allowed to submit five runs offline and another five online via the BioCreative Meta-Server. From the 52 runs submitted for the ACT, the highest Matthew's Correlation Coefficient (MCC) score measured was 0.55 at an accuracy of 89% and the best AUC iP/R was 68%. Most ACT teams explored machine learning methods, some of them also used lexical resources like MeSH terms, PSI-MI concepts or particular lists of verbs and nouns, some integrated NER approaches. For the IMT, a total of 42 runs were evaluated by comparing systems against manually generated annotations done by curators from the BioGRID and MINT databases. The highest AUC iP/R achieved by any run was 53%, the best MCC score 0.55. In case of competitive systems with an acceptable recall (above 35%) the macro-averaged precision ranged between 50% and 80%, with a maximum F-Score of 55%. CONCLUSIONS: The results of the ACT task of BioCreative III indicate that classification of large unbalanced article collections reflecting the real class imbalance is still challenging. Nevertheless, text-mining tools that report ranked lists of relevant articles for manual selection can potentially reduce the time needed to identify half of the relevant articles to less than 1/4 of the time when compared to unranked results. Detecting associations between full text articles and interaction detection method PSI-MI terms (IMT) is more difficult than might be anticipated. This is due to the variability of method term mentions, errors resulting from pre-processing of articles provided as PDF files, and the heterogeneity and different granularity of method term concepts encountered in the ontology. However, combining the sophisticated techniques developed by the participants with supporting evidence strings derived from the articles for human interpretation could result in practical modules for biological annotation workflows.


Subject(s)
Algorithms , Data Mining , Proteins/metabolism , Animals , Databases, Protein , Humans , Periodicals as Topic , PubMed
7.
J Mol Biol ; 410(4): 681-97, 2011 Jul 22.
Article in English | MEDLINE | ID: mdl-21762808

ABSTRACT

The characteristic event that follows infection of a cell by retroviruses Including human immunodeficiency virus (HIV)/ simian immunodeficiency virus (SIV) is the formation of a reverse transcription complex in which viral nucleic acids are synthesized. Nuclear transport of newly synthesized viral DNA requires phosphorylation of proteins in the reverse transcription complex by virion-associated cellular kinases. Recently, we demonstrated that disruption of cellular mitogen-activated protein kinase (MAPK)/extracellular signal-regulated kinase 2 (ERK-2) incorporation into SIV virions inhibits virus replication in nonproliferating target cells, indicating that MAPK/ERK-2 plays an important role in HIV /SIV replication. The mechanism of incorporation of MAPK/ERK-2 into virus particles is not defined. In this regard, we hypothesized that a likely interaction of MAPK/ERK-2 with Gag(p55) may enable its packaging into virus particles. In the present investigation, we provided evidence for the first time that MAPK/ERK-2 interacts with the structural Gag polyprotein p55 using a combination of mutagenesis and protein-protein interaction analysis. We further show that MAPK/ERK-2 interacts specifically with the poly-proline motif present in the capsid region of Gag(p55). Utilizing virus-like particles directed by Gag, we have shown that the exchange of conserved proline residues within capsid of Gag(p55) resulted in impaired incorporation of MAPK/ERK-2. In addition, the deletion of a domain comprising amino acids 201 to 255 within host cell MAPK/ERK-2 abrogates its interaction with Gag(p55). The relevance of the poly-proline motif is further evident by its conservation in diverse retroviruses, as noted from the sequence analysis and structural modeling studies of predicted amino acid sequences of the corresponding Gag proteins. Collectively, these data suggest that the interaction of MAPK/ERK-2 with Gag polyprotein results in its incorporation into virus particles and may be essential for retroviral replication.


Subject(s)
Capsid/chemistry , Gene Products, gag/chemistry , Gene Products, gag/metabolism , Lentivirus/metabolism , Mitogen-Activated Protein Kinase 1/metabolism , Proline-Rich Protein Domains , Virion/metabolism , Amino Acid Sequence , Conserved Sequence/genetics , DNA Replication , HIV-1/physiology , HeLa Cells , Humans , Molecular Sequence Data , Mutation/genetics , Protein Binding , Simian Immunodeficiency Virus/physiology , Structure-Activity Relationship , Virus Assembly/physiology
8.
Bioinformatics ; 27(13): i61-8, 2011 Jul 01.
Article in English | MEDLINE | ID: mdl-21685102

ABSTRACT

MOTIVATION: With rapidly expanding protein structure databases, efficiently retrieving structures similar to a given protein is an important problem. It involves two major issues: (i) effective protein structure representation that captures inherent relationship between fragments and facilitates efficient comparison between the structures and (ii) effective framework to address different retrieval requirements. Recently, researchers proposed vector space model of proteins using bag of fragments representation (FragBag), which corresponds to the basic information retrieval model. RESULTS: In this article, we propose an improved representation of protein structures using latent dirichlet allocation topic model. Another important requirement is to retrieve proteins, whether they are either close or remote homologs. In order to meet diverse objectives, we propose multi-viewpoint based framework that combines multiple representations and retrieval techniques. We compare the proposed representation and retrieval framework on the benchmark dataset developed by Kolodny and co-workers. The results indicate that the proposed techniques outperform state-of-the-art methods. AVAILABILITY: http://www.cse.iitm.ac.in/~ashishvt/research/protein-lda/. CONTACT: ashishvt@cse.iitm.ac.in.


Subject(s)
Proteins/chemistry , Structural Homology, Protein , Algorithms , Databases, Protein , Information Storage and Retrieval , Models, Molecular , Natural Language Processing
9.
Bioinformation ; 5(8): 341-9, 2011 Jan 22.
Article in English | MEDLINE | ID: mdl-21383922

ABSTRACT

The functional sites were predicted for Nudix enzymes from pathogenic microorganisms such as Streprococcus pneumonia (2B06) and Enterococcus faecalis (2AZW). Their structures are already determined, however, no data is reported about their functional sites, substrates and inhibitors. Therefore, we report prediction of functional sites in these Nudix enzymes via Geometric Invariant (GI) technique (Construct different geometries of peptides which remain unchanged). The GI method enumerated 2B06: RA57, EA58, EA61, EA62 and 2AZW: RA62, EA63, EA66, EA67 as putative functional sites in these Nudix enzymes. In addition, the substrate was predicted via Molecular docking (Docking of substrates against whole structure of Nudix enzymes). The substrate ADP-Ribose was docked with the Nudix enzymes, 2B06 (Docking energy -15.68 Kcal/mol) and 2AZW (Docking energy -10.86 Kcal/mol) with the higher affinity and the lower docking energy as compared to other substrates. The residues EA62 in 2B06 and RA62 in 2AZW make hydrogen bonds with the ADP-ribose. Furthermore, we screened 51 inhibitor compounds against structures of 2B06 and 2AZW. The inhibitor compounds AMPCPR and CID14258187 were docked well as compared to other compounds. The compound CID14258187 was also in agreement with Lipinski rule of 5 for drug likeness properties. Therefore, our findings of functional sites, substrates and inhibitors for these Nudix enzymes may help in structure based drug designing against Streprococcus pneumonia and Enterococcus faecalis.

10.
BMC Bioinformatics ; 12 Suppl 1: S20, 2011 Feb 15.
Article in English | MEDLINE | ID: mdl-21342550

ABSTRACT

BACKGROUND: The structure conservation in various α-helix subclasses reveals the sequence and context dependent factors causing distortions in the α-helix. The sequence-structure relationship in these subclasses can be used to predict structural variations in α-helix purely based on its sequence. We train support vector machine(SVM) with dot product kernel function to discriminate between regular α-helix and non-regular α-helices purely based on the sequences, which are represented with various overall and position specific propensities of amino acids. RESULTS: We characterize the structural distortions in five α-helix subclasses. The sequence structure correlation in the subclasses reveals that the increased propensity of proline, histidine, serine, aspartic acid and aromatic amino acids are responsible for the distortions in regular α-helix. The N-terminus of regular α-helix prefers neutral and acidic polar amino acids, while the C-terminus prefers basic polar amino acid. Proline is preferred in the first turn of regular α-helix, while it is preferred to produce kinked and curved subclasses. The SVM discriminates between regular α-helix and the rest with precision of 80.97% and recall of 88.05%. CONCLUSIONS: The correlation between structural variation in helices and their sequences is manifested by the performance of SVM based on sequence features. The results presented here are useful for computational design of helices. The results are also useful for prediction of structural perturbations in helix sequence purely based on its sequence.


Subject(s)
Amino Acids/chemistry , Protein Structure, Secondary , Proteins/chemistry , Software , Algorithms , Computational Biology/methods
11.
PLoS One ; 5(3): e9679, 2010 Mar 18.
Article in English | MEDLINE | ID: mdl-20305778

ABSTRACT

BACKGROUND: FragKB (Fragment Knowledgebase) is a repository of clusters of structurally similar fragments from proteins. Fragments are annotated with information at the level of sequence, structure and function, integrating biological descriptions derived from multiple existing resources and text mining. METHODOLOGY: FragKB contains approximately 400,000 conserved fragments from 4,800 representative proteins from PDB. Literature annotations are extracted from more than 1,700 articles and are available for over 12,000 fragments. The underlying systematic annotation workflow of FragKB ensures efficient update and maintenance of this database. The information in FragKB can be accessed through a web interface that facilitates sequence and structural visualization of fragments together with known literature information on the consequences of specific residue mutations and functional annotations of proteins and fragment clusters. FragKB is accessible online at http://ubio.bioinfo.cnio.es/biotools/fragkb/. SIGNIFICANCE: The information presented in FragKB can be used for modeling protein structures, for designing novel proteins and for functional characterization of related fragments. The current release is focused on functional characterization of proteins through inspection of conservation of the fragments.


Subject(s)
Computational Biology/methods , Peptides/chemistry , Animals , Cluster Analysis , Data Mining , Databases, Protein , Humans , Information Storage and Retrieval , Internet , Protein Conformation , Proteins/chemistry , Software
12.
Nucleic Acids Res ; 37(Web Server issue): W160-5, 2009 Jul.
Article in English | MEDLINE | ID: mdl-19520768

ABSTRACT

There is an increasing interest in using literature mining techniques to complement information extracted from annotation databases or generated by bioinformatics applications. Here we present PLAN2L, a web-based online search system that integrates text mining and information extraction techniques to access systematically information useful for analyzing genetic, cellular and molecular aspects of the plant model organism Arabidopsis thaliana. Our system facilitates a more efficient retrieval of information relevant to heterogeneous biological topics, from implications in biological relationships at the level of protein interactions and gene regulation, to sub-cellular locations of gene products and associations to cellular and developmental processes, i.e. cell cycle, flowering, root, leaf and seed development. Beyond single entities, also predefined pairs of entities can be provided as queries for which literature-derived relations together with textual evidences are returned. PLAN2L does not require registration and is freely accessible at http://zope.bioinfo.cnio.es/plan2l.


Subject(s)
Arabidopsis/physiology , Information Storage and Retrieval/methods , Software , AGAMOUS Protein, Arabidopsis/metabolism , Arabidopsis/genetics , Arabidopsis/growth & development , Arabidopsis Proteins/metabolism , Internet , Systems Integration , Transcription Factors/metabolism
13.
J Biosci ; 32(5): 899-908, 2007 Aug.
Article in English | MEDLINE | ID: mdl-17914232

ABSTRACT

The classical approaches for protein structure prediction rely either on homology of the protein sequence with a template structure or on ab initio calculations for energy minimization. These methods suffer from disadvantages such as the lack of availability of homologous template structures or intractably large conformational search space, respectively. The recently proposed fragment library based approaches first predict the local structures,which can be used in conjunction with the classical approaches of protein structure prediction. The accuracy of the predictions is dependent on the quality of the fragment library. In this work, we have constructed a library of local conformation classes purely based on geometric similarity. The local conformations are represented using Geometric Invariants, properties that remain unchanged under transformations such as translation and rotation, followed by dimension reduction via principal component analysis. The local conformations are then modeled as a mixture of Gaussian probability distribution functions (PDF). Each one of the Gaussian PDF's corresponds to a conformational class with the centroid representing the average structure of that class. We find 46 classes when we use an octapeptide as a unit of local conformation. The protein 3-D structure can now be described as a sequence of local conformational classes. Further, it was of interest to see whether the local conformations can be predicted from the amino acid sequences. To that end,we have analyzed the correlation between sequence features and the conformational classes.


Subject(s)
Models, Chemical , Normal Distribution , Protein Conformation , Sequence Analysis, Protein , Amino Acid Sequence , Computational Biology/methods , Computer Simulation , Molecular Sequence Data , Oligopeptides/chemistry , Peptide Library , Sequence Analysis, Protein/methods
14.
Pac Symp Biocomput ; : 291-302, 2006.
Article in English | MEDLINE | ID: mdl-17094247

ABSTRACT

Classification of helical structures and identification of class specific sequence features is of interest for protein structure modeling. We use geometric invariant based method to first select helix-like local conformations. These conformations are mapped in a principal component space and subjected to Gaussian mixture modeling. The largest Gaussian corresponds to the regular alpha-helix. Kinked helix and curved helix appear as a separate gaussians. Class conditional, position specific amino acid propensity analysis reveals striking difference among the three classes. In regular helix, proline propensity is significant only in the beginning and low in the rest of the region regardless of length of the helix. In kinked helix, the proline propensity has a sharp peak at the helix center, while in the curved helix, the proline propensity has a broad peak in the middle region.


Subject(s)
Proteins/chemistry , Amino Acids/chemistry , Computational Biology , Databases, Protein , Genetic Variation , Models, Molecular , Oligopeptides/chemistry , Oligopeptides/genetics , Protein Structure, Secondary , Proteins/genetics
15.
Bioinformatics ; 21(18): 3622-8, 2005 Sep 15.
Article in English | MEDLINE | ID: mdl-16096349

ABSTRACT

MOTIVATION: Characterization of the restricted nature of the protein local conformational space has remained a challenge, thereby necessitating a computationally expensive conformational search in protein modeling. Moreover, owing to the lack of unilateral structural descriptors, conventional data mining techniques, such as clustering and classification, have not been applied in protein structure analysis. RESULTS: We first map the local conformations in a fixed dimensional space by using a carefully selected suite of geometric invariants (GIs) and then reduce the number of dimensions via principal component analysis (PCA). Distribution of the conformations in the space spanned by the first four PCs is visualized as a set of conditional bivariate probability distribution plots, where the peaks correspond to the preferred conformations. The locations of the different canonical structures in the PC-space have been interpreted in the context of the weights of the GIs to the first four PCs. Clustering of the available conformations reveals that the number of preferred local conformations is several orders of magnitude smaller than that suggested previously. SUPPLEMENTARY INFORMATION: www.it.iitb.ac.in/~ashish/bioinfo2005/.


Subject(s)
Computational Biology/methods , Protein Conformation , Sequence Analysis, Protein , Algorithms , Cluster Analysis , Computer Simulation , Internet , Models, Theoretical , Molecular Conformation , Peptides/chemistry , Principal Component Analysis , Probability , Sequence Alignment
16.
J Mol Biol ; 338(3): 611-29, 2004 Apr 30.
Article in English | MEDLINE | ID: mdl-15081817

ABSTRACT

Structures of peptide fragments drawn from a protein can potentially occupy a vast conformational continuum. We co-ordinatize this conformational space with the help of geometric invariants and demonstrate that the peptide conformations of the currently available protein structures are heavily biased in favor of a finite number of conformational types or structural building blocks. This is achieved by representing a peptides' backbone structure with geometric invariants and then clustering peptides based on closeness of the geometric invariants. This results in 12,903 clusters, of which 2207 are made up of peptides drawn from functionally and/or structurally related proteins. These are termed "functional" clusters and provide clues about potential functional sites. The rest of the clusters, including the largest few, are made up of peptides drawn from unrelated proteins and are termed "structural" clusters. The largest clusters are of regular secondary structures such as helices and beta strands as well as of beta hairpins. Several categories of helices and strands are discovered based on geometric differences. In addition to the known classes of loops, we discover several new classes, which will be useful in protein structure modeling. Our algorithm does not require assignment of secondary structure and, therefore, overcomes the limitations in loop classification due to ambiguity in secondary structure assignment at loop boundaries.


Subject(s)
Peptides/chemistry , Proteins/genetics , Algorithms , Computational Biology , Evolution, Molecular , Peptides/classification , Peptides/metabolism , Phylogeny , Protein Structure, Secondary , Proteins/classification , Proteins/metabolism
17.
J Mol Biol ; 334(1): 157-72, 2003 Nov 14.
Article in English | MEDLINE | ID: mdl-14596807

ABSTRACT

We present a scheme for the classification of 3487 non-redundant protein structures into 1207 non-hierarchical clusters by using recurring structural patterns of three to six amino acids as keys of classification. This results in several signature patterns, which seem to decide membership of a protein in a functional category. The patterns provide clues to the key residues involved in functional sites as well as in protein-protein interaction. The discovered patterns include a "glutamate double bridge" of superoxide dismutase, the functional interface of the serine protease and inhibitor, interface of homo/hetero dimers, and functional sites of several enzyme families. We use geometric invariants to decide superimposability of structural patterns. This allows the parameterization of patterns and discovery of recurring patterns via clustering. The geometric invariant-based approach eliminates the computationally explosive step of pair-wise comparison of structures. The results provide a vast resource for the biologists for experimental validation of the proposed functional sites, and for the design of synthetic enzymes, inhibitors and drugs.


Subject(s)
Protein Structure, Tertiary , Proteins/chemistry , Proteins/classification , Algorithms , Amino Acids , Binding Sites , Evolution, Molecular , Models, Molecular , Models, Theoretical , Proteins/metabolism
18.
J Mol Biol ; 326(3): 955-78, 2003 Feb 21.
Article in English | MEDLINE | ID: mdl-12581652

ABSTRACT

We report a method for detection of recurring side-chain patterns (DRESPAT) using an unbiased and automated graph theoretic approach. We first list all structural patterns as sub-graphs where the protein is represented as a graph. The patterns from proteins are compared pair-wise to detect patterns common to a protein pair based on content and geometry criteria. The recurring pattern is then detected using an automated search algorithm from the all-against-all pair-wise comparison data of proteins. Intra-protein pattern comparison data are used to enable detection of patterns recurring within a protein. A method has been proposed for empirical calculation of statistical significance of recurring pattern. The method was tested on 17 protein sets of varying size, composed of non-redundant representatives from SCOP superfamilies. Recurring patterns in serine proteases, cysteine proteases, lipases, cupredoxin, ferredoxin, ferritin, cytochrome c, aspartoyl proteases, peroxidases, phospholipase A2, endonuclease, SH3 domain, EF-hand and lectins show additional residues conserved in the vicinity of the known functional sites. On the basis of the recurring patterns in ferritin, EF-hand and lectins, we could separate proteins or domains that are structurally similar yet different in metal ion-binding characteristics. In addition, novel recurring patterns were observed in glutathione-S-transferase, phospholipase A2 and ferredoxin with potential structural/functional roles. The results are discussed in relation to the known functional sites in each family. Between 2000 and 50,000 patterns were enumerated from each protein with between ten and 500 patterns detected as common to an evolutionarily related protein pair. Our results show that unbiased extraction of functional site pattern is not feasible from an evolutionarily related protein pair but is feasible from protein sets comprising five or more proteins. The DRESPAT method does not require a user-defined pattern, size or location of the pattern and therefore, has the potential to uncover new functional sites in protein families.


Subject(s)
Proteins/metabolism , Algorithms , Automation , Models, Molecular , Proteins/chemistry
SELECTION OF CITATIONS
SEARCH DETAIL
...