Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters










Database
Language
Publication year range
1.
Sci Rep ; 11(1): 8746, 2021 04 22.
Article in English | MEDLINE | ID: mdl-33888741

ABSTRACT

Genome sequencing projects unearth sequences of all the protein sequences encoded in a genome. As the first step, homology detection is employed to obtain clues to structure and function of these proteins. However, high evolutionary divergence between homologous proteins challenges our ability to detect distant relationships. In the past, an approach involving multiple Position Specific Scoring Matrices (PSSMs) was found to be more effective than traditional single PSSMs. Cascaded search is another successful approach where hits of a search are queried to detect more homologues. We propose a protocol, 'Master Blaster', which combines the principles adopted in these two approaches to enhance our ability to detect remote homologues even further. Assessment of the approach was performed using known relationships available in the SCOP70 database, and the results were compared against that of PSI-BLAST and HHblits, a hidden Markov model-based method. Compared to PSI-BLAST, Master Blaster resulted in 10% improvement with respect to detection of cross superfamily connections, nearly 35% improvement in cross family and more than 80% improvement in intra family connections. From the results it was observed that HHblits is more sensitive in detecting remote homologues compared to Master Blaster. However, there are true hits from 46-folds for which Master Blaster reported homologs that are not reported by HHblits even using the optimal parameters indicating that for detecting remote homologues, use of multiple methods employing a combination of different approaches can be more effective in detecting remote homologs. Master Blaster stand-alone code is available for download in the supplementary archive.


Subject(s)
Proteins/chemistry , Sequence Analysis, Protein/methods , Algorithms , Databases, Protein , Position-Specific Scoring Matrices
2.
Gene ; 723: 144134, 2020 Jan 10.
Article in English | MEDLINE | ID: mdl-31589960

ABSTRACT

Viral kinases are known to undergo autophosphorylation and also phosphorylate viral and host substrates. Viral kinases have been implicated in various diseases and are also known to acquire host kinases for mimicking cellular functions and exhibit virulence. Although substantial analyses have been reported in the literature on diversity of viral kinases, there is a gap in the understanding of sequence and structural similarity among kinases from different classes of viruses. In this study, we performed a comprehensive analysis of protein kinases encoded in viral genomes. Homology search methods have been used to identify kinases from 104,282 viral genomic datasets. Serine/threonine and tyrosine kinases are identified only in 390 viral genomes. Out of seven viral classes that are based on nature of genetic material, only viruses having double-stranded DNA and single-stranded RNA retroviruses are found to encode kinases. The 716 identified protein kinases are classified into 63 subfamilies based on their sequence similarity within each cluster, and sequence signatures have been identified for each subfamily. 11 clusters are well represented with at least 10 members in each of these clusters. Kinases from dsDNA viruses, Phycodnaviridae which infect green algae and Herpesvirales that infect vertebrates including human, form a major group. From our analysis, it has been observed that the protein kinases in viruses belonging to same taxonomic lineages form discrete clusters and the kinases encoded in alphaherpesvirus form host-specific clusters. A comprehensive sequence and structure-based analysis enabled us to identify the conserved residues or motifs in kinase catalytic domain regions across all viral kinases. Conserved sequence regions that are specific to a particular viral kinase cluster and the kinases that show close similarity to eukaryotic kinases were identified by using sequence and three-dimensional structural regions of eukaryotic kinases as reference. The regions specific to each viral kinase cluster can be used as signatures in the future in classifying uncharacterized viral kinases. We note that kinases from giant viruses Marseilleviridae have close similarity to viral oncogenes in the functional regions and in putative substrate binding regions indicating their possible role in cancer.


Subject(s)
Protein Kinases/chemistry , Protein Kinases/genetics , Viruses/classification , Catalytic Domain , Computational Biology/methods , Databases, Protein , Genetic Variation , Phosphorylation , Phylogeny , Protein Kinases/metabolism , Sequence Homology, Amino Acid , Viral Proteins/chemistry , Viral Proteins/genetics , Viral Proteins/metabolism , Virulence Factors/chemistry , Virulence Factors/genetics , Virulence Factors/metabolism , Viruses/enzymology , Viruses/pathogenicity
3.
Methods Mol Biol ; 1415: 301-13, 2016.
Article in English | MEDLINE | ID: mdl-27115639

ABSTRACT

With the advent of genome sequencing projects in the recent past, several kinases have come to light as regulating different signaling pathways. These kinases are generally classified into different subfamilies based on their sequence similarity with members of known subfamilies of kinases. A functional association is then defined to the kinase based on the subfamily to which it has been characterized. However, one of the key factors that give identity to a kinase in a subfamily is its ability to phosphorylate a given set of substrates. Substrate specificity of a kinase is largely determined by the residues at the substrate binding site. Though in general the sequence similarity based measure for classification more or less gives the preliminary idea on subfamily, understanding the molecular basis of kinase substrate recognition could further refine the classification scheme for kinases and render a better understanding of their functional role. In this analysis we emphasize on the possibility of using putative substrate binding information in the classification of a given kinase into a particular subfamily.


Subject(s)
Protein Kinases/chemistry , Protein Kinases/classification , Amino Acid Motifs , Amino Acid Sequence , Binding Sites , Databases, Genetic , Humans , Models, Molecular , Phosphorylation , Protein Binding , Protein Conformation , Protein Kinases/metabolism , Sequence Alignment , Substrate Specificity
4.
In Silico Biol ; 4(2): 149-61, 2004.
Article in English | MEDLINE | ID: mdl-15107020

ABSTRACT

The genes having similar expression profiles are considered to have common regulatory mechanisms and are controlled by the binding of transcription factors to the regulatory elements present in their upstream regions. The detection of cis-regulatory elements can help in further understanding of co-expression of genes. This paper deals with the detection of motifs in the upstream regions of genes involved in diurnal rhythms of Arabidopsis and also deals with the correlation of expression data with sequence information. We detected motifs in the upstream regions of genes involved in diurnal cycles and checked for their presence in circadian regulated, dark induced and in light induced genes of Arabidopsis. Ten motifs were reported in this study, out of which five were already reported in available transcription factor databases as the elements involved in light responsiveness. Significance study of ten motifs was done by taking random sets of same data size. One of the ten motifs namely GGCCCA, which was found without any base variations in 62 genes, was further studied by analyzing the expression profiles of its respective genes within the set of diurnal regulated genes using SOM clustering method. It was found that the genes were clustered together into two major groups, out of which one group had glycine rich proteins and the second group had genes belonging to dehydrogenase and oxidoreductase family.


Subject(s)
Arabidopsis/genetics , Computational Biology/methods , Gene Expression Regulation, Plant , Models, Genetic , Algorithms , Amino Acid Motifs , Base Sequence , Computers , Databases as Topic , Gene Expression Regulation , Genetic Variation , Molecular Sequence Data , Multigene Family , Oligonucleotide Array Sequence Analysis , Time Factors
5.
In Silico Biol ; 3(4): 429-40, 2003.
Article in English | MEDLINE | ID: mdl-12954086

ABSTRACT

In the past decade there has been an increase in the number of completely sequenced genomes due to the race of multibillion-dollar genome-sequencing projects. The enormous biological sequence data thus flooding into the sequence databases necessitates the development of efficient tools for comparative genome sequence analysis. The information deduced by such analysis has various applications viz. structural and functional annotation of novel genes and proteins, finding gene order in the genome, gene fusion studies, constructing metabolic pathways etc. Such study also proves invaluable for pharmaceutical industries, such as in silico drug target identification and new drug discovery. There are various sequence analysis tools available for mining such useful information of which FASTA and Smith-Waterman algorithms are widely used. However, analyzing large datasets of genome sequences using the above codes seems to be impractical on uniprocessor machines. Hence there is a need for improving the performance of the above popular sequence analysis tools on parallel cluster computers. Performance of the Smith-Waterman (SSEARCH) and FASTA programs were studied on PARAM 10000, a parallel cluster of workstations designed and developed in-house. FASTA and SSEARCH programs, which are available from the University of Virginia, were ported on PARAM and were optimized. In this era of high performance computing, where the paradigm is shifting from conventional supercomputers to the cost-effective general-purpose cluster of workstations and PCs, this study finds extreme relevance. Good performance of sequence analysis tools on a cluster of workstations was demonstrated, which is important for accelerating identification of novel genes and drug targets by screening large databases.


Subject(s)
Computing Methodologies , Genomics/statistics & numerical data , Algorithms , Databases, Genetic , Sequence Analysis/statistics & numerical data , Software
SELECTION OF CITATIONS
SEARCH DETAIL
...