Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 12 de 12
Filter
Add more filters










Publication year range
1.
Gene ; 705: 113-126, 2019 Jul 15.
Article in English | MEDLINE | ID: mdl-31009682

ABSTRACT

Identification of splice sites is imperative for prediction of gene structure. Machine learning-based approaches (MLAs) have been reported to be more successful than the rule-based methods for identification of splice sites. However, the strings of alphabets should be transformed into numeric features through sequence encoding before using them as input in MLAs. In this study, we evaluated the performances of 8 different sequence encoding schemes i.e., Bayes kernel, density and sparse (DS), distribution of tri-nucleotide and 1st order Markov model (DM), frequency difference distance measure (FDDM), paired-nucleotide frequency difference between true and false sites (FDTF), 1st order Markov model (MM1), combination of both 1st and 2nd order Markov model (MM1 + MM2) and 2nd order Markov model (MM2) in respect of predicting donor and acceptor splice sites using 5 supervised learning methods (ANN, Bagging, Boosting, RF and SVM). The encoding schemes and machine learning methods were first evaluated in 4 species i.e., A. thaliana, C. elegans, D. melanogaster and H. sapiens, and then performances were validated with another four species i.e., Ciona intestinalis, Dictyostelium discoideum, Phaeodactylum tricornutum and Trypanosoma brucei. In terms of ROC (receiver-operating-characteristics) and PR (precision-recall) curves, FDTF encoding approach achieved higher accuracy followed by either MM2 or FDDM. Further, SVM was found to achieve higher accuracy (in terms of ROC and PR curves) followed by RF across encoding schemes and species. In terms of prediction accuracy across species, the SVM-FDTF combination was optimum than other combinations of classifiers and encoding schemes. Further, splice site prediction accuracies were observed higher for the species with low intron density. To our limited knowledge, this is the first attempt as far as comprehensive evaluation of sequence encoding schemes for prediction of splice sites is concerned. We have also developed an R-package EncDNA (https://cran.r-project.org/web/packages/EncDNA/index.html) for encoding of splice site motifs with different encoding schemes, which is expected to supplement the existing nucleotide sequence encoding approaches. This study is believed to be useful for the computational biologists for predicting different functional elements on the genomic DNA.


Subject(s)
Computational Biology/methods , RNA Splice Sites , RNA, Messenger/metabolism , Algorithms , Animals , Arabidopsis , Caenorhabditis elegans/genetics , Drosophila melanogaster/genetics , Machine Learning , RNA Splicing , ROC Curve , Trypanosoma brucei brucei/genetics
2.
Sci Rep ; 9(1): 778, 2019 01 28.
Article in English | MEDLINE | ID: mdl-30692561

ABSTRACT

Herbicide resistance (HR) is a major concern for the agricultural producers as well as environmentalists. Resistance to commonly used herbicides are conferred due to mutation(s) in the genes encoding herbicide target sites/proteins (GETS). Identification of these genes through wet-lab experiments is time consuming and expensive. Thus, a supervised learning-based computational model has been proposed in this study, which is first of its kind for the prediction of seven classes of GETS. The cDNA sequences of the genes were initially transformed into numeric features based on the k-mer compositions and then supplied as input to the support vector machine. In the proposed SVM-based model, the prediction occurs in two stages, where a binary classifier in the first stage discriminates the genes involved in conferring the resistance to herbicides from other genes, followed by a multi-class classifier in the second stage that categorizes the predicted herbicide resistant genes in the first stage into any one of the seven resistant classes. Overall classification accuracies were observed to be ~89% and >97% for binary and multi-class classifications respectively. The proposed model confirmed higher accuracy than the homology-based algorithms viz., BLAST and Hidden Markov Model. Besides, the developed computational model achieved ~87% accuracy, while tested with an independent dataset. An online prediction server HRGPred ( http://cabgrid.res.in:8080/hrgpred ) has also been established to facilitate the prediction of GETS by the scientific community.


Subject(s)
Computational Biology/methods , Herbicide Resistance , Plant Proteins/genetics , Plants/genetics , Algorithms , Gene Expression Regulation, Plant , Models, Genetic , Sequence Analysis, DNA , Sequence Homology, Nucleic Acid , Support Vector Machine
3.
BMC Genet ; 20(1): 2, 2019 01 07.
Article in English | MEDLINE | ID: mdl-30616524

ABSTRACT

BACKGROUND: Identification of unknown fungal species aids to the conservation of fungal diversity. As many fungal species cannot be cultured, morphological identification of those species is almost impossible. But, DNA barcoding technique can be employed for identification of such species. For fungal taxonomy prediction, the ITS (internal transcribed spacer) region of rDNA (ribosomal DNA) is used as barcode. Though the computational prediction of fungal species has become feasible with the availability of huge volume of barcode sequences in public domain, prediction of fungal species is challenging due to high degree of variability among ITS regions within species. RESULTS: A Random Forest (RF)-based predictor was built for identification of unknown fungal species. The reference and query sequences were mapped onto numeric features based on gapped base pair compositions, and then used as training and test sets respectively for prediction of fungal species using RF. More than 85% accuracy was found when 4 sequences per species in the reference set were utilized; whereas it was seen to be stabilized at ~88% if ≥7 sequence per species in the reference set were used for training of the model. The proposed model achieved comparable accuracy, while evaluated against existing methods through cross-validation procedure. The proposed model also outperformed several existing models used for identification of different species other than fungi. CONCLUSIONS: An online prediction server "funbarRF" is established at http://cabgrid.res.in:8080/funbarrf/ for fungal species identification. Besides, an R-package funbarRF ( https://cran.r-project.org/web/packages/funbarRF/ ) is also available for prediction using high throughput sequence data. The effort put in this work will certainly supplement the future endeavors in the direction of fungal taxonomy assignments based on DNA barcode.


Subject(s)
Computational Biology/methods , DNA Barcoding, Taxonomic/methods , Fungi/classification , Fungi/genetics , Supervised Machine Learning , DNA, Fungal/genetics , Software
4.
Genomics ; 111(3): 297-309, 2019 05.
Article in English | MEDLINE | ID: mdl-29522800

ABSTRACT

Nematodes are responsible for causing severe diseases in plants, humans and other animals. Infection is associated with the release of Excretory/Secretory (ES) proteins into host cytoplasm and interference with the host immune system which make them attractive targets for therapeutic use. The identification of ES proteins through bioinformatics approaches is cost- and time-effective and could be used for screening of potential targets for parasitic diseases for further experimental studies. Here, we identified and functionally annotated 93,949 ES proteins, in the genome of 73 nematodes using integration of various bioinformatics tools. 30.6% of ES proteins were found to be supported at RNA level. The predicted ES proteins, annotated by Gene Ontology terms, domains, metabolic pathways, proteases and enzyme class analysis were enriched in molecular functions of proteases, protease inhibitors, c-type lectin and hydrolases which are strongly associated with typical functions of ES proteins. We identified a total of 452 ES proteins from human and plant parasitic nematodes, homologues to DrugBank-approved targets and C. elegans RNA interference phenotype genes which could represent potential targets for parasite control and provide valuable resource for further experimental studies to understand host-pathogen interactions.


Subject(s)
Antinematodal Agents/pharmacology , Helminth Proteins/genetics , Host-Pathogen Interactions , Nematoda/genetics , Secretory Pathway , Animals , Genome, Helminth , Helminth Proteins/chemistry , Helminth Proteins/metabolism , Humans , Nematoda/drug effects , Nematoda/pathogenicity , Plants/parasitology
5.
Front Microbiol ; 9: 1100, 2018.
Article in English | MEDLINE | ID: mdl-29896173

ABSTRACT

As inorganic nitrogen compounds are essential for basic building blocks of life (e.g., nucleotides and amino acids), the role of biological nitrogen-fixation (BNF) is indispensible. All nitrogen fixing microbes rely on the same nitrogenase enzyme for nitrogen reduction, which is in fact an enzyme complex consists of as many as 20 genes. However, the occurrence of six genes viz., nifB, nifD, nifE, nifH, nifK, and nifN has been proposed to be essential for a functional nitrogenase enzyme. Therefore, identification of these genes is important to understand the mechanism of BNF as well as to explore the possibilities for improving BNF from agricultural sustainability point of view. Further, though the computational tools are available for the annotation and phylogenetic analysis of nifH gene sequences alone, to the best of our knowledge no tool is available for the computational prediction of the above mentioned six categories of nitrogen-fixation (nif) genes or proteins. Thus, we proposed an approach, which is first of its kind for the computational identification of nif proteins encoded by the six categories of nif genes. Sequence-derived features were employed to map the input sequences into vectors of numeric observations that were subsequently fed to the support vector machine as input. Two types of classifier were constructed: (i) a binary classifier for classification of nif and non-nitrogen-fixation (non-nif) proteins, and (ii) a multi-class classifier for classification of six categories of nif proteins. Higher accuracies were observed for the combination of composition-transition-distribution (CTD) feature set and radial kernel, as compared to the other feature-kernel combinations. The overall accuracies were observed >90% in both binary and multi-class classifications. The developed approach further achieved >92% accuracy, while evaluated with blind (independent) test datasets. The developed approach also produced higher accuracy in identifying nif proteins, while evaluated using proteome-wide datasets of several species. Furthermore, we established a prediction server nifPred (http://webapp.cabgrid.res.in/nifPred) to assist the scientific community for proteome-wide identification of six categories of nif proteins. Besides, the source code of nifPred is also available at https://github.com/PrabinaMeher/nifPred. The developed web server is expected to supplement the transcriptional profiling and comparative genomics studies for the identification and functional annotation of genes related to BNF.

6.
Comput Biol Chem ; 67: 225-233, 2017 Apr.
Article in English | MEDLINE | ID: mdl-28187376

ABSTRACT

The root-knot nematode, Meloidogyne incognita causes significant damage to various economically important crops. Infection is associated with secretion of effector proteins into host cytoplasm and interference with host innate immunity. To combat this infection, the identification and functional annotations of Excretory/Secretory (ES) proteins serve as a key to produce durable control measures. The identification of ES proteins through experimental methods are expensive and time consuming while bioinformatics approaches are cost-effective by prioritizing the experimental analysis of potential drug targets for parasitic diseases. In this study, we predicted and functionally annotated the 1889 ES proteins in M. incognita genome using integration of several bioinformatics tools. Of these 1889 ES proteins, 473 (25%) had orthologues in free living nematode Caenorhabditis elegans, 825(67.8%) in parasitic nematodes whereas 561 (29.7%) appeared to be novel and M. incognita specific molecules. Of the C. elegans homologues, 17 ES proteins had "loss of function phenotype" by RNA interference and could represent potential drug targets for parasite intervention and control. We could functionally annotate 429 (22.7%) ES proteins using Gene Ontology (GO) terms, 672 (35.5%) proteins to protein domains and established pathway associations for 223 (11.8%) sequences using Kyoto Encyclopaedia of Genes and Genomes (KEGG). The 162 (8.5%) ES proteins were also mapped to several important plant cell-wall degrading CAZyme families including chitinase, cellulase, xylanase, pectate lyase and endo-ß-1,4-xylanase. Our comprehensive analysis of M. incognita secretome provides functional information for further experimental study.


Subject(s)
Genome, Helminth/genetics , Helminth Proteins/classification , Helminth Proteins/genetics , Proteome/classification , Proteome/genetics , Animals , Computational Biology , Female , Gene Ontology , Male , Protein Domains , Tylenchoidea/genetics
7.
Front Genet ; 8: 235, 2017.
Article in English | MEDLINE | ID: mdl-29379521

ABSTRACT

Heat shock proteins (HSPs) play a pivotal role in cell growth and variability. Since conventional approaches are expensive and voluminous protein sequence information is available in the post-genomic era, development of an automated and accurate computational tool is highly desirable for prediction of HSPs, their families and sub-types. Thus, we propose a computational approach for reliable prediction of all these components in a single framework and with higher accuracy as well. The proposed approach achieved an overall accuracy of ~84% in predicting HSPs, ~97% in predicting six different families of HSPs, and ~94% in predicting four types of DnaJ proteins, with bench mark datasets. The developed approach also achieved higher accuracy as compared to most of the existing approaches. For easy prediction of HSPs by experimental scientists, a user friendly web server ir-HSP is made freely accessible at http://cabgrid.res.in:8080/ir-hsp. The ir-HSP was further evaluated for proteome-wide identification of HSPs by using proteome datasets of eight different species, and ~50% of the predicted HSPs in each species were found to be annotated with InterPro HSP families/domains. Thus, the developed computational method is expected to supplement the currently available approaches for prediction of HSPs, to the extent of their families and sub-types.

8.
BMC Genomics ; 17: 166, 2016 Mar 01.
Article in English | MEDLINE | ID: mdl-26931371

ABSTRACT

BACKGROUND: Nematodes are the most numerous animals in the soil. Insect parasitic nematodes of the genus Heterorhabditis are capable of selectively seeking, infecting and killing their insect-hosts in the soil. The infective juvenile (IJ) stage of the Heterorhabditis nematodes is analogous to Caenorhabditis elegans dauer juvenile stage, which remains in 'arrested development' till it finds and infects a new insect-host in the soil. H. indica is the most prevalent species of Heterorhabditis in India. To understand the genes and molecular processes that govern the biology of the IJ stage, and to create a resource to facilitate functional genomics and genetic exploration, we sequenced the transcriptome of H. indica IJs. RESULTS: The de-novo sequence assembly using Velvet-Oases pipeline resulted in 13,593 unique transcripts at N50 of 1,371 bp, of which 53 % were annotated by blastx. H. indica transcripts showed higher orthology with parasitic nematodes as compared to free living nematodes. In-silico expression analysis showed 30 % of transcripts expressing with ≥100 FPKM value. All the four canonical dauer formation pathways like cGMP-PKG, insulin, dafachronic acid and TGF-ß were active in the IJ stage. Several other signaling pathways were highly represented in the transcriptome. Twenty-four orthologs of C. elegans RNAi pathway effector genes were discovered in H. indica, including nrde-3 that is reported for the first time in any of the parasitic nematodes. An ortholog of C. elegans tol-1 was also identified. Further, 272 kinases belonging to 137 groups, and several previously unidentified members of important gene classes were identified. CONCLUSIONS: We generated high-quality transcriptome sequence data from H. indica IJs for the first time. The transcripts showed high similarity with the parasitic nematodes, M. hapla, and A. suum as opposed to C. elegans, a species to which H. indica is more closely related. The high representation of transcripts from several signaling pathways in the IJs indicates that despite being a developmentally arrested stage; IJs are a hotbed of signaling and are actively interacting with their environment.


Subject(s)
Insecta/parasitology , Nematoda/genetics , Transcriptome , Animals , Gene Ontology , Genes, Helminth , Life Cycle Stages , Metabolic Networks and Pathways , RNA Interference , Signal Transduction
9.
Bioinformation ; 12(12): 412-415, 2016.
Article in English | MEDLINE | ID: mdl-28356679

ABSTRACT

Insulin-like (ins) peptides play an important role in development and metabolism across the metazoa. In nematodes, these are also required for dauer formation and longevity and are expressed in different types of neurons across various life stages which demonstrate their role in parasites and could become possible targets for parasite control. To date, many nematode genomes are publically available. However, a systematic screening of ins peptides across different nematode group has not been reported. In the present study, we systematically identified ins peptides in the secretomes of 73 nematodes with fully sequenced genomes covering five different groups viz. plant parasitic, animal parasitic, human parasitic, entomopathogenic and free living nematodes. From the total of 93,949 secretory proteins, 176 proteins were uniquely mapped to 40 identified C. elegans ins families. The obtained result showed that 74.15% of the identified ins proteins were represented in free living nematodes only and remaining 25.84% were combinedly identified in all other nematode groups. The ins-1, ins-17 and ins-18 were the only ins families which were detected in all the studied nematode groups. Out of 176 proteins, 96 of ins proteins were predicted as hydrophilic in nature and 39 proteins were found stable using ProtParam analysis. Our study provides insight into the distribution of ins peptides across different group of nematodes and this information could be useful for further experimental study.

10.
Bioinformation ; 9(18): 937-40, 2013.
Article in English | MEDLINE | ID: mdl-24307773

ABSTRACT

UNLABELLED: Designed degenerate primers unlike conventional primers are superior in matching and amplification of large number of genes, from related gene families. DPPrimer tool was designed to predict primers for PCR amplification of homologous gene from related or diverse plant species. The key features of this tool include platform independence and user friendliness in primer design. Embedded features such as search for functional domains, similarity score selection and phylogebetic tree further enhance the user friendliness of DPPrimer tool. Performance of DPPrimer tool was evaluated by successful PCR amplification of ADP-glucose phosphorylase genes from wheat, barley and rice. AVAILABILITY: DPPrimer is freely accessible at http://202.141.12.147/DGEN_tool/index.html.

11.
Curr Microbiol ; 66(5): 507-14, 2013 May.
Article in English | MEDLINE | ID: mdl-23325033

ABSTRACT

Proteome analysis of Enterobacter ludwigii PAS1 provide a powerful set of tool to study the cold shock proteins along with that combination of bioinformatics is useful for interpretation of comparative results from many species. There is a considerable interest in the use of psychrotrophic bacteria for nitrogen fixation, especially at hilly regions, thus better understanding of cold adaptation mechanisms too. The psychrotrophic E. ludwigii PAS1 grown at 30 and 4 °C, isolated from Himalaya soil was undertaken for proteomic responses during optimal and cold shock conditions. Comparative proteomic analyses using two-dimensional gel electrophoresis (2-DE) and MALDI-TOF/TOF MS revealed the presence of Cold shock protein E (CspE). Three-dimensional structure of CspE of E. ludwigii PAS1 divulge the presence of five antiparallel ß-sheets forming a ß-barrel structure with surface exposed aromatic and basic residues that were responsible for nucleic acid binding and also reveals the presence of highly conserved nucleic acid-binding motifs RNP1 and RNP2 in Csp family.


Subject(s)
Bacterial Proteins/chemistry , Bacterial Proteins/genetics , Catalytic Domain , Conserved Sequence , Enterobacter/genetics , Gene Expression , Soil Microbiology , Amino Acid Sequence , Computer Simulation , Enterobacter/isolation & purification , Models, Molecular , Molecular Sequence Annotation , Molecular Sequence Data , Phylogeny , Protein Conformation , Proteomics , Sequence Alignment
12.
J Biomol Struct Dyn ; 31(1): 30-43, 2013.
Article in English | MEDLINE | ID: mdl-22804492

ABSTRACT

The group of antigen 85 proteins of Mycobacterium tuberculosis is responsible for converting trehalose monomycolate to trehalose dimycolate, which contributes to cell wall stability. Here, we have used a serial enrichment approach to identify new potential inhibitors by searching the libraries of compounds using both 2D atom pair descriptors and binary fingerprints followed by molecular docking. Three different docking softwares AutoDock, GOLD, and LigandFit were used for docking calculations. In addition, we applied the criteria of selecting compounds with binding efficiency close to the starting known inhibitor and showing potential to form hydrogen bonds with the active site amino acid residues. The starting inhibitor was ethyl-3-phenoxybenzyl-butylphosphonate, which had IC(50) value of 2.0 µM in mycolyltransferase inhibition assay. Our search from more than 34 million compounds from public libraries yielded 49 compounds. Subsequently, selection was restricted to compounds conforming to the Lipinski rule of five and exhibiting hydrogen bonding to any of the amino acid residues in the active site pocket of all three proteins of antigen 85A, 85B, and 85C. Finally, we selected those ligands which were ranked top in the table with other known decoys in all the docking results. The compound NIH415032 from tuberculosis antimicrobial acquisition and coordinating facility was further examined using molecular dynamics simulations for 10 ns. These results showed that the binding is stable, although some of the hydrogen bond atom pairs varied through the course of simulation. The NIH415032 has antitubercular properties with IC(90) at 20 µg/ml (53.023 µM). These results will be helpful to the medicinal chemists for developing new antitubercular molecules for testing.


Subject(s)
Acyltransferases/chemistry , Antigens, Bacterial/chemistry , Antitubercular Agents/chemistry , Bacterial Proteins/chemistry , Mycobacterium tuberculosis/enzymology , Acyltransferases/metabolism , Antigens, Bacterial/metabolism , Antitubercular Agents/metabolism , Bacterial Proteins/metabolism , Binding Sites , Drug Design , Hydrogen Bonding , Ligands , Molecular Docking Simulation
SELECTION OF CITATIONS
SEARCH DETAIL
...