Search | VHL Regional Portal

1.

Protein feature engineering framework for AMPylation site prediction.

Prabhu, Hardik; Bhosale, Hrushikesh; Sane, Aamod; Dhadwal, Renu; Ramakrishnan, Vigneshwar; Valadi, Jayaraman.

Sci Rep ; 14(1): 8695, 2024 04 15.

Article in English | MEDLINE | ID: mdl-38622194

ABSTRACT

AMPylation is a biologically significant yet understudied post-translational modification where an adenosine monophosphate (AMP) group is added to Tyrosine and Threonine residues primarily. While recent work has illuminated the prevalence and functional impacts of AMPylation, experimental identification of AMPylation sites remains challenging. Computational prediction techniques provide a faster alternative approach. The predictive performance of machine learning models is highly dependent on the features used to represent the raw amino acid sequences. In this work, we introduce a novel feature extraction pipeline to encode the key properties relevant to AMPylation site prediction. We utilize a recently published dataset of curated AMPylation sites to develop our feature generation framework. We demonstrate the utility of our extracted features by training various machine learning classifiers, on various numerical representations of the raw sequences extracted with the help of our framework. Tenfold cross-validation is used to evaluate the model's capability to distinguish between AMPylated and non-AMPylated sites. The top-performing set of features extracted achieved MCC score of 0.58, Accuracy of 0.8, AUC-ROC of 0.85 and F1 score of 0.73. Further, we elucidate the behaviour of the model on the set of features consisting of monogram and bigram counts for various representations using SHapley Additive exPlanations.

Subject(s)

Protein Processing, Post-Translational , Tyrosine , Tyrosine/metabolism , Amino Acid Sequence , Adenosine Monophosphate/metabolism , Threonine/metabolism

2.

Identification of Plausible Candidates in Prostate Cancer Using Integrated Machine Learning Approaches.

Kour, Bhumandeep; Shukla, Nidhi; Bhargava, Harshita; Sharma, Devendra; Sharma, Amita; Singh, Anjuvan; Valadi, Jayaraman; Sadasukhi, Trilok Chand; Vuree, Sugunakar; Suravajhala, Prashanth.

Curr Genomics ; 24(5): 287-306, 2023 Dec 20.

Article in English | MEDLINE | ID: mdl-38235353

ABSTRACT

Background: Currently, prostate-specific antigen (PSA) is commonly used as a prostate cancer (PCa) biomarker. PSA is linked to some factors that frequently lead to erroneous positive results or even needless biopsies of elderly people. Objectives: In this pilot study, we undermined the potential genes and mutations from several databases and checked whether or not any putative prognostic biomarkers are central to the annotation. The aim of the study was to develop a risk prediction model that could help in clinical decision-making. Methods: An extensive literature review was conducted, and clinical parameters for related comorbidities, such as diabetes, obesity, as well as PCa, were collected. Such parameters were chosen with the understanding that variations in their threshold values could hasten the complicated process of carcinogenesis, more particularly PCa. The gathered data was converted to semi-binary data (-1, -0.5, 0, 0.5, and 1), on which machine learning (ML) methods were applied. First, we cross-checked various publicly available datasets, some published RNA-seq datasets, and our whole-exome sequencing data to find common role players in PCa, diabetes, and obesity. To narrow down their common interacting partners, interactome networks were analysed using GeneMANIA and visualised using Cytoscape, and later cBioportal was used (to compare expression level based on Z scored values) wherein various types of mutation w.r.t their expression and mRNA expression (RNA seq FPKM) plots are available. The GEPIA 2 tool was used to compare the expression of resulting similarities between the normal tissue and TCGA databases of PCa. Later, top-ranking genes were chosen to demonstrate striking clustering coefficients using the Cytoscape-cytoHubba module, and GEPIA 2 was applied again to ascertain survival plots. Results: Comparing various publicly available datasets, it was found that BLM is a frequent player in all three diseases, whereas comparing publicly available datasets, GWAS datasets, and published sequencing findings, SPFTPC and PPIMB were found to be the most common. With the assistance of GeneMANIA, TMPO and FOXP1 were found as common interacting partners, and they were also seen participating with BLM. Conclusion: A probabilistic machine learning model was achieved to identify key candidates between diabetes, obesity, and PCa. This, we believe, would herald precision scale modeling for easy prognosis.

3.

Machine Learning Heuristics on Gingivobuccal Cancer Gene Datasets Reveals Key Candidate Attributes for Prognosis.

Singh, Tanvi; Malik, Girik; Someshwar, Saloni; Le, Hien Thi Thu; Polavarapu, Rathnagiri; Chavali, Laxmi N; Melethadathil, Nidheesh; Sundararajan, Vijayaraghava Seshadri; Valadi, Jayaraman; Kavi Kishor, P B; Suravajhala, Prashanth.

Genes (Basel) ; 13(12)2022 12 16.

Article in English | MEDLINE | ID: mdl-36553647

ABSTRACT

Delayed cancer detection is one of the common causes of poor prognosis in the case of many cancers, including cancers of the oral cavity. Despite the improvement and development of new and efficient gene therapy treatments, very little has been carried out to algorithmically assess the impedance of these carcinomas. In this work, from attributes or NCBI's oral cancer datasets, viz. (i) name, (ii) gene(s), (iii) protein change, (iv) condition(s), clinical significance (last reviewed). We sought to train the number of instances emerging from them. Further, we attempt to annotate viable attributes in oral cancer gene datasets for the identification of gingivobuccal cancer (GBC). We further apply supervised and unsupervised machine learning methods to the gene datasets, revealing key candidate attributes for GBC prognosis. Our work highlights the importance of automated identification of key genes responsible for GBC that could perhaps be easily replicated in other forms of oral cancer detection.

Subject(s)

Heuristics , Mouth Neoplasms , Humans , Machine Learning , Prognosis , Oncogenes , Mouth Neoplasms/diagnosis , Mouth Neoplasms/genetics

4.

Editorial: Integrated systems genomic approaches for characterizing uncharacterized proteins.

Valadi, Jayaraman; Sundararajan, Vijayaraghava Seshadri; Bandapalli, Obul Reddy; Benso, Alfredo; Suravajhala, Prashanth.

Front Genet ; 13: 1000825, 2022.

Article in English | MEDLINE | ID: mdl-36176288

5.

In Silico Characterization of Uncharacterized Proteins From Multiple Strains of Clostridium Difficile.

Abbasi, Bilal Ahmed; Dharan, Aishwarya; Mishra, Astha; Saraf, Devansh; Ahamad, Irsad; Suravajhala, Prashanth; Valadi, Jayaraman.

Front Genet ; 13: 878012, 2022.

Article in English | MEDLINE | ID: mdl-36035185

ABSTRACT

Clostridium difficile (C. difficile) is a multi-strain, spore-forming, Gram-positive, opportunistic enteropathogen bacteria, majorly associated with nosocomial infections, resulting in severe diarrhoea and colon inflammation. Several antibiotics including penicillin, tetracycline, and clindamycin have been employed to control C. difficile infection, but studies have suggested that injudicious use of antibiotics has led to the development of resistance in C. difficile strains. However, many proteins from its genome are still considered uncharacterized proteins that might serve crucial functions and assist in the biological understanding of the organism. In this study, we aimed to annotate and characterise the 6 C. difficile strains using in silico approaches. We first analysed the complete genome of 6 C. difficile strains using standardised approaches and analysed hypothetical proteins (HPs) employing various bioinformatics approaches coalescing, including identifying contigs, coding sequences, phage sequences, CRISPR-Cas9 systems, antimicrobial resistance determination, membrane helices, instability index, secretory nature, conserved domain, and vaccine target properties like comparative homology analysis, allergenicity, antigenicity determination along with structure prediction and binding-site analysis. This study provides crucial supporting information about the functional characterization of the HPs involved in the pathophysiology of the disease. Moreover, this information also aims to assist in mechanisms associated with bacterial pathogenesis and further design candidate inhibitors and bona fide pharmaceutical targets.

6.

The Omic Insights on Unfolding Saga of COVID-19.

Kaur, Arvinpreet; Chopra, Mehak; Bhushan, Mahak; Gupta, Sonal; Kumari P, Hima; Sivagurunathan, Narmadhaa; Shukla, Nidhi; Rajagopal, Shalini; Bhalothia, Purva; Sharma, Purnima; Naravula, Jalaja; Suravajhala, Renuka; Gupta, Ayam; Abbasi, Bilal Ahmed; Goswami, Prittam; Singh, Harpreet; Narang, Rahul; Polavarapu, Rathnagiri; Medicherla, Krishna Mohan; Valadi, Jayaraman; Kumar S, Anil; Chaubey, Gyaneshwer; Singh, Keshav K; Bandapalli, Obul Reddy; Kavi Kishor, Polavarapu Bilhan; Suravajhala, Prashanth.

Front Immunol ; 12: 724914, 2021.

Article in English | MEDLINE | ID: mdl-34745097

ABSTRACT

The year 2019 has seen an emergence of the novel coronavirus named severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) causing coronavirus disease of 2019 (COVID-19). Since the onset of the pandemic, biological and interdisciplinary research is being carried out across the world at a rapid pace to beat the pandemic. There is an increased need to comprehensively understand various aspects of the virus from detection to treatment options including drugs and vaccines for effective global management of the disease. In this review, we summarize the salient findings pertaining to SARS-CoV-2 biology, including symptoms, hosts, epidemiology, SARS-CoV-2 genome, and its emerging variants, viral diagnostics, host-pathogen interactions, alternative antiviral strategies and application of machine learning heuristics and artificial intelligence for effective management of COVID-19 and future pandemics.

Subject(s)

COVID-19/immunology , SARS-CoV-2/physiology , Artificial Intelligence , COVID-19/epidemiology , Comorbidity , Heuristics , Host-Pathogen Interactions , Humans , Pandemics , Proteomics , Transcriptome

7.

Antibody Class(es) Predictor for Epitopes (AbCPE): A Multi-Label Classification Algorithm.

Kadam, Kiran; Peerzada, Noor; Karbhal, Rajiv; Sawant, Sangeeta; Valadi, Jayaraman; Kulkarni-Kale, Urmila.

Front Bioinform ; 1: 709951, 2021.

Article in English | MEDLINE | ID: mdl-36303781

ABSTRACT

Development of vaccines and therapeutic antibodies to deal with infectious and other diseases are the most perceptible scientific interventions that have had huge impact on public health including that in the current Covid-19 pandemic. From inactivation methodologies to reverse vaccinology, vaccine development strategies of 21st century have undergone several transformations and are moving towards rational design approaches. These developments are driven by data as the combinatorials involved in antigenic diversity of pathogens and immune repertoire of hosts are enormous. The computational prediction of epitopes is central to these developments and numerous B-cell epitope prediction methods developed over the years in the field of immunoinformatics have contributed enormously. Most of these methods predict epitopes that could potentially bind to an antibody regardless of its type and only a few account for antibody class specific epitope prediction. Recent studies have provided evidence of more than one class of antibodies being associated with a particular disease. Therefore, it is desirable to predict and prioritize 'peptidome' representing B-cell epitopes that can potentially bind to multiple classes of antibodies, as an open problem in immunoinformatics. To address this, AbCPE, a novel algorithm based on multi-label classification approach has been developed for prediction of antibody class(es) to which an epitope can potentially bind. The epitopes binding to one or more antibody classes (IgG, IgE, IgA and IgM) have been used as a knowledgebase to derive features for prediction. Multi-label algorithms, Binary Relevance and Label Powerset were applied along with Random Forest and AdaBoost. Classifier performance was assessed using evaluation measures like Hamming Loss, Precision, Recall and F1 score. The Binary Relevance model based on dipeptide composition, Random Forest and AdaBoost achieved the best results with Hamming Loss of 0.1121 and 0.1074 on training and test sets respectively. The results obtained by AbCPE are promising. To the best of our knowledge, this is the first multi-label method developed for prediction of antibody class(es) for sequential B-cell epitopes and is expected to bring a paradigm shift in the field of immunoinformatics and immunotherapeutic developments in synthetic biology. The AbCPE web server is available at http://bioinfo.unipune.ac.in/AbCPE/Home.html.

8.

Quantitative Structure Activity Relationship study of the Anti-Hepatitis Peptides employing Random Forests and Extra-trees regressors.

Mishra, Gunjan; Sehgal, Deepak; Valadi, Jayaraman K.

Bioinformation ; 13(3): 60-62, 2017.

Article in English | MEDLINE | ID: mdl-28584444

ABSTRACT

Antimicrobial peptides are host defense peptides being viewed as replacement to broad-spectrum antibiotics due to varied advantages. Hepatitis is the commonest infectious disease of liver, affecting 500 million globally with reported adverse side effects in treatment therapy. Antimicrobial peptides active against hepatitis are called as anti-hepatitis peptides (AHP). In current work, we present Extratrees and Random Forests based Quantitative Structure Activity Relationship (QSAR) regression modeling using extracted sequence based descriptors for prediction of the anti-hepatitis activity. The Extra-trees regression model yielded a very high performance in terms coefficient of determination (R2) as 0.95 for test set and 0.7 for the independent dataset. We hypothesize that the developed model can further be used to identify potentially active anti-hepatitis peptides with a high level of reliability.

9.

Recent trends in antimicrobial peptide prediction using machine learning techniques.

Shah, Yash; Sehgal, Deepak; Valadi, Jayaraman K.

Bioinformation ; 13(12): 415-416, 2017.

Article in English | MEDLINE | ID: mdl-29379261

ABSTRACT

The importance to develop effective alternatives to known antibiotics due to increased microbial resistance is gaining momentum in recent years. Therefore, it is of interest to predict, design and computationally model Antimicrobial Peptides (AMPs). AMPs are oligopeptides with varying size (from 5 to over100 residues) having key role in innate immunity. Thus, the potential exploitation of AMPs as novel therapeutic agents is evident. They act by causing cell death either by disrupting the microbial membrane by inhibiting extracellular polymer synthesis or by altering intra cellular polymer functions. AMPs have broad spectrum activity and act as first line of defense against all types of microorganisms including viruses, bacteria, parasites, fungi and as well as cancer (uncontrolled celldivision) progression. Large-scale identification and extraction of AMPs is often non-trivial, expensive and time consuming. Hence, there is a need to develop models to predict AMPs as therapeutics. We document recent trends and advancement in the prediction of AMP.

10.

Data Mining and Pattern Recognition Models for Identifying Inherited Diseases: Challenges and Implications.

Iddamalgoda, Lahiru; Das, Partha S; Aponso, Achala; Sundararajan, Vijayaraghava S; Suravajhala, Prashanth; Valadi, Jayaraman K.

Front Genet ; 7: 136, 2016.

Article in English | MEDLINE | ID: mdl-27559342

ABSTRACT

Data mining and pattern recognition methods reveal interesting findings in genetic studies, especially on how the genetic makeup is associated with inherited diseases. Although researchers have proposed various data mining models for biomedical approaches, there remains a challenge in accurately prioritizing the single nucleotide polymorphisms (SNP) associated with the disease. In this commentary, we review the state-of-art data mining and pattern recognition models for identifying inherited diseases and deliberate the need of binary classification- and scoring-based prioritization methods in determining causal variants. While we discuss the pros and cons associated with these methods known, we argue that the gene prioritization methods and the protein interaction (PPI) methods in conjunction with the K nearest neighbors' could be used in accurately categorizing the genetic factors in disease causation.

11.

Editorial: Annotation and curation of uncharacterized proteins: systems biology approaches.

Suravajhala, Prashanth; Benso, Alfredo; Valadi, Jayaraman K.

Front Genet ; 6: 224, 2015.

Article in English | MEDLINE | ID: mdl-26175751

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL