Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 36
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Protein Sci ; 33(6): e5015, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38747369

RESUMO

Prokaryotic DNA binding proteins (DBPs) play pivotal roles in governing gene regulation, DNA replication, and various cellular functions. Accurate computational models for predicting prokaryotic DBPs hold immense promise in accelerating the discovery of novel proteins, fostering a deeper understanding of prokaryotic biology, and facilitating the development of therapeutics targeting for potential disease interventions. However, existing generic prediction models often exhibit lower accuracy in predicting prokaryotic DBPs. To address this gap, we introduce ProkDBP, a novel machine learning-driven computational model for prediction of prokaryotic DBPs. For prediction, a total of nine shallow learning algorithms and five deep learning models were utilized, with the shallow learning models demonstrating higher performance metrics compared to their deep learning counterparts. The light gradient boosting machine (LGBM), coupled with evolutionarily significant features selected via random forest variable importance measure (RF-VIM) yielded the highest five-fold cross-validation accuracy. The model achieved the highest auROC (0.9534) and auPRC (0.9575) among the 14 machine learning models evaluated. Additionally, ProkDBP demonstrated substantial performance with an independent dataset, exhibiting higher values of auROC (0.9332) and auPRC (0.9371). Notably, when benchmarked against several cutting-edge existing models, ProkDBP showcased superior predictive accuracy. Furthermore, to promote accessibility and usability, ProkDBP (https://iasri-sg.icar.gov.in/prokdbp/) is available as an online prediction tool, enabling free access to interested users. This tool stands as a significant contribution, enhancing the repertoire of resources for accurate and efficient prediction of prokaryotic DBPs.


Assuntos
Proteínas de Bactérias , Proteínas de Ligação a DNA , Aprendizado de Máquina , Algoritmos , Proteínas de Bactérias/química , Proteínas de Bactérias/metabolismo , Proteínas de Bactérias/genética , Biologia Computacional/métodos , Proteínas de Ligação a DNA/química , Proteínas de Ligação a DNA/metabolismo
2.
Comput Struct Biotechnol J ; 23: 1631-1640, 2024 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38660008

RESUMO

RNA-binding proteins (RBPs) are central to key functions such as post-transcriptional regulation, mRNA stability, and adaptation to varied environmental conditions in prokaryotes. While the majority of research has concentrated on eukaryotic RBPs, recent developments underscore the crucial involvement of prokaryotic RBPs. Although computational methods have emerged in recent years to identify RBPs, they have fallen short in accurately identifying prokaryotic RBPs due to their generic nature. To bridge this gap, we introduce RBProkCNN, a novel machine learning-driven computational model meticulously designed for the accurate prediction of prokaryotic RBPs. The prediction process involves the utilization of eight shallow learning algorithms and four deep learning models, incorporating PSSM-based evolutionary features. By leveraging a convolutional neural network (CNN) and evolutionarily significant features selected through extreme gradient boosting variable importance measure, RBProkCNN achieved the highest accuracy in five-fold cross-validation, yielding 98.04% auROC and 98.19% auPRC. Furthermore, RBProkCNN demonstrated robust performance with an independent dataset, showcasing a commendable 95.77% auROC and 95.78% auPRC. Noteworthy is its superior predictive accuracy when compared to several state-of-the-art existing models. RBProkCNN is available as an online prediction tool (https://iasri-sg.icar.gov.in/rbprokcnn/), offering free access to interested users. This tool represents a substantial contribution, enriching the array of resources available for the accurate and efficient prediction of prokaryotic RBPs.

3.
Biochim Biophys Acta Gen Subj ; 1868(6): 130597, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38490467

RESUMO

BACKGROUND: Abiotic stresses pose serious threat to the growth and yield of crop plants. Several studies suggest that in plants, transcription factors (TFs) are important regulators of gene expression, especially when it comes to coping with abiotic stresses. Therefore, it is crucial to identify TFs associated with abiotic stress response for breeding of abiotic stress tolerant crop cultivars. METHODS: Based on a machine learning framework, a computational model was envisaged to predict TFs associated with abiotic stress response in plants. To numerically encode TF sequences, four distinct sequence derived features were generated. The prediction was performed using ten shallow learning and four deep learning algorithms. For prediction using more pertinent and informative features, feature selection techniques were also employed. RESULTS: Using the features chosen by the light-gradient boosting machine-variable importance measure (LGBM-VIM), the LGBM achieved the highest cross-validation performance metrics (accuracy: 86.81%, auROC: 92.98%, and auPRC: 94.03%). Further evaluation of the proposed model (LGBM prediction method + LGBM-VIM selected features) was also done using an independent test dataset, where the accuracy, auROC and auPRC were observed 81.98%, 90.65% and 91.30%, respectively. CONCLUSIONS: To facilitate the adoption of the proposed strategy by users, the approach was implemented as a prediction server called ASPTF, accessible at https://iasri-sg.icar.gov.in/asptf/. The developed approach and the corresponding web application are anticipated to supplement experimental methods in the identification of transcription factors (TFs) responsive to abiotic stress in plants.


Assuntos
Aprendizado de Máquina , Estresse Fisiológico , Fatores de Transcrição , Fatores de Transcrição/metabolismo , Fatores de Transcrição/genética , Algoritmos , Regulação da Expressão Gênica de Plantas , Biologia Computacional/métodos , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Plantas/metabolismo , Plantas/genética
4.
Plant Genome ; 16(4): e20332, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37122189

RESUMO

In wheat, genomic prediction accuracy (GPA) was assessed for three micronutrient traits (grain iron, grain zinc, and ß-carotenoid concentrations) using eight Bayesian regression models. For this purpose, data on 246 accessions, each genotyped with 17,937 DArT markers, were utilized. The phenotypic data on traits were available for 2013-2014 from Powerkheda (Madhya Pradesh) and for 2014-2015 from Meerut (Uttar Pradesh), India. The accuracy of the models was measured in terms of reliability, which was computed following a repeated cross-validation approach. The predictions were obtained independently for each of the two environments after adjusting for the local effects and across environments after adjusting for the environmental effects. The Bayes ridge regression (BayesRR) model outperformed the other seven models, whereas BayesLASSO (BayesL) was the least efficient. The GPA increased with an increase in the size of the training set as well as with an increase in marker density. The GPA values differed for the three traits and were higher for the best linear unbiased estimate (BLUE) (obtained after adjusting for the environmental effects) relative to those for the two environments. The GPA also remained unaffected after accounting for the population structure. The results of the present study suggest that only the best model should be used for the estimations of genomic estimated breeding values (GEBVs) before their use for genomic selection to improve the grain micronutrient contents.


Assuntos
Micronutrientes , Triticum , Triticum/genética , Teorema de Bayes , Reprodutibilidade dos Testes , Pão , Melhoramento Vegetal , Genômica/métodos , Grão Comestível/genética
5.
Funct Integr Genomics ; 23(2): 113, 2023 Mar 31.
Artigo em Inglês | MEDLINE | ID: mdl-37000299

RESUMO

Abiotic stresses are detrimental to plant growth and development and have a major negative impact on crop yields. A growing body of evidence indicates that a large number of long non-coding RNAs (lncRNAs) are key to many abiotic stress responses. Thus, identifying abiotic stress-responsive lncRNAs is essential in crop breeding programs in order to develop crop cultivars resistant to abiotic stresses. In this study, we have developed the first machine learning-based computational model for predicting abiotic stress-responsive lncRNAs. The lncRNA sequences which were responsive and non-responsive to abiotic stresses served as the two classes of the dataset for binary classification using the machine learning algorithms. The training dataset was created using 263 stress-responsive and 263 non-stress-responsive sequences, whereas the independent test set consists of 101 sequences from both classes. As the machine learning model can adopt only the numeric data, the Kmer features ranging from sizes 1 to 6 were utilized to represent lncRNAs in numeric form. To select important features, four different feature selection strategies were utilized. Among the seven learning algorithms, the support vector machine (SVM) achieved the highest cross-validation accuracy with the selected feature sets. The observed 5-fold cross-validation accuracy, AU-ROC, and AU-PRC were found to be 68.84, 72.78, and 75.86%, respectively. Furthermore, the robustness of the developed model (SVM with the selected feature) was evaluated using an independent test dataset, where the overall accuracy, AU-ROC, and AU-PRC were found to be 76.23, 87.71, and 88.49%, respectively. The developed computational approach was also implemented in an online prediction tool ASLncR accessible at https://iasri-sg.icar.gov.in/aslncr/ . The proposed computational model and the developed prediction tool are believed to supplement the existing effort for the identification of abiotic stress-responsive lncRNAs in plants.


Assuntos
RNA Longo não Codificante , RNA Longo não Codificante/genética , Biologia Computacional , Melhoramento Vegetal , Algoritmos , Plantas/genética , Estresse Fisiológico/genética
6.
Funct Integr Genomics ; 23(2): 92, 2023 Mar 20.
Artigo em Inglês | MEDLINE | ID: mdl-36939943

RESUMO

Abiotic stresses have become a major challenge in recent years due to their pervasive nature and shocking impacts on plant growth, development, and quality. MicroRNAs (miRNAs) play a significant role in plant response to different abiotic stresses. Thus, identification of specific abiotic stress-responsive miRNAs holds immense importance in crop breeding programmes to develop cultivars resistant to abiotic stresses. In this study, we developed a machine learning-based computational model for prediction of miRNAs associated with four specific abiotic stresses such as cold, drought, heat and salt. The pseudo K-tuple nucleotide compositional features of Kmer size 1 to 5 were used to represent miRNAs in numeric form. Feature selection strategy was employed to select important features. With the selected feature sets, support vector machine (SVM) achieved the highest cross-validation accuracy in all four abiotic stress conditions. The highest cross-validated prediction accuracies in terms of area under precision-recall curve were found to be 90.15, 90.09, 87.71, and 89.25% for cold, drought, heat and salt respectively. Overall prediction accuracies for the independent dataset were respectively observed 84.57, 80.62, 80.38 and 82.78%, for the abiotic stresses. The SVM was also seen to outperform different deep learning models for prediction of abiotic stress-responsive miRNAs. To implement our method with ease, an online prediction server "ASmiR" has been established at https://iasri-sg.icar.gov.in/asmir/ . The proposed computational model and the developed prediction tool are believed to supplement the existing effort for identification of specific abiotic stress-responsive miRNAs in plants.


Assuntos
MicroRNAs , MicroRNAs/genética , Melhoramento Vegetal , Plantas/genética , Aprendizado de Máquina , Cloreto de Sódio , Estresse Fisiológico/genética , Regulação da Expressão Gênica de Plantas
7.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36416116

RESUMO

DNA-binding proteins (DBPs) play crucial roles in numerous cellular processes including nucleotide recognition, transcriptional control and the regulation of gene expression. Majority of the existing computational techniques for identifying DBPs are mainly applicable to human and mouse datasets. Even though some models have been tested on Arabidopsis, they produce poor accuracy when applied to other plant species. Therefore, it is imperative to develop an effective computational model for predicting plant DBPs. In this study, we developed a comprehensive computational model for plant specific DBPs identification. Five shallow learning and six deep learning models were initially used for prediction, where shallow learning methods outperformed deep learning algorithms. In particular, support vector machine achieved highest repeated 5-fold cross-validation accuracy of 94.0% area under receiver operating characteristic curve (AUC-ROC) and 93.5% area under precision recall curve (AUC-PR). With an independent dataset, the developed approach secured 93.8% AUC-ROC and 94.6% AUC-PR. While compared with the state-of-art existing tools by using an independent dataset, the proposed model achieved much higher accuracy. Overall results suggest that the developed computational model is more efficient and reliable as compared to the existing models for the prediction of DBPs in plants. For the convenience of the majority of experimental scientists, the developed prediction server PlDBPred is publicly accessible at https://iasri-sg.icar.gov.in/pldbpred/.The source code is also provided at https://iasri-sg.icar.gov.in/pldbpred/source_code.php for prediction using a large-size dataset.


Assuntos
Arabidopsis , Proteínas de Ligação a DNA , Algoritmos , Arabidopsis/genética , Arabidopsis/metabolismo , Biologia Computacional/métodos , Simulação por Computador , Proteínas de Ligação a DNA/genética , Proteínas de Ligação a DNA/metabolismo , Curva ROC , Software
8.
Plant Genome ; : e20259, 2022 Sep 13.
Artigo em Inglês | MEDLINE | ID: mdl-36098562

RESUMO

One of the thrust areas of research in plant breeding is to develop crop cultivars with enhanced tolerance to abiotic stresses. Thus, identifying abiotic stress-responsive genes (SRGs) and proteins is important for plant breeding research. However, identifying such genes via established genetic approaches is laborious and resource intensive. Although transcriptome profiling has remained a reliable method of SRG identification, it is species specific. Additionally, identifying multistress responsive genes using gene expression studies is cumbersome. Thus, endorsing the need to develop a computational method for identifying the genes associated with different abiotic stresses. In this work, we aimed to develop a computational model for identifying genes responsive to six abiotic stresses: cold, drought, heat, light, oxidative, and salt. The predictions were performed using support vector machine (SVM), random forest, adaptive boosting (ADB), and extreme gradient boosting (XGB), where the autocross covariance (ACC) and K-mer compositional features were used as input. With ACC, K-mer, and ACC + K-mer compositional features, the overall accuracy of ∼60-77, ∼75-86, and ∼61-78% were respectively obtained using the SVM algorithm with fivefold cross-validation. The SVM also achieved higher accuracy than the other three algorithms. The proposed model was also assessed with an independent dataset and obtained an accuracy consistent with cross-validation. The proposed model is the first of its kind and is expected to serve the requirement of experimental biologists; however, the prediction accuracy was modest. Given its importance for the research community, the online prediction application, ASRpro, is made freely available (https://iasri-sg.icar.gov.in/asrpro/) for predicting abiotic SRGs and proteins.

9.
Brief Bioinform ; 23(5)2022 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-35998895

RESUMO

Linear B-cell epitopes have a prominent role in the development of peptide-based vaccines and disease diagnosis. High variability in the length of these epitopes is a major reason for low accuracy in their prediction. Most of the B-cell epitope prediction methods considered fixed length of epitope sequences and achieved good accuracy. Though a number of tools are available for the prediction of flexible length linear B-cell epitopes with reasonable accuracy, further improvement in the prediction performance is still expected. Thus, here we made an attempt to analyze the performance of machine learning approaches (MLA) with 18 different amino acid encoding schemes in the prediction of flexible length linear B-cell epitopes. We considered B-cell epitope sequences of variable lengths (11-56 amino acids) from well-established public resources. The performances of machine learning algorithms with the encoded epitope sequence datasets were evaluated. Besides, the feasible combinations of encoding schemes were also explored and analyzed. The results revealed that amino-acid composition (AC) and distribution component of composition-transition-distribution encoding schemes are suitable for heterogeneous epitope data, whereas amino-acid-anchoring-pair-composition (APC), dipeptide-composition and amino-acids-pair-propensity-scale (APP) are more appropriate for homogeneous data. Further, two combinations of peptide encoding schemes, i.e. APC + AC and APC + APP with random forest classifier were identified to have improved performance over the state-of-the-art tools for flexible length linear B-cell epitope prediction. The study also revealed better performance of random forest over other considered MLAs in the prediction of flexible length linear B-cell epitopes.


Assuntos
Epitopos de Linfócito B , Vacinas , Aminoácidos/genética , Dipeptídeos , Peptídeos/química
10.
Heredity (Edinb) ; 128(6): 519-530, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-35508540

RESUMO

We evaluated the performances of three BLUP and five Bayesian methods for genomic prediction by using nine actual and 54 simulated datasets. The genomic prediction accuracy was measured using Pearson's correlation coefficient between the genomic estimated breeding value (GEBV) and the observed phenotypic data using a fivefold cross-validation approach with 100 replications. The Bayesian alphabets performed better for the traits governed by a few genes/QTLs with relatively larger effects. On the contrary, the BLUP alphabets (GBLUP and CBLUP) exhibited higher genomic prediction accuracy for the traits controlled by several small-effect QTLs. Additionally, Bayesian methods performed better for the highly heritable traits and, for other traits, performed at par with the BLUP methods. Further, genomic BLUP (GBLUP) was identified as the least biased method for the GEBV estimation. Among the Bayesian methods, the Bayesian ridge regression and Bayesian LASSO were less biased than other Bayesian alphabets. Nonetheless, genomic prediction accuracy increased with an increase in trait heritability, irrespective of the sample size, marker density, and the QTL type (major/minor effect). In sum, this study provides valuable information regarding the choice of the selection method for genomic prediction in different breeding programs.


Assuntos
Genômica , Modelos Genéticos , Teorema de Bayes , Genômica/métodos , Genótipo , Fenótipo , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas
11.
Physiol Mol Biol Plants ; 28(3): 651-668, 2022 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-35465203

RESUMO

In the present study in wheat, GWAS was conducted for identification of marker trait associations (MTAs) for the following six grain morphology traits: (1) grain cross-sectional area (GCSA), (2) grain perimeter (GP), (3) grain length (GL), (4) grain width (GWid), (5) grain length-width ratio (GLWR) and (6) grain form-density (GFD). The data were recorded on a subset of spring wheat reference set (SWRS) comprising 225 diverse genotypes, which were genotyped using 10,904 SNPs and phenotyped for two consecutive years (2017-2018, 2018-2019). GWAS was conducted using five different models including two single-locus models (CMLM, SUPER), one multi-locus model (FarmCPU), one multi-trait model (mvLMM) and a model for Q x Q epistatic interactions. False discovery rate (FDR) [P value -log10(p) ≥ 5] and Bonferroni correction [P value -log10(p) ≥ 6] (corrected p value < 0.05) were applied to eliminate false positives due to multiple testing. This exercise gave 88 main effect and 29 epistatic MTAs after FDR and 13 main effect and 6 epistatic MTAs after Bonferroni corrections. MTAs obtained after Bonferroni corrections were further utilized for identification of 55 candidate genes (CGs). In silico expression analysis of CGs in different tissues at different parts of the seed at different developmental stages was also carried out. MTAs and CGs identified during the present study are useful addition to available resources for MAS to supplement wheat breeding programmes after due validation and also for future strategic basic research. Supplementary Information: The online version contains supplementary material available at 10.1007/s12298-022-01164-w.

12.
Physiol Mol Biol Plants ; 28(1): 1-16, 2022 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-35221569

RESUMO

In plants, GIGANTEA (GI) protein plays different biological functions including carbon and sucrose metabolism, cell wall deposition, transpiration and hypocotyl elongation. This suggests that GI is an important class of proteins. So far, the resource-intensive experimental methods have been mostly utilized for identification of GI proteins. Thus, we made an attempt in this study to develop a computational model for fast and accurate prediction of GI proteins. Ten different supervised learning algorithms i.e., SVM, RF, JRIP, J48, LMT, IBK, NB, PART, BAGG and LGB were employed for prediction, where the amino acid composition (AAC), FASGAI features and physico-chemical (PHYC) properties were used as numerical inputs for the learning algorithms. Higher accuracies i.e., 96.75% of AUC-ROC and 86.7% of AUC-PR were observed for SVM coupled with AAC + PHYC feature combination, while evaluated with five-fold cross validation. With leave-one-out cross validation, 97.29% of AUC-ROC and 87.89% of AUC-PR were respectively achieved. While the performance of the model was evaluated with an independent dataset of 18 GI sequences, 17 were observed as correctly predicted. We have also performed proteome-wide identification of GI proteins in wheat, followed by functional annotation using Gene Ontology terms. A prediction server "GIpred" is freely accessible at http://cabgrid.res.in:8080/gipred/ for proteome-wide recognition of GI proteins. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s12298-022-01130-6.

13.
Int J Mol Sci ; 23(3)2022 Jan 30.
Artigo em Inglês | MEDLINE | ID: mdl-35163534

RESUMO

MicroRNAs (miRNAs) play a significant role in plant response to different abiotic stresses. Thus, identification of abiotic stress-responsive miRNAs holds immense importance in crop breeding programmes to develop cultivars resistant to abiotic stresses. In this study, we developed a machine learning-based computational method for prediction of miRNAs associated with abiotic stresses. Three types of datasets were used for prediction, i.e., miRNA, Pre-miRNA, and Pre-miRNA + miRNA. The pseudo K-tuple nucleotide compositional features were generated for each sequence to transform the sequence data into numeric feature vectors. Support vector machine (SVM) was employed for prediction. The area under receiver operating characteristics curve (auROC) of 70.21, 69.71, 77.94 and area under precision-recall curve (auPRC) of 69.96, 65.64, 77.32 percentages were obtained for miRNA, Pre-miRNA, and Pre-miRNA + miRNA datasets, respectively. Overall prediction accuracies for the independent test set were 62.33, 64.85, 69.21 percentages, respectively, for the three datasets. The SVM also achieved higher accuracy than other learning methods such as random forest, extreme gradient boosting, and adaptive boosting. To implement our method with ease, an online prediction server "ASRmiRNA" has been developed. The proposed approach is believed to supplement the existing effort for identification of abiotic stress-responsive miRNAs and Pre-miRNAs.


Assuntos
Biologia Computacional/métodos , MicroRNAs/genética , Plantas/genética , Algoritmos , Área Sob a Curva , Regulação da Expressão Gênica de Plantas , RNA de Plantas/genética , Estresse Fisiológico , Máquina de Vetores de Suporte
14.
3 Biotech ; 11(11): 484, 2021 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-34790508

RESUMO

Identification of splice sites is an important aspect with regard to the prediction of gene structure. In most of the existing splice site prediction studies, machine learning algorithms coupled with sequence-derived features have been successfully employed for splice site recognition. However, the splice site identification by incorporating the secondary structure information is lacking, particularly in plant species. Thus, we made an attempt in this study to evaluate the performance of structural features on the splice site prediction accuracy in Arabidopsis thaliana. Prediction accuracies were evaluated with the sequence-derived features alone as well as by incorporating the structural features into the sequence-derived features, where support vector machine (SVM) was employed as prediction algorithm. Both short (40 base pairs) and long (105 base pairs) sequence datasets were considered for evaluation. After incorporating the secondary structure features, improvements in accuracies were observed only for the longer sequence dataset and the improvement was found to be higher with the sequence-derived features that accounted nucleotide dependencies. On the other hand, either a little or no improvement in accuracies was found for the short sequence dataset. The performance of SVM was further compared with that of LogitBoost, Random Forest (RF), AdaBoost and XGBoost machine learning methods. The prediction accuracies of SVM, AdaBoost and XGBoost were observed to be at par and higher than that of RF and LogitBoost algorithms. While prediction was performed by taking all the sequence-derived features along with the structural features, a little improvement in accuracies was found as compared to the combination of individual sequence-based features and structural features. To the best of our knowledge, this is the first attempt concerning the computational prediction of splice sites using machine learning methods by incorporating the secondary structure information into the sequence-derived features. All the source codes are available at https://github.com/meher861982/SSFeature. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s13205-021-03036-8.

15.
BMC Bioinformatics ; 22(1): 342, 2021 Jun 24.
Artigo em Inglês | MEDLINE | ID: mdl-34167457

RESUMO

BACKGROUND: Localization of messenger RNAs (mRNAs) plays a crucial role in the growth and development of cells. Particularly, it plays a major role in regulating spatio-temporal gene expression. The in situ hybridization is a promising experimental technique used to determine the localization of mRNAs but it is costly and laborious. It is also a known fact that a single mRNA can be present in more than one location, whereas the existing computational tools are capable of predicting only a single location for such mRNAs. Thus, the development of high-end computational tool is required for reliable and timely prediction of multiple subcellular locations of mRNAs. Hence, we develop the present computational model to predict the multiple localizations of mRNAs. RESULTS: The mRNA sequences from 9 different localizations were considered. Each sequence was first transformed to a numeric feature vector of size 5460, based on the k-mer features of sizes 1-6. Out of 5460 k-mer features, 1812 important features were selected by the Elastic Net statistical model. The Random Forest supervised learning algorithm was then employed for predicting the localizations with the selected features. Five-fold cross-validation accuracies of 70.87, 68.32, 68.36, 68.79, 96.46, 73.44, 70.94, 97.42 and 71.77% were obtained for the cytoplasm, cytosol, endoplasmic reticulum, exosome, mitochondrion, nucleus, pseudopodium, posterior and ribosome respectively. With an independent test set, accuracies of 65.33, 73.37, 75.86, 72.99, 94.26, 70.91, 65.53, 93.60 and 73.45% were obtained for the respective localizations. The developed approach also achieved higher accuracies than the existing localization prediction tools. CONCLUSIONS: This study presents a novel computational tool for predicting the multiple localization of mRNAs. Based on the proposed approach, an online prediction server "mLoc-mRNA" is accessible at http://cabgrid.res.in:8080/mlocmrna/ . The developed approach is believed to supplement the existing tools and techniques for the localization prediction of mRNAs.


Assuntos
Algoritmos , Biologia Computacional , Núcleo Celular , RNA Mensageiro/genética , Ribossomos
16.
Plant Methods ; 17(1): 46, 2021 Apr 26.
Artigo em Inglês | MEDLINE | ID: mdl-33902670

RESUMO

BACKGROUND: Circadian rhythms regulate several physiological and developmental processes of plants. Hence, the identification of genes with the underlying circadian rhythmic features is pivotal. Though computational methods have been developed for the identification of circadian genes, all these methods are based on gene expression datasets. In other words, we failed to search any sequence-based model, and that motivated us to deploy the present computational method to identify the proteins encoded by the circadian genes. RESULTS: Support vector machine (SVM) with seven kernels, i.e., linear, polynomial, radial, sigmoid, hyperbolic, Bessel and Laplace was utilized for prediction by employing compositional, transitional and physico-chemical features. Higher accuracy of 62.48% was achieved with the Laplace kernel, following the fivefold cross- validation approach. The developed model further secured 62.96% accuracy with an independent dataset. The SVM also outperformed other state-of-art machine learning algorithms, i.e., Random Forest, Bagging, AdaBoost, XGBoost and LASSO. We also performed proteome-wide identification of circadian proteins in two cereal crops namely, Oryza sativa and Sorghum bicolor, followed by the functional annotation of the predicted circadian proteins with Gene Ontology (GO) terms. CONCLUSIONS: To the best of our knowledge, this is the first computational method to identify the circadian genes with the sequence data. Based on the proposed method, we have developed an R-package PredCRG ( https://cran.r-project.org/web/packages/PredCRG/index.html ) for the scientific community for proteome-wide identification of circadian genes. The present study supplements the existing computational methods as well as wet-lab experiments for the recognition of circadian genes.

18.
Mol Breed ; 41(7): 46, 2021 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-37309385

RESUMO

A genome-wide association study (GWAS) for 10 yield and yield component traits was conducted using an association panel comprising 225 diverse spring wheat genotypes. The panel was genotyped using 10,904 SNPs and evaluated for three years (2016-2019), which constituted three environments (E1, E2 and E3). Heritability for different traits ranged from 29.21 to 97.69%. Marker-trait associations (MTAs) were identified for each trait using data from each environment separately and also using BLUP values. Four different models were used, which included three single trait models (CMLM, FarmCPU, SUPER) and one multi-trait model (mvLMM). Hundreds of MTAs were obtained using each model, but after Bonferroni correction, only 6 MTAs for 3 traits were available using CMLM, and 21 MTAs for 4 traits were available using FarmCPU; none of the 525 MTAs obtained using SUPER could qualify after Bonferroni correction. Using BLUP, 20 MTAs were available, five of which also figured among MTAs identified for individual environments. Using mvLMM model, after Bonferroni correction, 38 multi-trait MTAs, for 15 different trait combinations were available. Epistatic interactions involving 28 pairs of MTAs were also available for seven of the 10 traits; no epistatic interactions were available for GNPS, PH, and BYPP. As many as 164 putative candidate genes (CGs) were identified using all the 50 MTAs (CMLM, 3; FarmCPU, 9; mvLMM, 6, epistasis, 21 and BLUP, 11 MTAs), which ranged from 20 (CMLM) to 66 (epistasis) CGs. In-silico expression analysis of CGs was also conducted in different tissues at different developmental stages. The information generated through the present study proved useful for developing a better understanding of the genetics of each of the 10 traits; the study also provided novel markers for marker-assisted selection (MAS) to be utilized for the development of wheat cultivars with improved agronomic traits. Supplementary Information: The online version contains supplementary material available at 10.1007/s11032-021-01240-1.

19.
Sci Rep ; 10(1): 21593, 2020 12 09.
Artigo em Inglês | MEDLINE | ID: mdl-33299096

RESUMO

Foot-and-mouth disease (FMD) endangers a large number of livestock populations across the globe being a highly contagious viral infection in wild and domestic cloven-hoofed animals. It adversely affects the socioeconomic status of millions of households. Vaccination has been used to protect animals against FMD virus (FMDV) to some extent but the effectiveness of available vaccines has been decreased due to high genetic variability in the FMDV genome. Another key aspect that the current vaccines are not favored is they do not provide the ability to differentiate between infected and vaccinated animals. Thus, RNA interference (RNAi) being a potential strategy to control virus replication, has opened up a new avenue for controlling the viral transmission. Hence, an attempt has been made here to establish the role of RNAi in therapeutic developments for FMD by computationally identifying (i) microRNA (miRNA) targets in FMDV using target prediction algorithms, (ii) targetable genomic regions in FMDV based on their dissimilarity with the host genome and, (iii) plausible anti-FMDV miRNA-like simulated nucleotide sequences (SNSs). The results revealed 12 mature host miRNAs that have 284 targets in 98 distinct FMDV genomic sequences. Wet-lab validation for anti-FMDV properties of 8 host miRNAs was carried out and all were observed to confer variable magnitude of antiviral effect. In addition, 14 miRBase miRNAs were found with better target accessibility in FMDV than that of Bos taurus. Further, 8 putative targetable regions having sense strand properties of siRNAs were identified on FMDV genes that are highly dissimilar with the host genome. A total of 16 SNSs having > 90% identity with mature miRNAs were also identified that have targets in FMDV genes. The information generated from this study is populated at http://bioinformatics.iasri.res.in/fmdisc/ to cater the needs of biologists, veterinarians and animal scientists working on FMD.


Assuntos
Doenças dos Bovinos/terapia , Febre Aftosa/terapia , Terapêutica com RNAi , Algoritmos , Animais , Bovinos , Doenças dos Bovinos/genética , Biologia Computacional , Febre Aftosa/genética , Vírus da Febre Aftosa/genética
20.
Sci Rep ; 10(1): 14557, 2020 09 03.
Artigo em Inglês | MEDLINE | ID: mdl-32884018

RESUMO

MicroRNAs (miRNAs) are one kind of non-coding RNA, play vital role in regulating several physiological and developmental processes. Subcellular localization of miRNAs and their abundance in the native cell are central for maintaining physiological homeostasis. Besides, RNA silencing activity of miRNAs is also influenced by their localization and stability. Thus, development of computational method for subcellular localization prediction of miRNAs is desired. In this work, we have proposed a computational method for predicting subcellular localizations of miRNAs based on principal component scores of thermodynamic, structural properties and pseudo compositions of di-nucleotides. Prediction accuracy was analyzed following fivefold cross validation, where ~ 63-71% of AUC-ROC and ~ 69-76% of AUC-PR were observed. While evaluated with independent test set, > 50% localizations were found to be correctly predicted. Besides, the developed computational model achieved higher accuracy than the existing methods. A user-friendly prediction server "miRNALoc" is freely accessible at https://cabgrid.res.in:8080/mirnaloc/ , by which the user can predict localizations of miRNAs.


Assuntos
Algoritmos , Biologia Computacional/métodos , MicroRNAs/análise , Nucleotídeos/química , Análise de Componente Principal/métodos , Precursores de RNA/química , Frações Subcelulares/metabolismo , Humanos , MicroRNAs/química , MicroRNAs/genética , Precursores de RNA/genética , Termodinâmica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...