Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 39
Filter
Add more filters










Publication year range
1.
Comput Biol Med ; 151(Pt A): 106311, 2022 12.
Article in English | MEDLINE | ID: mdl-36410097

ABSTRACT

Antimicrobial peptides (AMPs) are gaining a lot of attention as cutting-edge treatments for many infectious disorders. The effectiveness of AMPs against bacteria, fungi, and viruses has persisted for a long period, making them the greatest option for addressing the growing problem of antibiotic resistance. Due to their wide-ranging actions, AMPs have become more prominent, particularly in therapeutic applications. The prediction of AMPs has become a difficult task for academics due to the explosive increase of AMPs documented in databases. Wet-lab investigations to find anti-microbial peptides are exceedingly costly, time-consuming, and even impossible for some species. Therefore, in order to choose the optimal AMPs candidate before to the in-vitro trials, an efficient computational method must be developed. In this study, an effort was made to develop a machine learning-based classification system that is effective, accurate, and can distinguish between anti-microbial peptides. The position-specific-scoring-matrix (PSSM), Pseudo Amino acid composition, di-peptide composition, and combination of these three were utilized in the suggested scheme to extract salient aspects from AMPs sequences. The classification techniques K-nearest neighbor (KNN), Random Forest (RF), and Support Vector Machine (SVM) were employed. On the independent dataset and training dataset, the accuracy levels achieved by the suggested predictor (Target-AMP) are 97.07% and 95.71%, respectively. The results show that, when compared to other techniques currently used in the literature, our Target-AMP had the best success rate.


Subject(s)
Amino Acids , Antimicrobial Peptides , Cluster Analysis , Databases, Factual
2.
Artif Intell Med ; 131: 102349, 2022 09.
Article in English | MEDLINE | ID: mdl-36100346

ABSTRACT

Cancer is a Toxic health concern worldwide, it happens when cellular modifications cause the irregular growth and division of human cells. Several traditional approaches such as therapies and wet laboratory-based methods have been applied to treat cancer cells. However, these methods are considered less effective due to their high cost and diverse side effects. According to recent advancements, peptide-based therapies have attracted the attention of scientists because of their high selectivity. Peptide therapy can efficiently treat the targeted cells, without affecting the normal cells. Due to the rapid increase of peptide sequences, an accurate prediction model has become a challenging task. Keeping the significance of anticancer peptides (ACPs) in cancer treatment, an intelligent and reliable prediction model is highly indispensable. In this paper, a FastText-based word embedding strategy has been employed to represent each peptide sample via a skip-gram model. After extracting the peptide embedding descriptors, the deep neural network (DNN) model was applied to accurately discriminate the ACPs. The optimized parameters of DNN achieved an accuracy of 96.94 %, 93.41 %, and 94.02 % using training, alternate, and independent samples, respectively. It was observed that our proposed cACP-DeepGram model outperformed and reported ~10 % highest prediction accuracy than existing predictors. It is suggested that the cACP-DeepGram model will be a reliable tool for scientists and might play a valuable role in academic research and drug discovery. The source code and the datasets are publicly available at https://github.com/shahidakbarcs/cACP-DeepGram.


Subject(s)
Neural Networks, Computer , Peptides , Amino Acid Sequence , Humans , Software
3.
Comput Biol Med ; 149: 105962, 2022 10.
Article in English | MEDLINE | ID: mdl-36049412

ABSTRACT

Plasmodium falciparum causes malaria, which is an infectious and fatal disease. In early days, malaria-infected cells were diagnosed using a microscope. owing to a huge number of instances for analysis and intricacy of time, it may lead to false detection. Automated parasite detection technologies are in high demand due to increased time consumption and erroneous detection. To create effective cures and treatments, it is critical to use an accurate approach for predicting malaria parasite. Here, numerous protein sequences formulation techniques namely: discrete methods, Biochemical, physiochemical and Natural language processing techniques are applied for transformation of protein sequences in to numerical descriptors. Four classification algorithms are utilized and the anticipated results of these classifiers were then fused to establish ensemble classification model via simple majority and genetic algorithm. In addition, BCH error correction code is incorporated with support vector machine using all the feature spaces. The simulated results demonstrate the remarkable achievement of proposed compared to previous models. Thus, our proposed model may be an effective tool for discriminating the secretory and non-secretory proteins of malaria parasite.


Subject(s)
Malaria , Parasites , Algorithms , Animals , Computer Simulation , Humans , Plasmodium falciparum
4.
Comput Biol Med ; 137: 104778, 2021 10.
Article in English | MEDLINE | ID: mdl-34481183

ABSTRACT

Tuberculosis (TB) is a worldwide illness caused by the bacteria Mycobacterium tuberculosis. Owing to the high prevalence of multidrug-resistant tuberculosis, numerous traditional strategies for developing novel alternative therapies have been presented. The effectiveness and dependability of these procedures are not always consistent. Peptide-based therapy has recently been regarded as a preferable alternative due to its excellent selectivity in targeting specific cells without affecting the normal cells. However, due to the rapid growth of the peptide samples, predicting TB accurately has become a challenging task. To effectively identify antitubercular peptides, an intelligent and reliable prediction model is indispensable. An ensemble learning approach was used in this study to improve expected results by compensating for the shortcomings of individual classification algorithms. Initially, three distinct representation approaches were used to formulate the training samples: k-space amino acid composition, composite physiochemical properties, and one-hot encoding. The feature vectors of the applied feature extraction methods are then combined to generate a heterogeneous vector. Finally, utilizing individual and heterogeneous vectors, five distinct nature classification models were used to evaluate prediction rates. In addition, a genetic algorithm-based ensemble model was used to improve the suggested model's prediction and training capabilities. Using Training and independent datasets, the proposed ensemble model achieved an accuracy of 94.47% and 92.68%, respectively. It was observed that our proposed "iAtbP-Hyb-EnC" model outperformed and reported ~10% highest training accuracy than existing predictors. The "iAtbP-Hyb-EnC" model is suggested to be a reliable tool for scientists and might play a valuable role in academic research and drug discovery. The source code and all datasets are publicly available at https://github.com/Farman335/iAtbP-Hyb-EnC.


Subject(s)
Algorithms , Peptides , Amino Acids , Machine Learning , Software
5.
Sci Rep ; 10(1): 19747, 2020 11 12.
Article in English | MEDLINE | ID: mdl-33184369

ABSTRACT

Heart disease is a fatal human disease, rapidly increases globally in both developed and undeveloped countries and consequently, causes death. Normally, in this disease, the heart fails to supply a sufficient amount of blood to other parts of the body in order to accomplish their normal functionalities. Early and on-time diagnosing of this problem is very essential for preventing patients from more damage and saving their lives. Among the conventional invasive-based techniques, angiography is considered to be the most well-known technique for diagnosing heart problems but it has some limitations. On the other hand, the non-invasive based methods, like intelligent learning-based computational techniques are found more upright and effectual for the heart disease diagnosis. Here, an intelligent computational predictive system is introduced for the identification and diagnosis of cardiac disease. In this study, various machine learning classification algorithms are investigated. In order to remove irrelevant and noisy data from extracted feature space, four distinct feature selection algorithms are applied and the results of each feature selection algorithm along with classifiers are analyzed. Several performance metrics namely: accuracy, sensitivity, specificity, AUC, F1-score, MCC, and ROC curve are used to observe the effectiveness and strength of the developed model. The classification rates of the developed system are examined on both full and optimal feature spaces, consequently, the performance of the developed model is boosted in case of high variated optimal feature space. In addition, P-value and Chi-square are also computed for the ET classifier along with each feature selection technique. It is anticipated that the proposed system will be useful and helpful for the physician to diagnose heart disease accurately and effectively.


Subject(s)
Artificial Intelligence , Computational Biology/methods , Heart Diseases/diagnosis , Machine Learning , Models, Statistical , Adult , Aged , Aged, 80 and over , Female , Follow-Up Studies , Humans , Male , Middle Aged , Prognosis , ROC Curve
6.
Neural Netw ; 129: 385-391, 2020 Sep.
Article in English | MEDLINE | ID: mdl-32593932

ABSTRACT

N6-methyladenosine (m6A) is a well-studied and most common interior messenger RNA (mRNA) modification that plays an important function in cell development. N6A is found in all kingdoms​ of life and many other cellular processes such as RNA splicing, immune tolerance, regulatory functions, RNA processing, and cancer. Despite the crucial role of m6A in cells, it was targeted computationally, but unfortunately, the obtained results were unsatisfactory. It is imperative to develop an efficient computational model that can truly represent m6A sites. In this regard, an intelligent and highly discriminative computational model namely: m6A-word2vec is introduced for the discrimination of m6A sites. Here, a concept of natural language processing in the form of word2vec is used to represent the motif of the target class automatically. These motifs (numerical descriptors) are automatically targeted from the human genome without any clear definition. Further, the extracted feature space is then forwarded to the convolution neural network model as input for prediction. The developed computational model obtained 83.17%, 92.69%, and 90.50% accuracy for benchmark datasets S1, S2, and S3, respectively, using a 10-fold cross-validation test. The predictive outcomes validate that the developed intelligent computational model showed better performance compared to existing computational models. It is thus greatly estimated that the introduced computational model "m6A-word2vec" may be a supportive and practical tool for elementary and pharmaceutical research such as in drug design along with academia.


Subject(s)
Adenosine/analogs & derivatives , Natural Language Processing , Neural Networks, Computer , RNA Caps , Adenosine/genetics , Humans , RNA Caps/genetics
7.
Genomics ; 112(2): 1565-1574, 2020 03.
Article in English | MEDLINE | ID: mdl-31526842

ABSTRACT

Bacteriophage virion proteins (BVPs) are bacterial viruses that have a great impact on different biological functions of bacteria. They are significantly used in genetic engineering and phage therapy applications. Correct identification of BVP through conventional pathogen methods are slow and expensive. Thus, designing a Bioinformatics predictor is urgently desirable to accelerate correct identification of BVPs within a huge volume of proteins. However, available prediction tools performance is inadequate due to the lack of useful feature representation and severe imbalance issue. In the present study, we propose an intelligent model, called Pred-BVP-Unb for discrimination of BVPs that employed three nominal sequences-driven descriptors, i.e. Bi-PSSM evolutionary information, composition & translation, and split amino acid composition. The imbalance phenomena between classes were coped with the help of a synthetic minority oversampling technique. The essential attributes are selected by a robust algorithm called recursive feature elimination. Finally, the optimal feature space is provided to support vector machine classifier using a radial base kernel in order to train the model. Our predictor remarkably outperforms than existing approaches in the literature by achieving the highest accuracy of 92.54% and 83.06% respectively on the benchmark and independent datasets. We expect that Pred-BVP-Unb tool can provide useful hints for designing antibacterial drugs and also helpful to expedite large scale discovery of new bacteriophage virion proteins. The source code and all datasets are publicly available at https://github.com/Muhammad-Arif-NUST/BVP_Pred_Unb.


Subject(s)
Sequence Analysis, Protein/methods , Software , Viral Structural Proteins/genetics , Bacteriophages/genetics , Evolution, Molecular , Support Vector Machine , Viral Structural Proteins/chemistry , Virion/genetics
8.
Genomics ; 112(1): 276-285, 2020 01.
Article in English | MEDLINE | ID: mdl-30779939

ABSTRACT

Nuclear receptor proteins (NRPs) perform a vital role in regulating gene expression. With the rapidity growth of NRPs in post-genomic era, it is highly recommendable to identify NRPs and their sub-families accurately from their primary sequences. Several conventional methods have been used for discrimination of NRPs and their sub-families, but did not achieve considerable results. In a sequel, a two-level new computational model "iNR-2 L" is developed. Two discrete methods namely: Dipeptide Composition and Tripeptide Composition were used to formulate NRPs sequences. Further, both the descriptor spaces were merged to construct hybrid space. Furthermore, feature selection technique minimum redundancy and maximum relevance was employed in order to select salient features as well as reduce the noise and redundancy. The experiential outcomes exhibited that the proposed model iNR-2 L achieved outstanding results. It is anticipated that the proposed computational model might be a practical and effective tool for academia and research community.


Subject(s)
Receptors, Cytoplasmic and Nuclear/chemistry , Receptors, Cytoplasmic and Nuclear/classification , Sequence Analysis, Protein/methods , Computational Biology/methods , Dipeptides/chemistry , Neural Networks, Computer , Oligopeptides/chemistry , Support Vector Machine
9.
Genomics ; 111(6): 1325-1332, 2019 12.
Article in English | MEDLINE | ID: mdl-30196077

ABSTRACT

The emergence of numerous genome projects has made the experimental classification of the protein localization almost impossible due to the exponential increase in the number of protein samples. However, most of the applications are merely developed for single-plex and completely ignored the presence of one protein at two or more locations in a cell. In this regard, few attempts were carried out to target Multi-label protein localizations; consequently, undesirable accuracies are achieved. This paper presents a novel approach, in which a discrete feature extraction method is fused with physicochemical properties of amino acids by using Chou's general form of Pseudo Amino Acid Composition. The technique is tested on two benchmark datasets namely: Gpos-mploc and Virus-mPLoc. The empirical results demonstrated that the proposed method yields better results via two examined classifiers i.e. ML-KNN and Rank-SVM. It is established that the proposed model has improved values in all performance measures considered for the comparison.


Subject(s)
Proteins/analysis , Sequence Analysis, Protein/methods , Algorithms , Bacterial Proteins/analysis , Cells/chemistry , Computational Biology/methods , Viral Proteins/analysis
10.
Mol Genet Genomics ; 294(1): 199-210, 2019 Feb.
Article in English | MEDLINE | ID: mdl-30291426

ABSTRACT

Nucleosome is a central element of eukaryotic chromatin, which composes of histone proteins and DNA molecules. It performs vital roles in many eukaryotic intra-nuclear processes, for instance, chromatin structure and transcriptional regulation formation. Identification of nucleosome positioning via wet lab is difficult; so, the attention is diverted towards the accurate intelligent automated prediction. In this regard, a novel intelligent automated model "iNuc-ext-PseTNC" is developed to identify the nucleosome positioning in genomes accurately. In this predictor, the sequences of DNA are mathematically represented by two different discrete feature extraction techniques, namely pseudo-tri-nucleotide composition (PseTNC) and pseudo-di-nucleotide composition. Several contemporary machine learning algorithms were examined. Further, the predictions of individual classifiers were integrated through an evolutionary genetic algorithm. The success rates of the ensemble model are higher than individual classifiers. After analyzing the prediction results, it is noticed that iNuc-ext-PseTNC model has achieved better performance in combination with PseTNC feature space, which are 94.3%, 93.14%, and 88.60% of accuracies using six-fold cross-validation test for the three benchmark datasets S1, S2, and S3, respectively. The achieved outcomes exposed that the results of iNuc-ext-PseTNC model are prominent compared to the existing methods so far notifiable in the literature. It is ascertained that the proposed model might be more fruitful and a practical tool for rudimentary academia and research.


Subject(s)
Caenorhabditis elegans/genetics , Computational Biology/methods , Drosophila melanogaster/genetics , Nucleosomes/genetics , Algorithms , Animals , Base Composition , Humans , Support Vector Machine
11.
J Theor Biol ; 463: 99-109, 2019 02 21.
Article in English | MEDLINE | ID: mdl-30562500

ABSTRACT

Automatic identification of protein subcellular localization has gained much popularity in the last few decades. Subcellular localizations are useful in diagnosis of different diseases as well as in the process of drug development. Golgi is a vital type of protein, which provides means of transportation to several other proteins destined for lysosome, plasma membrane and secretion etc. Cis-Golgi and trans-Golgi are two ends of Golgi protein meant for reception and transmission of various substances. Dysfunction in Golgi proteins may lead to different types of diseases especially the inheritable and neurodegenerative problems. Due to the significance of Golgi proteins, it is indispensable to correctly identify the Golgi proteins. In this paper, a novel and high throughput computational model is proposed which can identify the subGolgi proteins precisely. Discrete and evolutionary feature extraction schemes are applied so that all the salient, noiseless, and relevant information from protein sequences could be captured. Unfortunately, the benchmark dataset publicly available is quite imbalance, where trans-Golgi sequences constitute 72% of the whole dataset that reflects biasness, redundancy, and lack of hypothesis generalization. In order to cover the limitations of imbalance data, Synthetic Minority over Sampling Technique is utilized to balance the number of instances in different classes of the dataset. In addition, a condense feature space is formed by fusing the high rank features of eleven different feature selection techniques. The high rank features are selected through majority voting algorithm; consequently, the feature space is reduced 85%. The experiential results demonstrate that kNN classifier obtained promising results in combination with hybrid feature space. It has yielded an accuracy of 98% in jackknife cross-validation, 94% in independent data and 96% in 10-fold cross-validation test. It is ascertained that the proposed model is reliable, consistent and serves as a valuable tool for the research community.


Subject(s)
Algorithms , Golgi Apparatus/chemistry , Models, Biological , Proteins/classification , Amino Acid Sequence , Computational Biology/methods , Datasets as Topic , Reproducibility of Results , Statistics, Nonparametric , Support Vector Machine
12.
J Theor Biol ; 455: 205-211, 2018 10 14.
Article in English | MEDLINE | ID: mdl-30031793

ABSTRACT

N6- methyladenosine (m6A) is a vital post-transcriptional modification, which adds another layer of epigenetic regulation at RNA level. It chemically modifies mRNA that effects protein expression. RNA sequence contains many genetic code motifs (GAC). Among these codes, identification of methylated or not methylated GAC motif is highly indispensable. However, with a large number of RNA sequences generated in post-genomic era, it becomes a challenging task how to accurately and speedily characterize these sequences. In view of this, the concept of an intelligent is incorporated with a computational model that truly and fast reflects the motif of the desired classes. An intelligent computational model "iMethyl-STTNC" model is proposed for identification of methyladenosine sites in RNA. In the proposed study, four feature extraction techniques, such as; Pseudo-dinucleotide-composition, Pseudo-trinucleotide-composition, split-trinucleotide-composition, and split-tetra-nucleotides-composition (STTNC) are utilized for genuine numerical descriptors. Three different classification algorithms including probabilistic neural network, Support vector machine (SVM), and K-nearest neighbor are adopted for prediction. After examining the outcomes of prediction model on each feature spaces, SVM using STTNC feature space reported the highest accuracy of 69.84%, 91.84% on dataset1 and dataset2, respectively. The reported results show that our proposed predictor has achieved encouraging results compared to the present approaches, so far in the research. It is finally reckoned that our developed model might be beneficial for in-depth analysis of genomes and drug development.


Subject(s)
Adenosine/analogs & derivatives , Base Sequence , Neural Networks, Computer , RNA/genetics , Sequence Analysis, RNA , Support Vector Machine , Adenosine/chemistry , Adenosine/genetics , RNA/chemistry
13.
Comput Methods Programs Biomed ; 157: 205-215, 2018 Apr.
Article in English | MEDLINE | ID: mdl-29477429

ABSTRACT

BACKGROUND AND OBJECTIVE: Discriminative and informative feature extraction is the core requirement for accurate and efficient classification of protein subcellular localization images so that drug development could be more effective. The objective of this paper is to propose a novel modification in the Threshold Adjacency Statistics technique and enhance its discriminative power. METHODS: In this work, we utilized Threshold Adjacency Statistics from a novel perspective to enhance its discrimination power and efficiency. In this connection, we utilized seven threshold ranges to produce seven distinct feature spaces, which are then used to train seven SVMs. The final prediction is obtained through the majority voting scheme. The proposed ETAS-SubLoc system is tested on two benchmark datasets using 5-fold cross-validation technique. RESULTS: We observed that our proposed novel utilization of TAS technique has improved the discriminative power of the classifier. The ETAS-SubLoc system has achieved 99.2% accuracy, 99.3% sensitivity and 99.1% specificity for Endogenous dataset outperforming the classical Threshold Adjacency Statistics technique. Similarly, 91.8% accuracy, 96.3% sensitivity and 91.6% specificity values are achieved for Transfected dataset. CONCLUSIONS: Simulation results validated the effectiveness of ETAS-SubLoc that provides superior prediction performance compared to the existing technique. The proposed methodology aims at providing support to pharmaceutical industry as well as research community towards better drug designing and innovation in the fields of bioinformatics and computational biology. The implementation code for replicating the experiments presented in this paper is available at: https://drive.google.com/file/d/0B7IyGPObWbSqRTRMcXI2bG5CZWs/view?usp=sharing.


Subject(s)
Computational Biology/methods , Computer Simulation , Proteins/metabolism , Support Vector Machine , Microscopy, Fluorescence , Reproducibility of Results , Subcellular Fractions/metabolism
14.
J Theor Biol ; 442: 11-21, 2018 04 07.
Article in English | MEDLINE | ID: mdl-29337263

ABSTRACT

Membrane proteins execute significant roles in cellular processes of living organisms, ranging from cell signaling to cell adhesion. As a major part of a cell, the identification of membrane proteins and their functional types become a challenging job in the field of bioinformatics and proteomics from last few decades. Traditional experimental procedures are slightly applicable due to lack of recognized structures, enormous time and space. In this regard, the demand for fast, accurate and intelligent computational method is increased day by day. In this paper, a two-tier intelligent automated predictor has been developed called iMem-2LSAAC, which classifies protein sequence as membrane or non-membrane in first-tier (phase1) and in case of membrane the second-tier (phase2) identifies functional types of membrane protein. Quantitative attributes were extracted from protein sequences by applying three discrete features extraction schemes namely amino acid composition, pseudo amino acid composition and split amino acid composition (SAAC). Various learning algorithms were investigated by using jackknife test to select the best one for predictor. Experimental results exhibited that the highest predictive outcomes were yielded by SVM in conjunction with SAAC feature space on all examined datasets. The true classification rate of iMem-2LSAAC predictor is significantly higher than that of other state-of- the- art methods so far in the literature. Finally, it is expected that the proposed predictor will provide a solid framework for the development of pharmaceutical drug discovery and might be useful for researchers and academia.


Subject(s)
Algorithms , Computational Biology/methods , Membrane Proteins/metabolism , Neural Networks, Computer , Support Vector Machine , Amino Acid Sequence , Databases, Protein , Membrane Proteins/genetics , Reproducibility of Results , Sequence Analysis, Protein/methods
15.
J Theor Biol ; 435: 116-124, 2017 12 21.
Article in English | MEDLINE | ID: mdl-28927812

ABSTRACT

Mycobacterium is a pathogenic bacterium, which is a causative agent of tuberculosis (TB) and leprosy. These diseases are very crucial and become the cause of death of millions of people every year in the world. So, the characterize structure of membrane proteins of the protozoan play a vital role in the field of drug discovery because, without any knowledge about this Mycobacterium's membrane protein and their types, the scientists are unable to treat this pathogenic protozoan. So, an accurate and competitive computational model is needed to characterize this uncharacterized structure of mycobacterium. Series of attempts were carried out in this connection. Split amino acid compositions, Unbiased-Dipeptide peptide compositions (Unb-DPC), Over-represented tri-peptide compositions, compositions & translation were the few recent encoding techniques followed by different researchers in their publications. Although considerable results have been achieved by these models, still there is a gap which is filled in this study. In this study, an evolutionary feature extraction technique position specific scoring matrix (PSSM) is applied in order to extract evolutionary information from protein sequences. Consequently, 99.6% accuracy was achieved by the learning algorithms. The experimental results demonstrated that the proposed computational model will lead to develop a powerful tool for anti-mycobacterium drugs as well as play a promising rule in proteomic and bioinformatics.


Subject(s)
Artificial Intelligence , Bacterial Proteins/analysis , Membrane Proteins/analysis , Mycobacterium/chemistry , Position-Specific Scoring Matrices , Amino Acid Sequence , Computational Biology/methods , Evolution, Molecular
16.
Artif Intell Med ; 78: 14-22, 2017 05.
Article in English | MEDLINE | ID: mdl-28764869

ABSTRACT

Golgi is one of the core proteins of a cell, constitutes in both plants and animals, which is involved in protein synthesis. Golgi is responsible for receiving and processing the macromolecules and trafficking of newly processed protein to its intended destination. Dysfunction in Golgi protein is expected to cause many neurodegenerative and inherited diseases that may be cured well if they are detected effectively and timely. Golgi protein is categorized into two parts cis-Golgi and trans-Golgi. The identification of Golgi protein via direct method is very hard due to limited available recognized structures. Therefore, the researchers divert their attention toward the sequences from structures. However, owing to technological advancement, exploration of huge amount of sequences was reported in the databases. So recognition of large amount of unprocessed data using conventional methods is very difficult. Therefore, the concept of intelligence was incorporated with computational model. Intelligence based computational model obtained reasonable results, but the gap of improvement is still under consideration. In this regard, an intelligent automatic recognition model is developed in order to enhance the true classification rate of sub-Golgi proteins. In this approach, discrete and evolutionary feature extraction methods are applied on the benchmark Golgi protein datasets to excerpt salient, propound and variant numerical descriptors. After that, an oversampling technique Syntactic Minority over Sampling Technique is employed to balance the data. Hybrid spaces are also generated with combination of these feature spaces. Further, Fisher feature selection method is utilized to reduce the extra noisy and redundant features from feature vector. Finally, k-nearest neighbor algorithm is used as learning hypothesis. Three distinct cross validation tests are used to examine the stability and efficiency of the proposed model. The predicted outcomes of proposed model are better than the existing models in the literature so far. Finally, it is anticipated that the proposed model will provide the foundation to pharmaceutical industry in drug design and research community to innovate new ideas in the area of computational biology and bioinformatics.


Subject(s)
Algorithms , Golgi Apparatus , Animals , Computational Biology , Proteins
17.
Artif Intell Med ; 78: 61-71, 2017 05.
Article in English | MEDLINE | ID: mdl-28764874

ABSTRACT

Proteins are the central constitute of a cell or biological system. Proteins execute their functions by interacting with other molecules such as RNA, DNA and other proteins. The major functionality of protein-protein interactions (PPIs) is the execution of biochemical activities in living species. Therefore, an accurate identification of PPIs becomes a challenging and demanding task for investigators from last few decades. Various traditional and computational methods have been applied but they have not achieved quite encouraging results. In order to extend the concept of computational model by incorporating intelligent, contemporary machine learning algorithms have been utilized for identification of PPIs. In this prediction model, protein sequences are expressed by using two distinct feature extraction methods namely: physiochemical properties of amino acids and evolutionary profiles method position specific scoring matrix (PSSM). Jackknife test and numerous performance parameters namely: specificity, recall, accuracy, MCC, precision, and F-measure were employed to compute the predictive quality of proposed model. After empirical analysis, it is determined that the proposed prediction model yielded encouraging predictive outcomes compared to existing state-of-the-art models. This achievement is ascribed with PSSM because it has clearly discerned a motif of PPIs. It is realized that the proposed prediction model will lead to be a practical and very useful tool for research community.


Subject(s)
Machine Learning , Protein Interaction Mapping , Support Vector Machine , Algorithms , Computational Biology , Proteins
18.
Comput Methods Programs Biomed ; 146: 69-75, 2017 Jul.
Article in English | MEDLINE | ID: mdl-28688491

ABSTRACT

BACKGROUND AND OBJECTIVES: Enhancers are pivotal DNA elements, which are widely used in eukaryotes for activation of transcription genes. On the basis of enhancer strength, they are further classified into two groups; strong enhancers and weak enhancers. Due to high availability of huge amount of DNA sequences, it is needed to develop fast, reliable and robust intelligent computational method, which not only identify enhancers but also determines their strength. Considerable progress has been achieved in this regard; however, timely and precisely identification of enhancers is still a challenging task. METHODS: Two-level intelligent computational model for identification of enhancers and their subgroups is proposed. Two different feature extraction techniques including di-nucleotide composition and tri-nucleotide composition were adopted for extraction of numerical descriptors. Four classification methods including probabilistic neural network, support vector machine, k-nearest neighbor and random forest were utilized for classification. RESULTS: The proposed method yielded 77.25% of accuracy for dataset S1 contains enhancers and non-enhancers, whereas 64.70% of accuracy for dataset S2 comprises of strong enhancer and weak enhancer sequences using jackknife cross-validation test. CONCLUSION: The predictive results validated that the proposed method is better than that of existing approaches so far reported in the literature. It is thus highly observed that the developed method will be useful and expedient for basic research and academia.


Subject(s)
Computational Biology/methods , Enhancer Elements, Genetic , Neural Networks, Computer , Nucleotides/analysis , Support Vector Machine , Algorithms , Computer Simulation
19.
Artif Intell Med ; 79: 62-70, 2017 06.
Article in English | MEDLINE | ID: mdl-28655440

ABSTRACT

Cancer is a fatal disease, responsible for one-quarter of all deaths in developed countries. Traditional anticancer therapies such as, chemotherapy and radiation, are highly expensive, susceptible to errors and ineffective techniques. These conventional techniques induce severe side-effects on human cells. Due to perilous impact of cancer, the development of an accurate and highly efficient intelligent computational model is desirable for identification of anticancer peptides. In this paper, evolutionary intelligent genetic algorithm-based ensemble model, 'iACP-GAEnsC', is proposed for the identification of anticancer peptides. In this model, the protein sequences are formulated, using three different discrete feature representation methods, i.e., amphiphilic Pseudo amino acid composition, g-Gap dipeptide composition, and Reduce amino acid alphabet composition. The performance of the extracted feature spaces are investigated separately and then merged to exhibit the significance of hybridization. In addition, the predicted results of individual classifiers are combined together, using optimized genetic algorithm and simple majority technique in order to enhance the true classification rate. It is observed that genetic algorithm-based ensemble classification outperforms than individual classifiers as well as simple majority voting base ensemble. The performance of genetic algorithm-based ensemble classification is highly reported on hybrid feature space, with an accuracy of 96.45%. In comparison to the existing techniques, 'iACP-GAEnsC' model has achieved remarkable improvement in terms of various performance metrics. Based on the simulation results, it is observed that 'iACP-GAEnsC' model might be a leading tool in the field of drug design and proteomics for researchers.


Subject(s)
Algorithms , Computational Biology , Peptides/therapeutic use , Amino Acids , Antineoplastic Agents , Computer Simulation , Humans , Neoplasms/therapy , Sequence Analysis, Protein
20.
J Theor Biol ; 415: 13-19, 2017 02 21.
Article in English | MEDLINE | ID: mdl-27939596

ABSTRACT

This study investigates an efficient and accurate computational method for predicating mycobacterial membrane protein. Mycobacterium is a pathogenic bacterium which is the causative agent of tuberculosis and leprosy. The existing feature encoding algorithms for protein sequence representation such as composition and translation, and split amino acid composition cannot suitably express the mycobacterium membrane protein and their types due to biasness among different types. Therefore, in this study a novel un-biased dipeptide composition (Unb-DPC) method is proposed. The proposed encoding scheme has two advantages, first it avoid the biasness among the different mycobacterium membrane protein and their types. Secondly, the method is fast and preserves protein sequence structure information. The experimental results yield SVM based classification accurately of 97.1% for membrane protein types and 95.0% for discriminating mycobacterium membrane and non-membrane proteins by using jackknife cross validation test. The results exhibit that proposed model achieved significant predictive performance compared to the existing algorithms and will lead to develop a powerful tool for anti-mycobacterium drugs.


Subject(s)
Dipeptides/chemistry , Membrane Proteins/chemistry , Models, Theoretical , Mycobacteriaceae/chemistry , Algorithms , Amino Acid Sequence , Bias , Computational Biology/methods , Membrane Proteins/classification , Mycobacteriaceae/ultrastructure
SELECTION OF CITATIONS
SEARCH DETAIL
...