Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 19 de 19
Filter
1.
PeerJ Comput Sci ; 9: e1622, 2023.
Article in English | MEDLINE | ID: mdl-37869456

ABSTRACT

In recent years, the increased population has led to an increase in the demand for various industrially processed edibles and other consumable products. These industries regularly alter the proteins found in raw materials to generate more commercially viable end-products in order to keep up with consumer demand. These modifications result in a substance that may cause allergic reactions in consumers, thereby creating a protein allergen. The detection of such proteins in various substances is essential for the prevention, diagnosis and treatment of allergic conditions. Bioinformatics and computational methods can be used to analyze the information contained in amino-acid sequences to detect possible allergens. The article presents a deep learning based ensemble approach to identify protein allergens using Extra Tree, Deep Belief Network (DBN), and CatBoost models. The proposed ensemble model achieves higher detection accuracy by combining the prediction results of the three models using majority voting. The evaluation of the proposed model was carried out on the benchmark protein allergen dataset, and the performance analysis revealed that the proposed model outperforms the other state-of-the-art literature techniques with a protein allergen detection accuracy of 89.16%.

2.
Artif Intell Rev ; : 1-93, 2023 Mar 12.
Article in English | MEDLINE | ID: mdl-37362891

ABSTRACT

Machine learning (ML) and Deep learning (DL) models are popular in many areas, from business, medicine, industries, healthcare, transportation, smart cities, and many more. However, the conventional centralized training techniques may not apply to upcoming distributed applications, which require high accuracy and quick response time. It is mainly due to limited storage and performance bottleneck problems on the centralized servers during the execution of various ML and DL-based models. However, federated learning (FL) is a developing approach to training ML models in a collaborative and distributed manner. It allows the full potential exploitation of these models with unlimited data and distributed computing power. In FL, edge computing devices collaborate to train a global model on their private data and computational power without sharing their private data on the network, thereby offering privacy preservation by default. But the distributed nature of FL faces various challenges related to data heterogeneity, client mobility, scalability, and seamless data aggregation. Moreover, the communication channels, clients, and central servers are also vulnerable to attacks which may give various security threats. Thus, a structured vulnerability and risk assessment are needed to deploy FL successfully in real-life scenarios. Furthermore, the scope of FL is expanding in terms of its application areas, with each area facing different threats. In this paper, we analyze various vulnerabilities present in the FL environment and design a literature survey of possible threats from the perspective of different application areas. Also, we review the most recent defensive algorithms and strategies used to guard against security and privacy threats in those areas. For a systematic coverage of the topic, we considered various applications under four main categories: space, air, ground, and underwater communications. We also compared the proposed methodologies regarding the underlying approach, base model, datasets, evaluation matrices, and achievements. Lastly, various approaches' future directions and existing drawbacks are discussed in detail.

3.
Comput Intell Neurosci ; 2022: 4559219, 2022.
Article in English | MEDLINE | ID: mdl-36238666

ABSTRACT

Arm Venous Segmentation plays a crucial role in smart venipuncture. The difficulties faced in locating veins for intravenous procedures can be diminished using computer vision for vein imaging. To facilitate this, a high-resolution dataset consisting of arm images was curated and has been presented in this study. Leveraging the ability of Near Infrared Imaging to easily detect veins, ambient lighting conditions were created inside a small enclosure to capture the images. The acquired images were annotated to create the corresponding masks for the dataset. To extend the scope and assert the usability of the dataset, the images, and corresponding masks were used to train an image segmentation model. In addition to using basic preprocessing and image augmentation based techniques, a U-Net based algorithmic architecture has been used to facilitate the task of segmentation. Subsequently, the results of performing image segmentation after applying the preprocessing methods have been compared using various evaluation metrics and have been visualised in the study. Furthermore, the possible applications of the presented dataset have been investigated in the study.


Subject(s)
Deep Learning , Arm , Image Processing, Computer-Assisted/methods , Phlebotomy
4.
Multimed Tools Appl ; 81(28): 40013-40042, 2022.
Article in English | MEDLINE | ID: mdl-35528282

ABSTRACT

With the outbreak of the Coronavirus Disease in 2019, life seemed to be had come to a standstill. To combat the transmission of the virus, World Health Organization (WHO) announced wearing of face mask as an imperative way to limit the spread of the virus. However, manually ensuring whether people are wearing face masks or not in a public area is a cumbersome task. The exigency of monitoring people wearing face masks necessitated building an automatic system. Currently, distinct methods using machine learning and deep learning can be used effectively. In this paper, all the essential requirements for such a model have been reviewed. The need and the structural outline of the proposed model have been discussed extensively, followed by a comprehensive study of various available techniques and their respective comparative performance analysis. Further, the pros and cons of each method have been analyzed in depth. Subsequently, sources to multiple datasets are mentioned. The several software needed for the implementation are also discussed. And discussions have been organized on the various use cases, limitations, and observations for the system, and the conclusion of this paper with several directions for future research.

5.
Curr Med Imaging ; 19(1): 27-36, 2022.
Article in English | MEDLINE | ID: mdl-35260061

ABSTRACT

Big data has been a topic of interest for many researchers and industries for the past few decades. Due to the exponential growth of technology, a tremendous amount of data is generated every minute. This article provides a strategic review of big data in the healthcare sector. In particular, this article highlights various applications and issues faced by the healthcare industry using big data by evaluating various journal articles between 2016 and 2021. Multiple issues related to data mining, storing, analyzing, and sharing of big data in healthcare, briefly summarizing deep-learning-based tools available for big data analytics, have been covered in this article. This article aims to benefit the research community by summarizing various research tools and processes available today to manage big data in healthcare.


Subject(s)
Big Data , Humans , Data Mining
6.
Clin Transl Imaging ; 10(4): 355-389, 2022.
Article in English | MEDLINE | ID: mdl-35261910

ABSTRACT

Objective: Glioblastoma multiforme (GBM) is a grade IV brain tumour with very low life expectancy. Physicians and oncologists urgently require automated techniques in clinics for brain tumour segmentation (BTS) and survival prediction (SP) of GBM patients to perform precise surgery followed by chemotherapy treatment. Methods: This study aims at examining the recent methodologies developed using automated learning and radiomics to automate the process of SP. Automated techniques use pre-operative raw magnetic resonance imaging (MRI) scans and clinical data related to GBM patients. All SP methods submitted for the multimodal brain tumour segmentation (BraTS) challenge are examined to extract the generic workflow for SP. Results: The maximum accuracies achieved by 21 state-of-the-art different SP techniques reviewed in this study are 65.5 and 61.7% using the validation and testing subsets of the BraTS dataset, respectively. The comparisons based on segmentation architectures, SP models, training parameters and hardware configurations have been made. Conclusion: The limited accuracies achieved in the literature led us to review the various automated methodologies and evaluation metrics to find out the research gaps and other findings related to the survival prognosis of GBM patients so that these accuracies can be improved in future. Finally, the paper provides the most promising future research directions to improve the performance of automated SP techniques and increase their clinical relevance.

7.
Multimed Tools Appl ; 81(13): 18129-18153, 2022.
Article in English | MEDLINE | ID: mdl-35282403

ABSTRACT

The COVID-19 pandemic has affected all the countries in the world with its droplet spread mode. The colossal amount of cases has strained all the healthcare systems due to the serious nature of infections especially for people with comorbidities. A very high specificity Reverse Transcriptase-Polymerase Chain Reaction (RT-PCR) test is the principal technique in use for diagnosing the COVID-19 patients. Also, CT scans have helped medical professionals in patient severity estimation & progression tracking of COVID-19 virus. In study we present our own extensible COVID-19 viral infection tracking prognosis technique. It uses annotated dataset of CT chest scan slice images created with the help of medical professionals. The annotated dataset contains bounding box coordinates of different features for COVID-19 detection like ground glass opacities, crazy paving pattern, consolidations, lesions etc. We qualitatively identify the severity of the patient for later prognosis stages in our study to assist medical staff for patient prioritization. First we detected COVID-19 positive patients with pre-trained Siamese Neural Network (SNN) which obtained 87.6% accuracy, 87.1% F1-Score & 95.1% AUC scores. These metrics were achieved after removal of 40% quantitatively highly similar images from the COVID-CT dataset. This reduced dataset was further medically annotated with COVID-19 features for bounding box detection. After this we assigned severity scores to detected COVID-19 features and calculated the cumulative severity score for COVID-19 patients. For qualitative patient prioritization with prognosis clinical assistance information, we finally converted this score into a multi-classification problem which obtained 47% weighted-average F1-score.

8.
Neural Comput Appl ; 33(22): 15807-15814, 2021.
Article in English | MEDLINE | ID: mdl-34230771

ABSTRACT

The escalating transmission intensity of COVID-19 pandemic is straining the healthcare systems worldwide. Due to the unavailability of effective pharmaceutical treatment and vaccines, monitoring social distancing is the only viable tool to strive against asymptomatic transmission. Pertaining to the need of monitoring the social distancing at populated areas, a novel bird eye view computer vision-based framework implementing deep learning and utilizing surveillance video is proposed. This proposed method employs YOLO v3 object detection model and uses key point regressor to detect the key feature points. Additionally, as the massive crowd is detected, the bounding boxes on objects are received, and red boxes are also visible if social distancing is violated. When empirically tested over real-time data, proposed method is established to be efficacious than the existing approaches in terms of inference time and frame rate.

9.
IEEE/ACM Trans Comput Biol Bioinform ; 18(4): 1524-1534, 2021.
Article in English | MEDLINE | ID: mdl-31567100

ABSTRACT

Life threatening diseases like adult T-cell leukemia, neurodegenerative diseases, and demyelinating diseases such as HTLV-1 based myelopathy/tropical spastic paraparesis (HAM/TSP), hypocalcaemia, and bone lesions are caused by a group of human retrovirus known as Human T-cell Lymphotropic virus (HTLV). Out of the four different types of HTLVs, HTLV-1 is most prominent in scourging over 20 million people around the world and still not much effort has been made in understanding the epidemiology and controlling the prevalence of this virus. This condition further worsens when most of the infected cases remain asymptomatic throughout their lifetime due to the limited diagnostic methods; that are most of the times unavailable for timely detection of infected individuals. Moreover, at present, there is no licensed vaccination for HTLV-1 infection. Therefore, there is a need to develop the faster and efficient diagnostic method for the detection of HTLV-1. Influenced from the outcomes of the machine learning techniques in the field of bio-informatics, this is the first study in which 64 hybrid machine learning techniques have been proposed for the prediction of different type of HTLVs (HTLV-1, HTLV-2, and HTLV-3). The hybrid techniques are built by permutation and combination of four classification methods, four feature weighting, and four feature selection techniques. The proposed hybrid models when evaluated on the basis of various model evaluation parameters are found to be capable of efficiently predicting the type of HTLVs. The best hybrid model has been identified by having accuracy, an AUROC value, and F1 score of 99.85 percent, 0.99, and 0.99, respectively. This kind of the system can assist the current diagnostic system for the detection of HTLV-1 as after the molecular diagnostics of HTLV by various screening tests like enzyme-linked immunoassay or particle agglutination assays there is always a need of confirmatory tests like western blotting, immuno-fluorescence assay, or radio-immuno-precipitation assay for distinguishing HTLV-1 from HTLV-2. These confirmatory tests are indeed very complex analytical techniques involving various steps. The proposed hybrid techniques can be used to support and verify the results of confirmatory test from the protein mixture. Furthermore, better insights about the virus can be obtained by exploring the physicochemical properties of the protein sequences of HTLVs.


Subject(s)
Computational Biology/methods , HTLV-I Infections , Human T-lymphotropic virus 1 , Machine Learning , Algorithms , HTLV-I Infections/epidemiology , HTLV-I Infections/virology , Humans , Models, Statistical , Viral Proteins/chemistry , Viral Proteins/genetics
10.
IET Syst Biol ; 14(1): 1-7, 2020 02.
Article in English | MEDLINE | ID: mdl-31931475

ABSTRACT

The major intent of peptide vaccine designs, immunodiagnosis and antibody productions is to accurately identify linear B-cell epitopes. The determination of epitopes through experimental analysis is highly expensive. Therefore, it is desirable to develop a reliable model with significant improvement in prediction models. In this study, a hybrid model has been designed by using stacked generalisation ensemble technique for prediction of linear B-cell epitopes. The goal of using stacked generalisation ensemble approach is to refine predictions of base classifiers and to get rid of the worse predictions. In this study, six machine learning models are fused to predict variable length epitopes (6-49 mers). The proposed ensemble model achieves 76.6% accuracy and average accuracy of repeated 10-fold cross-validation is 73.14%. The trained ensemble model has been tested on the benchmark dataset and compared with existing sequential B-cell epitope prediction techniques including APCpred, ABCpred, BCpred and [inline-formula removed].


Subject(s)
Computational Biology/methods , Epitopes, B-Lymphocyte , Machine Learning , Algorithms , Amino Acid Sequence , Antibodies/metabolism , Drug Design , Epitopes, B-Lymphocyte/chemistry , Epitopes, B-Lymphocyte/genetics , Epitopes, B-Lymphocyte/immunology , Epitopes, B-Lymphocyte/metabolism , Humans , Models, Statistical , Support Vector Machine , Vaccines/chemistry , Vaccines/genetics , Vaccines/immunology , Vaccines/metabolism
11.
J Bioinform Comput Biol ; 17(5): 1950033, 2019 10.
Article in English | MEDLINE | ID: mdl-31744364

ABSTRACT

In this study, efforts are created to develop a quantitative structure-activity relationship (QSAR)-based model, which are used for the prediction of toxicities to reduce testing in animals, time, and money in the early stages of drug development. An efficient machine learning model is developed to predict the toxicity of those drug molecules which binds to the androgen receptor (AR). Toxicity prediction is performed in terms of their activity, activity score, potency, and efficacy by using various physicochemical properties. A multilevel ensemble model is proposed, where its first level is performed ensemble-based classification of activity, and the second level is performed ensemble-based regression of activity score, potency, and efficacy of only those drug molecules which have been found active during the classification level. The AR dataset has 10,273 drug molecules where 461 are active, and 9812 are inactive, and each drug molecule has 1444 features. Therefore, our dataset is highly imbalanced having a very large number of features. Initially, we performed feature selection then the class imbalance problem is resolved. The k-fold cross-validation is accomplished to measure the consistency of the model. Finally, our proposed multilevel ensemble model has been validated and compared with some existing models.


Subject(s)
Quantitative Structure-Activity Relationship , Receptors, Androgen , Small Molecule Libraries/toxicity , Toxicity Tests/methods , Computer Simulation , Humans , Linear Models , Machine Learning , Neural Networks, Computer , Random Allocation , Receptors, Androgen/drug effects , Reproducibility of Results , Small Molecule Libraries/chemistry
12.
IET Syst Biol ; 13(5): 243-250, 2019 10.
Article in English | MEDLINE | ID: mdl-31538958

ABSTRACT

In humans, oxidative stress is involved in the development of diabetes, cancer, hypertension, Alzheimers' disease, and heart failure. One of the mechanisms in the cellular defence against oxidative stress is the activation of the Nrf2-antioxidant response element (ARE) signalling pathway. Computation of activity, efficacy, and potency score of ARE signalling pathway and to propose a multi-level prediction scheme for the same is the main aim of the study as it contributes in a big amount to the improvement of oxidative stress in humans. Applying the process of knowledge discovery from data, required knowledge is gathered and then machine learning techniques are applied to propose a multi-level scheme. The validation of the proposed scheme is done using the K-fold cross-validation method and an accuracy of 90% is achieved for prediction of activity score for ARE molecules which determine their power to refine oxidative stress.


Subject(s)
Computational Biology/methods , Oxidative Stress/drug effects , Signal Transduction/drug effects , Antioxidant Response Elements/drug effects , Models, Statistical , ROC Curve
13.
Front Neurol ; 10: 781, 2019.
Article in English | MEDLINE | ID: mdl-31379730

ABSTRACT

Multiple sclerosis (MS) is a neurodegenerative disease characterized by lesions in the central nervous system (CNS). Inflammation and demyelination are the leading causes of neuronal death and brain lesions formation. The immune reactivity is believed to be essential in the neuronal damage in MS. Cytokines play important role in differentiation of Th cells and recruitment of auto-reactive B and T lymphocytes that leads to neuron demyelination and death. Several cytokines have been found to be linked with MS pathogenesis. In the present study, serum level of eight cytokines (IL-1ß, IL-2, IL-4, IL-8, IL-10, IL-13, IFN-γ, and TNF-α) was analyzed in USA and Russian MS to identify predictors for the disease. Further, the model was extended to classify MS into remitting and non-remitting by including age, gender, disease duration, Expanded Disability Status Scale (EDSS) and Multiple Sclerosis Severity Score (MSSS) into the cytokines datasets in Russian cohorts. The individual serum cytokines data for the USA cohort was generated by Z score percentile method using R studio, while serum cytokines of the Russian cohort were analyzed using multiplex immunoassay. Datasets were divided into training (70%) and testing (30%). These datasets were used as an input into four machine learning models (support vector machine, decision tree, random forest, and neural networks) available in R programming language. Random forest model was identified as the best model for diagnosis of MS as it performed remarkable on all the considered criteria i.e., Gini, accuracy, specificity, AUC, and sensitivity. RF model also performed best in predicting remitting and non-remitting MS. The present study suggests that the concentration of serum cytokines could be used as prognostic markers for the prediction of MS.

14.
IET Syst Biol ; 13(3): 147-158, 2019 06.
Article in English | MEDLINE | ID: mdl-31170694

ABSTRACT

The authors have proposed an efficient multilevel prediction model for better activity assessment to test whether certain chemical compounds can disrupt processes in the human body that may create negative health effects. Here, a computational method (in-silico) is proposed for the quality prediction of drugs in terms of their activity, activity score, potency, and efficacy for estrogen receptors (ERs) by using various physicochemical properties (molecular descriptors). PaDEL-Descriptor is used for features extraction. The ER dataset has 8481 drug molecules where 1084 are active, and 7397 are inactive, and each drug molecule has 1444 features. This dataset is highly imbalanced and has a substantial number of features. Initially, a class imbalance problem is resolved through synthetic minority oversampling technique algorithm, and feature selection is done using FSelector library of R. A machine learning based multilevel prediction model is developed where classification is performed on its first level and regression on its second level. By using all these strategies simultaneously, outperformed accuracy is achieved in comparison to many other computational approaches. The K-fold cross-validation is performed to measure the consistency of the model for all the target classes. Finally, the validity of the proposed method on some AIDS therapy's drug molecules is proved.


Subject(s)
Computer Simulation , Receptors, Estrogen/metabolism , Small Molecule Libraries/pharmacology , Machine Learning , Models, Molecular , Models, Statistical , Molecular Targeted Therapy , Protein Conformation , Quantitative Structure-Activity Relationship , Receptors, Estrogen/chemistry , Regression Analysis , Small Molecule Libraries/chemistry
15.
IET Syst Biol ; 13(1): 24-29, 2019 02.
Article in English | MEDLINE | ID: mdl-30774113

ABSTRACT

Prediction of drug synergy score is an ill-posed problem. It plays an efficient role in the medical field for inhibiting specific cancer agents. An efficient regression-based machine learning technique has an ability to minimise the drug synergy prediction errors. Therefore, in this study, an efficient machine learning technique for drug synergy prediction technique is designed by using ensemble based differential evolution (DE) for optimising the support vector machine (SVM). Because the tuning of the attributes of SVM kernel regulates the prediction precision. The ensemble based DE employs two trial vector generation techniques and two control attributes settings. The initial generation technique has the best solution and the other is without the best solution. The proposed and existing competitive machine learning techniques are applied to drug synergy data. The extensive analysis demonstrates that the proposed technique outperforms others in terms of accuracy, root mean square error and coefficient of correlation.


Subject(s)
Computational Biology/methods , Drug Synergism , Support Vector Machine , Time Factors
16.
Interdiscip Sci ; 11(4): 611-627, 2019 Dec.
Article in English | MEDLINE | ID: mdl-30406342

ABSTRACT

Development of an effective machine-learning model for T-cell Mycobacterium tuberculosis (M. tuberculosis) epitopes is beneficial for saving biologist's time and effort for identifying epitope in a targeted antigen. Existing NetMHC 2.2, NetMHC 2.3, NetMHC 3.0 and NetMHC 4.0 estimate binding capacity of peptide. This is still a challenge for those servers to predict whether a given peptide is M. tuberculosis epitope or non-epitope. One of the servers, CTLpred, works in this category but it is limited to peptide length of 9-mers. Therefore, in this work direct method of predicting M. tuberculosis epitope or non-epitope has been proposed which also overcomes the limitations of above servers. The proposed method is able to work with variable length epitopes having size even greater than 9-mers. Identification of T-cell or B-cell epitopes in the targeted antigen is the main goal in designing epitope-based vaccine, immune-diagnostic tests and antibody production. Therefore, it is important to introduce a reliable system which may help in the diagnosis of M. tuberculosis. In the present study, computational intelligence methods are used to classify T-cell M. tuberculosis epitopes. The caret feature selection approach is used to find out the set of relevant features. The ensemble model is designed by combining three models and is used to predict M. tuberculosis epitopes of variable length (7-40-mers). The proposed ensemble model achieves 82.0% accuracy, 0.89 specificity, 0.77 sensitivity with repeated k-fold cross-validation having average accuracy of 80.61%. The proposed ensemble model has been validated and compared with NetMHC 2.3, NetMHC 4.0 servers and CTLpred T-cell prediction server.


Subject(s)
Epitopes, T-Lymphocyte/chemistry , Mycobacterium tuberculosis/chemistry , T-Lymphocytes/immunology , Algorithms , Alleles , Area Under Curve , Artificial Intelligence , Computational Biology , Diagnostic Tests, Routine , Epitopes, B-Lymphocyte/chemistry , Humans , Machine Learning , Peptides/chemistry , Reproducibility of Results , Sensitivity and Specificity , Tuberculosis/microbiology
17.
Immunol Lett ; 184: 51-60, 2017 04.
Article in English | MEDLINE | ID: mdl-28214535

ABSTRACT

Identification of antigen for inducing specific class of antibody is prime objective in peptide based vaccine designs, immunodiagnosis, and antibody productions. It's urge to introduce a reliable system with high accuracy and efficiency for prediction. In the present study, a novel multilevel ensemble model is developed for prediction of antibodies IgG and IgA. Epitope length is important in training the model and it is efficient to use variable length of epitopes. In this ensemble approach, seven different machine learning models are combined to predict variable length of epitopes (4 to 50). The proposed model of IgG specific epitopes achieves 94.43% of accuracy and IgA specific epitopes achieves 97.56% of accuracy with repeated 10-fold cross validation. The proposed model is compared with the existing system i.e. IgPred model and outcome of proposed model is improved.


Subject(s)
Immunoglobulin A/chemistry , Immunoglobulin A/immunology , Immunoglobulin G/chemistry , Immunoglobulin G/immunology , Algorithms , Amino Acid Sequence , Area Under Curve , Epitopes/chemistry , Epitopes/immunology , Humans , Machine Learning , Models, Molecular , Protein Binding , Reproducibility of Results
18.
J Bioinform Comput Biol ; 13(2): 1550005, 2015 Apr.
Article in English | MEDLINE | ID: mdl-25524475

ABSTRACT

Physicochemical properties of proteins always guide to determine the quality of the protein structure, therefore it has been rigorously used to distinguish native or native-like structure from other predicted structures. In this work, we explore nine machine learning methods with six physicochemical properties to predict the Root Mean Square Deviation (RMSD), Template Modeling (TM-score), and Global Distance Test (GDT_TS-score) of modeled protein structure in the absence of its true native state. Physicochemical properties namely total surface area, euclidean distance (ED), total empirical energy, secondary structure penalty (SS), sequence length (SL), and pair number (PN) are used. There are a total of 95,091 modeled structures of 4896 native targets. A real coded Self-adaptive Differential Evolution algorithm (SaDE) is used to determine the feature importance. The K-fold cross validation is used to measure the robustness of the best predictive method. Through the intensive experiments, it is found that Random Forest method outperforms over other machine learning methods. This work makes the prediction faster and inexpensive. The performance result shows the prediction of RMSD, TM-score, and GDT_TS-score on Root Mean Square Error (RMSE) as 1.20, 0.06, and 0.06 respectively; correlation scores are 0.96, 0.92, and 0.91 respectively; R(2) are 0.92, 0.85, and 0.84 respectively; and accuracy are 78.82% (with ± 0.1 err), 86.56% (with ± 0.1 err), and 87.37% (with ± 0.1 err) respectively on the testing data set. The data set used in the study is available as supplement at http://bit.ly/RF-PCP-DataSets.


Subject(s)
Models, Molecular , Proteins/chemistry , Algorithms , Chemical Phenomena , Computational Biology , Computer Simulation , Databases, Protein/statistics & numerical data , Machine Learning , Protein Conformation , Quality Control
19.
Biochim Biophys Acta ; 1844(10): 1798-807, 2014 Oct.
Article in English | MEDLINE | ID: mdl-25062912

ABSTRACT

Root-mean-square-deviation (RMSD), of computationally-derived protein structures from experimentally determined structures, is a critical index to assessing protein-structure-prediction-algorithms (PSPAs). The development of PSPAs to obtain 0Å RMSD from native structures is considered central to computational biology. However, till date it has been quite challenging to measure how far a predicted protein structure is from its native - in the absence of a known experimental/native structure. In this work, we report the development of a metric "D2N" (distance to the native) - that predicts the "RMSD" of any structure without actually knowing the native structure. By combining physico-chemical properties and known universalities in spatial organization of soluble proteins to develop D2N, we demonstrate the ability to predict the distance of a proposed structure to within ±1.5Ǻ error with a remarkable average accuracy of 93.6% for structures below 5Ǻ from the native. We believe that this work opens up a completely new avenue towards assigning reliable structures to whole proteomes even in the absence of experimentally determined native structures. The D2N tool is freely available at http://www.scfbio-iitd.res.in/software/d2n.jsp.

SELECTION OF CITATIONS
SEARCH DETAIL
...