Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 21
Filter
Add more filters










Publication year range
1.
Comput Biol Med ; 170: 108081, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38295475

ABSTRACT

DNA-binding and RNA-binding proteins are essential to an organism's normal life cycle. These proteins have diverse functions in various biological processes. DNA-binding proteins are crucial for DNA replication, transcription, repair, packaging, and gene expression. Likewise, RNA-binding proteins are essential for the post-transcriptional control of RNAs and RNA metabolism. Identifying DNA- and RNA-binding residue is essential for biological research and understanding the pathogenesis of many diseases. However, most DNA-binding and RNA-binding proteins still need to be discovered. This research explored various properties of the protein sequences, such as amino acid composition type, Position-Specific Scoring Matrix (PSSM) values of amino acids, Hidden Markov model (HMM) profiles, physiochemical properties, structural properties, torsion angles, and disorder regions. We utilized a sliding window technique to extract more information from a target residue's neighbors. We proposed an optimized Light Gradient Boosting Machine (LightGBM) method, named DRBpred, to predict DNA-binding and RNA-binding residues from the protein sequence. DRBpred shows an improvement of 112.00 %, 33.33 %, and 6.49 % for the DNA-binding test set compared to the state-of-the-art method. It shows an improvement of 112.50 %, 16.67 %, and 7.46 % for the RNA-binding test set regarding Sensitivity, Mathews Correlation Coefficient (MCC), and AUC metric.


Subject(s)
Algorithms , Machine Learning , Amino Acids/chemistry , Amino Acids/metabolism , DNA-Binding Proteins/chemistry , DNA-Binding Proteins/metabolism , DNA/genetics , DNA/chemistry , RNA/genetics , RNA/chemistry , RNA/metabolism , RNA-Binding Proteins/genetics , RNA-Binding Proteins/chemistry , RNA-Binding Proteins/metabolism , Computational Biology/methods , Databases, Protein
2.
Biology (Basel) ; 12(7)2023 Jul 19.
Article in English | MEDLINE | ID: mdl-37508449

ABSTRACT

Protein molecules show varying degrees of flexibility throughout their three-dimensional structures. The flexibility is determined by the fluctuations in torsion angles, specifically phi (φ) and psi (ψ), which define the protein backbone. These angle fluctuations are derived from variations in backbone torsion angles observed in different models. By analyzing the fluctuations in Cartesian coordinate space, we can understand the structural flexibility of proteins. Predicting torsion angle fluctuations is valuable for determining protein function and structure when these angles act as constraints. In this study, a machine learning method called TAFPred is developed to predict torsion angle fluctuations using protein sequences directly. The method incorporates various features, such as disorder probability, position-specific scoring matrix profiles, secondary structure probabilities, and more. TAFPred, employing an optimized Light Gradient Boosting Machine Regressor (LightGBM), achieved high accuracy with correlation coefficients of 0.746 and 0.737 and mean absolute errors of 0.114 and 0.123 for the φ and ψ angles, respectively. Compared to the state-of-the-art method, TAFPred demonstrated significant improvements of 10.08% in MAE and 24.83% in PCC for the phi angle and 9.93% in MAE, and 22.37% in PCC for the psi angle.

3.
Bioinform Adv ; 3(1): vbad032, 2023.
Article in English | MEDLINE | ID: mdl-37038446

ABSTRACT

Motivation: Biological processes are regulated by underlying genes and their interactions that form gene regulatory networks (GRNs). Dysregulation of these GRNs can cause complex diseases such as cancer, Alzheimer's and diabetes. Hence, accurate GRN inference is critical for elucidating gene function, allowing for the faster identification and prioritization of candidate genes for functional investigation. Several statistical and machine learning-based methods have been developed to infer GRNs based on biological and synthetic datasets. Here, we developed a method named AGRN that infers GRNs by employing an ensemble of machine learning algorithms. Results: From the idea that a single method may not perform well on all datasets, we calculate the gene importance scores using three machine learning methods-random forest, extra tree and support vector regressors. We calculate the importance scores from Shapley Additive Explanations, a recently published method to explain machine learning models. We have found that the importance scores from Shapley values perform better than the traditional importance scoring methods based on almost all the benchmark datasets. We have analyzed the performance of AGRN using the datasets from the DREAM4 and DREAM5 challenges for GRN inference. The proposed method, AGRN-an ensemble machine learning method with Shapley values, outperforms the existing methods both in the DREAM4 and DREAM5 datasets. With improved accuracy, we believe that AGRN inferred GRNs would enhance our mechanistic understanding of biological processes in health and disease. Availabilityand implementation: https://github.com/DuaaAlawad/AGRN. Supplementary information: Supplementary data are available at Bioinformatics online.

4.
PLoS One ; 17(1): e0261613, 2022.
Article in English | MEDLINE | ID: mdl-35061733

ABSTRACT

Completing the genotype-to-phenotype map requires rigorous measurement of the entire multivariate organismal phenotype. However, phenotyping on a large scale is not feasible for many kinds of traits, resulting in missing data that can also cause problems for comparative analyses and the assessment of evolutionary trends across species. Measuring the multivariate performance phenotype is especially logistically challenging, and our ability to predict several performance traits from a given morphology is consequently poor. We developed a machine learning model to accurately estimate multivariate performance data from morphology alone by training it on a dataset containing performance and morphology data from 68 lizard species. Our final, stacked model predicts missing performance data accurately at the level of the individual from simple morphological measures. This model performed exceptionally well, even for performance traits that were missing values for >90% of the sampled individuals. Furthermore, incorporating phylogeny did not improve model fit, indicating that the phenotypic data alone preserved sufficient information to predict the performance based on morphological information. This approach can both significantly increase our understanding of performance evolution and act as a bridge to incorporate performance into future work on phenomics.


Subject(s)
Biological Evolution
5.
Bioinformatics ; 37(17): 2529-2536, 2021 Sep 09.
Article in English | MEDLINE | ID: mdl-33682878

ABSTRACT

MOTIVATION: Transposable Elements (TEs) or jumping genes are DNA sequences that have an intrinsic capability to move within a host genome from one genomic location to another. Studies show that the presence of a TE within or adjacent to a functional gene may alter its expression. TEs can also cause an increase in the rate of mutation and can even mediate duplications and large insertions and deletions in the genome, promoting gross genetic rearrangements. The proper classification of identified jumping genes is important for analyzing their genetic and evolutionary effects. An effective classifier, which can explain the role of TEs in germline and somatic evolution more accurately, is needed. In this study, we examine the performance of a variety of machine learning (ML) techniques and propose a robust method, ClassifyTE, for the hierarchical classification of TEs with high accuracy, using a stacking-based ML method. RESULTS: We propose a stacking-based approach for the hierarchical classification of TEs. When trained on three different benchmark datasets, our proposed system achieved 4%, 10.68% and 10.13% average percentage improvement (using the hF measure) compared to several state-of-the-art methods. We developed an end-to-end automated hierarchical classification tool based on the proposed approach, ClassifyTE, to classify TEs up to the super-family level. We further evaluated our method on a new TE library generated by a homology-based classification method and found relatively high concordance at higher taxonomic levels. Thus, ClassifyTE paves the way for a more accurate analysis of the role of TEs. AVAILABILITY AND IMPLEMENTATION: The source code and data are available at https://github.com/manisa/ClassifyTE. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

6.
Comput Biol Chem ; 91: 107436, 2021 Apr.
Article in English | MEDLINE | ID: mdl-33550156

ABSTRACT

The protein disulfide bond is a covalent bond that forms during post-translational modification by the oxidation of a pair of cysteines. In protein, the disulfide bond is the most frequent covalent link between amino acids after the peptide bond. It plays a significant role in three-dimensional (3D) ab initio protein structure prediction (aiPSP), stabilizing protein conformation, post-translational modification, and protein folding. In aiPSP, the location of disulfide bonds can strongly reduce the conformational space searching by imposing geometrical constraints. Existing experimental techniques for the determination of disulfide bonds are time-consuming and expensive. Thus, developing sequence-based computational methods for disulfide bond prediction becomes indispensable. This study proposed a stacking-based machine learning approach for disulfide bond prediction (diSBPred). Various useful sequence and structure-based features are extracted for effective training, including conservation profile, residue solvent accessibility, torsion angle flexibility, disorder probability, a sequential distance between cysteines, and more. The prediction of disulfide bonds is carried out in two stages: first, individual cysteines are predicted as either bonding or non-bonding; second, the cysteine-pairs are predicted as either bonding or non-bonding by including the results from cysteine bonding prediction as a feature. The examination of the relevance of the features employed in this study and the features utilized in the existing nearest neighbor algorithm (NNA) method shows that the features used in this study improve about 7.39 % in jackknife validation balanced accuracy. Moreover, for individual cysteine bonding prediction and cysteine-pair bonding prediction, diSBPred provides a 10-fold cross-validation balanced accuracy of 82.29 % and 94.20 %, respectively. Altogether, our predictor achieves an improvement of 43.25 % based on balanced accuracy compared to the existing NNA based approach. Thus, diSBPred can be utilized to annotate the cysteine bonding residues of protein sequences whose structures are unknown as well as improve the accuracy of the aiPSP method, which can further aid in experimental studies of the disulfide bond and structure determination.


Subject(s)
Disulfides/chemistry , Proteins/chemistry , Support Vector Machine , Algorithms , Amino Acid Sequence , Protein Conformation
7.
Carbohydr Res ; 486: 107857, 2019 Dec 01.
Article in English | MEDLINE | ID: mdl-31683069

ABSTRACT

Carbohydrate-binding proteins play vital roles in many important biological processes. The study of these protein-carbohydrate interactions, at residue level, is useful in treating many critical diseases. Analyzing the local sequential environments of the binding and non-binding regions to predict the protein-carbohydrate binding sites is one of the challenging problems in molecular and computational biology. Existing experimental methods for identifying protein-carbohydrate binding sites are laborious and expensive. Thus, prediction of such binding sites, directly from sequences, using computational methods, can be useful to fast annotate the binding sites and guide the experimental process. Because the number of carbohydrate-binding residues is significantly lower than the number of non-carbohydrate-binding residues, most of the methods developed for the prediction of protein-carbohydrate binding sites are biased towards over predicting the negative class (or non-carbohydrate-binding). Here, we propose a balanced predictor, called StackCBPred, which utilizes features, extracted from evolution-driven sequence profile, called the position-specific scoring matrix (PSSM) and several predicted structural properties of amino acids to effectively train a Stacking-based machine learning method for the accurate prediction of protein-carbohydrate binding sites (https://bmll.cs.uno.edu/).


Subject(s)
Carbohydrate Metabolism , Models, Molecular , Proteins/metabolism , Binding Sites , Protein Binding
8.
Methods Mol Biol ; 1958: 101-122, 2019.
Article in English | MEDLINE | ID: mdl-30945215

ABSTRACT

Supersecondary structure (SSS) refers to specific geometric arrangements of several secondary structure (SS) elements that are connected by loops. The SSS can provide useful information about the spatial structure and function of a protein. As such, the SSS is a bridge between the secondary structure and tertiary structure. In this chapter, we propose a stacking-based machine learning method for the prediction of two types of SSSs, namely, ß-hairpins and ß-α-ß, from the protein sequence based on comprehensive feature encoding. To encode protein residues, we utilize key features such as solvent accessibility, conservation profile, half surface exposure, torsion angle fluctuation, disorder probabilities, and more. The usefulness of the proposed approach is assessed using a widely used threefold cross-validation technique. The obtained empirical result shows that the proposed approach is useful and prediction can be improved further.


Subject(s)
Amino Acid Motifs , Computational Biology/methods , Proteins/chemistry , Algorithms , Amino Acid Sequence/genetics , Databases, Protein , Models, Molecular , Proteins/genetics , Software
9.
Bioinformatics ; 35(3): 433-441, 2019 02 01.
Article in English | MEDLINE | ID: mdl-30032213

ABSTRACT

Motivation: Identification of DNA-binding proteins from only sequence information is one of the most challenging problems in the field of genome annotation. DNA-binding proteins play an important role in various biological processes such as DNA replication, repair, transcription and splicing. Existing experimental techniques for identifying DNA-binding proteins are time-consuming and expensive. Thus, prediction of DNA-binding proteins from sequences alone using computational methods can be useful to quickly annotate and guide the experimental process. Most of the methods developed for predicting DNA-binding proteins use the information from the evolutionary profile, called the position-specific scoring matrix (PSSM) profile, alone and the accuracies of such methods have been limited. Here, we propose a method, called StackDPPred, which utilizes features extracted from PSSM and residue specific contact-energy to help train a stacking based machine learning method for the effective prediction of DNA-binding proteins. Results: Based on benchmark sequences of 1063 (518 DNA-binding and 545 non DNA-binding) proteins and using jackknife validation, StackDPPred achieved an ACC of 89.96%, MCC of 0.799 and AUC of 94.50%. This outcome outperforms several state-of-the-art approaches. Furthermore, when tested on recently designed two independent test datasets, StackDPPred outperforms existing approaches consistently. The proposed StackDPPred can be used for effective prediction of DNA-binding proteins from sequence alone. Availability and implementation: Online server is at http://bmll.cs.uno.edu/add and code-data is at http://cs.uno.edu/∼tamjid/Software/StackDPPred/code_data.zip. Supplementary information: Supplementary data are available at Bioinformatics online.


Subject(s)
DNA-Binding Proteins/genetics , Position-Specific Scoring Matrices , Sequence Analysis, Protein , Software , Computational Biology
10.
Bioinformatics ; 34(19): 3289-3299, 2018 10 01.
Article in English | MEDLINE | ID: mdl-29726965

ABSTRACT

Motivation: Machine learning plays a substantial role in bioscience owing to the explosive growth in sequence data and the challenging application of computational methods. Peptide-recognition domains (PRDs) are critical as they promote coupled-binding with short peptide-motifs of functional importance through transient interactions. It is challenging to build a reliable predictor of peptide-binding residue in proteins with diverse types of PRDs from protein sequence alone. On the other hand, it is vital to cope up with the sequencing speed and to broaden the scope of study. Results: In this paper, we propose a machine-learning-based tool, named PBRpredict, to predict residues in peptide-binding domains from protein sequence alone. To develop a generic predictor, we train the models on peptide-binding residues of diverse types of domains. As inputs to the models, we use a high-dimensional feature set of chemical, structural and evolutionary information extracted from protein sequence. We carefully investigate six different state-of-the-art classification algorithms for this application. Finally, we use the stacked generalization approach to non-linearly combine a set of complementary base-level learners using a meta-level learner which outperformed the winner-takes-all approach. The proposed predictor is found competitive based on statistical evaluation. Availability and implementation: PBRpredict-Suite software: http://cs.uno.edu/~tamjid/Software/PBRpredict/pbrpredict-suite.zip. Supplementary information: Supplementary data are available at Bioinformatics online.


Subject(s)
Amino Acid Sequence , Peptides/chemistry , Proteins/chemistry , Sequence Analysis, Protein , Software , Algorithms , Computational Biology
11.
PLoS One ; 11(9): e0161452, 2016.
Article in English | MEDLINE | ID: mdl-27588752

ABSTRACT

A set of features computed from the primary amino acid sequence of proteins, is crucial in the process of inducing a machine learning model that is capable of accurately predicting three-dimensional protein structures. Solutions for existing protein structure prediction problems are in need of features that can capture the complexity of molecular level interactions. With a view to this, we propose a novel approach to estimate position specific estimated energy (PSEE) of a residue using contact energy and predicted relative solvent accessibility (RSA). Furthermore, we demonstrate PSEE can be reasonably estimated based on sequence information alone. PSEE is useful in identifying the structured as well as unstructured or, intrinsically disordered region of a protein by computing favorable and unfavorable energy respectively, characterized by appropriate threshold. The most intriguing finding, verified empirically, is the indication that the PSEE feature can effectively classify disorder versus ordered residues and can segregate different secondary structure type residues by computing the constituent energies. PSEE values for each amino acid strongly correlate with the hydrophobicity value of the corresponding amino acid. Further, PSEE can be used to detect the existence of critical binding regions that essentially undergo disorder-to-order transitions to perform crucial biological functions. Towards an application of disorder prediction using the PSEE feature, we have rigorously tested and found that a support vector machine model informed by a set of features including PSEE consistently outperforms a model with an identical set of features with PSEE removed. In addition, the new disorder predictor, DisPredict2, shows competitive performance in predicting protein disorder when compared with six existing disordered protein predictors.


Subject(s)
Models, Molecular , Protein Conformation , Proteins/metabolism , Amino Acid Sequence , Binding Sites , Databases, Protein , Protein Structure, Secondary
12.
J Theor Biol ; 398: 112-21, 2016 06 07.
Article in English | MEDLINE | ID: mdl-27029514

ABSTRACT

The success of solving the protein folding and structure prediction problems in molecular and structural biology relies on an accurate energy function. With the rapid advancement in the computational biology and bioinformatics fields, there is a growing need of solving unknown fold and structure faster and thus an accurate energy function is indispensable. To address this need, we develop a new potential function, namely 3DIGARS3.0, which is a linearly weighted combination of 3DIGARS, mined accessible surface area (ASA) and ubiquitously computed Phi (uPhi) and Psi (uPsi) energies - optimized by a Genetic Algorithm (GA). We use a dataset of 4332 protein-structures to generate uPhi and uPsi based score libraries to be used within the core 3DIGARS method. The optimized weight of each component is obtained by applying Genetic Algorithm based optimization on three challenging decoy sets. The improved 3DIGARS3.0 outperformed state-of-the-art methods significantly based on a set of independent test datasets.


Subject(s)
Protein Conformation , Proteins/chemistry , Research Design , Databases, Protein , Solvents , Thermodynamics
13.
J Comput Chem ; 37(12): 1119-24, 2016 May 05.
Article in English | MEDLINE | ID: mdl-26849026

ABSTRACT

An important unsolved problem in molecular and structural biology is the protein folding and structure prediction problem. One major bottleneck for solving this is the lack of an accurate energy to discriminate near-native conformations against other possible conformations. Here we have developed sDFIRE energy function, which is an optimized linear combination of DFIRE (the Distance-scaled Finite Ideal gas Reference state based Energy), the orientation dependent (polar-polar and polar-nonpolar) statistical potentials, and the matching scores between predicted and model structural properties including predicted main-chain torsion angles and solvent accessible surface area. The weights for these scoring terms are optimized by three widely used decoy sets consisting of a total of 134 proteins. Independent tests on CASP8 and CASP9 decoy sets indicate that sDFIRE outperforms other state-of-the-art energy functions in selecting near native structures and in the Pearson's correlation coefficient between the energy score and structural accuracy of the model (measured by TM-score).


Subject(s)
Computational Biology , Proteins/chemistry , Thermodynamics , Algorithms , Protein Conformation , Protein Folding
14.
Comput Biol Chem ; 61: 162-77, 2016 Apr.
Article in English | MEDLINE | ID: mdl-26878130

ABSTRACT

Protein structure prediction is considered as one of the most challenging and computationally intractable combinatorial problem. Thus, the efficient modeling of convoluted search space, the clever use of energy functions, and more importantly, the use of effective sampling algorithms become crucial to address this problem. For protein structure modeling, an off-lattice model provides limited scopes to exercise and evaluate the algorithmic developments due to its astronomically large set of data-points. In contrast, an on-lattice model widens the scopes and permits studying the relatively larger proteins because of its finite set of data-points. In this work, we took the full advantage of an on-lattice model by using a face-centered-cube lattice that has the highest packing density with the maximum degree of freedom. We proposed a graded energy-strategically mixes the Miyazawa-Jernigan (MJ) energy with the hydrophobic-polar (HP) energy-based genetic algorithm (GA) for conformational search. In our application, we introduced a 2 × 2 HP energy guided macro-mutation operator within the GA to explore the best possible local changes exhaustively. Conversely, the 20 × 20 MJ energy model-the ultimate objective function of our GA that needs to be minimized-considers the impacts amongst the 20 different amino acids and allow searching the globally acceptable conformations. On a set of benchmark proteins, our proposed approach outperformed state-of-the-art approaches in terms of the free energy levels and the root-mean-square deviations.


Subject(s)
Algorithms , Proteins/chemistry , Models, Theoretical , Protein Conformation
15.
PLoS One ; 10(10): e0141551, 2015.
Article in English | MEDLINE | ID: mdl-26517719

ABSTRACT

Intrinsically disordered proteins or, regions perform important biological functions through their dynamic conformations during binding. Thus accurate identification of these disordered regions have significant implications in proper annotation of function, induced fold prediction and drug design to combat critical diseases. We introduce DisPredict, a disorder predictor that employs a single support vector machine with RBF kernel and novel features for reliable characterization of protein structure. DisPredict yields effective performance. In addition to 10-fold cross validation, training and testing of DisPredict was conducted with independent test datasets. The results were consistent with both the training and test error minimal. The use of multiple data sources, makes the predictor generic. The datasets used in developing the model include disordered regions of various length which are categorized as short and long having different compositions, different types of disorder, ranging from fully to partially disordered regions as well as completely ordered regions. Through comparison with other state of the art approaches and case studies, DisPredict is found to be a useful tool with competitive performance. DisPredict is available at https://github.com/tamjidul/DisPredict_v1.0.


Subject(s)
Algorithms , Intrinsically Disordered Proteins/chemistry , Support Vector Machine , Amino Acid Sequence , Area Under Curve , Crystallography, X-Ray , Datasets as Topic , Probability , Protein Structure, Secondary , ROC Curve
16.
J Theor Biol ; 380: 380-91, 2015 Sep 07.
Article in English | MEDLINE | ID: mdl-26092374

ABSTRACT

An accurate prediction of real value accessible surface area (ASA) from protein sequence alone has wide application in the field of bioinformatics and computational biology. ASA has been helpful in understanding the 3-dimensional structure and function of a protein, acting as high impact feature in secondary structure prediction, disorder prediction, binding region identification and fold recognition applications. To enhance and support broad applications of ASA, we have made an attempt to improve the prediction accuracy of absolute accessible surface area by developing a new predictor paradigm, namely REGAd(3)p, for real value prediction through classical Exact Regression with Regularization and polynomial kernel of degree 3 which was further optimized using Genetic Algorithm. ASA assisting effective energy function, motivated us to enhance the accuracy of predicted ASA for better energy function application. Our ASA prediction paradigm was trained and tested using a new benchmark dataset, proposed in this work, consisting of 1001 and 298 protein chains, respectively. We achieved maximum Pearson Correlation Coefficient (PCC) of 0.76 and 1.45% improved PCC when compared with existing top performing predictor, SPINE-X, in ASA prediction on independent test set. Furthermore, we modeled the error between actual and predicted ASA in terms of energy and combined this energy linearly with the energy function 3DIGARS which resulted in an effective energy function, namely 3DIGARS2.0, outperforming all the state-of-the-art energy functions. Based on Rosetta and Tasser decoy-sets 3DIGARS2.0 resulted 80.78%, 73.77%, 141.24%, 16.52%, and 32.32% improvement over DFIRE, RWplus, dDFIRE, GOAP and 3DIGARS respectively.


Subject(s)
Models, Theoretical , Surface Properties , Amino Acids/chemistry , Molecular Structure
17.
Adv Bioinformatics ; 2014: 985968, 2014.
Article in English | MEDLINE | ID: mdl-24744779

ABSTRACT

Protein structure prediction is computationally a very challenging problem. A large number of existing search algorithms attempt to solve the problem by exploring possible structures and finding the one with the minimum free energy. However, these algorithms perform poorly on large sized proteins due to an astronomically wide search space. In this paper, we present a multipoint spiral search framework that uses parallel processing techniques to expedite exploration by starting from different points. In our approach, a set of random initial solutions are generated and distributed to different threads. We allow each thread to run for a predefined period of time. The improved solutions are stored threadwise. When the threads finish, the solutions are merged together and the duplicates are removed. A selected distinct set of solutions are then split to different threads again. In our ab initio protein structure prediction method, we use the three-dimensional face-centred-cubic lattice for structure-backbone mapping. We use both the low resolution hydrophobic-polar energy model and the high-resolution 20 × 20 energy model for search guiding. The experimental results show that our new parallel framework significantly improves the results obtained by the state-of-the-art single-point search approaches for both energy models on three-dimensional face-centred-cubic lattice. We also experimentally show the effectiveness of mixing energy models within parallel threads.

18.
Biomed Res Int ; 2013: 924137, 2013.
Article in English | MEDLINE | ID: mdl-24224180

ABSTRACT

Protein structure prediction (PSP) is computationally a very challenging problem. The challenge largely comes from the fact that the energy function that needs to be minimised in order to obtain the native structure of a given protein is not clearly known. A high resolution 20 × 20 energy model could better capture the behaviour of the actual energy function than a low resolution energy model such as hydrophobic polar. However, the fine grained details of the high resolution interaction energy matrix are often not very informative for guiding the search. In contrast, a low resolution energy model could effectively bias the search towards certain promising directions. In this paper, we develop a genetic algorithm that mainly uses a high resolution energy model for protein structure evaluation but uses a low resolution HP energy model in focussing the search towards exploring structures that have hydrophobic cores. We experimentally show that this mixing of energy models leads to significant lower energy structures compared to the state-of-the-art results.


Subject(s)
Computational Biology/methods , Models, Molecular , Protein Conformation , Proteins/chemistry , Algorithms , Amino Acid Sequence , Hydrophobic and Hydrophilic Interactions , Protein Folding
19.
PLoS One ; 8(11): e79865, 2013.
Article in English | MEDLINE | ID: mdl-24278197

ABSTRACT

Three-dimensional (3D) in vitro cell based assays for Prostate Cancer (PCa) research are rapidly becoming the preferred alternative to that of conventional 2D monolayer cultures. 3D assays more precisely mimic the microenvironment found in vivo, and thus are ideally suited to evaluate compounds and their suitability for progression in the drug discovery pipeline. To achieve the desired high throughput needed for most screening programs, automated quantification of 3D cultures is required. Towards this end, this paper reports on the development of a prototype analysis module for an automated high-content-analysis (HCA) system, which allows for accurate and fast investigation of in vitro 3D cell culture models for PCa. The Java based program, which we have named PCaAnalyser, uses novel algorithms that allow accurate and rapid quantitation of protein expression in 3D cell culture. As currently configured, the PCaAnalyser can quantify a range of biological parameters including: nuclei-count, nuclei-spheroid membership prediction, various function based classification of peripheral and non-peripheral areas to measure expression of biomarkers and protein constituents known to be associated with PCa progression, as well as defining segregate cellular-objects effectively for a range of signal-to-noise ratios. In addition, PCaAnalyser architecture is highly flexible, operating as a single independent analysis, as well as in batch mode; essential for High-Throughput-Screening (HTS). Utilising the PCaAnalyser, accurate and rapid analysis in an automated high throughput manner is provided, and reproducible analysis of the distribution and intensity of well-established markers associated with PCa progression in a range of metastatic PCa cell-lines (DU145 and PC3) in a 3D model demonstrated.


Subject(s)
Algorithms , Image Processing, Computer-Assisted/methods , Prostatic Neoplasms/pathology , Cell Line, Tumor , Humans , Immunohistochemistry , Male , Signal-To-Noise Ratio , Spheroids, Cellular/cytology
20.
BMC Bioinformatics ; 14 Suppl 2: S16, 2013.
Article in English | MEDLINE | ID: mdl-23368706

ABSTRACT

BACKGROUND: Protein structure prediction is an important but unsolved problem in biological science. Predicted structures vary much with energy functions and structure-mapping spaces. In our simplified ab initio protein structure prediction methods, we use hydrophobic-polar (HP) energy model for structure evaluation, and 3-dimensional face-centred-cubic lattice for structure mapping. For HP energy model, developing a compact hydrophobic-core (H-core) is essential for the progress of the search. The H-core helps find a stable structure with the lowest possible free energy. RESULTS: In order to build H-cores, we present a new Spiral Search algorithm based on tabu-guided local search. Our algorithm uses a novel H-core directed guidance heuristic that squeezes the structure around a dynamic hydrophobic-core centre. We applied random walks to break premature H-cores and thus to avoid early convergence. We also used a novel relay-restart technique to handle stagnation. CONCLUSIONS: We have tested our algorithms on a set of benchmark protein sequences. The experimental results show that our spiral search algorithm outperforms the state-of-the-art local search algorithms for simplified protein structure prediction. We also experimentally show the effectiveness of the relay-restart.


Subject(s)
Algorithms , Models, Theoretical , Protein Conformation , Proteins/chemistry , Amino Acid Sequence , Hydrophobic and Hydrophilic Interactions
SELECTION OF CITATIONS
SEARCH DETAIL
...