Search | VHL Regional Portal

From Optimization to Mapping: An Evolutionary Algorithm for Protein Energy Landscapes.

Sapin, Emmanuel; De Jong, Kenneth A; Shehu, Amarda.

IEEE/ACM Trans Comput Biol Bioinform ; 15(3): 719-731, 2018.

Article in English | MEDLINE | ID: mdl-28113951

ABSTRACT

Stochastic search is often the only viable option to address complex optimization problems. Recently, evolutionary algorithms have been shown to handle challenging continuous optimization problems related to protein structure modeling. Building on recent work in our laboratories, we propose an evolutionary algorithm for efficiently mapping the multi-basin energy landscapes of dynamic proteins that switch between thermodynamically stable or semi-stable structural states to regulate their biological activity in the cell. The proposed algorithm balances computational resources between exploration and exploitation of the nonlinear, multimodal landscapes that characterize multi-state proteins via a novel combination of global and local search to generate a dynamically-updated, information-rich map of a protein's energy landscape. This new mapping-oriented EA is applied to several dynamic proteins and their disease-implicated variants to illustrate its ability to map complex energy landscapes in a computationally feasible manner. We further show that, given the availability of such maps, comparison between the maps of wildtype and variants of a protein allows for the formulation of a structural and thermodynamic basis for the impact of sequence mutations on dysfunction that may prove useful in guiding further wet-laboratory investigations of dysfunction and molecular interventions.

Subject(s)

Algorithms , Computational Biology/methods , Protein Conformation , Proteins/chemistry , Proteins/genetics , Humans , Models, Molecular , Thermodynamics

Computing energy landscape maps and structural excursions of proteins.

Sapin, Emmanuel; Carr, Daniel B; De Jong, Kenneth A; Shehu, Amarda.

BMC Genomics ; 17 Suppl 4: 546, 2016 08 18.

Article in English | MEDLINE | ID: mdl-27535545

ABSTRACT

BACKGROUND: Structural excursions of a protein at equilibrium are key to biomolecular recognition and function modulation. Protein modeling research is driven by the need to aid wet laboratories in characterizing equilibrium protein dynamics. In principle, structural excursions of a protein can be directly observed via simulation of its dynamics, but the disparate temporal scales involved in such excursions make this approach computationally impractical. On the other hand, an informative representation of the structure space available to a protein at equilibrium can be obtained efficiently via stochastic optimization, but this approach does not directly yield information on equilibrium dynamics. METHODS: We present here a novel methodology that first builds a multi-dimensional map of the energy landscape that underlies the structure space of a given protein and then queries the computed map for energetically-feasible excursions between structures of interest. An evolutionary algorithm builds such maps with a practical computational budget. Graphical techniques analyze a computed multi-dimensional map and expose interesting features of an energy landscape, such as basins and barriers. A path searching algorithm then queries a nearest-neighbor graph representation of a computed map for energetically-feasible basin-to-basin excursions. RESULTS: Evaluation is conducted on intrinsically-dynamic proteins of importance in human biology and disease. Visual statistical analysis of the maps of energy landscapes computed by the proposed methodology reveals features already captured in the wet laboratory, as well as new features indicative of interesting, unknown thermodynamically-stable and semi-stable regions of the equilibrium structure space. Comparison of maps and structural excursions computed by the proposed methodology on sequence variants of a protein sheds light on the role of equilibrium structure and dynamics in the sequence-function relationship. CONCLUSIONS: Applications show that the proposed methodology is effective at locating basins in complex energy landscapes and computing basin-basin excursions of a protein with a practical computational budget. While the actual temporal scales spanned by a structural excursion cannot be directly obtained due to the foregoing of simulation of dynamics, hypotheses can be formulated regarding the impact of sequence mutations on protein function. These hypotheses are valuable in instigating further research in wet laboratories.

Subject(s)

Computational Biology/methods , Protein Conformation , Proteins/chemistry , Algorithms , Cluster Analysis , Humans , Models, Molecular , Thermodynamics

An evolutionary algorithm approach for feature generation from sequence data and its application to DNA splice site prediction.

Kamath, Uday; Compton, Jack; Islamaj-Dogan, Rezarta; De Jong, Kenneth A; Shehu, Amarda.

IEEE/ACM Trans Comput Biol Bioinform ; 9(5): 1387-98, 2012.

Article in English | MEDLINE | ID: mdl-22508909

ABSTRACT

Associating functional information with biological sequences remains a challenge for machine learning methods. The performance of these methods often depends on deriving predictive features from the sequences sought to be classified. Feature generation is a difficult problem, as the connection between the sequence features and the sought property is not known a priori. It is often the task of domain experts or exhaustive feature enumeration techniques to generate a few features whose predictive power is then tested in the context of classification. This paper proposes an evolutionary algorithm to effectively explore a large feature space and generate predictive features from sequence data. The effectiveness of the algorithm is demonstrated on an important component of the gene-finding problem, DNA splice site prediction. This application is chosen due to the complexity of the features needed to obtain high classification accuracy and precision. Our results test the effectiveness of the obtained features in the context of classification by Support Vector Machines and show significant improvement in accuracy and precision over state-of-the-art approaches.

Subject(s)

Algorithms , Computational Biology/methods , DNA/chemistry , Sequence Analysis, DNA/methods , Pattern Recognition, Automated/methods , RNA Splicing

A two-stage evolutionary approach for effective classification of hypersensitive DNA sequences.

Kamath, Uday; Shehu, Amarda; De Jong, Kenneth A.

J Bioinform Comput Biol ; 9(3): 399-413, 2011 Jun.

Article in English | MEDLINE | ID: mdl-21714132

ABSTRACT

Hypersensitive (HS) sites in genomic sequences are reliable markers of DNA regulatory regions that control gene expression. Annotation of regulatory regions is important in understanding phenotypical differences among cells and diseases linked to pathologies in protein expression. Several computational techniques are devoted to mapping out regulatory regions in DNA by initially identifying HS sequences. Statistical learning techniques like Support Vector Machines (SVM), for instance, are employed to classify DNA sequences as HS or non-HS. This paper proposes a method to automate the basic steps in designing an SVM that improves the accuracy of such classification. The method proceeds in two stages and makes use of evolutionary algorithms. An evolutionary algorithm first designs optimal sequence motifs to associate explicit discriminating feature vectors with input DNA sequences. A second evolutionary algorithm then designs SVM kernel functions and parameters that optimally separate the HS and non-HS classes. Results show that this two-stage method significantly improves SVM classification accuracy. The method promises to be generally useful in automating the analysis of biological sequences, and we post its source code on our website.

Subject(s)

Algorithms , DNA/genetics , Sequence Analysis, DNA/statistics & numerical data , Artificial Intelligence , Computational Biology , DNA/chemistry , DNA/classification , Deoxyribonuclease I , Evolution, Molecular , Models, Genetic , Regulatory Elements, Transcriptional , Software

On the choice of the offspring population size in evolutionary algorithms.

Jansen, Thomas; De Jong, Kenneth A; Wegener, Ingo.

Evol Comput ; 13(4): 413-40, 2005.

Article in English | MEDLINE | ID: mdl-16297278

ABSTRACT

Evolutionary algorithms (EAs) generally come with a large number of parameters that have to be set before the algorithm can be used. Finding appropriate settings is a difficult task. The influence of these parameters on the efficiency of the search performed by an evolutionary algorithm can be very high. But there is still a lack of theoretically justified guidelines to help the practitioner find good values for these parameters. One such parameter is the offspring population size. Using a simplified but still realistic evolutionary algorithm, a thorough analysis of the effects of the offspring population size is presented. The result is a much better understanding of the role of offspring population size in an EA and suggests a simple way to dynamically adapt this parameter when necessary.

Subject(s)

Algorithms , Biological Evolution , Computer Simulation , Models, Theoretical , Population Density , Mutation/genetics , Selection, Genetic

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL