Search | VHL Regional Portal

Fast and Efficient Black Box Optimization Using the Parameter-less Population Pyramid.

Goldman, B W; Punch, W F.

Evol Comput ; 23(3): 451-79, 2015.

Article in English | MEDLINE | ID: mdl-25781724

ABSTRACT

The parameter-less population pyramid (P3) is a recently introduced method for performing evolutionary optimization without requiring any user-specified parameters. P3's primary innovation is to replace the generational model with a pyramid of multiple populations that are iteratively created and expanded. In combination with local search and advanced crossover, P3 scales to problem difficulty, exploiting previously learned information before adding more diversity. Across seven problems, each tested using on average 18 problem sizes, P3 outperformed all five advanced comparison algorithms. This improvement includes requiring fewer evaluations to find the global optimum and better fitness when using the same number of evaluations. Using both algorithm analysis and comparison, we find P3's effectiveness is due to its ability to properly maintain, add, and exploit diversity. Unlike the best comparison algorithms, P3 was able to achieve this quality without any problem-specific tuning. Thus, unlike previous parameter-less methods, P3 does not sacrifice quality for applicability. Therefore we conclude that P3 is an efficient, general, parameter-less approach to black box optimization which is more effective than existing state-of-the-art techniques.

Subject(s)

Algorithms , Models, Theoretical , Bayes Theorem , Genetic Linkage

The Liga algorithm for ab initio determination of nanostructure.

Juhás, P; Granlund, L; Duxbury, P M; Punch, W F; Billinge, S J L.

Acta Crystallogr A ; 64(Pt 6): 631-40, 2008 Nov.

Article in English | MEDLINE | ID: mdl-18931419

ABSTRACT

Computational techniques for nanostructure determination of substances that resist standard crystallographic methods are often laborious processes starting from initial guess solutions not derived from experimental data. The Liga algorithm can create nanostructures using only lists of lengths or distances between atom pairs, providing an experimental basis for starting structures. These distance lists may be extracted from a variety of experimental probes and we illustrate the procedure with distances determined from the pair distribution function. Candidate subclusters that are a subset of a structure's atoms compete based on adherence to the length list. Atoms are added to well performing candidates and removed from poor ones, until a complete structure with sufficient agreement to the length list emerges. The Liga algorithm is shown to reliably recreate Lennard-Jones clusters from ideal length lists and the C60 structure from neutron-scattering data. The correct fullerene structure was obtained with experimental data which missed several distances and had loosened constraints on distance multiplicity. This suggests that the Liga algorithm may have robust applicability for a wide range of nanostructures even in the absence of ideal data.

Ab initio determination of solid-state nanostructure.

Juhás, P; Cherba, D M; Duxbury, P M; Punch, W F; Billinge, S J L.

Nature ; 440(7084): 655-8, 2006 Mar 30.

Article in English | MEDLINE | ID: mdl-16572167

ABSTRACT

Advances in materials science and molecular biology followed rapidly from the ability to characterize atomic structure using single crystals. Structure determination is more difficult if single crystals are not available. Many complex inorganic materials that are of interest in nanotechnology have no periodic long-range order and so their structures cannot be solved using crystallographic methods. Here we demonstrate that ab initio structure solution of these nanostructured materials is feasible using diffraction data in combination with distance geometry methods. Precise, sub-ångström resolution distance data are experimentally available from the atomic pair distribution function (PDF). Current PDF analysis consists of structure refinement from reasonable initial structure guesses and it is not clear, a priori, that sufficient information exists in the PDF to obtain a unique structural solution. Here we present and validate two algorithms for structure reconstruction from precise unassigned interatomic distances for a range of clusters. We then apply the algorithms to find a unique, ab initio, structural solution for C60 from PDF data alone. This opens the door to sub-ångström resolution structure solution of nanomaterials, even when crystallographic methods fail.

Knowledge discovery in medical and biological datasets using a hybrid Bayes classifier/evolutionary algorithm.

Raymer, M L; Doom, T E; Kuhn, L A; Punch, W F.

IEEE Trans Syst Man Cybern B Cybern ; 33(5): 802-13, 2003.

Article in English | MEDLINE | ID: mdl-18238233

ABSTRACT

A key element of bioinformatics research is the extraction of meaningful information from large experimental data sets. Various approaches, including statistical and graph theoretical methods, data mining, and computational pattern recognition, have been applied to this task with varying degrees of success. Using a novel classifier based on the Bayes discriminant function, we present a hybrid algorithm that employs feature selection and extraction to isolate salient features from large medical and other biological data sets. We have previously shown that a genetic algorithm coupled with a k-nearest-neighbors classifier performs well in extracting information about protein-water binding from X-ray crystallographic protein structure data. The effectiveness of the hybrid EC-Bayes classifier is demonstrated to distinguish the features of this data set that are the most statistically relevant and to weight these features appropriately to aid in the prediction of solvation sites.

Comparisons of likelihood and machine learning methods of individual classification.

Guinand, B; Topchy, A; Page, K S; Burnham-Curtis, M K; Punch, W F; Scribner, K T.

J Hered ; 93(4): 260-9, 2002.

Article in English | MEDLINE | ID: mdl-12407212

ABSTRACT

Classification methods used in machine learning (e.g., artificial neural networks, decision trees, and k-nearest neighbor clustering) are rarely used with population genetic data. We compare different nonparametric machine learning techniques with parametric likelihood estimations commonly employed in population genetics for purposes of assigning individuals to their population of origin ("assignment tests"). Classifier accuracy was compared across simulated data sets representing different levels of population differentiation (low and high F(ST)), number of loci surveyed (5 and 10), and allelic diversity (average of three or eight alleles per locus). Empirical data for the lake trout (Salvelinus namaycush) exhibiting levels of population differentiation comparable to those used in simulations were examined to further evaluate and compare classification methods. Classification error rates associated with artificial neural networks and likelihood estimators were lower for simulated data sets compared to k-nearest neighbor and decision tree classifiers over the entire range of parameters considered. Artificial neural networks only marginally outperformed the likelihood method for simulated data (0-2.8% lower error rates). The relative performance of each machine learning classifier improved relative likelihood estimators for empirical data sets, suggesting an ability to "learn" and utilize properties of empirical genotypic arrays intrinsic to each population. Likelihood-based estimation methods provide a more accessible option for reliable assignment of individuals to the population of origin due to the intricacies in development and evaluation of artificial neural networks.

Subject(s)

Artificial Intelligence , Genetics, Population/methods , Likelihood Functions , Animals , Data Interpretation, Statistical

Predicting conserved water-mediated and polar ligand interactions in proteins using a K-nearest-neighbors genetic algorithm.

Raymer, M L; Sanschagrin, P C; Punch, W F; Venkataraman, S; Goodman, E D; Kuhn, L A.

J Mol Biol ; 265(4): 445-64, 1997 Jan 31.

Article in English | MEDLINE | ID: mdl-9034363

ABSTRACT

Water-mediated ligand interactions are essential to biological processes, from product displacement in thymidylate synthase to DNA recognition by Trp repressor, yet the structural chemistry influencing whether bound water is displaced or participates in ligand binding is not well characterized. Consolv, employing a hybrid k-nearest-neighbors classifier/genetic algorithm, predicts bound water molecules conserved between free and ligand-bound protein structures by examining the environment of each water molecule in the free structure. Four environmental features are used: the water molecule's crystallographic temperature factor, the number of hydrogen bonds between the water molecule and protein, and the density and hydrophilicity of neighboring protein atoms. After training on 13 non-homologous proteins, Consolv predicted the conservation of active-site water molecules upon ligand binding with 75% accuracy (Matthews coefficient Cm = 0.41) for seven new proteins. Mispredictions typically involved water molecules predicted to be conserved that were displaced by a polar ligand atom, indicating that Consolv correctly assesses polar binding sites; 90% accuracy (Cm = 0.78) was achieved for predicting conserved active-site water or polar ligand atom binding. Consolv thus provides an accurate means for optimizing ligand design by identifying sites favored to be occupied by either a mediating water molecule or a polar ligand atom, as well as water molecules likely to be displaced by the ligand. Accuracy for predicting first-shell water conservation between independently determined structures was 61% (Cm=0.23). The ability to predict water-mediated and polar interactions from the free protein structure indicates the surprising extent to which the conservation or displacement of active-site bound water is independent of the ligand, and shows that the protein micro-environment of each water molecule is the dominant influence.

Subject(s)

Algorithms , Ligands , Models, Molecular , Proteins/chemistry , Water/chemistry , Animals , Binding Sites , Humans , Solvents/chemistry , Statistics as Topic

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL