Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 25
Filter
1.
Environ Sci Technol ; 57(46): 17818-17830, 2023 Nov 21.
Article in English | MEDLINE | ID: mdl-37315216

ABSTRACT

Toxicological information as needed for risk assessments of chemical compounds is often sparse. Unfortunately, gathering new toxicological information experimentally often involves animal testing. Simulated alternatives, e.g., quantitative structure-activity relationship (QSAR) models, are preferred to infer the toxicity of new compounds. Aquatic toxicity data collections consist of many related tasks─each predicting the toxicity of new compounds on a given species. Since many of these tasks are inherently low-resource, i.e., involve few associated compounds, this is challenging. Meta-learning is a subfield of artificial intelligence that can lead to more accurate models by enabling the utilization of information across tasks. In our work, we benchmark various state-of-the-art meta-learning techniques for building QSAR models, focusing on knowledge sharing between species. Specifically, we employ and compare transformational machine learning, model-agnostic meta-learning, fine-tuning, and multi-task models. Our experiments show that established knowledge-sharing techniques outperform single-task approaches. We recommend the use of multi-task random forest models for aquatic toxicity modeling, which matched or exceeded the performance of other approaches and robustly produced good results in the low-resource settings we studied. This model functions on a species level, predicting toxicity for multiple species across various phyla, with flexible exposure duration and on a large chemical applicability domain.


Subject(s)
Artificial Intelligence , Quantitative Structure-Activity Relationship , Animals , Fishes
2.
J Chem Inf Model ; 60(9): 4283-4295, 2020 09 28.
Article in English | MEDLINE | ID: mdl-32343143

ABSTRACT

Kinases are frequently studied in the context of anticancer drugs. Their involvement in cell responses, such as proliferation, differentiation, and apoptosis, makes them interesting subjects in multitarget drug design. In this study, a workflow is presented that models the bioactivity spectra for two panels of kinases: (1) inhibition of RET, BRAF, SRC, and S6K, while avoiding inhibition of MKNK1, TTK, ERK8, PDK1, and PAK3, and (2) inhibition of AURKA, PAK1, FGFR1, and LKB1, while avoiding inhibition of PAK3, TAK1, and PIK3CA. Both statistical and structure-based models were included, which were thoroughly benchmarked and optimized. A virtual screening was performed to test the workflow for one of the main targets, RET kinase. This resulted in 5 novel and chemically dissimilar RET inhibitors with remaining RET activity of <60% (at a concentration of 10 µM) and similarities with known RET inhibitors from 0.18 to 0.29 (Tanimoto, ECFP6). The four more potent inhibitors were assessed in a concentration range and proved to be modestly active with a pIC50 value of 5.1 for the most active compound. The experimental validation of inhibitors for RET strongly indicates that the multitarget workflow is able to detect novel inhibitors for kinases, and hence, this workflow can potentially be applied in polypharmacology modeling. We conclude that this approach can identify new chemical matter for existing targets. Moreover, this workflow can easily be applied to other targets as well.


Subject(s)
Antineoplastic Agents , Proto-Oncogene Proteins c-ret , Antineoplastic Agents/pharmacology , Drug Design , Polypharmacology , Protein Kinase Inhibitors/pharmacology
3.
Evol Comput ; 27(1): 3-45, 2019.
Article in English | MEDLINE | ID: mdl-30475672

ABSTRACT

It has long been observed that for practically any computational problem that has been intensely studied, different instances are best solved using different algorithms. This is particularly pronounced for computationally hard problems, where in most cases, no single algorithm defines the state of the art; instead, there is a set of algorithms with complementary strengths. This performance complementarity can be exploited in various ways, one of which is based on the idea of selecting, from a set of given algorithms, for each problem instance to be solved the one expected to perform best. The task of automatically selecting an algorithm from a given set is known as the per-instance algorithm selection problem and has been intensely studied over the past 15 years, leading to major improvements in the state of the art in solving a growing number of discrete combinatorial problems, including propositional satisfiability and AI planning. Per-instance algorithm selection also shows much promise for boosting performance in solving continuous and mixed discrete/continuous optimisation problems. This survey provides an overview of research in automated algorithm selection, ranging from early and seminal works to recent and promising application areas. Different from earlier work, it covers applications to discrete and continuous problems, and discusses algorithm selection in context with conceptually related approaches, such as algorithm configuration, scheduling, or portfolio selection. Since informative and cheaply computable problem instance features provide the basis for effective per-instance algorithm selection systems, we also provide an overview of such features for discrete and continuous problems. Finally, we provide perspectives on future work in the area and discuss a number of open research challenges.


Subject(s)
Algorithms , Computer Simulation , Information Storage and Retrieval/methods , Pattern Recognition, Automated/methods , Decision Support Techniques , Humans , Surveys and Questionnaires
4.
Evol Comput ; 27(1): 147-171, 2019.
Article in English | MEDLINE | ID: mdl-30407875

ABSTRACT

Automatic algorithm configuration (AAC) is becoming a key ingredient in the design of high-performance solvers for challenging optimisation problems. However, most existing work on AAC deals with configuration procedures that optimise a single performance metric of a given, single-objective algorithm. Of course, these configurators can also be used to optimise the performance of multi-objective algorithms, as measured by a single performance indicator. In this work, we demonstrate that better results can be obtained by using a native, multi-objective algorithm configuration procedure. Specifically, we compare three AAC approaches: one considering only the hypervolume indicator, a second optimising the weighted sum of hypervolume and spread, and a third that simultaneously optimises these complementary indicators, using a genuinely multi-objective approach. We assess these approaches by applying them to a highly-parametric local search framework for two widely studied multi-objective optimisation problems, the bi-objective permutation flowshop and travelling salesman problems. Our results show that multi-objective algorithms are indeed best configured using a multi-objective configurator.


Subject(s)
Algorithms , Computer Simulation , Information Storage and Retrieval/methods , Models, Theoretical , Pattern Recognition, Automated/methods , Problem Solving , Humans
5.
Evol Comput ; 26(4): 597-620, 2018.
Article in English | MEDLINE | ID: mdl-28836836

ABSTRACT

The Travelling Salesperson Problem (TSP) is one of the best-studied NP-hard problems. Over the years, many different solution approaches and solvers have been developed. For the first time, we directly compare five state-of-the-art inexact solvers-namely, LKH, EAX, restart variants of those, and MAOS-on a large set of well-known benchmark instances and demonstrate complementary performance, in that different instances may be solved most effectively by different algorithms. We leverage this complementarity to build an algorithm selector, which selects the best TSP solver on a per-instance basis and thus achieves significantly improved performance compared to the single best solver, representing an advance in the state of the art in solving the Euclidean TSP. Our in-depth analysis of the selectors provides insight into what drives this performance improvement.

6.
IEEE Trans Vis Comput Graph ; 20(11): 1507-18, 2014 Nov.
Article in English | MEDLINE | ID: mdl-26355330

ABSTRACT

Our homes and workspaces are filled with collections of dozens of artifacts laid out on surfaces such as shelves, counters, and mantles. The content and layout of these arrangements reflect both context, e.g., kitchen or living room, and style, e.g., neat or messy. Manually assembling such arrangements in virtual scenes is highly time consuming, especially when one needs to generate multiple diverse arrangements for numerous support surfaces and living spaces. We present a data-driven method especially designed for artifact arrangement which automatically populates empty surfaces with diverse believable arrangements of artifacts in a given style. The input to our method is an annotated photograph or a 3D model of an exemplar arrangement, that reflects the desired context and style. Our method leverages this exemplar to generate diverse arrangements reflecting the exemplar style for arbitrary furniture setups and layout dimensions. To simultaneously achieve scalability, diversity and style preservation, we define a valid solution space of arrangements that reflect the input style. We obtain solutions within this space using barrier functions and stochastic optimization.

7.
BMC Bioinformatics ; 14: 139, 2013 Apr 24.
Article in English | MEDLINE | ID: mdl-23617269

ABSTRACT

BACKGROUND: Accurate structure prediction methods play an important role for the understanding of RNA function. Energy-based, pseudoknot-free secondary structure prediction is one of the most widely used and versatile approaches, and improved methods for this task have received much attention over the past five years. Despite the impressive progress that as been achieved in this area, existing evaluations of the prediction accuracy achieved by various algorithms do not provide a comprehensive, statistically sound assessment. Furthermore, while there is increasing evidence that no prediction algorithm consistently outperforms all others, no work has been done to exploit the complementary strengths of multiple approaches. RESULTS: In this work, we present two contributions to the area of RNA secondary structure prediction. Firstly, we use state-of-the-art, resampling-based statistical methods together with a previously published and increasingly widely used dataset of high-quality RNA structures to conduct a comprehensive evaluation of existing RNA secondary structure prediction procedures. The results from this evaluation clarify the performance relationship between ten well-known existing energy-based pseudoknot-free RNA secondary structure prediction methods and clearly demonstrate the progress that has been achieved in recent years. Secondly, we introduce AveRNA, a generic and powerful method for combining a set of existing secondary structure prediction procedures into an ensemble-based method that achieves significantly higher prediction accuracies than obtained from any of its component procedures. CONCLUSIONS: Our new, ensemble-based method, AveRNA, improves the state of the art for energy-based, pseudoknot-free RNA secondary structure prediction by exploiting the complementary strengths of multiple existing prediction procedures, as demonstrated using a state-of-the-art statistical resampling approach. In addition, AveRNA allows an intuitive and effective control of the trade-off between false negative and false positive base pair predictions. Finally, AveRNA can make use of arbitrary sets of secondary structure prediction procedures and can therefore be used to leverage improvements in prediction accuracy offered by algorithms and energy models developed in the future. Our data, MATLAB software and a web-based version of AveRNA are publicly available at http://www.cs.ubc.ca/labs/beta/Software/AveRNA.


Subject(s)
Algorithms , RNA/chemistry , Nucleic Acid Conformation , Software
8.
Cytometry A ; 81(12): 1022-30, 2012 Dec.
Article in English | MEDLINE | ID: mdl-23044634

ABSTRACT

Analysis of high-dimensional flow cytometry datasets can reveal novel cell populations with poorly understood biology. Following discovery, characterization of these populations in terms of the critical markers involved is an important step, as this can help to both better understand the biology of these populations and aid in designing simpler marker panels to identify them on simpler instruments and with fewer reagents (i.e., in resource poor or highly regulated clinical settings). However, current tools to design panels based on the biological characteristics of the target cell populations work exclusively based on technical parameters (e.g., instrument configurations, spectral overlap, and reagent availability). To address this shortcoming, we developed RchyOptimyx (cellular hieraRCHY OPTIMization), a computational tool that constructs cellular hierarchies by combining automated gating with dynamic programming and graph theory to provide the best gating strategies to identify a target population to a desired level of purity or correlation with a clinical outcome, using the simplest possible marker panels. RchyOptimyx can assess and graphically present the trade-offs between marker choice and population specificity in high-dimensional flow or mass cytometry datasets. We present three proof-of-concept use cases for RchyOptimyx that involve 1) designing a panel of surface markers for identification of rare populations that are primarily characterized using their intracellular signature; 2) simplifying the gating strategy for identification of a target cell population; 3) identification of a non-redundant marker set to identify a target cell population.


Subject(s)
Bone Marrow Cells/cytology , Flow Cytometry/methods , Software , Algorithms , Antigens, CD/analysis , Antigens, CD/immunology , Biomarkers/analysis , Bone Marrow Cells/immunology , Computational Biology/methods , HIV Infections/immunology , Humans , Immunophenotyping/methods , Interleukin-7/immunology , Lipopolysaccharides/immunology , Phenotype , Staining and Labeling , T-Lymphocytes/cytology , T-Lymphocytes/immunology
9.
Bioinformatics ; 28(7): 1009-16, 2012 Apr 01.
Article in English | MEDLINE | ID: mdl-22383736

ABSTRACT

MOTIVATION: Polychromatic flow cytometry (PFC), has enormous power as a tool to dissect complex immune responses (such as those observed in HIV disease) at a single cell level. However, analysis tools are severely lacking. Although high-throughput systems allow rapid data collection from large cohorts, manual data analysis can take months. Moreover, identification of cell populations can be subjective and analysts rarely examine the entirety of the multidimensional dataset (focusing instead on a limited number of subsets, the biology of which has usually already been well-described). Thus, the value of PFC as a discovery tool is largely wasted. RESULTS: To address this problem, we developed a computational approach that automatically reveals all possible cell subsets. From tens of thousands of subsets, those that correlate strongly with clinical outcome are selected and grouped. Within each group, markers that have minimal relevance to the biological outcome are removed, thereby distilling the complex dataset into the simplest, most clinically relevant subsets. This allows complex information from PFC studies to be translated into clinical or resource-poor settings, where multiparametric analysis is less feasible. We demonstrate the utility of this approach in a large (n=466), retrospective, 14-parameter PFC study of early HIV infection, where we identify three T-cell subsets that strongly predict progression to AIDS (only one of which was identified by an initial manual analysis). AVAILABILITY: The 'flowType: Phenotyping Multivariate PFC Assays' package is available through Bioconductor. Additional documentation and examples are available at: www.terryfoxlab.ca/flowsite/flowType/ SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. CONTACT: rbrinkman@bccrc.ca.


Subject(s)
Computational Biology/methods , Flow Cytometry , HIV Infections/immunology , T-Lymphocyte Subsets/immunology , Biomarkers/analysis , Humans , Immunophenotyping/methods , Predictive Value of Tests , Proportional Hazards Models , Retrospective Studies , T-Lymphocyte Subsets/cytology
10.
BMC Bioinformatics ; 13: 22, 2012 Feb 01.
Article in English | MEDLINE | ID: mdl-22296803

ABSTRACT

BACKGROUND: RNA molecules play critical roles in the cells of organisms, including roles in gene regulation, catalysis, and synthesis of proteins. Since RNA function depends in large part on its folded structures, much effort has been invested in developing accurate methods for prediction of RNA secondary structure from the base sequence. Minimum free energy (MFE) predictions are widely used, based on nearest neighbor thermodynamic parameters of Mathews, Turner et al. or those of Andronescu et al. Some recently proposed alternatives that leverage partition function calculations find the structure with maximum expected accuracy (MEA) or pseudo-expected accuracy (pseudo-MEA) methods. Advances in prediction methods are typically benchmarked using sensitivity, positive predictive value and their harmonic mean, namely F-measure, on datasets of known reference structures. Since such benchmarks document progress in improving accuracy of computational prediction methods, it is important to understand how measures of accuracy vary as a function of the reference datasets and whether advances in algorithms or thermodynamic parameters yield statistically significant improvements. Our work advances such understanding for the MFE and (pseudo-)MEA-based methods, with respect to the latest datasets and energy parameters. RESULTS: We present three main findings. First, using the bootstrap percentile method, we show that the average F-measure accuracy of the MFE and (pseudo-)MEA-based algorithms, as measured on our largest datasets with over 2000 RNAs from diverse families, is a reliable estimate (within a 2% range with high confidence) of the accuracy of a population of RNA molecules represented by this set. However, average accuracy on smaller classes of RNAs such as a class of 89 Group I introns used previously in benchmarking algorithm accuracy is not reliable enough to draw meaningful conclusions about the relative merits of the MFE and MEA-based algorithms. Second, on our large datasets, the algorithm with best overall accuracy is a pseudo MEA-based algorithm of Hamada et al. that uses a generalized centroid estimator of base pairs. However, between MFE and other MEA-based methods, there is no clear winner in the sense that the relative accuracy of the MFE versus MEA-based algorithms changes depending on the underlying energy parameters. Third, of the four parameter sets we considered, the best accuracy for the MFE-, MEA-based, and pseudo-MEA-based methods is 0.686, 0.680, and 0.711, respectively (on a scale from 0 to 1 with 1 meaning perfect structure predictions) and is obtained with a thermodynamic parameter set obtained by Andronescu et al. called BL* (named after the Boltzmann likelihood method by which the parameters were derived). CONCLUSIONS: Large datasets should be used to obtain reliable measures of the accuracy of RNA structure prediction algorithms, and average accuracies on specific classes (such as Group I introns and Transfer RNAs) should be interpreted with caution, considering the relatively small size of currently available datasets for such classes. The accuracy of the MEA-based methods is significantly higher when using the BL* parameter set of Andronescu et al. than when using the parameters of Mathews and Turner, and there is no significant difference between the accuracy of MEA-based methods and MFE when using the BL* parameters. The pseudo-MEA-based method of Hamada et al. with the BL* parameter set significantly outperforms all other MFE and MEA-based algorithms on our large data sets.


Subject(s)
Algorithms , RNA/chemistry , Nucleic Acid Conformation , Predictive Value of Tests , Ribonuclease P/chemistry , Thermodynamics
11.
Cytometry A ; 79(1): 6-13, 2011 Jan.
Article in English | MEDLINE | ID: mdl-21182178

ABSTRACT

We have developed flowMeans, a time-efficient and accurate method for automated identification of cell populations in flow cytometry (FCM) data based on K-means clustering. Unlike traditional K-means, flowMeans can identify concave cell populations by modelling a single population with multiple clusters. flowMeans uses a change point detection algorithm to determine the number of sub-populations, enabling the method to be used in high throughput FCM data analysis pipelines. Our approach compares favorably to manual analysis by human experts and current state-of-the-art automated gating algorithms. flowMeans is freely available as an open source R package through Bioconductor.


Subject(s)
Flow Cytometry/methods , Flow Cytometry/statistics & numerical data , Algorithms , Automation , Cluster Analysis , Graft vs Host Disease/blood , Humans , Lymphoma, Large B-Cell, Diffuse/pathology , Models, Statistical
12.
RNA ; 16(12): 2304-18, 2010 Dec.
Article in English | MEDLINE | ID: mdl-20940338

ABSTRACT

Methods for efficient and accurate prediction of RNA structure are increasingly valuable, given the current rapid advances in understanding the diverse functions of RNA molecules in the cell. To enhance the accuracy of secondary structure predictions, we developed and refined optimization techniques for the estimation of energy parameters. We build on two previous approaches to RNA free-energy parameter estimation: (1) the Constraint Generation (CG) method, which iteratively generates constraints that enforce known structures to have energies lower than other structures for the same molecule; and (2) the Boltzmann Likelihood (BL) method, which infers a set of RNA free-energy parameters that maximize the conditional likelihood of a set of reference RNA structures. Here, we extend these approaches in two main ways: We propose (1) a max-margin extension of CG, and (2) a novel linear Gaussian Bayesian network that models feature relationships, which effectively makes use of sparse data by sharing statistical strength between parameters. We obtain significant improvements in the accuracy of RNA minimum free-energy pseudoknot-free secondary structure prediction when measured on a comprehensive set of 2518 RNA molecules with reference structures. Our parameters can be used in conjunction with software that predicts RNA secondary structures, RNA hybridization, or ensembles of structures. Our data, software, results, and parameter sets in various formats are freely available at http://www.cs.ubc.ca/labs/beta/Projects/RNA-Params.


Subject(s)
Computational Biology/methods , Energy Metabolism/physiology , RNA/chemistry , RNA/metabolism , Statistics as Topic/methods , Algorithms , Animals , Base Composition , Base Sequence , Computational Biology/statistics & numerical data , Humans , Models, Theoretical , Molecular Sequence Data , Nucleic Acid Conformation , Reproducibility of Results , Sensitivity and Specificity , Sequence Analysis, RNA
13.
BMC Bioinformatics ; 9: 340, 2008 Aug 13.
Article in English | MEDLINE | ID: mdl-18700982

ABSTRACT

BACKGROUND: The ability to access, search and analyse secondary structures of a large set of known RNA molecules is very important for deriving improved RNA energy models, for evaluating computational predictions of RNA secondary structures and for a better understanding of RNA folding. Currently there is no database that can easily provide these capabilities for almost all RNA molecules with known secondary structures. RESULTS: In this paper we describe RNA STRAND - the RNA secondary STRucture and statistical ANalysis Database, a curated database containing known secondary structures of any type and organism. Our new database provides a wide collection of known RNA secondary structures drawn from public databases, searchable and downloadable in a common format. Comprehensive statistical information on the secondary structures in our database is provided using the RNA Secondary Structure Analyser, a new tool we have developed to analyse RNA secondary structures. The information thus obtained is valuable for understanding to which extent and with which probability certain structural motifs can appear. We outline several ways in which the data provided in RNA STRAND can facilitate research on RNA structure, including the improvement of RNA energy models and evaluation of secondary structure prediction programs. In order to keep up-to-date with new RNA secondary structure experiments, we offer the necessary tools to add solved RNA secondary structures to our database and invite researchers to contribute to RNA STRAND. CONCLUSION: RNA STRAND is a carefully assembled database of trusted RNA secondary structures, with easy on-line tools for searching, analyzing and downloading user selected entries, and is publicly available at http://www.rnasoft.ca/strand.


Subject(s)
Database Management Systems , Databases, Genetic , Models, Chemical , Models, Molecular , RNA/chemistry , RNA/ultrastructure , User-Computer Interface , Computer Graphics , Computer Simulation , Information Storage and Retrieval/methods , Nucleic Acid Conformation
14.
BMC Genomics ; 9: 355, 2008 Jul 29.
Article in English | MEDLINE | ID: mdl-18664289

ABSTRACT

BACKGROUND: Secondary structure interactions within introns have been shown to be essential for efficient splicing of several yeast genes. The nature of these base-pairing interactions and their effect on splicing efficiency were most extensively studied in ribosomal protein gene RPS17B (previously known as RP51B). It was determined that complementary pairing between two sequence segments located downstream of the 5' splice site and upstream of the branchpoint sequence promotes efficient splicing of the RPS17B pre-mRNA, presumably by shortening the branchpoint distance. However, no attempts were made to compute a shortened, 'structural' branchpoint distance and thus the functional relationship between this distance and the splicing efficiency remains unknown. RESULTS: In this paper we use computational RNA secondary structure prediction to analyze the secondary structure of the RPS17B intron. We show that it is necessary to consider suboptimal structure predictions and to compute the structural branchpoint distances in order to explain previously published splicing efficiency results. Our study reveals that there is a tight correlation between this distance and splicing efficiency levels of intron mutants described in the literature. We experimentally test this correlation on additional RPS17B mutants and intron mutants within two other yeast genes. CONCLUSION: The proposed model of secondary structure requirements for efficient splicing is the first attempt to specify the functional relationship between pre-mRNA secondary structure and splicing. Our findings provide further insights into the role of pre-mRNA secondary structure in gene splicing in yeast and also offer basis for improvement of computational methods for splice site identification and gene-finding.


Subject(s)
Introns , RNA Precursors/genetics , RNA Splicing , RNA, Messenger/genetics , Saccharomyces cerevisiae/genetics , Algorithms , Base Pairing , Computational Biology , Genes, Fungal , Genome, Fungal , Mutation , Nucleic Acid Conformation , RNA, Fungal/genetics , Ribosomal Proteins/genetics , Saccharomyces cerevisiae Proteins/genetics
15.
BMC Bioinformatics ; 8: 342, 2007 Sep 17.
Article in English | MEDLINE | ID: mdl-17875212

ABSTRACT

BACKGROUND: The ab initio protein folding problem consists of predicting protein tertiary structure from a given amino acid sequence by minimizing an energy function; it is one of the most important and challenging problems in biochemistry, molecular biology and biophysics. The ab initio protein folding problem is computationally challenging and has been shown to be NuRho -hard even when conformations are restricted to a lattice. In this work, we implement and evaluate the replica exchange Monte Carlo (REMC) method, which has already been applied very successfully to more complex protein models and other optimization problems with complex energy landscapes, in combination with the highly effective pull move neighbourhood in two widely studied Hydrophobic Polar (HP) lattice models. RESULTS: We demonstrate that REMC is highly effective for solving instances of the square (2D)and cubic (3D) HP protein folding problem. When using the pull move neighbourhood, REMCoutperforms current state-of-the-art algorithms for most benchmark instances. Additionally, we show that this new algorithm provides a larger ensemble of ground-state structures than the existing state-of-the-art methods. Furthermore, it scales well with sequence length, and it finds significantly better conformations on long biological sequences and sequences with a provably unique ground-state structure, which is believed to be a characteristic of real proteins. We also present evidence that our REMC algorithm can fold sequences which exhibit significant interaction between termini in the hydrophobic core relatively easily. CONCLUSION: We demonstrate that REMC utilizing the pull move neighbourhood significantly outperforms current state-of-the-art methods for protein structure prediction in the HP model on 2D and 3D lattices. This is particularly noteworthy, since so far, the state-of-the-art methods for2D and 3D HP protein folding - in particular, the pruned-enriched Rosenbluth method (PERM) and,to some extent, Ant Colony Optimisation (ACO) - were based on chain growth mechanisms. To the best of our knowledge, this is the first application of REMC to HP protein folding on the cubic lattice, and the first extension of the pull move neighbourhood to a 3D lattice.


Subject(s)
Models, Chemical , Models, Molecular , Monte Carlo Method , Sequence Analysis, Protein/methods , Amino Acid Sequence , Computer Simulation , Models, Statistical , Molecular Sequence Data , Protein Conformation , Protein Folding
16.
Bioinformatics ; 23(13): i19-28, 2007 Jul 01.
Article in English | MEDLINE | ID: mdl-17646296

ABSTRACT

MOTIVATION: Accurate prediction of RNA secondary structure from the base sequence is an unsolved computational challenge. The accuracy of predictions made by free energy minimization is limited by the quality of the energy parameters in the underlying free energy model. The most widely used model, the Turner99 model, has hundreds of parameters, and so a robust parameter estimation scheme should efficiently handle large data sets with thousands of structures. Moreover, the estimation scheme should also be trained using available experimental free energy data in addition to structural data. RESULTS: In this work, we present constraint generation (CG), the first computational approach to RNA free energy parameter estimation that can be efficiently trained on large sets of structural as well as thermodynamic data. Our CG approach employs a novel iterative scheme, whereby the energy values are first computed as the solution to a constrained optimization problem. Then the newly computed energy parameters are used to update the constraints on the optimization function, so as to better optimize the energy parameters in the next iteration. Using our method on biologically sound data, we obtain revised parameters for the Turner99 energy model. We show that by using our new parameters, we obtain significant improvements in prediction accuracy over current state of-the-art methods. AVAILABILITY: Our CG implementation is available at http://www.rnasoft.ca/CG/.


Subject(s)
Algorithms , Models, Chemical , Models, Molecular , RNA/chemistry , RNA/ultrastructure , Sequence Analysis, RNA/methods , Base Sequence , Computer Simulation , Molecular Sequence Data , Nucleic Acid Conformation
17.
BMC Bioinformatics ; 8: 136, 2007 Apr 24.
Article in English | MEDLINE | ID: mdl-17451609

ABSTRACT

BACKGROUND: The problem of protein structure prediction consists of predicting the functional or native structure of a protein given its linear sequence of amino acids. This problem has played a prominent role in the fields of biomolecular physics and algorithm design for over 50 years. Additionally, its importance increases continually as a result of an exponential growth over time in the number of known protein sequences in contrast to a linear increase in the number of determined structures. Our work focuses on the problem of searching an exponentially large space of possible conformations as efficiently as possible, with the goal of finding a global optimum with respect to a given energy function. This problem plays an important role in the analysis of systems with complex search landscapes, and particularly in the context of ab initio protein structure prediction. RESULTS: In this work, we introduce a novel approach for solving this conformation search problem based on the use of a bin framework for adaptively storing and retrieving promising locally optimal solutions. Our approach provides a rich and general framework within which a broad range of adaptive or reactive search strategies can be realized. Here, we introduce adaptive mechanisms for choosing which conformations should be stored, based on the set of conformations already stored in memory, and for biasing choices when retrieving conformations from memory in order to overcome search stagnation. CONCLUSION: We show that our bin framework combined with a widely used optimization method, Monte Carlo search, achieves significantly better performance than state-of-the-art generalized ensemble methods for a well-known protein-like homopolymer model on the face-centered cubic lattice.


Subject(s)
Databases, Protein , Models, Chemical , Models, Molecular , Proteins/chemistry , Proteins/ultrastructure , Sequence Alignment/methods , Sequence Analysis, Protein/methods , Amino Acid Sequence , Computer Simulation , Information Storage and Retrieval/methods , Molecular Sequence Data , Protein Conformation
18.
BMC Bioinformatics ; 8: 34, 2007 Jan 31.
Article in English | MEDLINE | ID: mdl-17266771

ABSTRACT

BACKGROUND: We investigate the empirical complexity of the RNA secondary structure design problem, that is, the scaling of the typical difficulty of the design task for various classes of RNA structures as the size of the target structure is increased. The purpose of this work is to understand better the factors that make RNA structures hard to design for existing, high-performance algorithms. Such understanding provides the basis for improving the performance of one of the best algorithms for this problem, RNA-SSD, and for characterising its limitations. RESULTS: To gain insights into the practical complexity of the problem, we present a scaling analysis on random and biologically motivated structures using an improved version of the RNA-SSD algorithm, and also the RNAinverse algorithm from the Vienna package. Since primary structure constraints are relevant for designing RNA structures, we also investigate the correlation between the number and the location of the primary structure constraints when designing structures and the performance of the RNA-SSD algorithm. The scaling analysis on random and biologically motivated structures supports the hypothesis that the running time of both algorithms scales polynomially with the size of the structure. We also found that the algorithms are in general faster when constraints are placed only on paired bases in the structure. Furthermore, we prove that, according to the standard thermodynamic model, for some structures that the RNA-SSD algorithm was unable to design, there exists no sequence whose minimum free energy structure is the target structure. CONCLUSION: Our analysis helps to better understand the strengths and limitations of both the RNA-SSD and RNAinverse algorithms, and suggests ways in which the performance of these algorithms can be further improved.


Subject(s)
Algorithms , Models, Chemical , Models, Molecular , RNA/chemistry , RNA/ultrastructure , Sequence Analysis, RNA/methods , Base Sequence , Computer Simulation , Molecular Sequence Data , Nucleic Acid Conformation
19.
Nucleic Acids Res ; 33(15): 4965-77, 2005.
Article in English | MEDLINE | ID: mdl-16284197

ABSTRACT

An algorithm is presented for the generation of sets of non-interacting DNA sequences, employing existing thermodynamic models for the prediction of duplex stabilities and secondary structures. A DNA 'word' structure is employed in which individual DNA 'words' of a given length (e.g. 12mer and 16mer) may be concatenated into longer sequences (e.g. four tandem words and six tandem words). This approach, where multiple word variants are used at each tandem word position, allows very large sets of non-interacting DNA strands to be assembled from combinations of the individual words. Word sets were generated and their figures of merit are compared to sets as described previously in the literature (e.g. 4, 8, 12, 15 and 16mer). The predicted hybridization behavior was experimentally verified on selected members of the sets using standard UV hyperchromism measurements of duplex melting temperatures (T(m)s). Additional experimental validation was obtained by using the sequences in formulating and solving a small example of a DNA computing problem.


Subject(s)
Algorithms , DNA/chemistry , Sequence Analysis, DNA/methods , Thermodynamics , Base Sequence , Computational Biology/methods , Cytosine/chemistry , Guanine/chemistry , Nucleic Acid Conformation , Nucleic Acid Denaturation , Nucleic Acid Heteroduplexes/chemistry , Nucleic Acid Hybridization , Temperature
20.
RNA ; 11(10): 1494-504, 2005 Oct.
Article in English | MEDLINE | ID: mdl-16199760

ABSTRACT

We present HotKnots, a new heuristic algorithm for the prediction of RNA secondary structures including pseudoknots. Based on the simple idea of iteratively forming stable stems, our algorithm explores many alternative secondary structures, using a free energy minimization algorithm for pseudoknot free secondary structures to identify promising candidate stems. In an empirical evaluation of the algorithm with 43 sequences taken from the Pseudobase database and from the literature on pseudoknotted structures, we found that overall, in terms of the sensitivity and specificity of predictions, HotKnots outperforms the well-known Pseudoknots algorithm of Rivas and Eddy and the NUPACK algorithm of Dirks and Pierce, both based on dynamic programming approaches for limited classes of pseudoknotted structures. It also outperforms the heuristic Iterated Loop Matching algorithm of Ruan and colleagues, and in many cases gives better results than the genetic algorithm from the STAR package of van Batenburg and colleagues and the recent pknotsRG-mfe algorithm of Reeder and Giegerich. The HotKnots algorithm has been implemented in C/C++ and is available from http://www.cs.ubc.ca/labs/beta/Software/HotKnots.


Subject(s)
Algorithms , Nucleic Acid Conformation , RNA/chemistry , Base Pairing , Base Sequence , Computational Biology , Molecular Sequence Data , Predictive Value of Tests , Sensitivity and Specificity , Sequence Analysis, RNA , Sequence Homology, Nucleic Acid , Software
SELECTION OF CITATIONS
SEARCH DETAIL
...