Search | VHL Regional Portal

Contemporary Symbolic Regression Methods and their Relative Performance.

La Cava, William; Burlacu, Bogdan; Virgolin, Marco; Kommenda, Michael; Orzechowski, Patryk; de França, Fabrício Olivetti; Jin, Ying; Moore, Jason H.

Adv Neural Inf Process Syst ; 2021(DB1): 1-16, 2021 Dec.

Article in English | MEDLINE | ID: mdl-38715933

ABSTRACT

Many promising approaches to symbolic regression have been presented in recent years, yet progress in the field continues to suffer from a lack of uniform, robust, and transparent benchmarking standards. We address this shortcoming by introducing an open-source, reproducible benchmarking platform for symbolic regression. We assess 14 symbolic regression methods and 7 machine learning methods on a set of 252 diverse regression problems. Our assessment includes both real-world datasets with no known model form as well as ground-truth benchmark problems. For the real-world datasets, we benchmark the ability of each method to learn models with low error and low complexity relative to state-of-the-art machine learning methods. For the synthetic problems, we assess each method's ability to find exact solutions in the presence of varying levels of noise. Under these controlled experiments, we conclude that the best performing methods for real-world regression combine genetic algorithms with parameter estimation and/or semantic search drivers. When tasked with recovering exact equations in the presence of noise, we find that several approaches perform similarly. We provide a detailed guide to reproducing this experiment and contributing new methods, and encourage other researchers to collaborate with us on a common and living symbolic regression benchmark.

Extended Regression Models for Predicting the Pumping Capability and Viscous Dissipation of Two-Dimensional Flows in Single-Screw Extrusion.

Roland, Wolfgang; Kommenda, Michael; Marschik, Christian; Miethlinger, Jürgen.

Polymers (Basel) ; 11(2)2019 Feb 14.

Article in English | MEDLINE | ID: mdl-30960318

ABSTRACT

Generally, numerical methods are required to model the non-Newtonian flow of polymer melts in single-screw extruders. Existing approximation equations for modeling the throughputâ»pressure relationship and viscous dissipation are limited in their scope of application, particularly when it comes to special screw designs. Maximum dimensionless throughputs of Π V < 2.0 , implying minimum dimensionless pressure gradients Π p , z ≥ - 0.5 for low power-law exponents are captured. We present analytical approximation models for predicting the pumping capability and viscous dissipation of metering channels for an extended range of influencing parameters ( Π p , z ≥ - 1.0 , and t / D b ≤ 2.4 ) required to model wave- and energy-transfer screws. We first rewrote the governing equations in dimensionless form, identifying three independent influencing parameters: (i) the dimensionless down-channel pressure gradient Π p , z , (ii) the power-law exponent n , and (iii) the screw-pitch ratio t / D b . We then carried out a parametric design study covering an extended range of the dimensionless influencing parameters. Based on this data set, we developed regression models for predicting the dimensionless throughput-pressure relationship and the viscous dissipation. Finally, the accuracy of all three models was proven using an independent data set for evaluation. We demonstrate that our approach provides excellent approximation. Our models allow fast, stable, and accurate prediction of both throughput-pressure behavior and viscous dissipation.

PRIMOS: an integrated database of reassessed protein-protein interactions providing web-based access to in silico validation of experimentally derived data.

Rid, Raphaela; Strasser, Wolfgang; Siegl, Doris; Frech, Christian; Kommenda, Michael; Kern, Thomas; Hintner, Helmut; Bauer, Johann W; Önder, Kamil.

Assay Drug Dev Technol ; 11(5): 333-46, 2013 Jun.

Article in English | MEDLINE | ID: mdl-23772554

ABSTRACT

Steady improvements in proteomics present a bioinformatic challenge to retrieve, store, and process the accumulating and often redundant amount of information. In particular, a large-scale comparison and analysis of protein-protein interaction (PPI) data requires tools for data interpretation as well as validation. At this juncture, the Protein Interaction and Molecule Search (PRIMOS) platform represents a novel web portal that unifies six primary PPI databases (BIND, Biomolecular Interaction Network Database; DIP, Database of Interacting Proteins; HPRD, Human Protein Reference Database; IntAct; MINT, Molecular Interaction Database; and MIPS, Munich Information Center for Protein Sequences) into a single consistent repository, which currently includes more than 196,700 redundancy-removed PPIs. PRIMOS supports three advanced search strategies centering on disease-relevant PPIs, on inter- and intra-organismal crosstalk relations (e.g., pathogen-host interactions), and on highly connected protein nodes analysis ("hub" identification). The main novelties distinguishing PRIMOS from other secondary PPI databases are the reassessment of known PPIs, and the capacity to validate personal experimental data by our peer-reviewed, homology-based validation. This article focuses on definite PRIMOS use cases (presentation of embedded biological concepts, example applications) to demonstrate its broad functionality and practical value. PRIMOS is publicly available at http://primos.fh-hagenberg.at.

Subject(s)

Database Management Systems , Databases, Protein , Information Storage and Retrieval/methods , Internet , Protein Interaction Mapping/methods , Proteome/chemistry , Proteome/metabolism , Systems Integration , User-Computer Interface

Improved homology-driven computational validation of protein-protein interactions motivated by the evolutionary gene duplication and divergence hypothesis.

Frech, Christian; Kommenda, Michael; Dorfer, Viktoria; Kern, Thomas; Hintner, Helmut; Bauer, Johann W; Onder, Kamil.

BMC Bioinformatics ; 10: 21, 2009 Jan 19.

Article in English | MEDLINE | ID: mdl-19152684

ABSTRACT

BACKGROUND: Protein-protein interaction (PPI) data sets generated by high-throughput experiments are contaminated by large numbers of erroneous PPIs. Therefore, computational methods for PPI validation are necessary to improve the quality of such data sets. Against the background of the theory that most extant PPIs arose as a consequence of gene duplication, the sensitive search for homologous PPIs, i.e. for PPIs descending from a common ancestral PPI, should be a successful strategy for PPI validation. RESULTS: To validate an experimentally observed PPI, we combine FASTA and PSI-BLAST to perform a sensitive sequence-based search for pairs of interacting homologous proteins within a large, integrated PPI database. A novel scoring scheme that incorporates both quality and quantity of all observed matches allows us (1) to consider also tentative paralogs and orthologs in this analysis and (2) to combine search results from more than one homology detection method. ROC curves illustrate the high efficacy of this approach and its improvement over other homology-based validation methods. CONCLUSION: New PPIs are primarily derived from preexisting PPIs and not invented de novo. Thus, the hallmark of true PPIs is the existence of homologous PPIs. The sensitive search for homologous PPIs within a large body of known PPIs is an efficient strategy to separate biologically relevant PPIs from the many spurious PPIs reported by high-throughput experiments.

Subject(s)

Evolution, Molecular , Gene Duplication , Genetic Variation , Protein Interaction Mapping/methods , Sequence Homology, Amino Acid , Computational Biology , Databases, Protein

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL