Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 16 de 16
Filter
Add more filters










Publication year range
1.
Artif Life ; 26(1): 23-37, 2020.
Article in English | MEDLINE | ID: mdl-32027528

ABSTRACT

Susceptibility to common human diseases such as cancer is influenced by many genetic and environmental factors that work together in a complex manner. The state of the art is to perform a genome-wide association study (GWAS) that measures millions of single-nucleotide polymorphisms (SNPs) throughout the genome followed by a one-SNP-at-a-time statistical analysis to detect univariate associations. This approach has identified thousands of genetic risk factors for hundreds of diseases. However, the genetic risk factors detected have very small effect sizes and collectively explain very little of the overall heritability of the disease. Nonetheless, it is assumed that the genetic component of risk is due to many independent risk factors that contribute additively. The fact that many genetic risk factors with small effects can be detected is taken as evidence to support this notion. It is our working hypothesis that the genetic architecture of common diseases is partly driven by non-additive interactions. To test this hypothesis, we developed a heuristic simulation-based method for conducting experiments about the complexity of genetic architecture. We show that a genetic architecture driven by complex interactions is highly consistent with the magnitude and distribution of univariate effects seen in real data. We compare our results with measures of univariate and interaction effects from two large-scale GWASs of sporadic breast cancer and find evidence to support our hypothesis that is consistent with the results of our computational experiment.


Subject(s)
Computational Biology , Disease/genetics , Genome-Wide Association Study , Polymorphism, Single Nucleotide , Computer Simulation , Humans
2.
Stud Health Technol Inform ; 264: 684-688, 2019 Aug 21.
Article in English | MEDLINE | ID: mdl-31438011

ABSTRACT

Falls are the leading cause of injuries among older adults, particularly in the more vulnerable home health care (HHC) population. Existing standardized fall risk assessments often require supplemental data collection and tend to have low specificity. We applied a random forest algorithm on readily available HHC data from the mandated Outcomes and Assessment Information Set (OASIS) with over 100 items from 59,006 HHC patients to identify factors that predict and quantify fall risks. Our ultimate goal is to build clinical decision support for fall prevention. Our model achieves higher precision and balanced accuracy than the commonly used multifactorial Missouri Alliance for Home Care fall risk assessment. This is the first known attempt to determine fall risk factors from the extensive OASIS data from a large sample. Our quantitative prediction of fall risks can aid clinical discussions of risk factors and prevention strategies for lowering fall incidence.


Subject(s)
Accidental Falls , Home Care Services , Machine Learning , Humans , Missouri , Risk Assessment , Risk Factors
3.
Genet Program Evolvable Mach ; 20(1): 127-137, 2019 Mar.
Article in English | MEDLINE | ID: mdl-31105467

ABSTRACT

The process of developing new test statistics is laborious, requiring the manual development and evaluation of mathematical functions that satisfy several theoretical properties. Automating this process, hitherto not done, would greatly accelerate the discovery of much-needed, new test statistics. This automation is a challenging problem because it requires the discovery method to know something about the desirable properties of a good test statistic in addition to having an engine that can develop and explore candidate mathematical solutions with an intuitive representation. In this paper we describe a genetic programming-based system for the automated discovery of new test statistics. Specifically, our system was able to discover test statistics as powerful as the t-test for comparing sample means from two distributions with equal variances.

4.
J Biomed Inform ; 85: 168-188, 2018 09.
Article in English | MEDLINE | ID: mdl-30030120

ABSTRACT

Modern biomedical data mining requires feature selection methods that can (1) be applied to large scale feature spaces (e.g. 'omics' data), (2) function in noisy problems, (3) detect complex patterns of association (e.g. gene-gene interactions), (4) be flexibly adapted to various problem domains and data types (e.g. genetic variants, gene expression, and clinical data) and (5) are computationally tractable. To that end, this work examines a set of filter-style feature selection algorithms inspired by the 'Relief' algorithm, i.e. Relief-Based algorithms (RBAs). We implement and expand these RBAs in an open source framework called ReBATE (Relief-Based Algorithm Training Environment). We apply a comprehensive genetic simulation study comparing existing RBAs, a proposed RBA called MultiSURF, and other established feature selection methods, over a variety of problems. The results of this study (1) support the assertion that RBAs are particularly flexible, efficient, and powerful feature selection methods that differentiate relevant features having univariate, multivariate, epistatic, or heterogeneous associations, (2) confirm the efficacy of expansions for classification vs. regression, discrete vs. continuous features, missing data, multiple classes, or class imbalance, (3) identify previously unknown limitations of specific RBAs, and (4) suggest that while MultiSURF∗ performs best for explicitly identifying pure 2-way interactions, MultiSURF yields the most reliable feature selection performance across a wide range of problem types.


Subject(s)
Computational Biology/methods , Data Mining/methods , Algorithms , Benchmarking , Computational Biology/standards , Computer Simulation , Data Mining/standards , Databases, Genetic , Epistasis, Genetic , Humans
5.
J Biomed Inform ; 85: 189-203, 2018 09.
Article in English | MEDLINE | ID: mdl-30031057

ABSTRACT

Feature selection plays a critical role in biomedical data mining, driven by increasing feature dimensionality in target problems and growing interest in advanced but computationally expensive methodologies able to model complex associations. Specifically, there is a need for feature selection methods that are computationally efficient, yet sensitive to complex patterns of association, e.g. interactions, so that informative features are not mistakenly eliminated prior to downstream modeling. This paper focuses on Relief-based algorithms (RBAs), a unique family of filter-style feature selection algorithms that have gained appeal by striking an effective balance between these objectives while flexibly adapting to various data characteristics, e.g. classification vs. regression. First, this work broadly examines types of feature selection and defines RBAs within that context. Next, we introduce the original Relief algorithm and associated concepts, emphasizing the intuition behind how it works, how feature weights generated by the algorithm can be interpreted, and why it is sensitive to feature interactions without evaluating combinations of features. Lastly, we include an expansive review of RBA methodological research beyond Relief and its popular descendant, ReliefF. In particular, we characterize branches of RBA research, and provide comparative summaries of RBA algorithms including contributions, strategies, functionality, time complexity, adaptation to key data characteristics, and software availability.


Subject(s)
Algorithms , Computational Biology/methods , Data Mining/methods , Humans , Models, Statistical , Regression Analysis , Software
6.
Pac Symp Biocomput ; 23: 192-203, 2018.
Article in English | MEDLINE | ID: mdl-29218881

ABSTRACT

As the bioinformatics field grows, it must keep pace not only with new data but with new algorithms. Here we contribute a thorough analysis of 13 state-of-the-art, commonly used machine learning algorithms on a set of 165 publicly available classification problems in order to provide data-driven algorithm recommendations to current researchers. We present a number of statistical and visual comparisons of algorithm performance and quantify the effect of model selection and algorithm tuning for each algorithm and dataset. The analysis culminates in the recommendation of five algorithms with hyperparameters that maximize classifier performance across the tested problems, as well as general guidelines for applying machine learning to supervised classification problems.


Subject(s)
Algorithms , Computational Biology/methods , Supervised Machine Learning/statistics & numerical data , Databases, Factual/statistics & numerical data , Humans , Linear Models , Nonlinear Dynamics
7.
Pac Symp Biocomput ; 23: 259-267, 2018.
Article in English | MEDLINE | ID: mdl-29218887

ABSTRACT

A central challenge of developing and evaluating artificial intelligence and machine learning methods for regression and classification is access to data that illuminates the strengths and weaknesses of different methods. Open data plays an important role in this process by making it easy for computational researchers to easily access real data for this purpose. Genomics has in some examples taken a leading role in the open data effort starting with DNA microarrays. While real data from experimental and observational studies is necessary for developing computational methods it is not sufficient. This is because it is not possible to know what the ground truth is in real data. This must be accompanied by simulated data where that balance between signal and noise is known and can be directly evaluated. Unfortunately, there is a lack of methods and software for simulating data with the kind of complexity found in real biological and biomedical systems. We present here the Heuristic Identification of Biological Architectures for simulating Complex Hierarchical Interactions (HIBACHI) method and prototype software for simulating complex biological and biomedical data. Further, we introduce new methods for developing simulation models that generate data that specifically allows discrimination between different machine learning methods.


Subject(s)
Machine Learning , Algorithms , Artificial Intelligence , Computational Biology/methods , Computer Simulation , Databases, Genetic/statistics & numerical data , Genetic Association Studies/statistics & numerical data , Heuristics , Humans , Machine Learning/statistics & numerical data , Models, Genetic , Software , Stochastic Processes , Systems Biology
8.
Pac Symp Biocomput ; 23: 460-471, 2018.
Article in English | MEDLINE | ID: mdl-29218905

ABSTRACT

With the maturation of metabolomics science and proliferation of biobanks, clinical metabolic profiling is an increasingly opportunistic frontier for advancing translational clinical research. Automated Machine Learning (AutoML) approaches provide exciting opportunity to guide feature selection in agnostic metabolic profiling endeavors, where potentially thousands of independent data points must be evaluated. In previous research, AutoML using high-dimensional data of varying types has been demonstrably robust, outperforming traditional approaches. However, considerations for application in clinical metabolic profiling remain to be evaluated. Particularly, regarding the robustness of AutoML to identify and adjust for common clinical confounders. In this study, we present a focused case study regarding AutoML considerations for using the Tree-Based Optimization Tool (TPOT) in metabolic profiling of exposure to metformin in a biobank cohort. First, we propose a tandem rank-accuracy measure to guide agnostic feature selection and corresponding threshold determination in clinical metabolic profiling endeavors. Second, while AutoML, using default parameters, demonstrated potential to lack sensitivity to low-effect confounding clinical covariates, we demonstrated residual training and adjustment of metabolite features as an easily applicable approach to ensure AutoML adjustment for potential confounding characteristics. Finally, we present increased homocysteine with long-term exposure to metformin as a potentially novel, non-replicated metabolite association suggested by TPOT; an association not identified in parallel clinical metabolic profiling endeavors. While warranting independent replication, our tandem rank-accuracy measure suggests homocysteine to be the metabolite feature with largest effect, and corresponding priority for further translational clinical research. Residual training and adjustment for a potential confounding effect by BMI only slightly modified the suggested association. Increased homocysteine is thought to be associated with vitamin B12 deficiency - evaluation for potential clinical relevance is suggested. While considerations for clinical metabolic profiling are recommended, including adjustment approaches for clinical confounders, AutoML presents an exciting tool to enhance clinical metabolic profiling and advance translational research endeavors.


Subject(s)
Homocysteine/blood , Hypoglycemic Agents/adverse effects , Metabolome , Metformin/adverse effects , Supervised Machine Learning/statistics & numerical data , Bias , Body Mass Index , Case-Control Studies , Computational Biology/methods , Diabetes Mellitus, Type 2/blood , Diabetes Mellitus, Type 2/drug therapy , Humans , Metabolomics/statistics & numerical data , Risk Factors , Translational Research, Biomedical
9.
Sci Rep ; 7(1): 17141, 2017 12 07.
Article in English | MEDLINE | ID: mdl-29215023

ABSTRACT

Physiological function, disease expression and drug effects vary by time-of-day. Clock disruption in mice results in cardio-metabolic, immunological and neurological dysfunction; circadian misalignment using forced desynchrony increases cardiovascular risk factors in humans. Here we integrated data from remote sensors, physiological and multi-omics analyses to assess the feasibility of detecting time dependent signals - the chronobiome - despite the "noise" attributable to the behavioral differences of free-living human volunteers. The majority (62%) of sensor readouts showed time-specific variability including the expected variation in blood pressure, heart rate, and cortisol. While variance in the multi-omics is dominated by inter-individual differences, temporal patterns are evident in the metabolome (5.4% in plasma, 5.6% in saliva) and in several genera of the oral microbiome. This demonstrates, despite a small sample size and limited sampling, the feasibility of characterizing at scale the human chronobiome "in the wild". Such reference data at scale are a prerequisite to detect and mechanistically interpret discordant data derived from patients with temporal patterns of disease expression, to develop time-specific therapeutic strategies and to refine existing treatments.


Subject(s)
Circadian Rhythm , Metabolome , Microbiota , Proteome , Transcriptome , Adult , Blood Pressure , Blood Proteins/metabolism , Heart Rate , Humans , Hydrocortisone/metabolism , Male , Mouth/metabolism , Pilot Projects , Saliva/metabolism , Time Factors
10.
BioData Min ; 10: 36, 2017.
Article in English | MEDLINE | ID: mdl-29238404

ABSTRACT

BACKGROUND: The selection, development, or comparison of machine learning methods in data mining can be a difficult task based on the target problem and goals of a particular study. Numerous publicly available real-world and simulated benchmark datasets have emerged from different sources, but their organization and adoption as standards have been inconsistent. As such, selecting and curating specific benchmarks remains an unnecessary burden on machine learning practitioners and data scientists. RESULTS: The present study introduces an accessible, curated, and developing public benchmark resource to facilitate identification of the strengths and weaknesses of different machine learning methodologies. We compare meta-features among the current set of benchmark datasets in this resource to characterize the diversity of available data. Finally, we apply a number of established machine learning methods to the entire benchmark suite and analyze how datasets and algorithms cluster in terms of performance. From this study, we find that existing benchmarks lack the diversity to properly benchmark machine learning algorithms, and there are several gaps in benchmarking problems that still need to be considered. CONCLUSIONS: This work represents another important step towards understanding the limitations of popular benchmarking suites and developing a resource that connects existing benchmarking standards to more diverse and efficient standards in the future.

12.
BioData Min ; 10: 19, 2017.
Article in English | MEDLINE | ID: mdl-28572842

ABSTRACT

BACKGROUND: Large-scale genetic studies of common human diseases have focused almost exclusively on the independent main effects of single-nucleotide polymorphisms (SNPs) on disease susceptibility. These studies have had some success, but much of the genetic architecture of common disease remains unexplained. Attention is now turning to detecting SNPs that impact disease susceptibility in the context of other genetic factors and environmental exposures. These context-dependent genetic effects can manifest themselves as non-additive interactions, which are more challenging to model using parametric statistical approaches. The dimensionality that results from a multitude of genotype combinations, which results from considering many SNPs simultaneously, renders these approaches underpowered. We previously developed the multifactor dimensionality reduction (MDR) approach as a nonparametric and genetic model-free machine learning alternative. Approaches such as MDR can improve the power to detect gene-gene interactions but are limited in their ability to exhaustively consider SNP combinations in genome-wide association studies (GWAS), due to the combinatorial explosion of the search space. We introduce here a stochastic search algorithm called Crush for the application of MDR to modeling high-order gene-gene interactions in genome-wide data. The Crush-MDR approach uses expert knowledge to guide probabilistic searches within a framework that capitalizes on the use of biological knowledge to filter gene sets prior to analysis. Here we evaluated the ability of Crush-MDR to detect hierarchical sets of interacting SNPs using a biology-based simulation strategy that assumes non-additive interactions within genes and additivity in genetic effects between sets of genes within a biochemical pathway. RESULTS: We show that Crush-MDR is able to identify genetic effects at the gene or pathway level significantly better than a baseline random search with the same number of model evaluations. We then applied the same methodology to a GWAS for Alzheimer's disease and showed base level validation that Crush-MDR was able to identify a set of interacting genes with biological ties to Alzheimer's disease. CONCLUSIONS: We discuss the role of stochastic search and cloud computing for detecting complex genetic effects in genome-wide data.

13.
Artif Life ; 22(3): 299-318, 2016.
Article in English | MEDLINE | ID: mdl-27139941

ABSTRACT

Animal grouping behaviors have been widely studied due to their implications for understanding social intelligence, collective cognition, and potential applications in engineering, artificial intelligence, and robotics. An important biological aspect of these studies is discerning which selection pressures favor the evolution of grouping behavior. In the past decade, researchers have begun using evolutionary computation to study the evolutionary effects of these selection pressures in predator-prey models. The selfish herd hypothesis states that concentrated groups arise because prey selfishly attempt to place their conspecifics between themselves and the predator, thus causing an endless cycle of movement toward the center of the group. Using an evolutionary model of a predator-prey system, we show that how predators attack is critical to the evolution of the selfish herd. Following this discovery, we show that density-dependent predation provides an abstraction of Hamilton's original formulation of domains of danger. Finally, we verify that density-dependent predation provides a sufficient selective advantage for prey to evolve the selfish herd in response to predation by coevolving predators. Thus, our work corroborates Hamilton's selfish herd hypothesis in a digital evolutionary model, refines the assumptions of the selfish herd hypothesis, and generalizes the domain of danger concept to density-dependent predation.


Subject(s)
Behavior, Animal , Models, Biological , Social Behavior , Animals , Biological Evolution , Movement , Predatory Behavior
14.
R Soc Open Sci ; 2(9): 150135, 2015 Sep.
Article in English | MEDLINE | ID: mdl-26473039

ABSTRACT

Even though grouping behaviour has been actively studied for over a century, the relative importance of the numerous proposed fitness benefits of grouping remain unclear. We use a digital model of evolving prey under simulated predation to directly explore the evolution of gregarious foraging behaviour according to one such benefit, the 'many eyes' hypothesis. According to this hypothesis, collective vigilance allows prey in large groups to detect predators more efficiently by making alarm signals or behavioural cues to each other, thereby allowing individuals within the group to spend more time foraging. Here, we find that collective vigilance is sufficient to select for gregarious foraging behaviour as long there is not a direct cost for grouping (e.g. competition for limited food resources), even when controlling for confounding factors such as the dilution effect. Furthermore, we explore the role of the genetic relatedness and reproductive strategy of the prey and find that highly related groups of prey with a semelparous reproductive strategy are the most likely to evolve gregarious foraging behaviour mediated by the benefit of vigilance. These findings, combined with earlier studies with evolving digital organisms, further sharpen our understanding of the factors favouring grouping behaviour.

15.
Sci Rep ; 5: 8242, 2015 Feb 04.
Article in English | MEDLINE | ID: mdl-25649757

ABSTRACT

Risk aversion is a common behavior universal to humans and animals alike. Economists have traditionally defined risk preferences by the curvature of the utility function. Psychologists and behavioral economists also make use of concepts such as loss aversion and probability weighting to model risk aversion. Neurophysiological evidence suggests that loss aversion has its origins in relatively ancient neural circuitries (e.g., ventral striatum). Could there thus be an evolutionary origin to risk aversion? We study this question by evolving strategies that adapt to play the equivalent mean payoff gamble. We hypothesize that risk aversion in this gamble is beneficial as an adaptation to living in small groups, and find that a preference for risk averse strategies only evolves in small populations of less than 1,000 individuals, or in populations segmented into groups of 150 individuals or fewer - numbers thought to be comparable to what humans encountered in the past. We observe that risk aversion only evolves when the gamble is a rare event that has a large impact on the individual's fitness. As such, we suggest that rare, high-risk, high-payoff events such as mating and mate competition could have driven the evolution of risk averse behavior in humans living in small groups.


Subject(s)
Adaptation, Physiological , Biological Evolution , Decision Making , Risk , Animals , Humans , Models, Theoretical
16.
J R Soc Interface ; 10(85): 20130305, 2013 Aug 06.
Article in English | MEDLINE | ID: mdl-23740485

ABSTRACT

Swarming behaviours in animals have been extensively studied owing to their implications for the evolution of cooperation, social cognition and predator-prey dynamics. An important goal of these studies is discerning which evolutionary pressures favour the formation of swarms. One hypothesis is that swarms arise because the presence of multiple moving prey in swarms causes confusion for attacking predators, but it remains unclear how important this selective force is. Using an evolutionary model of a predator-prey system, we show that predator confusion provides a sufficient selection pressure to evolve swarming behaviour in prey. Furthermore, we demonstrate that the evolutionary effect of predator confusion on prey could in turn exert pressure on the structure of the predator's visual field, favouring the frontally oriented, high-resolution visual systems commonly observed in predators that feed on swarming animals. Finally, we provide evidence that when prey evolve swarming in response to predator confusion, there is a change in the shape of the functional response curve describing the predator's consumption rate as prey density increases. Thus, we show that a relatively simple perceptual constraint--predator confusion--could have pervasive evolutionary effects on prey behaviour, predator sensory mechanisms and the ecological interactions between predators and prey.


Subject(s)
Behavior, Animal , Biological Evolution , Models, Biological , Predatory Behavior , Animals , Food Chain
SELECTION OF CITATIONS
SEARCH DETAIL
...