Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 8 de 8
Filter
Add more filters










Database
Language
Publication year range
1.
J Comput Biol ; 30(3): 323-336, 2023 03.
Article in English | MEDLINE | ID: mdl-36322888

ABSTRACT

Information theory-based measures of variable dependency (previously published) have been implemented into a software package, MIST. The design of the software and its potential uses are described, and a demonstration is presented in the discovery of modifier alleles of the ApoE gene in affecting Alzheimer's disease (AD) by analyzing the UK Biobank dataset. The modifier genes uncovered overlap strongly with genes found to be associated with AD. Others include many known to influence AD. We discuss a range of uses of the dependency calculations using MIST that can uncover additional genetic effects in similar complex datasets, like higher degrees of interaction and phenotypic pleiotropy.


Subject(s)
Alzheimer Disease , Humans , Alleles , Alzheimer Disease/genetics , Information Theory , Apolipoproteins E/genetics , Genotype
2.
Sci Data ; 9(1): 216, 2022 05 17.
Article in English | MEDLINE | ID: mdl-35581201

ABSTRACT

Baker's yeast (Saccharomyces cerevisiae) is a model organism for studying the morphology that emerges at the scale of multi-cell colonies. To look at how morphology develops, we collect a dataset of time-lapse photographs of the growth of different strains of S. cerevisiae. We discuss the general statistical challenges that arise when using time-lapse photographs to extract time-dependent features. In particular, we show how texture-based feature engineering and representative clustering can be successfully applied to categorize the development of yeast colony morphology using our dataset. The Local binary pattern (LBP) from image processing is used to score the surface texture of colonies. This texture score develops along a smooth trajectory during growth. The path taken depends on how the morphology emerges. A hierarchical clustering of the colonies is performed according to their texture development trajectories. The clustering method is designed for practical interpretability; it obtains the best representative colony image for any hierarchical cluster.


Subject(s)
Saccharomyces cerevisiae , Image Processing, Computer-Assisted , Time-Lapse Imaging
3.
BMC Bioinformatics ; 22(1): 180, 2021 Apr 07.
Article in English | MEDLINE | ID: mdl-33827420

ABSTRACT

BACKGROUND: Permutation testing is often considered the "gold standard" for multi-test significance analysis, as it is an exact test requiring few assumptions about the distribution being computed. However, it can be computationally very expensive, particularly in its naive form in which the full analysis pipeline is re-run after permuting the phenotype labels. This can become intractable in multi-locus genome-wide association studies (GWAS), in which the number of potential interactions to be tested is combinatorially large. RESULTS: In this paper, we develop an approach for permutation testing in multi-locus GWAS, specifically focusing on SNP-SNP-phenotype interactions using multivariable measures that can be computed from frequency count tables, such as those based in Information Theory. We find that the computational bottleneck in this process is the construction of the count tables themselves, and that this step can be eliminated at each iteration of the permutation testing by transforming the count tables directly. This leads to a speed-up by a factor of over 103 for a typical permutation test compared to the naive approach. Additionally, this approach is insensitive to the number of samples making it suitable for datasets with large number of samples. CONCLUSIONS: The proliferation of large-scale datasets with genotype data for hundreds of thousands of individuals enables new and more powerful approaches for the detection of multi-locus genotype-phenotype interactions. Our approach significantly improves the computational tractability of permutation testing for these studies. Moreover, our approach is insensitive to the large number of samples in these modern datasets. The code for performing these computations and replicating the figures in this paper is freely available at https://github.com/kunert/permute-counts .


Subject(s)
Epistasis, Genetic , Genome-Wide Association Study , Polymorphism, Single Nucleotide , Genotype , Humans , Phenotype
4.
J Comput Biol ; 28(6): 527-559, 2021 06.
Article in English | MEDLINE | ID: mdl-33395537

ABSTRACT

Quantitative genetics has evolved dramatically in the past century, and the proliferation of genetic data, in quantity as well as type, enables the characterization of complex interactions and mechanisms beyond the scope of its theoretical foundations. In this article, we argue that revisiting the framework for analysis is important and we begin to lay the foundations of an alternative formulation of quantitative genetics based on information theory. Information theory can provide sensitive and unbiased measures of statistical dependencies among variables, and it provides a natural mathematical language for an alternative view of quantitative genetics. In the previous work, we examined the information content of discrete functions and applied this approach and methods to the analysis of genetic data. In this article, we present a framework built around a set of relationships that both unifies the information measures for the discrete functions and uses them to express key quantitative genetic relationships. Information theory measures of variable interdependency are used to identify significant interactions, and a general approach is described for inferring functional relationships in genotype and phenotype data. We present information-based measures of the genetic quantities: penetrance, heritability, and degrees of statistical epistasis. Our scope here includes the consideration of both two- and three-variable dependencies and independently segregating variants, which captures additive effects, genetic interactions, and two-phenotype pleiotropy. This formalism and the theoretical approach naturally apply to higher multivariable interactions and complex dependencies, and can be adapted to account for population structure, linkage, and nonrandomly segregating markers. This article thus focuses on presenting the initial groundwork for a full formulation of quantitative genetics based on information theory.


Subject(s)
Information Theory , Models, Genetic , Databases, Genetic , Genome, Fungal , Genome-Wide Association Study/methods , Genomics/methods , Polymorphism, Single Nucleotide , Saccharomyces cerevisiae
5.
Entropy (Basel) ; 22(12)2020 Nov 24.
Article in English | MEDLINE | ID: mdl-33266517

ABSTRACT

Information theory provides robust measures of multivariable interdependence, but classically does little to characterize the multivariable relationships it detects. The Partial Information Decomposition (PID) characterizes the mutual information between variables by decomposing it into unique, redundant, and synergistic components. This has been usefully applied, particularly in neuroscience, but there is currently no generally accepted method for its computation. Independently, the Information Delta framework characterizes non-pairwise dependencies in genetic datasets. This framework has developed an intuitive geometric interpretation for how discrete functions encode information, but lacks some important generalizations. This paper shows that the PID and Delta frameworks are largely equivalent. We equate their key expressions, allowing for results in one framework to apply towards open questions in the other. For example, we find that the approach of Bertschinger et al. is useful for the open Information Delta question of how to deal with linkage disequilibrium. We also show how PID solutions can be mapped onto the space of delta measures. Using Bertschinger et al. as an example solution, we identify a specific plane in delta-space on which this approach's optimization is constrained, and compute it for all possible three-variable discrete functions of a three-letter alphabet. This yields a clear geometric picture of how a given solution decomposes information.

6.
Front Comput Neurosci ; 13: 75, 2019.
Article in English | MEDLINE | ID: mdl-31736734

ABSTRACT

Resting state networks (RSNs) extracted from functional magnetic resonance imaging (fMRI) scans are believed to reflect the intrinsic organization and network structure of brain regions. Most traditional methods for computing RSNs typically assume these functional networks are static throughout the duration of a scan lasting 5-15 min. However, they are known to vary on timescales ranging from seconds to years; in addition, the dynamic properties of RSNs are affected in a wide variety of neurological disorders. Recently, there has been a proliferation of methods for characterizing RSN dynamics, yet it remains a challenge to extract reproducible time-resolved networks. In this paper, we develop a novel method based on dynamic mode decomposition (DMD) to extract networks from short windows of noisy, high-dimensional fMRI data, allowing RSNs from single scans to be resolved robustly at a temporal resolution of seconds. After validating the method on a synthetic dataset, we analyze data from 120 individuals from the Human Connectome Project and show that unsupervised clustering of DMD modes discovers RSNs at both the group (gDMD) and the single subject (sDMD) levels. The gDMD modes closely resemble canonical RSNs. Compared to established methods, sDMD modes capture individualized RSN structure that both better resembles the population RSN and better captures subject-level variation. We further leverage this time-resolved sDMD analysis to infer occupancy and transitions among RSNs with high reproducibility. This automated DMD-based method is a powerful tool to characterize spatial and temporal structures of RSNs in individual subjects.

7.
J Comput Biol ; 24(12): 1153-1178, 2017 Dec.
Article in English | MEDLINE | ID: mdl-29028175

ABSTRACT

The complex of central problems in data analysis consists of three components: (1) detecting the dependence of variables using quantitative measures, (2) defining the significance of these dependence measures, and (3) inferring the functional relationships among dependent variables. We have argued previously that an information theory approach allows separation of the detection problem from the inference of functional form problem. We approach here the third component of inferring functional forms based on information encoded in the functions. We present here a direct method for classifying the functional forms of discrete functions of three variables represented in data sets. Discrete variables are frequently encountered in data analysis, both as the result of inherently categorical variables and from the binning of continuous numerical variables into discrete alphabets of values. The fundamental question of how much information is contained in a given function is answered for these discrete functions, and their surprisingly complex relationships are illustrated. The all-important effect of noise on the inference of function classes is found to be highly heterogeneous and reveals some unexpected patterns. We apply this classification approach to an important area of biological data analysis-that of inference of genetic interactions. Genetic analysis provides a rich source of real and complex biological data analysis problems, and our general methods provide an analytical basis and tools for characterizing genetic problems and for analyzing genetic data. We illustrate the functional description and the classes of a number of common genetic interaction modes and also show how different modes vary widely in their sensitivity to noise.


Subject(s)
Algorithms , Computational Biology/methods , Data Interpretation, Statistical , Epistasis, Genetic , Information Theory , Humans , Signal-To-Noise Ratio
8.
Front Comput Neurosci ; 11: 53, 2017.
Article in English | MEDLINE | ID: mdl-28659783

ABSTRACT

The neural dynamics of the nematode Caenorhabditis elegans are experimentally low-dimensional and may be understood as long-timescale transitions between multiple low-dimensional attractors. Previous modeling work has found that dynamic models of the worm's full neuronal network are capable of generating reasonable dynamic responses to certain inputs, even when all neurons are treated as identical save for their connectivity. This study investigates such a model of C. elegans neuronal dynamics, finding that a wide variety of multistable responses are generated in response to varied inputs. Specifically, we generate bifurcation diagrams for all possible single-neuron inputs, showing the existence of fixed points and limit cycles for different input regimes. The nature of the dynamical response is seen to vary according to the type of neuron receiving input; for example, input into sensory neurons is more likely to drive a bifurcation in the system than input into motor neurons. As a specific example we consider compound input into the neuron pairs PLM and ASK, discovering bistability of a limit cycle and a fixed point. The transient timescales in approaching each of these states are much longer than any intrinsic timescales of the system. This suggests consistency of our model with the characterization of dynamics in neural systems as long-timescale transitions between discrete, low-dimensional attractors corresponding to behavioral states.

SELECTION OF CITATIONS
SEARCH DETAIL
...