Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 77.150
Filter
1.
Genet Sel Evol ; 56(1): 33, 2024 May 02.
Article in English | MEDLINE | ID: mdl-38698321

ABSTRACT

BACKGROUND: Recursive models are a category of structural equation models that propose a causal relationship between traits. These models are more parameterized than multiple trait models, and they require imposing restrictions on the parameter space to ensure statistical identification. Nevertheless, in certain situations, the likelihood of recursive models and multiple trait models are equivalent. Consequently, the estimates of variance components derived from the multiple trait mixed model can be converted into estimates under several recursive models through LDL' or block-LDL' transformations. RESULTS: The procedure was employed on a dataset comprising five traits (birth weight-BW, weight at 90 days-W90, weight at 210 days-W210, cold carcass weight-CCW and conformation-CON) from the Pirenaica beef cattle breed. These phenotypic records were unequally distributed among 149,029 individuals and had a high percentage of missing data. The pedigree used consisted of 343,753 individuals. A Bayesian approach involving a multiple-trait mixed model was applied using a Gibbs sampler. The variance components obtained at each iteration of the Gibbs sampler were subsequently used to estimate the variance components within three distinct recursive models. CONCLUSIONS: The LDL' or block-LDL' transformations applied to the variance component estimates achieved from a multiple trait mixed model enabled inference across multiple sets of recursive models, with the sole prerequisite of being likelihood equivalent. Furthermore, the aforementioned transformations simplify the handling of missing data when conducting inference within the realm of recursive models.


Subject(s)
Models, Genetic , Animals , Cattle/genetics , Bayes Theorem , Phenotype , Breeding/methods , Breeding/standards , Birth Weight/genetics , Pedigree , Quantitative Trait, Heritable
2.
Genet Sel Evol ; 56(1): 35, 2024 May 02.
Article in English | MEDLINE | ID: mdl-38698347

ABSTRACT

BACKGROUND: The theory of "metafounders" proposes a unified framework for relationships across base populations within breeds (e.g. unknown parent groups), and base populations across breeds (crosses) together with a sensible compatibility with genomic relationships. Considering metafounders might be advantageous in pedigree best linear unbiased prediction (BLUP) or single-step genomic BLUP. Existing methods to estimate relationships across metafounders Γ are not well adapted to highly unbalanced data, genotyped individuals far from base populations, or many unknown parent groups (within breed per year of birth). METHODS: We derive likelihood methods to estimate Γ . For a single metafounder, summary statistics of pedigree and genomic relationships allow deriving a cubic equation with the real root being the maximum likelihood (ML) estimate of Γ . This equation is tested with Lacaune sheep data. For several metafounders, we split the first derivative of the complete likelihood in a term related to Γ , and a second term related to Mendelian sampling variances. Approximating the first derivative by its first term results in a pseudo-EM algorithm that iteratively updates the estimate of Γ by the corresponding block of the H-matrix. The method extends to complex situations with groups defined by year of birth, modelling the increase of Γ using estimates of the rate of increase of inbreeding ( Δ F ), resulting in an expanded Γ and in a pseudo-EM+ Δ F algorithm. We compare these methods with the generalized least squares (GLS) method using simulated data: complex crosses of two breeds in equal or unsymmetrical proportions; and in two breeds, with 10 groups per year of birth within breed. We simulate genotyping in all generations or in the last ones. RESULTS: For a single metafounder, the ML estimates of the Lacaune data corresponded to the maximum. For simulated data, when genotypes were spread across all generations, both GLS and pseudo-EM(+ Δ F ) methods were accurate. With genotypes only available in the most recent generations, the GLS method was biased, whereas the pseudo-EM(+ Δ F ) approach yielded more accurate and unbiased estimates. CONCLUSIONS: We derived ML, pseudo-EM and pseudo-EM+ Δ F methods to estimate Γ in many realistic settings. Estimates are accurate in real and simulated data and have a low computational cost.


Subject(s)
Breeding , Models, Genetic , Pedigree , Animals , Likelihood Functions , Breeding/methods , Algorithms , Sheep/genetics , Genomics/methods , Computer Simulation , Male , Female , Genotype
3.
Genet Sel Evol ; 56(1): 34, 2024 May 02.
Article in English | MEDLINE | ID: mdl-38698373

ABSTRACT

Metafounders are a useful concept to characterize relationships within and across populations, and to help genetic evaluations because they help modelling the means and variances of unknown base population animals. Current definitions of metafounder relationships are sensitive to the choice of reference alleles and have not been compared to their counterparts in population genetics-namely, heterozygosities, FST coefficients, and genetic distances. We redefine the relationships across populations with an arbitrary base of a maximum heterozygosity population in Hardy-Weinberg equilibrium. Then, the relationship between or within populations is a cross-product of the form Γ b , b ' = 2 n 2 p b - 1 2 p b ' - 1 ' with p being vectors of allele frequencies at n markers in populations b and b ' . This is simply the genomic relationship of two pseudo-individuals whose genotypes are equal to twice the allele frequencies. We also show that this coding is invariant to the choice of reference alleles. In addition, standard population genetics metrics (inbreeding coefficients of various forms; FST differentiation coefficients; segregation variance; and Nei's genetic distance) can be obtained from elements of matrix Γ .


Subject(s)
Gene Frequency , Genetics, Population , Models, Genetic , Animals , Genetics, Population/methods , Heterozygote , Alleles , Genomics/methods , Genotype , Genome
4.
Chaos ; 34(5)2024 May 01.
Article in English | MEDLINE | ID: mdl-38717409

ABSTRACT

In the evolution of species, the karyotype changes with a timescale of tens to hundreds of thousand years. In the development of cancer, the karyotype often is modified in cancerous cells over the lifetime of an individual. Characterizing these changes and understanding the mechanisms leading to them has been of interest in a broad range of disciplines including evolution, cytogenetics, and cancer genetics. A central issue relates to the relative roles of random vs deterministic mechanisms in shaping the changes. Although it is possible that all changes result from random events followed by selection, many results point to other non-random factors that play a role in karyotype evolution. In cancer, chromosomal instability leads to characteristic changes in the karyotype, in which different individuals with a specific type of cancer display similar changes in karyotype structure over time. Statistical analyses of chromosome lengths in different species indicate that the length distribution of chromosomes is not consistent with models in which the lengths of chromosomes are random or evolve solely by simple random processes. A better understanding of the mechanisms underlying karyotype evolution should enable the development of quantitative theoretical models that combine the random and deterministic processes that can be compared to experimental determinations of the karyotype in diverse settings.


Subject(s)
Karyotype , Humans , Animals , Evolution, Molecular , Models, Genetic , Neoplasms/genetics , Biological Evolution
5.
Sci Adv ; 10(19): eadn1547, 2024 May 10.
Article in English | MEDLINE | ID: mdl-38718117

ABSTRACT

Pre-mRNA splicing is a fundamental step in gene expression, conserved across eukaryotes, in which the spliceosome recognizes motifs at the 3' and 5' splice sites (SSs), excises introns, and ligates exons. SS recognition and pairing is often influenced by protein splicing factors (SFs) that bind to splicing regulatory elements (SREs). Here, we describe SMsplice, a fully interpretable model of pre-mRNA splicing that combines models of core SS motifs, SREs, and exonic and intronic length preferences. We learn models that predict SS locations with 83 to 86% accuracy in fish, insects, and plants and about 70% in mammals. Learned SRE motifs include both known SF binding motifs and unfamiliar motifs, and both motif classes are supported by genetic analyses. Our comparisons across species highlight similarities between non-mammals, increased reliance on intronic SREs in plant splicing, and a greater reliance on SREs in mammalian splicing.


Subject(s)
Exons , Introns , RNA Precursors , RNA Splice Sites , RNA Splicing , RNA Precursors/genetics , RNA Precursors/metabolism , Animals , Introns/genetics , Exons/genetics , Genes, Plant , Models, Genetic , Spliceosomes/metabolism , Spliceosomes/genetics , Plants/genetics , Humans , RNA Splicing Factors/genetics , RNA Splicing Factors/metabolism
6.
Phys Rev E ; 109(4-1): 044407, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38755817

ABSTRACT

All the cells of a multicellular organism are the product of cell divisions that trace out a single binary tree, the so-called cell lineage tree. Because cell divisions are accompanied by replication errors, the shape of the cell lineage tree is a key determinant of how somatic evolution, which can potentially lead to cancer, proceeds. Carcinogenesis requires the accumulation of a certain number of driver mutations. By mapping the accumulation of mutations into a graph theoretical problem, we present an exact numerical method to calculate the probability of collecting a given number of mutations and show that for low mutation rates it can be approximated with a simple analytical formula, which depends only on the distribution of the lineage lengths, and is dominated by the longest lineages. Our results are crucial in understanding how natural selection can shape the cell lineage trees of multicellular organisms and curtail somatic evolution.


Subject(s)
Cell Lineage , Models, Genetic , Mutation Accumulation , Mutation
7.
Stat Appl Genet Mol Biol ; 23(1)2024 Jan 01.
Article in English | MEDLINE | ID: mdl-38753402

ABSTRACT

Somatic mutations in cancer can be viewed as a mixture distribution of several mutational signatures, which can be inferred using non-negative matrix factorization (NMF). Mutational signatures have previously been parametrized using either simple mono-nucleotide interaction models or general tri-nucleotide interaction models. We describe a flexible and novel framework for identifying biologically plausible parametrizations of mutational signatures, and in particular for estimating di-nucleotide interaction models. Our novel estimation procedure is based on the expectation-maximization (EM) algorithm and regression in the log-linear quasi-Poisson model. We show that di-nucleotide interaction signatures are statistically stable and sufficiently complex to fit the mutational patterns. Di-nucleotide interaction signatures often strike the right balance between appropriately fitting the data and avoiding over-fitting. They provide a better fit to data and are biologically more plausible than mono-nucleotide interaction signatures, and the parametrization is more stable than the parameter-rich tri-nucleotide interaction signatures. We illustrate our framework in a large simulation study where we compare to state of the art methods, and show results for three data sets of somatic mutation counts from patients with cancer in the breast, Liver and urinary tract.


Subject(s)
Algorithms , Mutation , Neoplasms , Humans , Neoplasms/genetics , Models, Genetic , Computer Simulation , Models, Statistical
8.
Elife ; 122024 May 08.
Article in English | MEDLINE | ID: mdl-38717010

ABSTRACT

Interacting molecules create regulatory architectures that can persist despite turnover of molecules. Although epigenetic changes occur within the context of such architectures, there is limited understanding of how they can influence the heritability of changes. Here, I develop criteria for the heritability of regulatory architectures and use quantitative simulations of interacting regulators parsed as entities, their sensors, and the sensed properties to analyze how architectures influence heritable epigenetic changes. Information contained in regulatory architectures grows rapidly with the number of interacting molecules and its transmission requires positive feedback loops. While these architectures can recover after many epigenetic perturbations, some resulting changes can become permanently heritable. Architectures that are otherwise unstable can become heritable through periodic interactions with external regulators, which suggests that mortal somatic lineages with cells that reproducibly interact with the immortal germ lineage could make a wider variety of architectures heritable. Differential inhibition of the positive feedback loops that transmit regulatory architectures across generations can explain the gene-specific differences in heritable RNA silencing observed in the nematode Caenorhabditis elegans. More broadly, these results provide a foundation for analyzing the inheritance of epigenetic changes within the context of the regulatory architectures implemented using diverse molecules in different living systems.


Subject(s)
Caenorhabditis elegans , Epigenesis, Genetic , Caenorhabditis elegans/genetics , Animals , Models, Genetic , Gene Regulatory Networks , Inheritance Patterns
9.
Theor Appl Genet ; 137(6): 134, 2024 May 16.
Article in English | MEDLINE | ID: mdl-38753078

ABSTRACT

The standard approach to variance component estimation in linear mixed models for alpha designs is the residual maximum likelihood (REML) method. One drawback of the REML method in the context of incomplete block designs is that the block variance may be estimated as zero, which can compromise the recovery of inter-block information and hence reduce the accuracy of treatment effects estimation. Due to the development of statistical and computational methods, there is an increasing interest in adopting hierarchical approaches to analysis. In order to increase the precision of the analysis of individual trials laid out as alpha designs, we here make a proposal to create an objectively informed prior distribution for variance components for replicates, blocks and plots, based on the results of previous (historical) trials. We propose different modelling approaches for the prior distributions and evaluate the effectiveness of the hierarchical approach compared to the REML method, which is classically used for analysing individual trials in two-stage approaches for multi-environment trials.


Subject(s)
Models, Genetic , Likelihood Functions , Linear Models , Computer Simulation , Models, Statistical
10.
PLoS Biol ; 22(5): e3002594, 2024 May.
Article in English | MEDLINE | ID: mdl-38754362

ABSTRACT

The standard genetic code defines the rules of translation for nearly every life form on Earth. It also determines the amino acid changes accessible via single-nucleotide mutations, thus influencing protein evolvability-the ability of mutation to bring forth adaptive variation in protein function. One of the most striking features of the standard genetic code is its robustness to mutation, yet it remains an open question whether such robustness facilitates or frustrates protein evolvability. To answer this question, we use data from massively parallel sequence-to-function assays to construct and analyze 6 empirical adaptive landscapes under hundreds of thousands of rewired genetic codes, including those of codon compression schemes relevant to protein engineering and synthetic biology. We find that robust genetic codes tend to enhance protein evolvability by rendering smooth adaptive landscapes with few peaks, which are readily accessible from throughout sequence space. However, the standard genetic code is rarely exceptional in this regard, because many alternative codes render smoother landscapes than the standard code. By constructing low-dimensional visualizations of these landscapes, which each comprise more than 16 million mRNA sequences, we show that such alternative codes radically alter the topological features of the network of high-fitness genotypes. Whereas the genetic codes that optimize evolvability depend to some extent on the detailed relationship between amino acid sequence and protein function, we also uncover general design principles for engineering nonstandard genetic codes for enhanced and diminished evolvability, which may facilitate directed protein evolution experiments and the bio-containment of synthetic organisms, respectively.


Subject(s)
Evolution, Molecular , Genetic Code , Proteins , Proteins/genetics , Proteins/metabolism , Mutation/genetics , Codon/genetics , Models, Genetic , Synthetic Biology/methods , Protein Biosynthesis , Protein Engineering/methods
11.
BMC Med Genomics ; 17(1): 132, 2024 May 16.
Article in English | MEDLINE | ID: mdl-38755654

ABSTRACT

BACKGROUND: Polygenic risk scores (PRS) quantify an individual's genetic predisposition for different traits and are expected to play an increasingly important role in personalized medicine. A crucial challenge in clinical practice is the generalizability and transferability of PRS models to populations with different ancestries. When assessing the generalizability of PRS models for continuous traits, the R 2 is a commonly used measure to evaluate prediction accuracy. While the R 2 is a well-defined goodness-of-fit measure for statistical linear models, there exist different definitions for its application on test data, which complicates interpretation and comparison of results. METHODS: Based on large-scale genotype data from the UK Biobank, we compare three definitions of the R 2 on test data for evaluating the generalizability of PRS models to different populations. Polygenic models for several phenotypes, including height, BMI and lipoprotein A, are derived based on training data with European ancestry using state-of-the-art regression methods and are evaluated on various test populations with different ancestries. RESULTS: Our analysis shows that the choice of the R 2  definition can lead to considerably different results on test data, making the comparison of R 2  values from the literature problematic. While the definition as the squared correlation between predicted and observed phenotypes solely addresses the discriminative performance and always yields values between 0 and 1, definitions of the R 2 based on the mean squared prediction error (MSPE) with reference to intercept-only models assess both discrimination and calibration. These MSPE-based definitions can yield negative values indicating miscalibrated predictions for out-of-target populations. We argue that the choice of the most appropriate definition depends on the aim of PRS analysis - whether it primarily serves for risk stratification or also for individual phenotype prediction. Moreover, both correlation-based and MSPE-based definitions of R 2 can provide valuable complementary information. CONCLUSIONS: Awareness of the different definitions of the R 2 on test data is necessary to facilitate the reporting and interpretation of results on PRS generalizability. It is recommended to explicitly state which definition was used when reporting R 2 values on test data. Further research is warranted to develop and evaluate well-calibrated polygenic models for diverse populations.


Subject(s)
Models, Genetic , Multifactorial Inheritance , Humans , Phenotype , Genetic Predisposition to Disease
12.
PLoS Biol ; 22(5): e3002627, 2024 May.
Article in English | MEDLINE | ID: mdl-38758732

ABSTRACT

The relationship between genetic code robustness and protein evolvability is unknown. A new study in PLOS Biology using in silico rewiring of genetic codes and functional protein data identified a positive correlation between code robustness and protein evolvability that is protein-specific.


Subject(s)
Evolution, Molecular , Genetic Code , Proteins , Proteins/genetics , Proteins/metabolism , Models, Genetic
13.
Genet Sel Evol ; 56(1): 38, 2024 May 15.
Article in English | MEDLINE | ID: mdl-38750427

ABSTRACT

BACKGROUND: The accuracy of genomic prediction is partly determined by the size of the reference population. In Atlantic salmon breeding programs, four parallel populations often exist, thus offering the opportunity to increase the size of the reference set by combining these populations. By allowing a reduction in the number of records per population, multi-population prediction can potentially reduce cost and welfare issues related to the recording of traits, particularly for diseases. In this study, we evaluated the accuracy of multi- and across-population prediction of breeding values for resistance to amoebic gill disease (AGD) using all single nucleotide polymorphisms (SNPs) on a 55K chip or a selected subset of SNPs based on the signs of allele substitution effect estimates across populations, using both linear and nonlinear genomic prediction (GP) models in Atlantic salmon populations. In addition, we investigated genetic distance, genetic correlation estimated based on genomic relationships, and persistency of linkage disequilibrium (LD) phase across these populations. RESULTS: The genetic distance between populations ranged from 0.03 to 0.07, while the genetic correlation ranged from 0.19 to 0.99. Nonetheless, compared to within-population prediction, there was limited or no impact of combining populations for multi-population prediction across the various models used or when using the selected subset of SNPs. The estimates of across-population prediction accuracy were low and to some extent proportional to the genetic correlation estimates. The persistency of LD phase between adjacent markers across populations using all SNP data ranged from 0.51 to 0.65, indicating that LD is poorly conserved across the studied populations. CONCLUSIONS: Our results show that a high genetic correlation and a high genetic relationship between populations do not guarantee a higher prediction accuracy from multi-population genomic prediction in Atlantic salmon.


Subject(s)
Linkage Disequilibrium , Polymorphism, Single Nucleotide , Salmo salar , Animals , Salmo salar/genetics , Genomics/methods , Fish Diseases/genetics , Genetics, Population/methods , Models, Genetic , Breeding/methods , Genome , Disease Resistance/genetics
14.
Mol Biol Evol ; 41(5)2024 May 03.
Article in English | MEDLINE | ID: mdl-38696269

ABSTRACT

This perspective article offers a meditation on FST and other quantities developed by Sewall Wright to describe the population structure, defined as any departure from reproduction through random union of gametes. Concepts related to the F-statistics draw from studies of the partitioning of variation, identity coefficients, and diversity measures. Relationships between the first two approaches have recently been clarified and unified. This essay addresses the third pillar of the discussion: Nei's GST and related measures. A hierarchy of probabilities of identity-by-state provides a description of the relationships among levels of a structured population with respect to genetic diversity. Explicit expressions for the identity-by-state probabilities are determined for models of structured populations undergoing regular inbreeding and recurrent mutation. Levels of genetic diversity within and between subpopulations reflect mutation as well as migration. Accordingly, indices of the population structure are inherently locus-specific, contrary to the intentions of Wright. Some implications of this locus-specificity are explored.


Subject(s)
Genetic Variation , Genetics, Population , Models, Genetic , Genetics, Population/methods , Mutation , Inbreeding
15.
Theor Appl Genet ; 137(6): 138, 2024 May 21.
Article in English | MEDLINE | ID: mdl-38771334

ABSTRACT

KEY MESSAGE: Residual neural network genomic selection is the first GS algorithm to reach 35 layers, and its prediction accuracy surpasses previous algorithms. With the decrease in DNA sequencing costs and the development of deep learning, phenotype prediction accuracy by genomic selection (GS) continues to improve. Residual networks, a widely validated deep learning technique, are introduced to deep learning for GS. Since each locus has a different weighted impact on the phenotype, strided convolutions are more suitable for GS problems than pooling layers. Through the above technological innovations, we propose a GS deep learning algorithm, residual neural network for genomic selection (ResGS). ResGS is the first neural network to reach 35 layers in GS. In 15 cases from four public data, the prediction accuracy of ResGS is higher than that of ridge-regression best linear unbiased prediction, support vector regression, random forest, gradient boosting regressor, and deep neural network genomic prediction in most cases. ResGS performs well in dealing with gene-environment interaction. Phenotypes from other environments are imported into ResGS along with genetic data. The prediction results are much better than just providing genetic data as input, which demonstrates the effectiveness of GS multi-modal learning. Standard deviation is recommended as an auxiliary GS evaluation metric, which could improve the distribution of predicted results. Deep learning for GS, such as ResGS, is becoming more accurate in phenotype prediction.


Subject(s)
Algorithms , Genomics , Neural Networks, Computer , Phenotype , Genomics/methods , Models, Genetic , Deep Learning , Gene-Environment Interaction , Selection, Genetic
16.
Yi Chuan ; 46(5): 421-430, 2024 May 20.
Article in English | MEDLINE | ID: mdl-38763776

ABSTRACT

Inner Mongolia cashmere goat is an excellent livestock breed formed through long-term natural selection and artificial breeding, and is currently a world-class dual-purpose breed producing cashmere and meat. Multi trait animal model is considered to significantly improve the accuracy of genetic evaluation in livestock and poultry, enabling indirect selection between traits. In this study, the pedigree, genotype, environment, and phenotypic records of early growth traits of Inner Mongolia cashmere goats were used to build multi trait animal model., Then three methods including ABLUP, GBLUP, and ssGBLUP wereused to estimate the genetic parameters and genomic breeding values of early growth traits (birth weight, weaning weight, average daily weight gain before weaning, and yearling weight). The accuracy and reliability of genomic estimated breeding value are further evaluated using the five fold cross validation method. The results showed that the heritability of birth weight estimated by three methods was 0.13-0.15, the heritability of weaning weight was 0.13-0.20, heritability of daily weight gain before weaning was 0.11-0.14, and the heritability of yearling weight was 0.09-0.14, all of which belonged to moderate to low heritability. There is a strong positive genetic correlation between weaning weight and daily weight gain before weaning, daily weight gain before weaning and yearling weight, with correlation coefficients of 0.77-0.79 and 0.56-0.67, respectively. The same pattern was found in phenotype correlation among traits. The accuracy of the estimated breeding values by ABLUP, GBLUP, and ssGBLUP methods for birth weight is 0.5047, 0.6694, and 0.7156, respectively; the weaning weight is 0.6207, 0.6456, and 0.7254, respectively; the daily weight gain before weaning was 0.6110, 0.6855, and 0.7357 respectively; and the yearling weight was 0.6209, 0.7155, and 0.7756, respectively. In summary, the early growth traits of Inner Mongolia cashmere goats belong to moderate to low heritability, and the speed of genetic improvement is relatively slow. The genetic improvement of other growth traits can be achieved through the selection of weaning weight. The ssGBLUP method has the highest accuracy and reliability in estimating genomic breeding value of early growth traits in Inner Mongolia cashmere goats, and is significantly higher than that from ABLUP method, indicating that it is the best method for genomic breeding of early growth weight in Inner Mongolia cashmere goats.


Subject(s)
Breeding , Goats , Animals , Goats/genetics , Goats/growth & development , Phenotype , Genomics/methods , Female , Male , Birth Weight/genetics , Models, Genetic
17.
Sci Rep ; 14(1): 11314, 2024 05 17.
Article in English | MEDLINE | ID: mdl-38760507

ABSTRACT

This paper focuses on the maximum speed at which biological evolution can occur. I derive inequalities that limit the rate of evolutionary processes driven by natural selection, mutations, or genetic drift. These rate limits link the variability in a population to evolutionary rates. In particular, high variances in the fitness of a population and of a quantitative trait allow for fast changes in the trait's average. In contrast, low variability makes a trait less susceptible to random changes due to genetic drift. The results in this article generalize Fisher's fundamental theorem of natural selection to dynamics that allow for mutations and genetic drift, via trade-off relations that constrain the evolutionary rates of arbitrary traits. The rate limits can be used to probe questions in various evolutionary biology and ecology settings. They apply, for instance, to trait dynamics within or across species or to the evolution of bacteria strains. They apply to any quantitative trait, e.g., from species' weights to the lengths of DNA strands.


Subject(s)
Biological Evolution , Genetic Drift , Selection, Genetic , Mutation , Models, Genetic , Phenotype , Quantitative Trait, Heritable , Evolution, Molecular
18.
Bull Math Biol ; 86(6): 70, 2024 May 08.
Article in English | MEDLINE | ID: mdl-38717656

ABSTRACT

Practical limitations of quality and quantity of data can limit the precision of parameter identification in mathematical models. Model-based experimental design approaches have been developed to minimise parameter uncertainty, but the majority of these approaches have relied on first-order approximations of model sensitivity at a local point in parameter space. Practical identifiability approaches such as profile-likelihood have shown potential for quantifying parameter uncertainty beyond linear approximations. This research presents a genetic algorithm approach to optimise sample timing across various parameterisations of a demonstrative PK-PD model with the goal of aiding experimental design. The optimisation relies on a chosen metric of parameter uncertainty that is based on the profile-likelihood method. Additionally, the approach considers cases where multiple parameter scenarios may require simultaneous optimisation. The genetic algorithm approach was able to locate near-optimal sampling protocols for a wide range of sample number (n = 3-20), and it reduced the parameter variance metric by 33-37% on average. The profile-likelihood metric also correlated well with an existing Monte Carlo-based metric (with a worst-case r > 0.89), while reducing computational cost by an order of magnitude. The combination of the new profile-likelihood metric and the genetic algorithm demonstrate the feasibility of considering the nonlinear nature of models in optimal experimental design at a reasonable computational cost. The outputs of such a process could allow for experimenters to either improve parameter certainty given a fixed number of samples, or reduce sample quantity while retaining the same level of parameter certainty.


Subject(s)
Algorithms , Computer Simulation , Mathematical Concepts , Models, Biological , Monte Carlo Method , Likelihood Functions , Humans , Dose-Response Relationship, Drug , Research Design/statistics & numerical data , Models, Genetic , Uncertainty
19.
Cladistics ; 40(3): 242-281, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38728134

ABSTRACT

Although simulations have shown that implied weighting (IW) outperforms equal weighting (EW) in phylogenetic parsimony analyses, weighting against homoplasy lacks extensive usage in palaeontology. Iterative modifications of several phylogenetic matrices in the last decades resulted in extensive genealogies of datasets that allow the evaluation of differences in the stability of results for alternative character weighting methods directly on empirical data. Each generation was compared against the most recent generation in each genealogy because it is assumed that it is the most comprehensive (higher sampling), revised (fewer misscorings) and complete (lower amount of missing data) matrix of the genealogy. The analyses were conducted on six different genealogies under EW and IW and extended implied weighting (EIW) with a range of concavity constant values (k) between 3 and 30. Pairwise comparisons between trees were conducted using Robinson-Foulds distances normalized by the total number of groups, distortion coefficient, subtree pruning and regrafting moves, and the proportional sum of group dissimilarities. The results consistently show that IW and EIW produce results more similar to those of the last dataset than EW in the vast majority of genealogies and for all comparative measures. This is significant because almost all of these matrices were originally analysed only under EW. Implied weighting and EIW do not outperform each other unambiguously. Euclidean distances based on a principal components analysis of the comparative measures show that different ranges of k-values retrieve the most similar results to the last generation in different genealogies. There is a significant positive linear correlation between the optimal k-values and the number of terminals of the last generations. This could be employed to inform about the range of k-values to be used in phylogenetic analyses based on matrix size but with the caveat that this emergent relationship still relies on a low sample size of genealogies.


Subject(s)
Paleontology , Phylogeny , Animals , Models, Genetic , Computer Simulation , Fossils
20.
PLoS Genet ; 20(5): e1011245, 2024 May.
Article in English | MEDLINE | ID: mdl-38728360

ABSTRACT

Joint analysis of multiple correlated phenotypes for genome-wide association studies (GWAS) can identify and interpret pleiotropic loci which are essential to understand pleiotropy in diseases and complex traits. Meanwhile, constructing a network based on associations between phenotypes and genotypes provides a new insight to analyze multiple phenotypes, which can explore whether phenotypes and genotypes might be related to each other at a higher level of cellular and organismal organization. In this paper, we first develop a bipartite signed network by linking phenotypes and genotypes into a Genotype and Phenotype Network (GPN). The GPN can be constructed by a mixture of quantitative and qualitative phenotypes and is applicable to binary phenotypes with extremely unbalanced case-control ratios in large-scale biobank datasets. We then apply a powerful community detection method to partition phenotypes into disjoint network modules based on GPN. Finally, we jointly test the association between multiple phenotypes in a network module and a single nucleotide polymorphism (SNP). Simulations and analyses of 72 complex traits in the UK Biobank show that multiple phenotype association tests based on network modules detected by GPN are much more powerful than those without considering network modules. The newly proposed GPN provides a new insight to investigate the genetic architecture among different types of phenotypes. Multiple phenotypes association studies based on GPN are improved by incorporating the genetic information into the phenotype clustering. Notably, it might broaden the understanding of genetic architecture that exists between diagnoses, genes, and pleiotropy.


Subject(s)
Genome-Wide Association Study , Genotype , Phenotype , Polymorphism, Single Nucleotide , Humans , Genome-Wide Association Study/methods , Polymorphism, Single Nucleotide/genetics , Models, Genetic , Genetic Pleiotropy , Genetic Association Studies/methods , Quantitative Trait Loci/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...