Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 7 de 7
Filter
Add more filters










Database
Language
Publication year range
2.
Sci Rep ; 13(1): 11662, 2023 07 19.
Article in English | MEDLINE | ID: mdl-37468507

ABSTRACT

In this paper we characterize the performance of linear models trained via widely-used sparse machine learning algorithms. We build polygenic scores and examine performance as a function of training set size, genetic ancestral background, and training method. We show that predictor performance is most strongly dependent on size of training data, with smaller gains from algorithmic improvements. We find that LASSO generally performs as well as the best methods, judged by a variety of metrics. We also investigate performance characteristics of predictors trained on one genetic ancestry group when applied to another. Using LASSO, we develop a novel method for projecting AUC and correlation as a function of data size (i.e., for new biobanks) and characterize the asymptotic limit of performance. Additionally, for LASSO (compressed sensing) we show that performance metrics and predictor sparsity are in agreement with theoretical predictions from the Donoho-Tanner phase transition. Specifically, a future predictor trained in the Taiwan Precision Medicine Initiative for asthma can achieve an AUC of [Formula: see text] and for height a correlation of [Formula: see text] for a Taiwanese population. This is above the measured values of [Formula: see text] and [Formula: see text], respectively, for UK Biobank trained predictors applied to a European population.


Subject(s)
Asthma , Biological Specimen Banks , Humans , Machine Learning , Forecasting , Algorithms
3.
Sci Rep ; 13(1): 376, 2023 01 07.
Article in English | MEDLINE | ID: mdl-36611071

ABSTRACT

We use UK Biobank and a unique IVF family dataset (including genotyped embryos) to investigate sibling variation in both phenotype and genotype. We compare phenotype (disease status, height, blood biomarkers) and genotype (polygenic scores, polygenic health index) distributions among siblings to those in the general population. As expected, the between-siblings standard deviation in polygenic scores is [Formula: see text] times smaller than in the general population, but variation is still significant. As previously demonstrated, this allows for substantial benefit from polygenic screening in IVF. Differences in sibling genotypes result from distinct recombination patterns in sexual reproduction. We develop a novel sibling-pair method for detection of recombination breaks via statistical discontinuities. The new method is used to construct a dataset of 1.44 million recombination events which may be useful in further study of meiosis.


Subject(s)
Multifactorial Inheritance , Siblings , Humans , Multifactorial Inheritance/genetics , Biological Specimen Banks , Genotype , Phenotype , Recombination, Genetic , United Kingdom/epidemiology , DNA , Fertilization in Vitro , Genome-Wide Association Study
5.
Sci Rep ; 12(1): 18173, 2022 10 28.
Article in English | MEDLINE | ID: mdl-36307513

ABSTRACT

We construct a polygenic health index as a weighted sum of polygenic risk scores for 20 major disease conditions, including, e.g., coronary artery disease, type 1 and 2 diabetes, schizophrenia, etc. Individual weights are determined by population-level estimates of impact on life expectancy. We validate this index in odds ratios and selection experiments using unrelated individuals and siblings (pairs and trios) from the UK Biobank. Individuals with higher index scores have decreased disease risk across almost all 20 diseases (no significant risk increases), and longer calculated life expectancy. When estimated Disability Adjusted Life Years (DALYs) are used as the performance metric, the gain from selection among ten individuals (highest index score vs average) is found to be roughly 4 DALYs. We find no statistical evidence for antagonistic trade-offs in risk reduction across these diseases. Correlations between genetic disease risks are found to be mostly positive and generally mild. These results have important implications for public health and also for fundamental issues such as pleiotropy and genetic architecture of human disease conditions.


Subject(s)
Diabetes Mellitus, Type 1 , Diabetes Mellitus, Type 2 , Humans , Siblings , Multifactorial Inheritance , Life Expectancy , Risk Reduction Behavior , Risk Factors
6.
Methods Mol Biol ; 2467: 421-446, 2022.
Article in English | MEDLINE | ID: mdl-35451785

ABSTRACT

Decoding the genome confers the capability to predict characteristics of the organism (phenotype) from DNA (genotype). We describe the present status and future prospects of genomic prediction of complex traits in humans. Some highly heritable complex phenotypes such as height and other quantitative traits can already be predicted with reasonable accuracy from DNA alone. For many diseases, including important common conditions such as coronary artery disease, breast cancer, type I and II diabetes, individuals with outlier polygenic scores (e.g., top few percent) have been shown to have 5 or even 10 times higher risk than average. Several psychiatric conditions such as schizophrenia and autism also fall into this category. We discuss related topics such as the genetic architecture of complex traits, sibling validation of polygenic scores, and applications to adult health, in vitro fertilization (embryo selection), and genetic engineering.


Subject(s)
Genome-Wide Association Study , Multifactorial Inheritance , Genomics , Genotype , Humans , Models, Genetic , Phenotype , Polymorphism, Single Nucleotide
7.
Genes (Basel) ; 12(7)2021 06 29.
Article in English | MEDLINE | ID: mdl-34209487

ABSTRACT

We use UK Biobank data to train predictors for 65 blood and urine markers such as HDL, LDL, lipoprotein A, glycated haemoglobin, etc. from SNP genotype. For example, our Polygenic Score (PGS) predictor correlates ∼0.76 with lipoprotein A level, which is highly heritable and an independent risk factor for heart disease. This may be the most accurate genomic prediction of a quantitative trait that has yet been produced (specifically, for European ancestry groups). We also train predictors of common disease risk using blood and urine biomarkers alone (no DNA information); we call these predictors biomarker risk scores, BMRS. Individuals who are at high risk (e.g., odds ratio of >5× population average) can be identified for conditions such as coronary artery disease (AUC∼0.75), diabetes (AUC∼0.95), hypertension, liver and kidney problems, and cancer using biomarkers alone. Our atherosclerotic cardiovascular disease (ASCVD) predictor uses ∼10 biomarkers and performs in UKB evaluation as well as or better than the American College of Cardiology ASCVD Risk Estimator, which uses quite different inputs (age, diagnostic history, BMI, smoking status, statin usage, etc.). We compare polygenic risk scores (risk conditional on genotype: PRS) for common diseases to the risk predictors which result from the concatenation of learned functions BMRS and PGS, i.e., applying the BMRS predictors to the PGS output.


Subject(s)
Atherosclerosis/epidemiology , Biomarkers/blood , Biomarkers/urine , Cardiovascular Diseases/epidemiology , Lipoprotein(a)/blood , Adult , Atherosclerosis/blood , Atherosclerosis/urine , Biological Specimen Banks , Calcium/blood , Calcium/urine , Cardiovascular Diseases/blood , Female , Heart Disease Risk Factors , Hemoglobins/genetics , Humans , Lipoproteins, HDL/blood , Lipoproteins, LDL/blood , Machine Learning , Male , Middle Aged , Multifactorial Inheritance/genetics , Risk Assessment , United Kingdom/epidemiology , United States/epidemiology
SELECTION OF CITATIONS
SEARCH DETAIL
...