Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 35
Filter
1.
J Am Med Inform Assoc ; 31(4): 846-854, 2024 Apr 03.
Article in English | MEDLINE | ID: mdl-38263490

ABSTRACT

IMPORTANCE: Knowledge gained from cohort studies has dramatically advanced both public and precision health. The All of Us Research Program seeks to enroll 1 million diverse participants who share multiple sources of data, providing unique opportunities for research. It is important to understand the phenomic profiles of its participants to conduct research in this cohort. OBJECTIVES: More than 280 000 participants have shared their electronic health records (EHRs) in the All of Us Research Program. We aim to understand the phenomic profiles of this cohort through comparisons with those in the US general population and a well-established nation-wide cohort, UK Biobank, and to test whether association results of selected commonly studied diseases in the All of Us cohort were comparable to those in UK Biobank. MATERIALS AND METHODS: We included participants with EHRs in All of Us and participants with health records from UK Biobank. The estimates of prevalence of diseases in the US general population were obtained from the Global Burden of Diseases (GBD) study. We conducted phenome-wide association studies (PheWAS) of 9 commonly studied diseases in both cohorts. RESULTS: This study included 287 012 participants from the All of Us EHR cohort and 502 477 participants from the UK Biobank. A total of 314 diseases curated by the GBD were evaluated in All of Us, 80.9% (N = 254) of which were more common in All of Us than in the US general population [prevalence ratio (PR) >1.1, P < 2 × 10-5]. Among 2515 diseases and phenotypes evaluated in both All of Us and UK Biobank, 85.6% (N = 2152) were more common in All of Us (PR >1.1, P < 2 × 10-5). The Pearson correlation coefficients of effect sizes from PheWAS between All of Us and UK Biobank were 0.61, 0.50, 0.60, 0.57, 0.40, 0.53, 0.46, 0.47, and 0.24 for ischemic heart diseases, lung cancer, chronic obstructive pulmonary disease, dementia, colorectal cancer, lower back pain, multiple sclerosis, lupus, and cystic fibrosis, respectively. DISCUSSION: Despite the differences in prevalence of diseases in All of Us compared to the US general population or the UK Biobank, our study supports that All of Us can facilitate rapid investigation of a broad range of diseases. CONCLUSION: Most diseases were more common in All of Us than in the general US population or the UK Biobank. Results of disease-disease association tests from All of Us are comparable to those estimated in another well-studied national cohort.


Subject(s)
Phenomics , Population Health , Humans , Biological Specimen Banks , UK Biobank , Phenotype , United Kingdom/epidemiology
2.
J Am Med Inform Assoc ; 30(3): 427-437, 2023 02 16.
Article in English | MEDLINE | ID: mdl-36474423

ABSTRACT

OBJECTIVE: The aim of this study was to analyze a publicly available sample of rule-based phenotype definitions to characterize and evaluate the variability of logical constructs used. MATERIALS AND METHODS: A sample of 33 preexisting phenotype definitions used in research that are represented using Fast Healthcare Interoperability Resources and Clinical Quality Language (CQL) was analyzed using automated analysis of the computable representation of the CQL libraries. RESULTS: Most of the phenotype definitions include narrative descriptions and flowcharts, while few provide pseudocode or executable artifacts. Most use 4 or fewer medical terminologies. The number of codes used ranges from 5 to 6865, and value sets from 1 to 19. We found that the most common expressions used were literal, data, and logical expressions. Aggregate and arithmetic expressions are the least common. Expression depth ranges from 4 to 27. DISCUSSION: Despite the range of conditions, we found that all of the phenotype definitions consisted of logical criteria, representing both clinical and operational logic, and tabular data, consisting of codes from standard terminologies and keywords for natural language processing. The total number and variety of expressions are low, which may be to simplify implementation, or authors may limit complexity due to data availability constraints. CONCLUSIONS: The phenotype definitions analyzed show significant variation in specific logical, arithmetic, and other operators but are all composed of the same high-level components, namely tabular data and logical expressions. A standard representation for phenotype definitions should support these formats and be modular to support localization and shared logic.


Subject(s)
Electronic Health Records , Language , Phenotype , Narration
3.
Genet Med ; 24(5): 1130-1138, 2022 05.
Article in English | MEDLINE | ID: mdl-35216901

ABSTRACT

PURPOSE: The goal of Electronic Medical Records and Genomics (eMERGE) Phase III Network was to return actionable sequence variants to 25,084 consenting participants from 10 different health care institutions across the United States. The purpose of this study was to evaluate system-based issues relating to the return of results (RoR) disclosure process for clinical grade research genomic tests to eMERGE3 participants. METHODS: RoR processes were developed and approved by each eMERGE institution's internal review board. Investigators at each eMERGE3 site were surveyed for RoR processes related to the participant's disclosure of pathogenic or likely pathogenic variants and engagement with genetic counseling. Standard statistical analysis was performed. RESULTS: Of the 25,084 eMERGE participants, 1444 had a pathogenic or likely pathogenic variant identified on the eMERGEseq panel of 67 genes and 14 single nucleotide variants. Of these, 1077 (74.6%) participants had results disclosed, with 562 (38.9%) participants provided with variant-specific genetic counseling. Site-specific processes that either offered or required genetic counseling in their RoR process had an effect on whether a participant ultimately engaged with genetic counseling (P = .0052). CONCLUSION: The real-life experience of the multiarm eMERGE3 RoR study for returning actionable genomic results to consented research participants showed the impact of consent, method of disclosure, and genetic counseling on RoR.


Subject(s)
Genome , Genomics , Disclosure , Genetic Counseling , Humans , Population Groups
4.
Hum Genet ; 141(11): 1739-1748, 2022 Nov.
Article in English | MEDLINE | ID: mdl-35226188

ABSTRACT

Uterine fibroids (UF) are common pelvic tumors in women, heritable, and genome-wide association studies (GWAS) have identified ~ 30 loci associated with increased risk in UF. Using summary statistics from a previously published UF GWAS performed in a non-Hispanic European Ancestry (NHW) female subset from the Electronic Medical Records and Genomics (eMERGE) Network, we constructed a polygenic risk score (PRS) for UF. UF-PRS was developed using PRSice and optimized in the separate clinical population of BioVU. PRS was validated using parallel methods of 10-fold cross-validation logistic regression and phenome-wide association study (PheWAS) in a seperate subset of eMERGE NHW females (validation set), excluding samples used in GWAS. PRSice determined pt < 0.001 and after linkage disequilibrium pruning (r2 < 0.2), 4458 variants were in the PRS which was significant (pseudo-R2 = 0.0018, p = 0.041). 10-fold cross-validation logistic regression modeling of validation set revealed the model had an area under the curve (AUC) value of 0.60 (95% confidence interval [CI] 0.58-0.62) when plotted in a receiver operator curve (ROC). PheWAS identified six phecodes associated with the PRS with the most significant phenotypes being 218 'benign neoplasm of uterus' and 218.1 'uterine leiomyoma' (p = 1.94 × 10-23, OR 1.31 [95% CI 1.26-1.37] and p = 3.50 × 10-23, OR 1.32 [95% CI 1.26-1.37]). We have developed and validated the first PRS for UF. We find our PRS has predictive ability for UF and captures genetic architecture of increased risk for UF that can be used in further studies.


Subject(s)
Genome-Wide Association Study , Leiomyoma , Female , Genetic Predisposition to Disease , Genomics , Humans , Leiomyoma/genetics , Linkage Disequilibrium , Risk Factors
5.
Nat Commun ; 12(1): 168, 2021 01 08.
Article in English | MEDLINE | ID: mdl-33420026

ABSTRACT

Increasingly, clinical phenotypes with matched genetic data from bio-bank linked electronic health records (EHRs) have been used for pleiotropy analyses. Thus far, pleiotropy analysis using individual-level EHR data has been limited to data from one site. However, it is desirable to integrate EHR data from multiple sites to improve the detection power and generalizability of the results. Due to privacy concerns, individual-level patients' data are not easily shared across institutions. As a result, we introduce Sum-Share, a method designed to efficiently integrate EHR and genetic data from multiple sites to perform pleiotropy analysis. Sum-Share requires only summary-level data and one round of communication from each site, yet it produces identical test statistics compared with that of pooled individual-level data. Consequently, Sum-Share can achieve lossless integration of multiple datasets. Using real EHR data from eMERGE, Sum-Share is able to identify 1734 potential pleiotropic SNPs for five cardiovascular diseases.


Subject(s)
Electronic Health Records/statistics & numerical data , Genetic Pleiotropy , Communication , Databases, Factual , Genome-Wide Association Study/statistics & numerical data , Humans , Models, Biological , Phenotype , Polymorphism, Single Nucleotide , Privacy
6.
Circulation ; 142(17): 1633-1646, 2020 10 27.
Article in English | MEDLINE | ID: mdl-32981348

ABSTRACT

BACKGROUND: Abdominal aortic aneurysm (AAA) is an important cause of cardiovascular mortality; however, its genetic determinants remain incompletely defined. In total, 10 previously identified risk loci explain a small fraction of AAA heritability. METHODS: We performed a genome-wide association study in the Million Veteran Program testing ≈18 million DNA sequence variants with AAA (7642 cases and 172 172 controls) in veterans of European ancestry with independent replication in up to 4972 cases and 99 858 controls. We then used mendelian randomization to examine the causal effects of blood pressure on AAA. We examined the association of AAA risk variants with aneurysms in the lower extremity, cerebral, and iliac arterial beds, and derived a genome-wide polygenic risk score (PRS) to identify a subset of the population at greater risk for disease. RESULTS: Through a genome-wide association study, we identified 14 novel loci, bringing the total number of known significant AAA loci to 24. In our mendelian randomization analysis, we demonstrate that a genetic increase of 10 mm Hg in diastolic blood pressure (odds ratio, 1.43 [95% CI, 1.24-1.66]; P=1.6×10-6), as opposed to systolic blood pressure (odds ratio, 1.06 [95% CI, 0.97-1.15]; P=0.2), likely has a causal relationship with AAA development. We observed that 19 of 24 AAA risk variants associate with aneurysms in at least 1 other vascular territory. A 29-variant PRS was strongly associated with AAA (odds ratioPRS, 1.26 [95% CI, 1.18-1.36]; PPRS=2.7×10-11 per SD increase in PRS), independent of family history and smoking risk factors (odds ratioPRS+family history+smoking, 1.24 [95% CI, 1.14-1.35]; PPRS=1.27×10-6). Using this PRS, we identified a subset of the population with AAA prevalence greater than that observed in screening trials informing current guidelines. CONCLUSIONS: We identify novel AAA genetic associations with therapeutic implications and identify a subset of the population at significantly increased genetic risk of AAA independent of family history. Our data suggest that extending current screening guidelines to include testing to identify those with high polygenic AAA risk, once the cost of genotyping becomes comparable with that of screening ultrasound, would significantly increase the yield of current screening at reasonable cost.


Subject(s)
Aortic Aneurysm, Abdominal/genetics , Humans , Veterans
7.
Sci Rep ; 9(1): 6077, 2019 04 15.
Article in English | MEDLINE | ID: mdl-30988330

ABSTRACT

Benign prostatic hyperplasia (BPH) results in a significant public health burden due to the morbidity caused by the disease and many of the available remedies. As much as 70% of men over 70 will develop BPH. Few studies have been conducted to discover the genetic determinants of BPH risk. Understanding the biological basis for this condition may provide necessary insight for development of novel pharmaceutical therapies or risk prediction. We have evaluated SNP-based heritability of BPH in two cohorts and conducted a genome-wide association study (GWAS) of BPH risk using 2,656 cases and 7,763 controls identified from the Electronic Medical Records and Genomics (eMERGE) network. SNP-based heritability estimates suggest that roughly 60% of the phenotypic variation in BPH is accounted for by genetic factors. We used logistic regression to model BPH risk as a function of principal components of ancestry, age, and imputed genotype data, with meta-analysis performed using METAL. The top result was on chromosome 22 in SYN3 at rs2710383 (p-value = 4.6 × 10-7; Odds Ratio = 0.69, 95% confidence interval = 0.55-0.83). Other suggestive signals were near genes GLGC, UNCA13, SORCS1 and between BTBD3 and SPTLC3. We also evaluated genetically-predicted gene expression in prostate tissue. The most significant result was with increasing predicted expression of ETV4 (chr17; p-value = 0.0015). Overexpression of this gene has been associated with poor prognosis in prostate cancer. In conclusion, although there were no genome-wide significant variants identified for BPH susceptibility, we present evidence supporting the heritability of this phenotype, have identified suggestive signals, and evaluated the association between BPH and genetically-predicted gene expression in prostate.


Subject(s)
Genetic Predisposition to Disease , Inheritance Patterns , Prostatic Hyperplasia/genetics , Aged , Aged, 80 and over , Biomarkers/metabolism , Case-Control Studies , Electronic Health Records/statistics & numerical data , Gene Expression Profiling , Genome-Wide Association Study , Genotyping Techniques , Humans , Male , Middle Aged , Polymorphism, Single Nucleotide , Prostate/pathology , Prostatic Hyperplasia/epidemiology , Prostatic Hyperplasia/pathology
8.
Circulation ; 138(22): 2469-2481, 2018 11 27.
Article in English | MEDLINE | ID: mdl-30571344

ABSTRACT

BACKGROUND: Proteomic approaches allow measurement of thousands of proteins in a single specimen, which can accelerate biomarker discovery. However, applying these technologies to massive biobanks is not currently feasible because of the practical barriers and costs of implementing such assays at scale. To overcome these challenges, we used a "virtual proteomic" approach, linking genetically predicted protein levels to clinical diagnoses in >40 000 individuals. METHODS: We used genome-wide association data from the Framingham Heart Study (n=759) to construct genetic predictors for 1129 plasma protein levels. We validated the genetic predictors for 268 proteins and used them to compute predicted protein levels in 41 288 genotyped individuals in the Electronic Medical Records and Genomics (eMERGE) cohort. We tested associations for each predicted protein with 1128 clinical phenotypes. Lead associations were validated with directly measured protein levels and either low-density lipoprotein cholesterol or subclinical atherosclerosis in the MDCS (Malmö Diet and Cancer Study; n=651). RESULTS: In the virtual proteomic analysis in eMERGE, 55 proteins were associated with 89 distinct diagnoses at a false discovery rate q<0.1. Among these, 13 associations involved lipid (n=7) or atherosclerosis (n=6) phenotypes. We tested each association for validation in MDCS using directly measured protein levels. At Bonferroni-adjusted significance thresholds, levels of apolipoprotein E isoforms were associated with hyperlipidemia, and circulating C-type lectin domain family 1 member B and platelet-derived growth factor receptor-ß predicted subclinical atherosclerosis. Odds ratios for carotid atherosclerosis were 1.31 (95% CI, 1.08-1.58; P=0.006) per 1-SD increment in C-type lectin domain family 1 member B and 0.79 (0.66-0.94; P=0.008) per 1-SD increment in platelet-derived growth factor receptor-ß. CONCLUSIONS: We demonstrate a biomarker discovery paradigm to identify candidate biomarkers of cardiovascular and other diseases.


Subject(s)
Biomarkers/blood , Carotid Artery Diseases/diagnosis , Genome-Wide Association Study , Proteome/analysis , Adult , Aged , Aged, 80 and over , Carotid Artery Diseases/genetics , Female , Genotype , Humans , Lectins, C-Type/analysis , Male , Middle Aged , Odds Ratio , Phenotype , Polymorphism, Single Nucleotide , Proteomics , Receptor, Platelet-Derived Growth Factor beta/blood
9.
Nat Commun ; 9(1): 3522, 2018 08 30.
Article in English | MEDLINE | ID: mdl-30166544

ABSTRACT

Defining the full spectrum of human disease associated with a biomarker is necessary to advance the biomarker into clinical practice. We hypothesize that associating biomarker measurements with electronic health record (EHR) populations based on shared genetic architectures would establish the clinical epidemiology of the biomarker. We use Bayesian sparse linear mixed modeling to calculate SNP weightings for 53 biomarkers from the Atherosclerosis Risk in Communities study. We use the SNP weightings to computed predicted biomarker values in an EHR population and test associations with 1139 diagnoses. Here we report 116 associations meeting a Bonferroni level of significance. A false discovery rate (FDR)-based significance threshold reveals more known and undescribed associations across a broad range of biomarkers, including biometric measures, plasma proteins and metabolites, functional assays, and behaviors. We confirm an inverse association between LDL-cholesterol level and septicemia risk in an independent epidemiological cohort. This approach efficiently discovers biomarker-disease associations.


Subject(s)
Biomarkers/analysis , Electronic Health Records , Genome-Wide Association Study/methods , Bayes Theorem , Biomarkers/blood , Cholesterol, LDL/blood , Humans , Prospective Studies , Risk Factors
10.
Nat Commun ; 9(1): 2904, 2018 07 25.
Article in English | MEDLINE | ID: mdl-30046033

ABSTRACT

Electrocardiographic PR interval measures atrio-ventricular depolarization and conduction, and abnormal PR interval is a risk factor for atrial fibrillation and heart block. Our genome-wide association study of over 92,000 European-descent individuals identifies 44 PR interval loci (34 novel). Examination of these loci reveals known and previously not-yet-reported biological processes involved in cardiac atrial electrical activity. Genes in these loci are over-represented in cardiac disease processes including heart block and atrial fibrillation. Variants in over half of the 44 loci were associated with atrial or blood transcript expression levels, or were in high linkage disequilibrium with missense variants. Six additional loci were identified either by meta-analysis of ~105,000 African and European-descent individuals and/or by pleiotropic analyses combining PR interval with heart rate, QRS interval, and atrial fibrillation. These findings implicate developmental pathways, and identify transcription factors, ion-channel genes, and cell-junction/cell-signaling proteins in atrio-ventricular conduction, identifying potential targets for drug development.


Subject(s)
Atrial Function/physiology , Atrioventricular Node/physiology , Electrophysiological Phenomena/genetics , Genome-Wide Association Study , Electrocardiography , Female , Humans , Linkage Disequilibrium/genetics , Male , Mutation, Missense/genetics , Risk Factors
11.
Am J Cardiol ; 121(12): 1552-1557, 2018 06 15.
Article in English | MEDLINE | ID: mdl-29627106

ABSTRACT

Diastolic dysfunction (DD), an abnormality in cardiac left ventricular (LV) chamber compliance, is associated with increased morbidity and mortality. Although DD has been extensively studied in older populations, co-morbidity patterns are less well characterized in middle-aged subjects. We screened 156,434 subjects with transthoracic echocardiogram reports available through Vanderbilt's electronic heath record and identified 6,612 subjects 40 to 55 years old with an LV ejection fraction ≥50% and diastolic function staging. We tested 452 incident and prevalent clinical diagnoses for associations with early-stage DD (n = 1,676) versus normal function. There were 44 co-morbid diagnoses associated with grade 1 DD including hypertension (odds ratio [OR] = 2.02, 95% confidence interval [CI] 1.78 to 2.28, p <5.3 × 10-29), type 2 diabetes (OR 1.96, 95% CI 1.68 to 2.29, p = 2.1 × 10-17), tachycardia (OR 1.38, 95% CI 0.53 to 2.19, p = 2.9 × 10-6), obesity (OR 1.76, 95% CI 1.51 to 2.06, p = 1.7 × 10-12), and clinical end points, including end-stage renal disease (OR 3.29, 95% CI 2.19 to 4.96, p = 1.2 × 10-8) and stroke (OR 1.5, 95% CI 1.12 to 2.02, p = 6.9 × 10-3). Among the 60 incident diagnoses associated with DD, heart failure with preserved ejection fraction (OR 4.63, 95% CI 3.39 to 6.32, p = 6.3 × 10-22) had the most significant association. Among subjects with normal diastolic function and blood pressure at baseline, a blood pressure measurement in the hypertensive range at the time of the second echocardiogram was associated with progression to stage 1 DD (p = 0.04). In conclusion, DD was common among subjects 40 to 55 years old and was associated with a heavy burden of co-morbid disease.


Subject(s)
Heart Failure, Diastolic/diagnostic imaging , Heart Failure, Diastolic/epidemiology , Stroke Volume/physiology , Ventricular Dysfunction, Left/diagnostic imaging , Ventricular Dysfunction, Left/epidemiology , Adult , Age Distribution , Cohort Studies , Databases, Factual , Diabetes Mellitus, Type 2/diagnosis , Diabetes Mellitus, Type 2/epidemiology , Echocardiography/methods , Female , Heart Failure, Diastolic/physiopathology , Humans , Hypertension/diagnosis , Hypertension/epidemiology , Incidence , Kaplan-Meier Estimate , Linear Models , Male , Middle Aged , Multivariate Analysis , Prognosis , Retrospective Studies , Risk Assessment , Severity of Illness Index , Sex Distribution , Survival Analysis , United States/epidemiology , Ventricular Dysfunction, Left/physiopathology
12.
J Clin Endocrinol Metab ; 103(6): 2234-2243, 2018 06 01.
Article in English | MEDLINE | ID: mdl-29659871

ABSTRACT

Context: Mutations in alkaline phosphatase (AlkP), liver/bone/kidney (ALPL), which encodes tissue-nonspecific isozyme AlkP, cause hypophosphatasia (HPP). HPP is suspected by a low-serum AlkP. We hypothesized that some patients with bone or dental disease have undiagnosed HPP, caused by ALPL variants. Objective: Our objective was to discover the prevalence of these gene variants in the Vanderbilt University DNA Biobank (BioVU) and to assess phenotypic associations. Design: We identified subjects in BioVU, a repository of DNA, that had at least one of three known, rare HPP disease-causing variants in ALPL: rs199669988, rs121918007, and/or rs121918002. To evaluate for phenotypic associations, we conducted a sequential phenome-wide association study of ALPL variants and then performed a de-identified manual record review to refine the phenotype. Results: Out of 25,822 genotyped individuals, we identified 52 women and 53 men with HPP disease-causing variants in ALPL, 7/1000. None had a clinical diagnosis of HPP. For patients with ALPL variants, the average serum AlkP levels were in the lower range of normal or lower. Forty percent of men and 62% of women had documented bone and/or dental disease, compatible with the diagnosis of HPP. Forty percent of the female patients had ovarian pathology or other gynecological abnormalities compared with 15% seen in controls. Conclusions: Variants in the ALPL gene cause bone and dental disease in patients with and without the standard biomarker, low plasma AlkP. ALPL gene variants are more prevalent than currently reported and underdiagnosed. Gynecologic disease appears to be associated with HPP-causing variants in ALPL.


Subject(s)
Alkaline Phosphatase/genetics , Hypophosphatasia/genetics , Ovarian Diseases/genetics , Polymorphism, Single Nucleotide , Uterine Diseases/genetics , Adult , Aged , Aged, 80 and over , Alleles , DNA Mutational Analysis , Female , Gene Frequency , Genetic Predisposition to Disease , Genotype , Humans , Male , Middle Aged , Mutation , Phenotype
13.
J Am Med Inform Assoc ; 25(3): 275-288, 2018 Mar 01.
Article in English | MEDLINE | ID: mdl-29036387

ABSTRACT

OBJECTIVE: Birth month and climate impact lifetime disease risk, while the underlying exposures remain largely elusive. We seek to uncover distal risk factors underlying these relationships by probing the relationship between global exposure variance and disease risk variance by birth season. MATERIAL AND METHODS: This study utilizes electronic health record data from 6 sites representing 10.5 million individuals in 3 countries (United States, South Korea, and Taiwan). We obtained birth month-disease risk curves from each site in a case-control manner. Next, we correlated each birth month-disease risk curve with each exposure. A meta-analysis was then performed of correlations across sites. This allowed us to identify the most significant birth month-exposure relationships supported by all 6 sites while adjusting for multiplicity. We also successfully distinguish relative age effects (a cultural effect) from environmental exposures. RESULTS: Attention deficit hyperactivity disorder was the only identified relative age association. Our methods identified several culprit exposures that correspond well with the literature in the field. These include a link between first-trimester exposure to carbon monoxide and increased risk of depressive disorder (R = 0.725, confidence interval [95% CI], 0.529-0.847), first-trimester exposure to fine air particulates and increased risk of atrial fibrillation (R = 0.564, 95% CI, 0.363-0.715), and decreased exposure to sunlight during the third trimester and increased risk of type 2 diabetes mellitus (R = -0.816, 95% CI, -0.5767, -0.929). CONCLUSION: A global study of birth month-disease relationships reveals distal risk factors involved in causal biological pathways that underlie them.

14.
AMIA Jt Summits Transl Sci Proc ; 2017: 185-192, 2017.
Article in English | MEDLINE | ID: mdl-28815128

ABSTRACT

Electronic health records (EHRs) linked with biobanks have been recognized as valuable data sources for pharmacogenomic studies, which require identification of patients with certain adverse drug reactions (ADRs) from a large population. Since manual chart review is costly and time-consuming, automatic methods to accurately identify patients with ADRs have been called for. In this study, we developed and compared different informatics approaches to identify ADRs from EHRs, using clopidogrel-induced bleeding as our case study. Three different types of methods were investigated: 1) rule-based methods; 2) machine learning-based methods; and 3) scoring function-based methods. Our results show that both machine learning and scoring methods are effective and the scoring method can achieve a high precision with a reasonable recall. We also analyzed the contributions of different types of features and found that the temporality information between clopidogrel and bleeding events, as well as textual evidence from physicians' assertion of the adverse events are helpful. We believe that our findings are valuable in advancing EHR-based pharmacogenomic studies.

15.
Circ Cardiovasc Genet ; 10(2)2017 Apr.
Article in English | MEDLINE | ID: mdl-28416512

ABSTRACT

BACKGROUND: One potential use for the PR interval is as a biomarker of disease risk. We hypothesized that quantifying the shared genetic architectures of the PR interval and a set of clinical phenotypes would identify genetic mechanisms contributing to PR variability and identify diseases associated with a genetic predictor of PR variability. METHODS AND RESULTS: We used ECG measurements from the ARIC study (Atherosclerosis Risk in Communities; n=6731 subjects) and 63 genetically modulated diseases from the eMERGE network (Electronic Medical Records and Genomics; n=12 978). We measured pairwise genetic correlations (rG) between PR phenotypes (PR interval, PR segment, P-wave duration) and each of the 63 phenotypes. The PR segment was genetically correlated with atrial fibrillation (rG=-0.88; P=0.0009). An analysis of metabolic phenotypes in ARIC also showed that the P wave was genetically correlated with waist circumference (rG=0.47; P=0.02). A genetically predicted PR interval phenotype based on 645 714 single-nucleotide polymorphisms was associated with atrial fibrillation (odds ratio=0.89 per SD change; 95% confidence interval, 0.83-0.95; P=0.0006). The differing pattern of associations among the PR phenotypes is consistent with analyses that show that the genetic correlation between the P wave and PR segment was not significantly different from 0 (rG=-0.03 [0.16]). CONCLUSIONS: The genetic architecture of the PR interval comprises modulators of atrial fibrillation risk and obesity.


Subject(s)
Atrial Fibrillation/physiopathology , Electrocardiography , Adolescent , Adult , Aged , Atrial Fibrillation/diagnostic imaging , Atrial Fibrillation/genetics , Body Mass Index , Case-Control Studies , Female , Genotype , Humans , Male , Metabolic Syndrome/complications , Middle Aged , Odds Ratio , Phenotype , Polymorphism, Single Nucleotide , Risk Factors , Waist Circumference , Young Adult
16.
J Am Coll Cardiol ; 69(7): 823-836, 2017 Feb 21.
Article in English | MEDLINE | ID: mdl-28209224

ABSTRACT

BACKGROUND: Genome-wide association studies have so far identified 56 loci associated with risk of coronary artery disease (CAD). Many CAD loci show pleiotropy; that is, they are also associated with other diseases or traits. OBJECTIVES: This study sought to systematically test if genetic variants identified for non-CAD diseases/traits also associate with CAD and to undertake a comprehensive analysis of the extent of pleiotropy of all CAD loci. METHODS: In discovery analyses involving 42,335 CAD cases and 78,240 control subjects we tested the association of 29,383 common (minor allele frequency >5%) single nucleotide polymorphisms available on the exome array, which included a substantial proportion of known or suspected single nucleotide polymorphisms associated with common diseases or traits as of 2011. Suggestive association signals were replicated in an additional 30,533 cases and 42,530 control subjects. To evaluate pleiotropy, we tested CAD loci for association with cardiovascular risk factors (lipid traits, blood pressure phenotypes, body mass index, diabetes, and smoking behavior), as well as with other diseases/traits through interrogation of currently available genome-wide association study catalogs. RESULTS: We identified 6 new loci associated with CAD at genome-wide significance: on 2q37 (KCNJ13-GIGYF2), 6p21 (C2), 11p15 (MRVI1-CTR9), 12q13 (LRP1), 12q24 (SCARB1), and 16q13 (CETP). Risk allele frequencies ranged from 0.15 to 0.86, and odds ratio per copy of the risk allele ranged from 1.04 to 1.09. Of 62 new and known CAD loci, 24 (38.7%) showed statistical association with a traditional cardiovascular risk factor, with some showing multiple associations, and 29 (47%) showed associations at p < 1 × 10-4 with a range of other diseases/traits. CONCLUSIONS: We identified 6 loci associated with CAD at genome-wide significance. Several CAD loci show substantial pleiotropy, which may help us understand the mechanisms by which these loci affect CAD risk.


Subject(s)
Coronary Artery Disease/genetics , Genetic Loci , Genetic Pleiotropy , Case-Control Studies , Coronary Artery Disease/epidemiology , Female , Gene Frequency , Genome-Wide Association Study , Humans , Male , Odds Ratio , Polymorphism, Single Nucleotide
17.
Pac Symp Biocomput ; 22: 348-355, 2017.
Article in English | MEDLINE | ID: mdl-27896988

ABSTRACT

The major goal of precision medicine is to improve human health. A feature that unites much research in the field is the use of large datasets such as genomic data and electronic health records. Research in this field includes examination of variation in the core bases of DNA and their methylation status, through variations in metabolic and signaling molecules, all the way up to broader systems level changes in physiology and disease presentation. Intermediate goals include understanding the individual drivers of disease that differentiate the cause of disease in each individual. To match this development of approaches to physical and activitybased measurements, computational approaches to using these new streams of data to better understand improve human health are being rapidly developed by the thriving biomedical informatics research community. This session of the 2017 Pacific Symposium of Biocomputing presents some of the latest advances in the capture, analysis and use of diverse biomedical data in precision medicine.


Subject(s)
Precision Medicine/statistics & numerical data , Computational Biology , Databases, Factual , Electronic Health Records , Genomics , Humans
18.
Circ Cardiovasc Genet ; 9(6): 521-530, 2016 Dec.
Article in English | MEDLINE | ID: mdl-27780847

ABSTRACT

BACKGROUND: Continued reductions in morbidity and mortality attributable to ischemic heart disease (IHD) require an understanding of the changing epidemiology of this disease. We hypothesized that we could use genetic correlations, which quantify the shared genetic architectures of phenotype pairs and extant risk factors from a historical prospective study to define the risk profile of a contemporary IHD phenotype. METHODS AND RESULTS: We used 37 phenotypes measured in the ARIC study (Atherosclerosis Risk in Communities; n=7716, European ancestry subjects) and clinical diagnoses from an electronic health record (EHR) data set (n=19 093). All subjects had genome-wide single-nucleotide polymorphism genotyping. We measured pairwise genetic correlations (rG) between the ARIC and EHR phenotypes using linear mixed models. The genetic correlation estimates between the ARIC risk factors and the EHR IHD were modestly linearly correlated with hazards ratio estimates for incident IHD in ARIC (Pearson correlation [r]=0.62), indicating that the 2 IHD phenotypes had differing risk profiles. For comparison, this correlation was 0.80 when comparing EHR and ARIC type 2 diabetes mellitus phenotypes. The EHR IHD phenotype was most strongly correlated with ARIC metabolic phenotypes, including total:high-density lipoprotein cholesterol ratio (rG=-0.44, P=0.005), high-density lipoprotein (rG=-0.48, P=0.005), systolic blood pressure (rG=0.44, P=0.02), and triglycerides (rG=0.38, P=0.02). EHR phenotypes related to type 2 diabetes mellitus, atherosclerotic, and hypertensive diseases were also genetically correlated with these ARIC risk factors. CONCLUSIONS: The EHR IHD risk profile differed from ARIC and indicates that treatment and prevention efforts in this population should target hypertensive and metabolic disease.


Subject(s)
Myocardial Ischemia/genetics , Polymorphism, Single Nucleotide , Aged , Aged, 80 and over , Atherosclerosis/epidemiology , Atherosclerosis/genetics , Blood Pressure , Case-Control Studies , Chi-Square Distribution , Cross-Sectional Studies , Diabetes Mellitus, Type 2/epidemiology , Diabetes Mellitus, Type 2/genetics , Electronic Health Records , Female , Genetic Markers , Genetic Predisposition to Disease , Genome-Wide Association Study , Humans , Hypertension/epidemiology , Hypertension/genetics , Incidence , Linear Models , Lipids/blood , Male , Middle Aged , Molecular Epidemiology , Myocardial Ischemia/diagnosis , Myocardial Ischemia/epidemiology , Phenotype , Prevalence , Prognosis , Proportional Hazards Models , Risk Assessment , Risk Factors , Time Factors , United States/epidemiology
19.
Appl Clin Inform ; 7(3): 693-706, 2016 07 20.
Article in English | MEDLINE | ID: mdl-27452794

ABSTRACT

OBJECTIVE: The objective of this study is to develop an algorithm to accurately identify children with severe early onset childhood obesity (ages 1-5.99 years) using structured and unstructured data from the electronic health record (EHR). INTRODUCTION: Childhood obesity increases risk factors for cardiovascular morbidity and vascular disease. Accurate definition of a high precision phenotype through a standardize tool is critical to the success of large-scale genomic studies and validating rare monogenic variants causing severe early onset obesity. DATA AND METHODS: Rule based and machine learning based algorithms were developed using structured and unstructured data from two EHR databases from Boston Children's Hospital (BCH) and Cincinnati Children's Hospital and Medical Center (CCHMC). Exclusion criteria including medications or comorbid diagnoses were defined. Machine learning algorithms were developed using cross-site training and testing in addition to experimenting with natural language processing features. RESULTS: Precision was emphasized for a high fidelity cohort. The rule-based algorithm performed the best overall, 0.895 (CCHMC) and 0.770 (BCH). The best feature set for machine learning employed Unified Medical Language System (UMLS) concept unique identifiers (CUIs), ICD-9 codes, and RxNorm codes. CONCLUSIONS: Detecting severe early childhood obesity is essential for the intervention potential in children at the highest long-term risk of developing comorbidities related to obesity and excluding patients with underlying pathological and non-syndromic causes of obesity assists in developing a high-precision cohort for genetic study. Further such phenotyping efforts inform future practical application in health care environments utilizing clinical decision support.


Subject(s)
Machine Learning , Pediatric Obesity/diagnosis , Tertiary Healthcare , Child , Child, Preschool , Comorbidity , Early Diagnosis , Female , Humans , Infant , Male , Pediatric Obesity/epidemiology
20.
N Engl J Med ; 374(12): 1134-44, 2016 03 24.
Article in English | MEDLINE | ID: mdl-26934567

ABSTRACT

BACKGROUND: The discovery of low-frequency coding variants affecting the risk of coronary artery disease has facilitated the identification of therapeutic targets. METHODS: Through DNA genotyping, we tested 54,003 coding-sequence variants covering 13,715 human genes in up to 72,868 patients with coronary artery disease and 120,770 controls who did not have coronary artery disease. Through DNA sequencing, we studied the effects of loss-of-function mutations in selected genes. RESULTS: We confirmed previously observed significant associations between coronary artery disease and low-frequency missense variants in the genes LPA and PCSK9. We also found significant associations between coronary artery disease and low-frequency missense variants in the genes SVEP1 (p.D2702G; minor-allele frequency, 3.60%; odds ratio for disease, 1.14; P=4.2×10(-10)) and ANGPTL4 (p.E40K; minor-allele frequency, 2.01%; odds ratio, 0.86; P=4.0×10(-8)), which encodes angiopoietin-like 4. Through sequencing of ANGPTL4, we identified 9 carriers of loss-of-function mutations among 6924 patients with myocardial infarction, as compared with 19 carriers among 6834 controls (odds ratio, 0.47; P=0.04); carriers of ANGPTL4 loss-of-function alleles had triglyceride levels that were 35% lower than the levels among persons who did not carry a loss-of-function allele (P=0.003). ANGPTL4 inhibits lipoprotein lipase; we therefore searched for mutations in LPL and identified a loss-of-function variant that was associated with an increased risk of coronary artery disease (p.D36N; minor-allele frequency, 1.9%; odds ratio, 1.13; P=2.0×10(-4)) and a gain-of-function variant that was associated with protection from coronary artery disease (p.S447*; minor-allele frequency, 9.9%; odds ratio, 0.94; P=2.5×10(-7)). CONCLUSIONS: We found that carriers of loss-of-function mutations in ANGPTL4 had triglyceride levels that were lower than those among noncarriers; these mutations were also associated with protection from coronary artery disease. (Funded by the National Institutes of Health and others.).


Subject(s)
Angiopoietins/genetics , Cell Adhesion Molecules/genetics , Coronary Artery Disease/genetics , Lipoprotein Lipase/genetics , Mutation , Triglycerides/blood , Aged , Angiopoietin-Like Protein 4 , Female , Genotyping Techniques , Humans , Lipoprotein Lipase/antagonists & inhibitors , Lipoprotein Lipase/metabolism , Male , Middle Aged , Mutation, Missense , Risk Factors , Sequence Analysis, DNA , Triglycerides/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...