Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 33
Filter
1.
Article in English | MEDLINE | ID: mdl-38723657

ABSTRACT

The progress of precision medicine research hinges on the gathering and analysis of extensive and diverse clinical datasets. With the continued expansion of modalities, scales, and sources of clinical datasets, it becomes imperative to devise methods for aggregating information from these varied sources to achieve a comprehensive understanding of diseases. In this review, we describe two important approaches for the analysis of diverse clinical datasets, namely the centralized model and federated model. We compare and contrast the strengths and weaknesses inherent in each model and present recent progress in methodologies and their associated challenges. Finally, we present an outlook on the opportunities that both models hold for the future analysis of clinical data.

2.
J Am Med Inform Assoc ; 31(6): 1303-1312, 2024 May 20.
Article in English | MEDLINE | ID: mdl-38713006

ABSTRACT

OBJECTIVES: Racial disparities in kidney transplant access and posttransplant outcomes exist between non-Hispanic Black (NHB) and non-Hispanic White (NHW) patients in the United States, with the site of care being a key contributor. Using multi-site data to examine the effect of site of care on racial disparities, the key challenge is the dilemma in sharing patient-level data due to regulations for protecting patients' privacy. MATERIALS AND METHODS: We developed a federated learning framework, named dGEM-disparity (decentralized algorithm for Generalized linear mixed Effect Model for disparity quantification). Consisting of 2 modules, dGEM-disparity first provides accurately estimated common effects and calibrated hospital-specific effects by requiring only aggregated data from each center and then adopts a counterfactual modeling approach to assess whether the graft failure rates differ if NHB patients had been admitted at transplant centers in the same distribution as NHW patients were admitted. RESULTS: Utilizing United States Renal Data System data from 39 043 adult patients across 73 transplant centers over 10 years, we found that if NHB patients had followed the distribution of NHW patients in admissions, there would be 38 fewer deaths or graft failures per 10 000 NHB patients (95% CI, 35-40) within 1 year of receiving a kidney transplant on average. DISCUSSION: The proposed framework facilitates efficient collaborations in clinical research networks. Additionally, the framework, by using counterfactual modeling to calculate the event rate, allows us to investigate contributions to racial disparities that may occur at the level of site of care. CONCLUSIONS: Our framework is broadly applicable to other decentralized datasets and disparities research related to differential access to care. Ultimately, our proposed framework will advance equity in human health by identifying and addressing hospital-level racial disparities.


Subject(s)
Algorithms , Black or African American , Healthcare Disparities , Kidney Transplantation , White People , Humans , United States , Healthcare Disparities/ethnology , Adult , Male , Female , Graft Rejection/ethnology , Middle Aged
3.
Res Sq ; 2024 Mar 06.
Article in English | MEDLINE | ID: mdl-38496631

ABSTRACT

Background: Preeclampsia (PE) is a severe pregnancy complication characterized by hypertension and end-organ damage such as proteinuria. PE poses a significant threat to women's long-term health, including an increased risk of cardiovascular and renal diseases. Most previous studies have been hypothesis-based, potentially overlooking certain significant complications. This study conducts a comprehensive, non-hypothesis-based analysis of PE-complicated diagnoses after pregnancies using multiple large-scale electronic health records (EHR) datasets. Method: From the University of Michigan (UM) Healthcare System, we collected 4,348 PE patients for the cases and 27,377 patients with pregnancies not complicated by PE or related conditions for the controls. We first conducted a non-hypothesis-based analysis to identify any long-term adverse health conditions associated with PE using logistic regression with adjustments to demographics, social history, and medical history. We confirmed the identified complications with UK Biobank data which contain 443 PE cases and 14,870 non-PE controls. We then conducted a survival analysis on complications that exhibited significance in more than 5 consecutive years post-PE. We further examined the potential racial disparities of identified complications between Caucasian and African American patients. Findings: Uncomplicated hypertension, complicated diabetes, congestive heart failure, renal failure, and obesity exhibited significantly increased risks whereas hypothyroidism showed decreased risks, in 5 consecutive years after PE in the UM discovery data. UK Biobank data confirmed the increased risks of uncomplicated hypertension, complicated diabetes, congestive heart failure, renal failure, and obesity. Further survival analysis using UM data indicated significantly increased risks in uncomplicated hypertension, complicated diabetes, congestive heart failure, renal failure, and obesity, and significantly decreased risks in hypothyroidism. There exist racial differences in the risks of developing hypertension and hypothyroidism after PE. PE protects against hypothyroidism in African American postpartum women but not Cacausians; it also increases the risks of uncomplicated hypertension but less severely in African American postpartum women as compared to Cacausians. Interpretation: This study addresses the lack of a comprehensive examination of PE's long-term effects utilizing large-scale EHR and advanced statistical methods. Our findings underscore the need for long-term monitoring and interventions for women with a history of PE, emphasizing the importance of personalized postpartum care. Notably, the racial disparities observed in the impact of PE on hypertension and hypothyroidism highlight the necessity of tailored aftercare based on race.

4.
medRxiv ; 2024 Feb 15.
Article in English | MEDLINE | ID: mdl-38405849

ABSTRACT

Background: Preeclampsia (PE) is a severe pregnancy complication characterized by hypertension and end-organ damage such as proteinuria. PE poses a significant threat to women's long-term health, including an increased risk of cardiovascular and renal diseases. Most previous studies have been hypothesis-based, potentially overlooking certain significant complications. This study conducts a comprehensive, non-hypothesis-based analysis of PE-complicated diagnoses after pregnancies using multiple large-scale electronic health records (EHR) datasets. Method: From the University of Michigan (UM) Healthcare System, we collected 4,348 PE patients for the cases and 27,377 patients with pregnancies not complicated by PE or related conditions for the controls. We first conducted a non-hypothesis-based analysis to identify any long-term adverse health conditions associated with PE using logistic regression with adjustments to demographics, social history, and medical history. We confirmed the identified complications with UK Biobank data which contain 443 PE cases and 14,870 non-PE controls. We then conducted a survival analysis on complications that exhibited significance in more than 5 consecutive years post-PE. We further examined the potential racial disparities of identified complications between Caucasian and African American patients. Findings: Uncomplicated hypertension, complicated diabetes, congestive heart failure, renal failure, and obesity exhibited significantly increased risks whereas hypothyroidism showed decreased risks, in 5 consecutive years after PE in the UM discovery data. UK Biobank data confirmed the increased risks of uncomplicated hypertension, complicated diabetes, congestive heart failure, renal failure, and obesity. Further survival analysis using UM data indicated significantly increased risks in uncomplicated hypertension, complicated diabetes, congestive heart failure, renal failure, and obesity, and significantly decreased risks in hypothyroidism. There exist racial differences in the risks of developing hypertension and hypothyroidism after PE. PE protects against hypothyroidism in African American postpartum women but not Cacausians; it also increases the risks of uncomplicated hypertension but less severely in African American postpartum women as compared to Cacausians. Interpretation: This study addresses the lack of a comprehensive examination of PE's long-term effects utilizing large-scale EHR and advanced statistical methods. Our findings underscore the need for long-term monitoring and interventions for women with a history of PE, emphasizing the importance of personalized postpartum care. Notably, the racial disparities observed in the impact of PE on hypertension and hypothyroidism highlight the necessity of tailored aftercare based on race.

5.
medRxiv ; 2024 Jan 10.
Article in English | MEDLINE | ID: mdl-38260403

ABSTRACT

Genome-wide association studies (GWAS) have been instrumental in identifying genetic associations for various diseases and traits. However, uncovering genetic underpinnings among traits beyond univariate phenotype associations remains a challenge. Multi-phenotype associations (MPA), or genetic pleiotropy, offer important insights into shared genes and pathways among traits, enhancing our understanding of genetic architectures of complex diseases. GWAS of biobank-linked electronic health record (EHR) data are increasingly being utilized to identify MPA among various traits and diseases. However, methodologies that can efficiently take advantage of distributed EHR to detect MPA are still lacking. Here, we introduce mixWAS, a novel algorithm that efficiently and losslessly integrates multiple EHRs via summary statistics, allowing the detection of MPA among mixed phenotypes while accounting for heterogeneities across EHRs. Simulations demonstrate that mixWAS outperforms the widely used MPA detection method, Phenome-wide association study (PheWAS), across diverse scenarios. Applying mixWAS to data from seven EHRs in the US, we identified 4,534 MPA among blood lipids, BMI, and circulatory diseases. Validation in an independent EHR data from UK confirmed 97.7% of the associations. mixWAS fundamentally improves the detection of MPA and is available as a free, open-source software.

6.
J Am Med Inform Assoc ; 31(4): 809-819, 2024 Apr 03.
Article in English | MEDLINE | ID: mdl-38065694

ABSTRACT

OBJECTIVES: COVID-19, since its emergence in December 2019, has globally impacted research. Over 360 000 COVID-19-related manuscripts have been published on PubMed and preprint servers like medRxiv and bioRxiv, with preprints comprising about 15% of all manuscripts. Yet, the role and impact of preprints on COVID-19 research and evidence synthesis remain uncertain. MATERIALS AND METHODS: We propose a novel data-driven method for assigning weights to individual preprints in systematic reviews and meta-analyses. This weight termed the "confidence score" is obtained using the survival cure model, also known as the survival mixture model, which takes into account the time elapsed between posting and publication of a preprint, as well as metadata such as the number of first 2-week citations, sample size, and study type. RESULTS: Using 146 preprints on COVID-19 therapeutics posted from the beginning of the pandemic through April 30, 2021, we validated the confidence scores, showing an area under the curve of 0.95 (95% CI, 0.92-0.98). Through a use case on the effectiveness of hydroxychloroquine, we demonstrated how these scores can be incorporated practically into meta-analyses to properly weigh preprints. DISCUSSION: It is important to note that our method does not aim to replace existing measures of study quality but rather serves as a supplementary measure that overcomes some limitations of current approaches. CONCLUSION: Our proposed confidence score has the potential to improve systematic reviews of evidence related to COVID-19 and other clinical conditions by providing a data-driven approach to including unpublished manuscripts.


Subject(s)
COVID-19 , Humans , Systematic Reviews as Topic , Research Design , PubMed , Pandemics
7.
Pac Symp Biocomput ; 29: 650-653, 2024.
Article in English | MEDLINE | ID: mdl-38160314

ABSTRACT

The following sections are included:Introduction to the workshopWorkshop Presenters.

9.
Sci Rep ; 13(1): 19078, 2023 11 04.
Article in English | MEDLINE | ID: mdl-37925516

ABSTRACT

In response to the escalating global obesity crisis and its associated health and financial burdens, this paper presents a novel methodology for analyzing longitudinal weight loss data and assessing the effectiveness of financial incentives. Drawing from the Keep It Off trial-a three-arm randomized controlled study with 189 participants-we examined the potential impact of financial incentives on weight loss maintenance. Given that some participants choose not to weigh themselves because of small weight change or weight gains, which is a common phenomenon in many weight-loss studies, traditional methods, for example, the Generalized Estimating Equations (GEE) method tends to overestimate the effect size due to the assumption that data are missing completely at random. To address this challenge, we proposed a framework which can identify evidence of missing not at random and conduct bias correction using the estimating equation derived from pairwise composite likelihood. By analyzing the Keep It Off data, we found that the data in this trial are most likely characterized by non-random missingness. Notably, we also found that the enrollment time (i.e., duration time) would be positively associated with the weight loss maintenance after adjusting for the baseline participant characteristics (e.g., age, sex). Moreover, the lottery-based intervention was found to be more effective in weight loss maintenance compared with the direct payment intervention, though the difference was non-statistically significant. This framework's significance extends beyond weight loss research, offering a semi-parametric approach to assess missing data mechanisms and robustly explore associations between exposures (e.g., financial incentives) and key outcomes (e.g., weight loss maintenance). In essence, the proposed methodology provides a powerful toolkit for analyzing real-world longitudinal data, particularly in scenarios with data missing not at random, enriching comprehension of intricate dataset dynamics.


Subject(s)
Research Design , Weight Loss , Humans , Bias , Longitudinal Studies , Self Report , Randomized Controlled Trials as Topic
10.
BioData Min ; 16(1): 20, 2023 Jul 13.
Article in English | MEDLINE | ID: mdl-37443040

ABSTRACT

The introduction of large language models (LLMs) that allow iterative "chat" in late 2022 is a paradigm shift that enables generation of text often indistinguishable from that written by humans. LLM-based chatbots have immense potential to improve academic work efficiency, but the ethical implications of their fair use and inherent bias must be considered. In this editorial, we discuss this technology from the academic's perspective with regard to its limitations and utility for academic writing, education, and programming. We end with our stance with regard to using LLMs and chatbots in academia, which is summarized as (1) we must find ways to effectively use them, (2) their use does not constitute plagiarism (although they may produce plagiarized text), (3) we must quantify their bias, (4) users must be cautious of their poor accuracy, and (5) the future is bright for their application to research and as an academic tool.

11.
BioData Min ; 16(1): 14, 2023 Apr 10.
Article in English | MEDLINE | ID: mdl-37038201

ABSTRACT

BACKGROUND: Quantitative Trait Locus (QTL) analysis and Genome-Wide Association Studies (GWAS) have the power to identify variants that capture significant levels of phenotypic variance in complex traits. However, effort and time are required to select the best methods and optimize parameters and pre-processing steps. Although machine learning approaches have been shown to greatly assist in optimization and data processing, applying them to QTL analysis and GWAS is challenging due to the complexity of large, heterogenous datasets. Here, we describe proof-of-concept for an automated machine learning approach, AutoQTL, with the ability to automate many complicated decisions related to analysis of complex traits and generate solutions to describe relationships that exist in genetic data. RESULTS: Using a publicly available dataset of 18 putative QTL from a large-scale GWAS of body mass index in the laboratory rat, Rattus norvegicus, AutoQTL captures the phenotypic variance explained under a standard additive model. AutoQTL also detects evidence of non-additive effects including deviations from additivity and 2-way epistatic interactions in simulated data via multiple optimal solutions. Additionally, feature importance metrics provide different insights into the inheritance models and predictive power of multiple GWAS-derived putative QTL. CONCLUSIONS: This proof-of-concept illustrates that automated machine learning techniques can complement standard approaches and have the potential to detect both additive and non-additive effects via various optimal solutions and feature importance metrics. In the future, we aim to expand AutoQTL to accommodate omics-level datasets with intelligent feature selection and feature engineering strategies.

12.
bioRxiv ; 2023 Jan 13.
Article in English | MEDLINE | ID: mdl-36711526

ABSTRACT

Background: Quantitative Trait Locus (QTL) analysis and Genome-Wide Association Studies (GWAS) have the power to identify variants that capture significant levels of phenotypic variance in complex traits. However, effort and time are required to select the best methods and optimize parameters and pre-processing steps. Although machine learning approaches have been shown to greatly assist in optimization and data processing, applying them to QTL analysis and GWAS is challenging due to the complexity of large, heterogenous datasets. Here, we describe proof-of-concept for an automated machine learning approach, AutoQTL, with the ability to automate many complex decisions related to analysis of complex traits and generate diverse solutions to describe relationships that exist in genetic data. Results: Using a dataset of 18 putative QTL from a large-scale GWAS of body mass index in the laboratory rat, Rattus norvegicus , AutoQTL captures the phenotypic variance explained under a standard additive model while also providing evidence of non-additive effects including deviations from additivity and 2-way epistatic interactions from simulated data via multiple optimal solutions. Additionally, feature importance metrics provide different insights into the inheritance models and predictive power of multiple GWAS-derived putative QTL. Conclusions: This proof-of-concept illustrates that automated machine learning techniques can be applied to genetic data and has the potential to detect both additive and non-additive effects via various optimal solutions and feature importance metrics. In the future, we aim to expand AutoQTL to accommodate omics-level datasets with intelligent feature selection strategies.

13.
Pac Symp Biocomput ; 28: 546-548, 2023.
Article in English | MEDLINE | ID: mdl-36541009

ABSTRACT

The primary efforts of disease and epidemiological research can be divided into two areas: identifying the causal mechanisms and utilizing important variables for risk prediction. The latter is generally perceived as a more obtainable goal due to the vast number of readily available tools and the faster pace of obtaining results. However, the lower barrier of entry in risk prediction means that it is easy to make predictions, yet it is incredibility more difficult to make sound predictions. As an ever-growing amount of data is being generated, developing risk prediction models and turning them into clinically actionable findings is crucial as the next step. However, there are still sizable gaps before risk prediction models can be implemented clinically. While clinicians are eager to embrace new ways to improve patients' care, they are overwhelmed by a plethora of prediction methods. Thus, the next generation of prediction models will need to shift from making simple predictions towards interpretable, equitable, explainable and ultimately, casual predictions.


Subject(s)
Computational Biology , Humans , Risk Assessment , Patient Care , Forecasting
14.
Nat Commun ; 12(1): 168, 2021 01 08.
Article in English | MEDLINE | ID: mdl-33420026

ABSTRACT

Increasingly, clinical phenotypes with matched genetic data from bio-bank linked electronic health records (EHRs) have been used for pleiotropy analyses. Thus far, pleiotropy analysis using individual-level EHR data has been limited to data from one site. However, it is desirable to integrate EHR data from multiple sites to improve the detection power and generalizability of the results. Due to privacy concerns, individual-level patients' data are not easily shared across institutions. As a result, we introduce Sum-Share, a method designed to efficiently integrate EHR and genetic data from multiple sites to perform pleiotropy analysis. Sum-Share requires only summary-level data and one round of communication from each site, yet it produces identical test statistics compared with that of pooled individual-level data. Consequently, Sum-Share can achieve lossless integration of multiple datasets. Using real EHR data from eMERGE, Sum-Share is able to identify 1734 potential pleiotropic SNPs for five cardiovascular diseases.


Subject(s)
Electronic Health Records/statistics & numerical data , Genetic Pleiotropy , Communication , Databases, Factual , Genome-Wide Association Study/statistics & numerical data , Humans , Models, Biological , Phenotype , Polymorphism, Single Nucleotide , Privacy
15.
Nat Rev Genet ; 21(8): 493-502, 2020 08.
Article in English | MEDLINE | ID: mdl-32235907

ABSTRACT

Accurate prediction of disease risk based on the genetic make-up of an individual is essential for effective prevention and personalized treatment. Nevertheless, to date, individual genetic variants from genome-wide association studies have achieved only moderate prediction of disease risk. The aggregation of genetic variants under a polygenic model shows promising improvements in prediction accuracies. Increasingly, electronic health records (EHRs) are being linked to patient genetic data in biobanks, which provides new opportunities for developing and applying polygenic risk scores in the clinic, to systematically examine and evaluate patient susceptibilities to disease. However, the heterogeneous nature of EHR data brings forth many practical challenges along every step of designing and implementing risk prediction strategies. In this Review, we present the unique considerations for using genotype and phenotype data from biobank-linked EHRs for polygenic risk prediction.


Subject(s)
Electronic Health Records , Genetic Association Studies , Genetic Predisposition to Disease , Multifactorial Inheritance , Algorithms , Computational Biology/methods , Genome-Wide Association Study , Genomics/methods , Genotype , Humans , Phenotype , Reproducibility of Results , Risk Assessment , Risk Factors
16.
AMIA Annu Symp Proc ; 2020: 1383-1391, 2020.
Article in English | MEDLINE | ID: mdl-33936514

ABSTRACT

Large-scale biobank cohorts coupled with electronic health records offer unprecedented opportunities to study genotype-phenotype relationships. Genome-wide association studies uncovered disease-associated loci through univariate methods, with the focus on one trait at a time. With genetic variants being identifiedfor thousands of traits, researchers found that 90% of human genetic loci are associated with more than one trait, highlighting the ubiquity of pleiotropy. Recently, multivariate methods have been proposed to effectively identify pleiotropy. However, the statistical performance in natural biomedical data, which often have unbalanced case-control sample sizes, is largely known. In this work, we designed 21 scenarios of real-data informed simulations to thoroughly evaluate the statistical characteristics of univariate and multivariate methods. Our results can serve as a reference guide for the application of multivariate methods. We also investigated potential pleiotropy across type II diabetes, Alzheimer's disease, atherosclerosis of arteries, depression, and atherosclerotic heart disease in the UK Biobank.


Subject(s)
Biological Specimen Banks , Biostatistics , Case-Control Studies , Computer Simulation , Diabetes Mellitus, Type 2 , Genome-Wide Association Study , Humans , Multivariate Analysis , Phenotype , Sample Size , United Kingdom
17.
Pac Symp Biocomput ; 25: 695-706, 2020.
Article in English | MEDLINE | ID: mdl-31797639

ABSTRACT

Electronic Health Records (EHR) contain extensive patient data on various health outcomes and risk predictors, providing an efficient and wide-reaching source for health research. Integrated EHR data can provide a larger sample size of the population to improve estimation and prediction accuracy. To overcome the obstacle of sharing patient-level data, distributed algorithms were developed to conduct statistical analyses across multiple clinical sites through sharing only aggregated information. However, the heterogeneity of data across sites is often ignored by existing distributed algorithms, which leads to substantial bias when studying the association between the outcomes and exposures. In this study, we propose a privacy-preserving and communication-efficient distributed algorithm which accounts for the heterogeneity caused by a small number of the clinical sites. We evaluated our algorithm through a systematic simulation study motivated by real-world scenarios and applied our algorithm to multiple claims datasets from the Observational Health Data Sciences and Informatics (OHDSI) network. The results showed that the proposed method performed better than the existing distributed algorithm ODAL and a meta-analysis method.


Subject(s)
Computational Biology , Electronic Health Records , Information Dissemination , Machine Learning , Algorithms , Computer Simulation , Humans , Medical Informatics
18.
J Am Med Inform Assoc ; 26(10): 1083-1090, 2019 10 01.
Article in English | MEDLINE | ID: mdl-31529123

ABSTRACT

OBJECTIVE: Pleiotropy, where 1 genetic locus affects multiple phenotypes, can offer significant insights in understanding the complex genotype-phenotype relationship. Although individual genotype-phenotype associations have been thoroughly explored, seemingly unrelated phenotypes can be connected genetically through common pleiotropic loci or genes. However, current analyses of pleiotropy have been challenged by both methodologic limitations and a lack of available suitable data sources. MATERIALS AND METHODS: In this study, we propose to utilize a new regression framework, reduced rank regression, to simultaneously analyze multiple phenotypes and genotypes to detect pleiotropic effects. We used a large-scale biobank linked electronic health record data from the Penn Medicine BioBank to select 5 cardiovascular diseases (hypertension, cardiac dysrhythmias, ischemic heart disease, congestive heart failure, and heart valve disorders) and 5 mental disorders (mood disorders; anxiety, phobic and dissociative disorders; alcohol-related disorders; neurological disorders; and delirium dementia) to validate our framework. RESULTS: Compared with existing methods, reduced rank regression showed a higher power to distinguish known associated single-nucleotide polymorphisms from random single-nucleotide polymorphisms. In addition, genome-wide gene-based investigation of pleiotropy showed that reduced rank regression was able to identify candidate genetic variants with novel pleiotropic effects compared to existing methods. CONCLUSION: The proposed regression framework offers a new approach to account for the phenotype and genotype correlations when identifying pleiotropic effects. By jointly modeling multiple phenotypes and genotypes together, the method has the potential to distinguish confounding from causal genotype and phenotype associations.


Subject(s)
Cardiovascular Diseases/genetics , Electronic Health Records , Genetic Pleiotropy , Mental Disorders/genetics , Genetic Association Studies , Genetic Predisposition to Disease , Genotype , Humans , Phenotype , Polymorphism, Single Nucleotide
19.
J Am Med Inform Assoc ; 26(10): 1056-1063, 2019 10 01.
Article in English | MEDLINE | ID: mdl-31329892

ABSTRACT

OBJECTIVE: Clinical data of patients' measurements and treatment history stored in electronic health record (EHR) systems are starting to be mined for better treatment options and disease associations. A primary challenge associated with utilizing EHR data is the considerable amount of missing data. Failure to address this issue can introduce significant bias in EHR-based research. Currently, imputation methods rely on correlations among the structured phenotype variables in the EHR. However, genetic studies have shown that many EHR-based phenotypes have a heritable component, suggesting that measured genetic variants might be useful for imputing missing data. In this article, we developed a computational model that incorporates patients' genetic information to perform EHR data imputation. MATERIALS AND METHODS: We used the individual single nucleotide polymorphism's association with phenotype variables in the EHR as input to construct a genetic risk score that quantifies the genetic contribution to the phenotype. Multiple approaches to constructing the genetic risk score were evaluated for optimal performance. The genetic score, along with phenotype correlation, is then used as a predictor to impute the missing values. RESULTS: To demonstrate the method performance, we applied our model to impute missing cardiovascular related measurements including low-density lipoprotein, heart failure, and aortic aneurysm disease in the electronic Medical Records and Genomics data. The integration method improved imputation's area-under-the-curve for binary phenotypes and decreased root-mean-square error for continuous phenotypes. CONCLUSION: Compared with standard imputation approaches, incorporating genetic information offers a novel approach that can utilize more of the EHR data for better performance in missing data imputation.


Subject(s)
Computational Biology , Electronic Health Records , Genomics , Genotype , Information Storage and Retrieval/methods , Genome-Wide Association Study , Heart Disease Risk Factors , Humans , Polymorphism, Single Nucleotide
20.
Pharmacogenomics J ; 19(2): 178-190, 2019 04.
Article in English | MEDLINE | ID: mdl-29795408

ABSTRACT

Identifying genetic variants associated with chemotherapeutic induced toxicity is an important step towards personalized treatment of cancer patients. However, annotating and interpreting the associated genetic variants remains challenging because each associated variant is a surrogate for many other variants in the same region. The issue is further complicated when investigating patterns of associated variants with multiple drugs. In this study, we used biological knowledge to annotate and compare genetic variants associated with cellular sensitivity to mechanistically distinct chemotherapeutic drugs, including platinating agents (cisplatin, carboplatin), capecitabine, cytarabine, and paclitaxel. The most significantly associated SNPs from genome wide association studies of cellular sensitivity to each drug in lymphoblastoid cell lines derived from populations of European (CEU) and African (YRI) descent were analyzed for their enrichment in biological pathways and processes. We annotated genetic variants using higher-level biological annotations in efforts to group variants into more interpretable biological modules. Using the higher-level annotations, we observed distinct biological modules associated with cell line populations as well as classes of chemotherapeutic drugs. We also integrated genetic variants and gene expression variables to build predictive models for chemotherapeutic drug cytotoxicity and prioritized the network models based on the enrichment of DNA regulatory data. Several biological annotations, often encompassing different SNPs, were replicated in independent datasets. By using biological knowledge and DNA regulatory information, we propose a novel approach for jointly analyzing genetic variants associated with multiple chemotherapeutic drugs.


Subject(s)
Genetic Variation/genetics , Genome-Wide Association Study/methods , Neoplasms/drug therapy , Pharmacogenetics/methods , Black People/genetics , Capecitabine/adverse effects , Capecitabine/therapeutic use , Carboplatin/adverse effects , Carboplatin/therapeutic use , Cell Line , Cisplatin/adverse effects , Cisplatin/therapeutic use , Gene Expression Regulation, Neoplastic/drug effects , Genome, Human/genetics , Humans , Molecular Sequence Annotation , Neoplasms/genetics , Paclitaxel/adverse effects , Paclitaxel/therapeutic use , Polymorphism, Single Nucleotide/genetics , White People/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...