Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 34
Filtrar
1.
Cancer Inform ; 22: 11769351231183847, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37426052

RESUMO

Background: In recent years, interest in prognostic calculators for predicting patient health outcomes has grown with the popularity of personalized medicine. These calculators, which can inform treatment decisions, employ many different methods, each of which has advantages and disadvantages. Methods: We present a comparison of a multistate model (MSM) and a random survival forest (RSF) through a case study of prognostic predictions for patients with oropharyngeal squamous cell carcinoma. The MSM is highly structured and takes into account some aspects of the clinical context and knowledge about oropharyngeal cancer, while the RSF can be thought of as a black-box non-parametric approach. Key in this comparison are the high rate of missing values within these data and the different approaches used by the MSM and RSF to handle missingness. Results: We compare the accuracy (discrimination and calibration) of survival probabilities predicted by both approaches and use simulation studies to better understand how predictive accuracy is influenced by the approach to (1) handling missing data and (2) modeling structural/disease progression information present in the data. We conclude that both approaches have similar predictive accuracy, with a slight advantage going to the MSM. Conclusions: Although the MSM shows slightly better predictive ability than the RSF, consideration of other differences are key when selecting the best approach for addressing a specific research question. These key differences include the methods' ability to incorporate domain knowledge, and their ability to handle missing data as well as their interpretability, and ease of implementation. Ultimately, selecting the statistical method that has the most potential to aid in clinical decisions requires thoughtful consideration of the specific goals.

2.
EBioMedicine ; 91: 104534, 2023 May.
Artigo em Inglês | MEDLINE | ID: mdl-37004335

RESUMO

BACKGROUND: Throughout the COVID-19 pandemic, the SARS-CoV-2 virus has continued to evolve, with new variants outcompeting existing variants and often leading to different dynamics of disease spread. METHODS: In this paper, we performed a retrospective analysis using longitudinal sequencing data to characterize differences in the speed, calendar timing, and magnitude of 16 SARS-CoV-2 variant waves/transitions for 230 countries and sub-country regions, between October 2020 and January 2023. We then clustered geographic locations in terms of their variant behavior across several Omicron variants, allowing us to identify groups of locations exhibiting similar variant transitions. Finally, we explored relationships between heterogeneity in these variant waves and time-varying factors, including vaccination status of the population, governmental policy, and the number of variants in simultaneous competition. FINDINGS: This work demonstrates associations between the behavior of an emerging variant and the number of co-circulating variants as well as the demographic context of the population. We also observed an association between high vaccination rates and variant transition dynamics prior to the Mu and Delta variant transitions. INTERPRETATION: These results suggest the behavior of an emergent variant may be sensitive to the immunologic and demographic context of its location. Additionally, this work represents the most comprehensive characterization of variant transitions globally to date. FUNDING: Laboratory Directed Research and Development (LDRD), Los Alamos National Laboratory.


Assuntos
COVID-19 , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , COVID-19/epidemiologia , COVID-19/prevenção & controle , Pandemias , Estudos Retrospectivos
3.
Cancer Epidemiol Biomarkers Prev ; 32(6): 748-759, 2023 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-36626383

RESUMO

BACKGROUND: Studies have shown an increased risk of severe SARS-CoV-2-related (COVID-19) disease outcome and mortality for patients with cancer, but it is not well understood whether associations vary by cancer site, cancer treatment, and vaccination status. METHODS: Using electronic health record data from an academic medical center, we identified a retrospective cohort of 260,757 individuals tested for or diagnosed with COVID-19 from March 10, 2020, to August 1, 2022. Of these, 52,019 tested positive for COVID-19 of whom 13,752 had a cancer diagnosis. We conducted Firth-corrected logistic regression to assess the association between cancer status, site, treatment, vaccination, and four COVID-19 outcomes: hospitalization, intensive care unit admission, mortality, and a composite "severe COVID" outcome. RESULTS: Cancer diagnosis was significantly associated with higher rates of severe COVID, hospitalization, and mortality. These associations were driven by patients whose most recent initial cancer diagnosis was within the past 3 years. Chemotherapy receipt, colorectal cancer, hematologic malignancies, kidney cancer, and lung cancer were significantly associated with higher rates of worse COVID-19 outcomes. Vaccinations were significantly associated with lower rates of worse COVID-19 outcomes regardless of cancer status. CONCLUSIONS: Patients with colorectal cancer, hematologic malignancies, kidney cancer, or lung cancer or who receive chemotherapy for treatment should be cautious because of their increased risk of worse COVID-19 outcomes, even after vaccination. IMPACT: Additional COVID-19 precautions are warranted for people with certain cancer types and treatments. Significant benefit from vaccination is noted for both cancer and cancer-free patients.


Assuntos
COVID-19 , Neoplasias Colorretais , Neoplasias Hematológicas , Neoplasias Renais , Neoplasias Pulmonares , Humanos , COVID-19/epidemiologia , SARS-CoV-2 , Estudos Retrospectivos , Hospitalização , Vacinação
4.
PLoS One ; 18(1): e0279894, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36603015

RESUMO

The COVID-19 pandemic has highlighted a need for better understanding of countries' vulnerability and resilience to not only pandemics but also disasters, climate change, and other systemic shocks. A comprehensive characterization of vulnerability can inform efforts to improve infrastructure and guide disaster response in the future. In this paper, we propose a data-driven framework for studying countries' vulnerability and resilience to incident disasters across multiple dimensions of society. To illustrate this methodology, we leverage the rich data landscape surrounding the COVID-19 pandemic to characterize observed resilience for several countries (USA, Brazil, India, Sweden, New Zealand, and Israel) as measured by pandemic impacts across a variety of social, economic, and political domains. We also assess how observed responses and outcomes (i.e., resilience) of the COVID-19 pandemic are associated with pre-pandemic characteristics or vulnerabilities, including (1) prior risk for adverse pandemic outcomes due to population density and age and (2) the systems in place prior to the pandemic that may impact the ability to respond to the crisis, including health infrastructure and economic capacity. Our work demonstrates the importance of viewing vulnerability and resilience in a multi-dimensional way, where a country's resources and outcomes related to vulnerability and resilience can differ dramatically across economic, political, and social domains. This work also highlights key gaps in our current understanding about vulnerability and resilience and a need for data-driven, context-specific assessments of disaster vulnerability in the future.


Assuntos
COVID-19 , Desastres , Humanos , COVID-19/epidemiologia , Pandemias , Brasil/epidemiologia , Índia
5.
Stat Med ; 41(28): 5501-5516, 2022 12 10.
Artigo em Inglês | MEDLINE | ID: mdl-36131394

RESUMO

Electronic health records (EHR) are not designed for population-based research, but they provide easy and quick access to longitudinal health information for a large number of individuals. Many statistical methods have been proposed to account for selection bias, missing data, phenotyping errors, or other problems that arise in EHR data analysis. However, addressing multiple sources of bias simultaneously is challenging. We developed a methodological framework (R package, SAMBA) for jointly handling both selection bias and phenotype misclassification in the EHR setting that leverages external data sources. These methods assume factors related to selection and misclassification are fully observed, but these factors may be poorly understood and partially observed in practice. As a follow-up to the methodological work, we demonstrate how to apply these methods for two real-world case studies, and we evaluate their performance. In both examples, we use individual patient-level data collected through the University of Michigan Health System and various external population-based data sources. In case study (a), we explore the impact of these methods on estimated associations between gender and cancer diagnosis. In case study (b), we compare corrected associations between previously identified genetic loci and age-related macular degeneration with gold standard external summary estimates. These case studies illustrate how to utilize diverse auxiliary information to achieve less biased inference in EHR-based research.


Assuntos
Registros Eletrônicos de Saúde , Armazenamento e Recuperação da Informação , Viés de Seleção , Viés , Fenótipo
6.
PLoS Comput Biol ; 18(6): e1010115, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-35658007

RESUMO

Infectious disease forecasting is of great interest to the public health community and policymakers, since forecasts can provide insight into disease dynamics in the near future and inform interventions. Due to delays in case reporting, however, forecasting models may often underestimate the current and future disease burden. In this paper, we propose a general framework for addressing reporting delay in disease forecasting efforts with the goal of improving forecasts. We propose strategies for leveraging either historical data on case reporting or external internet-based data to estimate the amount of reporting error. We then describe several approaches for adapting general forecasting pipelines to account for under- or over-reporting of cases. We apply these methods to address reporting delay in data on dengue fever cases in Puerto Rico from 1990 to 2009 and to reports of influenza-like illness (ILI) in the United States between 2010 and 2019. Through a simulation study, we compare method performance and evaluate robustness to assumption violations. Our results show that forecasting accuracy and prediction coverage almost always increase when correction methods are implemented to address reporting delay. Some of these methods required knowledge about the reporting error or high quality external data, which may not always be available. Provided alternatives include excluding recently-reported data and performing sensitivity analysis. This work provides intuition and guidance for handling delay in disease case reporting and may serve as a useful resource to inform practical infectious disease forecasting efforts.


Assuntos
Doenças Transmissíveis , Influenza Humana , Doenças Transmissíveis/epidemiologia , Simulação por Computador , Previsões , Humanos , Influenza Humana/epidemiologia , Modelos Estatísticos , Saúde Pública , Estados Unidos
7.
Stat Med ; 41(13): 2317-2337, 2022 06 15.
Artigo em Inglês | MEDLINE | ID: mdl-35224743

RESUMO

False negative rates of severe acute respiratory coronavirus 2 diagnostic tests, together with selection bias due to prioritized testing can result in inaccurate modeling of COVID-19 transmission dynamics based on reported "case" counts. We propose an extension of the widely used Susceptible-Exposed-Infected-Removed (SEIR) model that accounts for misclassification error and selection bias, and derive an analytic expression for the basic reproduction number R0 as a function of false negative rates of the diagnostic tests and selection probabilities for getting tested. Analyzing data from the first two waves of the pandemic in India, we show that correcting for misclassification and selection leads to more accurate prediction in a test sample. We provide estimates of undetected infections and deaths between April 1, 2020 and August 31, 2021. At the end of the first wave in India, the estimated under-reporting factor for cases was at 11.1 (95% CI: 10.7,11.5) and for deaths at 3.58 (95% CI: 3.5,3.66) as of February 1, 2021, while they change to 19.2 (95% CI: 17.9, 19.9) and 4.55 (95% CI: 4.32, 4.68) as of July 1, 2021. Equivalently, 9.0% (95% CI: 8.7%, 9.3%) and 5.2% (95% CI: 5.0%, 5.6%) of total estimated infections were reported on these two dates, while 27.9% (95% CI: 27.3%, 28.6%) and 22% (95% CI: 21.4%, 23.1%) of estimated total deaths were reported. Extensive simulation studies demonstrate the effect of misclassification and selection on estimation of R0 and prediction of future infections. A R-package SEIRfansy is developed for broader dissemination.


Assuntos
COVID-19 , Número Básico de Reprodução , COVID-19/diagnóstico , COVID-19/epidemiologia , Humanos , Índia/epidemiologia , Pandemias , SARS-CoV-2
8.
Biometrics ; 78(1): 214-226, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-33179768

RESUMO

Health research using electronic health records (EHR) has gained popularity, but misclassification of EHR-derived disease status and lack of representativeness of the study sample can result in substantial bias in effect estimates and can impact power and type I error. In this paper, we develop new strategies for handling disease status misclassification and selection bias in EHR-based association studies. We first focus on each type of bias separately. For misclassification, we propose three novel likelihood-based bias correction strategies. A distinguishing feature of the EHR setting is that misclassification may be related to patient-varying factors, and the proposed methods leverage data in the EHR to estimate misclassification rates without gold standard labels. For addressing selection bias, we describe how calibration and inverse probability weighting methods from the survey sampling literature can be extended and applied to the EHR setting. Addressing misclassification and selection biases simultaneously is a more challenging problem than dealing with each on its own, and we propose several new strategies. For all methods proposed, we derive valid standard error estimators and provide software for implementation. We provide a new suite of statistical estimation and inference strategies for addressing misclassification and selection bias simultaneously that is tailored to problems arising in EHR data analysis. We apply these methods to data from The Michigan Genomics Initiative, a longitudinal EHR-linked biorepository.


Assuntos
Registros Eletrônicos de Saúde , Viés , Humanos , Funções Verossimilhança , Michigan , Viés de Seleção
9.
J Comput Graph Stat ; 31(4): 1063-1075, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36644406

RESUMO

Penalized regression methods are used in many biomedical applications for variable selection and simultaneous coefficient estimation. However, missing data complicates the implementation of these methods, particularly when missingness is handled using multiple imputation. Applying a variable selection algorithm on each imputed dataset will likely lead to different sets of selected predictors. This paper considers a general class of penalized objective functions which, by construction, force selection of the same variables across imputed datasets. By pooling objective functions across imputations, optimization is then performed jointly over all imputed datasets rather than separately for each dataset. We consider two objective function formulations that exist in the literature, which we will refer to as "stacked" and "grouped" objective functions. Building on existing work, we (a) derive and implement efficient cyclic coordinate descent and majorization-minimization optimization algorithms for continuous and binary outcome data, (b) incorporate adaptive shrinkage penalties, (c) compare these methods through simulation, and (d) develop an R package miselect. Simulations demonstrate that the "stacked" approaches are more computationally efficient and have better estimation and selection properties. We apply these methods to data from the University of Michigan ALS Patients Biorepository aiming to identify the association between environmental pollutants and ALS risk. Supplementary materials are available online.

10.
Stat Methods Med Res ; 30(12): 2685-2700, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34643465

RESUMO

Multiple imputation is a well-established general technique for analyzing data with missing values. A convenient way to implement multiple imputation is sequential regression multiple imputation, also called chained equations multiple imputation. In this approach, we impute missing values using regression models for each variable, conditional on the other variables in the data. This approach, however, assumes that the missingness mechanism is missing at random, and it is not well-justified under not-at-random missingness without additional modification. In this paper, we describe how we can generalize the sequential regression multiple imputation imputation procedure to handle missingness not at random in the setting where missingness may depend on other variables that are also missing but not on the missing variable itself, conditioning on fully observed variables. We provide algebraic justification for several generalizations of standard sequential regression multiple imputation using Taylor series and other approximations of the target imputation distribution under missingness not at random. Resulting regression model approximations include indicators for missingness, interactions, or other functions of the missingness not at random missingness model and observed data. In a simulation study, we demonstrate that the proposed sequential regression multiple imputation modifications result in reduced bias in the final analysis compared to standard sequential regression multiple imputation, with an approximation strategy involving inclusion of an offset in the imputation model performing the best overall. The method is illustrated in a breast cancer study, where the goal is to estimate the prevalence of a specific genetic pathogenic variant.


Assuntos
Neoplasias da Mama , Projetos de Pesquisa , Viés , Neoplasias da Mama/genética , Simulação por Computador , Interpretação Estatística de Dados , Feminino , Humanos
11.
Rev Sci Instrum ; 92(9): 093505, 2021 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-34598501

RESUMO

Proton imaging is a powerful technique for imaging electromagnetic fields within an experimental volume, in which spatial variations in proton fluence are a result of deflections to proton trajectories due to interaction with the fields. When deflections are large, proton trajectories can overlap, and this nonlinearity creates regions of greatly increased proton fluence on the image, known as caustics. The formation of caustics has been a persistent barrier to reconstructing the underlying fields from proton images. We have developed a new method for reconstructing the path-integrated magnetic fields, which begins to address the problem posed by caustics. Our method uses multiple proton images of the same object, each image at a different energy, to fill in the information gaps and provide some uniqueness when reconstructing caustic features. We use a differential evolution algorithm to iteratively estimate the underlying deflection function, which accurately reproduces the observed proton fluence at multiple proton energies simultaneously. We test this reconstruction method using synthetic proton images generated for three different, cylindrically symmetric field geometries at various field amplitudes and levels of proton statistics and present reconstruction results from a set of experimental images. The method we propose requires no assumption of deflection linearity and can reliably solve for fields underlying linear, nonlinear, and caustic proton image features for the selected geometries and is shown to be fairly robust to noise in the input proton intensity.

13.
Stat Med ; 40(27): 6118-6132, 2021 11 30.
Artigo em Inglês | MEDLINE | ID: mdl-34459011

RESUMO

Not-at-random missingness presents a challenge in addressing missing data in many health research applications. In this article, we propose a new approach to account for not-at-random missingness after multiple imputation through weighted analysis of stacked multiple imputations. The weights are easily calculated as a function of the imputed data and assumptions about the not-at-random missingness. We demonstrate through simulation that the proposed method has excellent performance when the missingness model is correctly specified. In practice, the missingness mechanism will not be known. We show how we can use our approach in a sensitivity analysis framework to evaluate the robustness of model inference to different assumptions about the missingness mechanism, and we provide R package StackImpute to facilitate implementation as part of routine sensitivity analyses. We apply the proposed method to account for not-at-random missingness in human papillomavirus test results in a study of survival for patients diagnosed with oropharyngeal cancer.


Assuntos
Modelos Estatísticos , Projetos de Pesquisa , Simulação por Computador , Interpretação Estatística de Dados , Humanos
14.
JAMA Netw Open ; 4(8): e2120055, 2021 08 02.
Artigo em Inglês | MEDLINE | ID: mdl-34369988

RESUMO

Importance: Recent insights into the biologic characteristics and treatment of oropharyngeal cancer may help inform improvements in prognostic modeling. A bayesian multistate model incorporates sophisticated statistical techniques to provide individualized predictions of survival and recurrence outcomes for patients with newly diagnosed oropharyngeal cancer. Objective: To develop a model for individualized survival, locoregional recurrence, and distant metastasis prognostication for patients with newly diagnosed oropharyngeal cancer, incorporating clinical, oncologic, and imaging data. Design, Setting, and Participants: In this prognostic study, a data set was used comprising 840 patients with newly diagnosed oropharyngeal cancer treated at a National Cancer Institute-designated center between January 2003 and August 2016; analysis was performed between January 2019 and June 2020. Using these data, a bayesian multistate model was developed that can be used to obtain individualized predictions. The prognostic performance of the model was validated using data from 447 patients treated for oropharyngeal cancer at Erasmus Medical Center in the Netherlands. Exposures: Clinical/oncologic factors and imaging biomarkers collected at or before initiation of first-line therapy. Main Outcomes and Measures: Overall survival, locoregional recurrence, and distant metastasis after first-line cancer treatment. Results: Of the 840 patients included in the National Cancer Institute-designated center, 715 (85.1%) were men and 268 (31.9%) were current smokers. The Erasmus Medical Center cohort comprised 300 (67.1%) men, with 350 (78.3%) current smokers. Model predictions for 5-year overall survival demonstrated good discrimination, with area under the curve values of 0.81 for the model with and 0.78 for the model without imaging variables. Application of the model without imaging data in the independent Dutch validation cohort resulted in an area under the curve of 0.75. This model possesses good calibration and stratifies patients well in terms of likely outcomes among many competing events. Conclusions and Relevance: In this prognostic study, a multistate model of oropharyngeal cancer incorporating imaging biomarkers appeared to estimate and discriminate locoregional recurrence from distant metastases. Providing personalized predictions of multiple outcomes increases the information available for patients and clinicians. The web-based application designed in this study may serve as a useful tool for generating predictions and visualizing likely outcomes for a specific patient.


Assuntos
Biomarcadores Tumorais/sangue , Neoplasias Orofaríngeas/psicologia , Neoplasias Orofaríngeas/terapia , Prognóstico , Análise de Sobrevida , Teorema de Bayes , Feminino , Previsões , Humanos , Masculino , Michigan , Pessoa de Meia-Idade , Modelos Teóricos , Países Baixos , Neoplasias Orofaríngeas/epidemiologia , Resultado do Tratamento , Estados Unidos/epidemiologia
15.
Sci Rep ; 11(1): 9748, 2021 05 07.
Artigo em Inglês | MEDLINE | ID: mdl-33963259

RESUMO

Susceptible-Exposed-Infected-Removed (SEIR)-type epidemiologic models, modeling unascertained infections latently, can predict unreported cases and deaths assuming perfect testing. We apply a method we developed to account for the high false negative rates of diagnostic RT-PCR tests for detecting an active SARS-CoV-2 infection in a classic SEIR model. The number of unascertained cases and false negatives being unobservable in a real study, population-based serosurveys can help validate model projections. Applying our method to training data from Delhi, India, during March 15-June 30, 2020, we estimate the underreporting factor for cases at 34-53 (deaths: 8-13) on July 10, 2020, largely consistent with the findings of the first round of serosurveys for Delhi (done during June 27-July 10, 2020) with an estimated 22.86% IgG antibody prevalence, yielding estimated underreporting factors of 30-42 for cases. Together, these imply approximately 96-98% cases in Delhi remained unreported (July 10, 2020). Updated calculations using training data during March 15-December 31, 2020 yield estimated underreporting factor for cases at 13-22 (deaths: 3-7) on January 23, 2021, which are again consistent with the latest (fifth) round of serosurveys for Delhi (done during January 15-23, 2021) with an estimated 56.13% IgG antibody prevalence, yielding an estimated range for the underreporting factor for cases at 17-21. Together, these updated estimates imply approximately 92-96% cases in Delhi remained unreported (January 23, 2021). Such model-based estimates, updated with latest data, provide a viable alternative to repeated resource-intensive serosurveys for tracking unreported cases and deaths and gauging the true extent of the pandemic.


Assuntos
COVID-19/diagnóstico , COVID-19/epidemiologia , SARS-CoV-2/isolamento & purificação , Adolescente , Adulto , Anticorpos Antivirais/imunologia , COVID-19/imunologia , COVID-19/transmissão , Teste para COVID-19 , Criança , Pré-Escolar , Reações Falso-Negativas , Feminino , Humanos , Imunoglobulina G/imunologia , Índia/epidemiologia , Masculino , SARS-CoV-2/imunologia , Estudos Soroepidemiológicos , Adulto Jovem
16.
Biometrics ; 77(4): 1342-1354, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-32920819

RESUMO

Multiple imputation by chained equations (MICE) has emerged as a popular approach for handling missing data. A central challenge for applying MICE is determining how to incorporate outcome information into covariate imputation models, particularly for complicated outcomes. Often, we have a particular analysis model in mind, and we would like to ensure congeniality between the imputation and analysis models. We propose a novel strategy for directly incorporating the analysis model into the handling of missing data. In our proposed approach, multiple imputations of missing covariates are obtained without using outcome information. We then utilize the strategy of imputation stacking, where multiple imputations are stacked on top of each other to create a large data set. The analysis model is then incorporated through weights. Instead of applying Rubin's combining rules, we obtain parameter estimates by fitting a weighted version of the analysis model on the stacked data set. We propose a novel estimator for obtaining standard errors for this stacked and weighted analysis. Our estimator is based on the observed data information principle in Louis' work and can be applied for analyzing stacked multiple imputations more generally. Our approach for analyzing stacked multiple imputations is the first method that can be easily applied (using R package StackImpute) for a wide variety of standard analysis models and missing data settings.


Assuntos
Modelos Estatísticos , Projetos de Pesquisa
17.
J Biomed Inform ; 113: 103652, 2021 01.
Artigo em Inglês | MEDLINE | ID: mdl-33279681

RESUMO

BACKGROUND: Traditional methods for disease risk prediction and assessment, such as diagnostic tests using serum, urine, blood, saliva or imaging biomarkers, have been important for identifying high-risk individuals for many diseases, leading to early detection and improved survival. For pancreatic cancer, traditional methods for screening have been largely unsuccessful in identifying high-risk individuals in advance of disease progression leading to high mortality and poor survival. Electronic health records (EHR) linked to genetic profiles provide an opportunity to integrate multiple sources of patient information for risk prediction and stratification. We leverage a constellation of temporally associated diagnoses available in the EHR to construct a summary risk score, called a phenotype risk score (PheRS), for identifying individuals at high-risk for having pancreatic cancer. The proposed PheRS approach incorporates the time with respect to disease onset into the prediction framework. We combine and contrast the PheRS with more well-known measures of inherited susceptibility, namely, the polygenic risk scores (PRS) for prediction of pancreatic cancer. METHODOLOGY: We first calculated pairwise, unadjusted associations between pancreatic cancer diagnosis and all possible other diagnoses across the medical phenome. We call these pairwise associations co-occurrences. After accounting for cross-phenotype correlations, the multivariable association estimates from a subset of relatively independent diagnoses were used to create a weighted sum PheRS. We constructed time-restricted risk scores using data from 38,359 participants in the Michigan Genomics Initiative (MGI) based on the diagnoses contained in the EHR at 0, 1, 2, and 5 years prior to the target pancreatic cancer diagnosis. The PheRS was assessed for predictability in the UK Biobank (UKB). We tested the relative contribution of PheRS when added to a model containing a summary measure of inherited genetic susceptibility (PRS) plus other covariates like age, sex, smoking status, drinking status, and body mass index (BMI). RESULTS: Our exploration of co-occurrence patterns identified expected associations while also revealing unexpected relationships that may warrant closer attention. Solely using the pancreatic cancer PheRS at 5 years before the target diagnoses yielded an AUC of 0.60 (95% CI = [0.58, 0.62]) in UKB. A larger predictive model including PheRS, PRS, and the covariates at the 5-year threshold achieved an AUC of 0.74 (95% CI = [0.72, 0.76]) in UKB. We note that PheRS does contribute independently in the joint model. Finally, scores at the top percentiles of the PheRS distribution demonstrated promise in terms of risk stratification. Scores in the top 2% were 10.20 (95% CI = [9.34, 12.99]) times more likely to identify cases than those in the bottom 98% in UKB at the 5-year threshold prior to pancreatic cancer diagnosis. CONCLUSIONS: We developed a framework for creating a time-restricted PheRS from EHR data for pancreatic cancer using the rich information content of a medical phenome. In addition to identifying hypothesis-generating associations for future research, this PheRS demonstrates a potentially important contribution in identifying high-risk individuals, even after adjusting for PRS for pancreatic cancer and other traditional epidemiologic covariates. The methods are generalizable to other phenotypic traits.


Assuntos
Registros Eletrônicos de Saúde , Neoplasias Pancreáticas , Bancos de Espécimes Biológicos , Estudo de Associação Genômica Ampla , Humanos , Michigan , Neoplasias Pancreáticas/genética , Fenótipo , Fatores de Risco
18.
medRxiv ; 2020 Sep 25.
Artigo em Inglês | MEDLINE | ID: mdl-32995829

RESUMO

The false negative rate of the diagnostic RT-PCR test for SARS-CoV-2 has been reported to be substantially high. Due to limited availability of testing, only a non-random subset of the population can get tested. Hence, the reported test counts are subject to a large degree of selection bias. We consider an extension of the Susceptible-Exposed-Infected-Removed (SEIR) model under both selection bias and misclassification. We derive closed form expression for the basic reproduction number under such data anomalies using the next generation matrix method. We conduct extensive simulation studies to quantify the effect of misclassification and selection on the resultant estimation and prediction of future case counts. Finally we apply the methods to reported case-death-recovery count data from India, a nation with more than 5 million cases reported over the last seven months. We show that correcting for misclassification and selection can lead to more accurate prediction of case-counts (and death counts) using the observed data as a beta tester. The model also provides an estimate of undetected infections and thus an under-reporting factor. For India, the estimated under-reporting factor for cases is around 21 and for deaths is around 6. We develop an R-package (SEIRfansy) for broader dissemination of the methods.

19.
Am J Hum Genet ; 107(5): 815-836, 2020 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-32991828

RESUMO

To facilitate scientific collaboration on polygenic risk scores (PRSs) research, we created an extensive PRS online repository for 35 common cancer traits integrating freely available genome-wide association studies (GWASs) summary statistics from three sources: published GWASs, the NHGRI-EBI GWAS Catalog, and UK Biobank-based GWASs. Our framework condenses these summary statistics into PRSs using various approaches such as linkage disequilibrium pruning/p value thresholding (fixed or data-adaptively optimized thresholds) and penalized, genome-wide effect size weighting. We evaluated the PRSs in two biobanks: the Michigan Genomics Initiative (MGI), a longitudinal biorepository effort at Michigan Medicine, and the population-based UK Biobank (UKB). For each PRS construct, we provide measures on predictive performance and discrimination. Besides PRS evaluation, the Cancer-PRSweb platform features construct downloads and phenome-wide PRS association study results (PRS-PheWAS) for predictive PRSs. We expect this integrated platform to accelerate PRS-related cancer research.


Assuntos
Bancos de Espécimes Biológicos/estatística & dados numéricos , Predisposição Genética para Doença , Genoma Humano , Genômica/métodos , Herança Multifatorial , Neoplasias/genética , Adulto , Idoso , Feminino , Estudo de Associação Genômica Ampla , Humanos , Internet , Desequilíbrio de Ligação , Masculino , Pessoa de Meia-Idade , Neoplasias/classificação , Neoplasias/diagnóstico , Neoplasias/epidemiologia , Fenótipo , Característica Quantitativa Herdável , Fatores de Risco , Reino Unido/epidemiologia , Estados Unidos/epidemiologia
20.
Stat Med ; 39(14): 1965-1979, 2020 06 30.
Artigo em Inglês | MEDLINE | ID: mdl-32198773

RESUMO

Large-scale association analyses based on observational health care databases such as electronic health records have been a topic of increasing interest in the scientific community. However, challenges due to nonprobability sampling and phenotype misclassification associated with the use of these data sources are often ignored in standard analyses. The extent of the bias introduced by ignoring these factors is not well-characterized. In this paper, we develop an analytic framework for characterizing the bias expected in disease-gene association studies based on electronic health records when disease status misclassification and the sampling mechanism are ignored. Through a sensitivity analysis approach, this framework can be used to obtain plausible values for parameters of interest given summary results from standard analysis. We develop an online tool for performing this sensitivity analysis. Simulations demonstrate promising properties of the proposed method. We apply our approach to study bias in disease-gene association studies using electronic health record data from the Michigan Genomics Initiative, a longitudinal biorepository effort within The University Michigan health system.


Assuntos
Registros Eletrônicos de Saúde , Estudo de Associação Genômica Ampla , Viés , Michigan , Fenótipo , Polimorfismo de Nucleotídeo Único
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...