Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 2 de 2
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Biometrics ; 79(1): 190-202, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-34747010

RESUMO

Readily available proxies for the time of disease onset such as the time of the first diagnostic code can lead to substantial risk prediction error if performing analyses based on poor proxies. Due to the lack of detailed documentation and labor intensiveness of manual annotation, it is often only feasible to ascertain for a small subset the current status of the disease by a follow-up time rather than the exact time. In this paper, we aim to develop risk prediction models for the onset time efficiently leveraging both a small number of labels on the current status and a large number of unlabeled observations on imperfect proxies. Under a semiparametric transformation model for onset and a highly flexible measurement error model for proxy onset time, we propose the semisupervised risk prediction method by combining information from proxies and limited labels efficiently. From an initially estimator solely based on the labeled subset, we perform a one-step correction with the full data augmenting against a mean zero rank correlation score derived from the proxies. We establish the consistency and asymptotic normality of the proposed semisupervised estimator and provide a resampling procedure for interval estimation. Simulation studies demonstrate that the proposed estimator performs well in a finite sample. We illustrate the proposed estimator by developing a genetic risk prediction model for obesity using data from Mass General Brigham Healthcare Biobank.


Assuntos
Algoritmos , Registros Eletrônicos de Saúde , Simulação por Computador , Fatores de Risco
2.
Stat Methods Med Res ; 29(2): 455-465, 2020 02.
Artigo em Inglês | MEDLINE | ID: mdl-30943854

RESUMO

Electronic medical records data are valuable resources for discovery research. They contain detailed phenotypic information on individual patients, opening opportunities for simultaneously studying multiple phenotypes. A useful tool for such simultaneous assessment is the phenome-wide association study, which relates a genomic or biological marker of interest to a wide spectrum of disease phenotypes, typically defined by the diagnostic billing codes. One challenge arises when the biomarker of interest is expensive to measure on the entire electronic medical record cohort. Performing phenome-wide association study based on supervised estimation using only subjects who have marker measurements may yield limited power. In this paper, we focus on the setting where the marker is measured on a small fraction of the patients while a few surrogate markers such as historical measurements of the biomarker are available on a large number of patients. We propose an efficient semi-supervised estimation procedure to estimate the covariance between the biomarker and the billing code, leveraging the surrogate marker information. We employ surrogate marker values to impute the missing outcome via a two-step semi-non-parametric approach and demonstrate that our proposed estimator is always more efficient than the supervised counterpart without requiring the imputation model to be correct. We illustrate the proposed procedure by assessing the association between the C-reactive protein and some inflammatory diseases with an electronic medical record study of inflammatory bowel disease performed with the Partners HealthCare electronic medical record database where C-reactive protein was only measured for a small fraction of the patients due to budget constraints.


Assuntos
Interpretação Estatística de Dados , Registros Eletrônicos de Saúde , Estudo de Associação Genômica Ampla , Algoritmos , Viés , Biomarcadores , Doenças Inflamatórias Intestinais
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...