Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 13 de 13
Filter
1.
BMC Med Res Methodol ; 23(1): 178, 2023 08 02.
Article in English | MEDLINE | ID: mdl-37533017

ABSTRACT

BACKGROUND: The Targeted Learning roadmap provides a systematic guide for generating and evaluating real-world evidence (RWE). From a regulatory perspective, RWE arises from diverse sources such as randomized controlled trials that make use of real-world data, observational studies, and other study designs. This paper illustrates a principled approach to assessing the validity and interpretability of RWE. METHODS: We applied the roadmap to a published observational study of the dose-response association between ritodrine hydrochloride and pulmonary edema among women pregnant with twins in Japan. The goal was to identify barriers to causal effect estimation beyond unmeasured confounding reported by the study's authors, and to explore potential options for overcoming the barriers that robustify results. RESULTS: Following the roadmap raised issues that led us to formulate alternative causal questions that produced more reliable, interpretable RWE. The process revealed a lack of information in the available data to identify a causal dose-response curve. However, under explicit assumptions the effect of treatment with any amount of ritodrine versus none, albeit a less ambitious parameter, can be estimated from data. CONCLUSIONS: Before RWE can be used in support of clinical and regulatory decision-making, its quality and reliability must be systematically evaluated. The TL roadmap prescribes how to carry out a thorough, transparent, and realistic assessment of RWE. We recommend this approach be a routine part of any decision-making process.


Subject(s)
Research Design , Female , Humans , Reproducibility of Results , Japan , Randomized Controlled Trials as Topic
2.
Ann Epidemiol ; 86: 34-48.e28, 2023 10.
Article in English | MEDLINE | ID: mdl-37343734

ABSTRACT

PURPOSE: The targeted maximum likelihood estimation (TMLE) statistical data analysis framework integrates machine learning, statistical theory, and statistical inference to provide a least biased, efficient, and robust strategy for estimation and inference of a variety of statistical and causal parameters. We describe and evaluate the epidemiological applications that have benefited from recent methodological developments. METHODS: We conducted a systematic literature review in PubMed for articles that applied any form of TMLE in observational studies. We summarized the epidemiological discipline, geographical location, expertize of the authors, and TMLE methods over time. We used the Roadmap of Targeted Learning and Causal Inference to extract key methodological aspects of the publications. We showcase the contributions to the literature of these TMLE results. RESULTS: Of the 89 publications included, 33% originated from the University of California at Berkeley, where the framework was first developed by Professor Mark van der Laan. By 2022, 59% of the publications originated from outside the United States and explored up to seven different epidemiological disciplines in 2021-2022. Double-robustness, bias reduction, and model misspecification were the main motivations that drew researchers toward the TMLE framework. Through time, a wide variety of methodological, tutorial, and software-specific articles were cited, owing to the constant growth of methodological developments around TMLE. CONCLUSIONS: There is a clear dissemination trend of the TMLE framework to various epidemiological disciplines and to increasing numbers of geographical areas. The availability of R packages, publication of tutorial papers, and involvement of methodological experts in applied publications have contributed to an exponential increase in the number of studies that understood the benefits and adoption of TMLE.


Subject(s)
Models, Statistical , Public Health , Humans , Likelihood Functions , Bias , Epidemiologic Studies
3.
Int J Epidemiol ; 52(4): 1276-1285, 2023 08 02.
Article in English | MEDLINE | ID: mdl-36905602

ABSTRACT

Common tasks encountered in epidemiology, including disease incidence estimation and causal inference, rely on predictive modelling. Constructing a predictive model can be thought of as learning a prediction function (a function that takes as input covariate data and outputs a predicted value). Many strategies for learning prediction functions from data (learners) are available, from parametric regressions to machine learning algorithms. It can be challenging to choose a learner, as it is impossible to know in advance which one is the most suitable for a particular dataset and prediction task. The super learner (SL) is an algorithm that alleviates concerns over selecting the one 'right' learner by providing the freedom to consider many, such as those recommended by collaborators, used in related research or specified by subject-matter experts. Also known as stacking, SL is an entirely prespecified and flexible approach for predictive modelling. To ensure the SL is well specified for learning the desired prediction function, the analyst does need to make a few important choices. In this educational article, we provide step-by-step guidelines for making these decisions, walking the reader through each of them and providing intuition along the way. In doing so, we aim to empower the analyst to tailor the SL specification to their prediction task, thereby ensuring their SL performs as well as possible. A flowchart provides a concise, easy-to-follow summary of key suggestions and heuristics, based on our accumulated experience and guided by SL optimality theory.


Subject(s)
Algorithms , Machine Learning , Humans
4.
Stat Med ; 42(7): 1013-1044, 2023 03 30.
Article in English | MEDLINE | ID: mdl-36897184

ABSTRACT

In this work we introduce the personalized online super learner (POSL), an online personalizable ensemble machine learning algorithm for streaming data. POSL optimizes predictions with respect to baseline covariates, so personalization can vary from completely individualized, that is, optimization with respect to subject ID, to many individuals, that is, optimization with respect to common baseline covariates. As an online algorithm, POSL learns in real time. As a super learner, POSL is grounded in statistical optimality theory and can leverage a diversity of candidate algorithms, including online algorithms with different training and update times, fixed/offline algorithms that are not updated during POSL's fitting procedure, pooled algorithms that learn from many individuals' time series, and individualized algorithms that learn from within a single time series. POSL's ensembling of the candidates can depend on the amount of data collected, the stationarity of the time series, and the mutual characteristics of a group of time series. Depending on the underlying data-generating process and the information available in the data, POSL is able to adapt to learning across samples, through time, or both. For a range of simulations that reflect realistic forecasting scenarios and in a medical application, we examine the performance of POSL relative to other current ensembling and online learning methods. We show that POSL is able to provide reliable predictions for both short and long time series, and it's able to adjust to changing data-generating environments. We further cultivate POSL's practicality by extending it to settings where time series dynamically enter and exit.


Subject(s)
Algorithms , Machine Learning , Humans
6.
Epidemics ; 41: 100640, 2022 12.
Article in English | MEDLINE | ID: mdl-36274569

ABSTRACT

We investigated the initial outbreak rates and subsequent social distancing behaviour over the initial phase of the COVID-19 pandemic across 29 Combined Statistical Areas (CSAs) of the United States. We used the Numerus Model Builder Data and Simulation Analysis (NMB-DASA) web application to fit the exponential phase of a SCLAIV+D (Susceptible, Contact, Latent, Asymptomatic infectious, symptomatic Infectious, Vaccinated, Dead) disease classes model to outbreaks, thereby allowing us to obtain an estimate of the basic reproductive number R0 for each CSA. Values of R0 ranged from 1.9 to 9.4, with a mean and standard deviation of 4.5±1.8. Fixing the parameters from the exponential fit, we again used NMB-DASA to estimate a set of social distancing behaviour parameters to compute an epidemic flattening index cflatten. Finally, we applied hierarchical clustering methods using this index to divide CSA outbreaks into two clusters: those presenting a social distancing response that was either weaker or stronger. We found cflatten to be more influential in the clustering process than R0. Thus, our results suggest that the behavioural response after a short initial exponential growth phase is likely to be more determinative of the rise of an epidemic than R0 itself.


Subject(s)
COVID-19 , Humans , COVID-19/epidemiology , Pandemics/prevention & control , Physical Distancing , Basic Reproduction Number , Disease Outbreaks/prevention & control
7.
Epigenetics ; 17(13): 2259-2277, 2022 12.
Article in English | MEDLINE | ID: mdl-36017556

ABSTRACT

Sufficient evidence supports a relationship between certain myeloid neoplasms and exposure to benzene or formaldehyde. DNA methylation could underlie benzene- and formaldehyde-induced health outcomes, but data in exposed human populations are limited. We conducted two cross-sectional epigenome-wide association studies (EWAS), one in workers exposed to benzene and another in workers exposed to formaldehyde. Using HumanMethylation450 BeadChips, we investigated differences in blood cell DNA methylation among 50 benzene-exposed subjects and 48 controls, and among 31 formaldehyde-exposed subjects and 40 controls. We performed CpG-level and regional-level analyses. In the benzene EWAS, we found genome-wide significant alterations, i.e., FWER-controlled P-values <0.05, in the mean and variance of methylation at 22 and 318 CpG sites, respectively, and in mean methylation of a large genomic region. Pathway analysis of genes corresponding to benzene-associated differential methylation sites revealed an impact on the AMPK signalling pathway. In formaldehyde-exposed subjects compared to controls, 9 CpGs in the DUSP22 gene promoter had genome-wide significant decreased methylation variability and a large region of the HOXA5 promoter with 44 CpGs was hypomethylated. Our findings suggest that DNA methylation may contribute to the pathogenesis of diseases related to benzene and formaldehyde exposure. Aberrant expression and methylation of HOXA5 previously has been shown to be clinically significant in myeloid leukaemias. The tumour suppressor gene DUSP22 is a potential biomarker of exposure to formaldehyde, and irregularities have been associated with multiple exposures and diseases.


Subject(s)
Benzene , Occupational Exposure , Humans , Benzene/toxicity , Benzene/analysis , DNA Methylation , Epigenome , Cross-Sectional Studies , Occupational Exposure/adverse effects , Formaldehyde/toxicity , Genome-Wide Association Study , CpG Islands
8.
NPJ Digit Med ; 5(1): 66, 2022 May 31.
Article in English | MEDLINE | ID: mdl-35641814

ABSTRACT

Machine learning (ML) and artificial intelligence (AI) algorithms have the potential to derive insights from clinical data and improve patient outcomes. However, these highly complex systems are sensitive to changes in the environment and liable to performance decay. Even after their successful integration into clinical practice, ML/AI algorithms should be continuously monitored and updated to ensure their long-term safety and effectiveness. To bring AI into maturity in clinical care, we advocate for the creation of hospital units responsible for quality assurance and improvement of these algorithms, which we refer to as "AI-QI" units. We discuss how tools that have long been used in hospital quality assurance and quality improvement can be adapted to monitor static ML algorithms. On the other hand, procedures for continual model updating are still nascent. We highlight key considerations when choosing between existing methods and opportunities for methodological innovation.

9.
Am J Epidemiol ; 191(9): 1640-1651, 2022 08 22.
Article in English | MEDLINE | ID: mdl-35512316

ABSTRACT

Inverse probability weighting (IPW) and targeted maximum likelihood estimation (TMLE) are methodologies that can adjust for confounding and selection bias and are often used for causal inference. Both estimators rely on the positivity assumption that within strata of confounders there is a positive probability of receiving treatment at all levels under consideration. Practical applications of IPW require finite inverse probability (IP) weights. TMLE requires that propensity scores (PS) be bounded away from 0 and 1. Although truncation can improve variance and finite sample bias, this artificial distortion of the IP weights and PS distribution introduces asymptotic bias. As sample size grows, truncation-induced bias eventually swamps variance, rendering nominal confidence interval coverage and hypothesis tests invalid. We present a simple truncation strategy based on the sample size, n, that sets the upper bound on IP weights at $\sqrt{\textit{n}}$ ln n/5. For TMLE, the lower bound on the PS should be set to 5/($\sqrt{\textit{n}}$ ln n/5). Our strategy was designed to optimize the mean squared error of the parameter estimate. It naturally extends to data structures with missing outcomes. Simulation studies and a data analysis demonstrate our strategy's ability to minimize both bias and mean squared error in comparison with other common strategies, including the popular but flawed quantile-based heuristic.


Subject(s)
Propensity Score , Bias , Causality , Computer Simulation , Humans , Likelihood Functions
10.
Stat Med ; 41(12): 2132-2165, 2022 05 30.
Article in English | MEDLINE | ID: mdl-35172378

ABSTRACT

Several recently developed methods have the potential to harness machine learning in the pursuit of target quantities inspired by causal inference, including inverse weighting, doubly robust estimating equations and substitution estimators like targeted maximum likelihood estimation. There are even more recent augmentations of these procedures that can increase robustness, by adding a layer of cross-validation (cross-validated targeted maximum likelihood estimation and double machine learning, as applied to substitution and estimating equation approaches, respectively). While these methods have been evaluated individually on simulated and experimental data sets, a comprehensive analysis of their performance across real data based simulations have yet to be conducted. In this work, we benchmark multiple widely used methods for estimation of the average treatment effect using ten different nutrition intervention studies data. A nonparametric regression method, undersmoothed highly adaptive lasso, is used to generate the simulated distribution which preserves important features from the observed data and reproduces a set of true target parameters. For each simulated data, we apply the methods above to estimate the average treatment effects as well as their standard errors and resulting confidence intervals. Based on the analytic results, a general recommendation is put forth for use of the cross-validated variants of both substitution and estimating equation estimators. We conclude that the additional layer of cross-validation helps in avoiding unintentional over-fitting of nuisance parameter functionals and leads to more robust inferences.


Subject(s)
Machine Learning , Research Design , Causality , Computer Simulation , Humans , Likelihood Functions , Models, Statistical , Regression Analysis
11.
Environ Int ; 158: 106871, 2022 01.
Article in English | MEDLINE | ID: mdl-34560324

ABSTRACT

Epigenetic aging biomarkers are associated with increased morbidity and mortality. We evaluated if occupational exposure to three established chemical carcinogens is associated with acceleration of epigenetic aging. We studied workers in China occupationally exposed to benzene, trichloroethylene (TCE) or formaldehyde by measuring personal air exposures prior to blood collection. Unexposed controls matched by age and sex were selected from nearby factories. We measured leukocyte DNA methylation (DNAm) in peripheral white blood cells using the Infinium HumanMethylation450 BeadChip to calculate five epigenetic aging clocks and DNAmTL, a biomarker associated with leukocyte telomere length and cell replication. We tested associations between exposure intensity and epigenetic age acceleration (EAA), defined as the residuals of regressing the DNAm aging biomarker on chronological age, matching factors and potential confounders. Median differences in EAA between exposure groups were tested using a permutation test with exact p-values. Epigenetic clocks were strongly correlated with age (Spearman r > 0.8) in all three occupational studies. There was a positive exposure-response relationship between benzene and the Skin-Blood Clock EAA biomarker: median EAA was -0.91 years in controls (n = 44), 0.78 years in workers exposed to <10 ppm (n = 41; mean benzene = 1.35 ppm; p = 0.034 vs. controls), and 2.10 years in workers exposed to ≥10 ppm (n = 9; mean benzene = 27.3 ppm; p = 0.019 vs. controls; ptrend = 0.0021). In the TCE study, control workers had a median Skin-Blood Clock EAA of -0.54 years (n = 71) compared to 1.63 years among workers exposed to <10 ppm of TCE (n = 27; mean TCE = 4.22 ppm; p = 0.035). We observed no evidence of EAA associations with formaldehyde exposure (39 controls, 31 exposed). Occupational benzene and TCE exposure were associated with increased epigenetic age acceleration measured by the Skin-Blood Clock. For TCE, there was some evidence of epigenetic age acceleration for lower exposures compared to controls. Our results suggest that some chemical carcinogens may accelerate epigenetic aging.


Subject(s)
Occupational Exposure , Trichloroethylene , Aging , Benzene/toxicity , Biomarkers , Epigenesis, Genetic , Formaldehyde/toxicity , Humans , Occupational Exposure/analysis , Trichloroethylene/toxicity
12.
Am J Physiol Endocrinol Metab ; 318(5): E667-E677, 2020 05 01.
Article in English | MEDLINE | ID: mdl-32045263

ABSTRACT

The global prevalence of type 2 diabetes (T2D) has doubled since 1980. Human epidemiological studies support arsenic exposure as a risk factor for T2D, although the precise mechanism is unclear. We hypothesized that chronic arsenic ingestion alters glucose homeostasis by impairing adaptive thermogenesis, i.e., body heat production in cold environments. Arsenic is a pervasive environmental contaminant, with more than 200 million people worldwide currently exposed to arsenic-contaminated drinking water. Male C57BL/6J mice exposed to sodium arsenite in drinking water at 300 µg/L for 9 wk experienced significantly decreased metabolic heat production when acclimated to chronic cold tolerance testing, as evidenced by indirect calorimetry, despite no change in physical activity. Arsenic exposure increased total fat mass and subcutaneous inguinal white adipose tissue (iWAT) mass. RNA sequencing analysis of iWAT indicated that arsenic dysregulated mitochondrial processes, including fatty acid metabolism. Western blotting in WAT confirmed that arsenic significantly decreased TOMM20, a correlate of mitochondrial abundance; PGC1A, a master regulator of mitochondrial biogenesis; and, CPT1B, the rate-limiting step of fatty acid oxidation (FAO). Our findings show that chronic arsenic exposure impacts the mitochondrial proteins of thermogenic tissues involved in energy expenditure and substrate regulation, providing novel mechanistic evidence for arsenic's role in T2D development.


Subject(s)
Adipose Tissue, Brown/drug effects , Arsenites/pharmacology , Sodium Compounds/pharmacology , Thermogenesis/drug effects , Adipose Tissue, Brown/metabolism , Adipose Tissue, White/drug effects , Adipose Tissue, White/metabolism , Animals , Energy Metabolism/drug effects , Male , Membrane Transport Proteins/metabolism , Methacrylates , Mice , Mice, Inbred C57BL , Mitochondrial Precursor Protein Import Complex Proteins , Peroxisome Proliferator-Activated Receptor Gamma Coactivator 1-alpha/metabolism , Receptors, Cell Surface/metabolism , Siloxanes , Subcutaneous Fat/drug effects , Subcutaneous Fat/metabolism
13.
Epigenetics ; 14(11): 1112-1124, 2019 11.
Article in English | MEDLINE | ID: mdl-31241004

ABSTRACT

Human exposure to trichloroethylene (TCE) is linked to kidney cancer, autoimmune diseases, and probably non-Hodgkin lymphoma. Additionally, TCE exposed mice and cell cultures show altered DNA methylation. To evaluate associations between TCE exposure and DNA methylation in humans, we conducted an epigenome-wide association study (EWAS) in TCE exposed workers using the HumanMethylation450 BeadChip. Across individual CpG probes, genomic regions, and globally (i.e., the 450K methylome), we investigated differences in mean DNA methylation and differences in variability of DNA methylation between 73 control (< 0.005 ppm TCE), 30 lower exposed (< 10 ppm TCE), and 37 higher exposed ( ≥ 10 ppm TCE) subjects' white blood cells. We found that TCE exposure increased methylation variation globally (Kruskal-Wallis p-value = 3.75e-3) and in 25 CpG sites at a genome-wide significance level (Bonferroni p-value < 0.05). We identified a 609 basepair region in the TRIM68 gene promoter that exhibited hypomethylation with increased exposure to TCE (FWER = 1.20e-2). Also, genes that matched to differentially variable CpGs were enriched in the 'focal adhesion' biological pathway (p-value = 2.80e-2). All in all, human exposure to TCE was associated with epigenetic alterations in genes involved in cell-matrix adhesions and interferon subtype expression, which are important in the development of autoimmune diseases; and in genes related to cancer development. These results suggest that DNA methylation may play a role in the pathogenesis of TCE exposure-related diseases and that TCE exposure may contribute to epigenetic drift.


Subject(s)
Autoimmune Diseases/genetics , DNA Methylation , Genetic Variation , Neoplasms/genetics , Trichloroethylene/pharmacology , Adult , Autoantigens/genetics , CpG Islands , Female , Genetic Loci , Genetic Predisposition to Disease , Humans , Male , Tripartite Motif Proteins/genetics , Ubiquitin-Protein Ligases/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...