Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 2 de 2
Filter
Add more filters










Database
Language
Publication year range
1.
BMC Med Res Methodol ; 23(1): 144, 2023 06 19.
Article in English | MEDLINE | ID: mdl-37337173

ABSTRACT

BACKGROUND: Machine learning tools such as random forests provide important opportunities for modeling large, complex modern data generated in medicine. Unfortunately, when it comes to understanding why machine learning models are predictive, applied research continues to rely on 'out of bag' (OOB) variable importance metrics (VIMPs) that are known to have considerable shortcomings within the statistics community. After explaining the limitations of OOB VIMPs - including bias towards correlated features and limited interpretability - we describe a modern approach called 'knockoff VIMPs' and explain its advantages. METHODS: We first evaluate current VIMP practices through an in-depth literature review of 50 recent random forest manuscripts. Next, we recommend organized and interpretable strategies for analysis with knockoff VIMPs, including computing them for groups of features and considering multiple model performance metrics. To demonstrate methods, we develop a random forest to predict 5-year incident stroke in the Sleep Heart Health Study and compare results based on OOB and knockoff VIMPs. RESULTS: Nearly all papers in the literature review contained substantial limitations in their use of VIMPs. In our demonstration, using OOB VIMPs for individual variables suggested two highly correlated lung function variables (forced expiratory volume, forced vital capacity) as the best predictors of incident stroke, followed by age and height. Using an organized analytic approach that considered knockoff VIMPs of both groups of features and individual features, the largest contributions to model sensitivity were medications (especially cardiovascular) and measured medical risk factors, while the largest contributions to model specificity were age, diastolic blood pressure, self-reported medical risk factors, polysomnography features, and pack-years of smoking. Thus, we reach very different conclusions about stroke risk factors using OOB VIMPs versus knockoff VIMPs. CONCLUSIONS: The near-ubiquitous reliance on OOB VIMPs may provide misleading results for researchers who use such methods to guide their research. Given the rapid pace of scientific inquiry using machine learning, it is essential to bring modern knockoff VIMPs that are interpretable and unbiased into widespread applied practice to steer researchers using random forest machine learning toward more meaningful results.


Subject(s)
Random Forest , Stroke , Humans , Benchmarking , Machine Learning , Stroke/diagnosis , Stroke/epidemiology , Sleep
2.
Elife ; 2: e00426, 2013 Apr 09.
Article in English | MEDLINE | ID: mdl-23580255

ABSTRACT

Genetic and molecular approaches have been critical for elucidating the mechanism of the mammalian circadian clock. Here, we demonstrate that the ClockΔ19 mutant behavioral phenotype is significantly modified by mouse strain genetic background. We map a suppressor of the ClockΔ19 mutation to a ∼900 kb interval on mouse chromosome 1 and identify the transcription factor, Usf1, as the responsible gene. A SNP in the promoter of Usf1 causes elevation of its transcript and protein in strains that suppress the Clock mutant phenotype. USF1 competes with the CLOCK:BMAL1 complex for binding to E-box sites in target genes. Saturation binding experiments demonstrate reduced affinity of the CLOCKΔ19:BMAL1 complex for E-box sites, thereby permitting increased USF1 occupancy on a genome-wide basis. We propose that USF1 is an important modulator of molecular and behavioral circadian rhythms in mammals. DOI:http://dx.doi.org/10.7554/eLife.00426.001.


Subject(s)
ARNTL Transcription Factors/metabolism , CLOCK Proteins/metabolism , Circadian Clocks , Circadian Rhythm , DNA/metabolism , Mutation , Upstream Stimulatory Factors/metabolism , ARNTL Transcription Factors/genetics , Animals , Binding Sites , Binding, Competitive , CLOCK Proteins/genetics , Circadian Clocks/genetics , Circadian Rhythm/genetics , E-Box Elements , Gene Expression Regulation , Genotype , Mice , Mice, Inbred BALB C , Mice, Inbred C57BL , Mice, Transgenic , Phenotype , Polymorphism, Single Nucleotide , Promoter Regions, Genetic , Protein Interaction Domains and Motifs , RNA, Messenger/metabolism , Signal Transduction , Species Specificity , Time Factors , Transcription, Genetic , Transcriptional Activation , Upstream Stimulatory Factors/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...