Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 2 de 2
Filter
Add more filters










Database
Language
Publication year range
1.
Big Data ; 11(3): 199-214, 2023 06.
Article in English | MEDLINE | ID: mdl-34612727

ABSTRACT

Although confirmatory modeling has dominated much of applied research in medical, business, and behavioral sciences, modeling large data sets with the goal of accurate prediction has become more widely accepted. The current practice for fitting predictive models is guided by heuristic-based modeling frameworks that lead researchers to make a series of often isolated decisions regarding data preparation and cleaning that may result in substandard predictive performance. In this article, we use an experimental design to evaluate the impact of six factors related to data preparation and model selection (techniques for numerical imputation, categorical imputation, encoding, subsampling for unbalanced data, feature selection, and machine learning algorithm) and their interactions on the predictive accuracy of models applied to a large, publicly available heart transplantation database. Our factorial experiment includes 10,800 models evaluated on 5 independent test partitions of the data. Results confirm that some decisions made early in the modeling process interact with later decisions to affect predictive performance; therefore, the current practice of making these decisions independently can negatively affect predictive outcomes. A key result of this case study is to highlight the need for improved rigor in applied predictive research. By using the scientific method to inform predictive modeling, we can work toward a framework for applied predictive modeling and a standard for reproducibility in predictive research.


Subject(s)
Algorithms , Machine Learning , Reproducibility of Results , Databases, Factual
2.
Qual Eng ; 30(4): 546-555, 2018.
Article in English | MEDLINE | ID: mdl-33442200

ABSTRACT

Poisson regression is a commonly used tool for analyzing rate data; however, the assumption that the mean and variance of a process are equal rarely holds true in practice. When this assumption is violated, a quasi-Poisson distribution can be used to account for the existing over- or under-dispersion. This paper presents an analysis of a study conducted by NASA to assess the performance of a new airborne spacing algorithm. A deterministic computer simulation was conducted to examine the algorithm in various conditions designed to simulate real-life scenarios, and two measures of algorithm performance were modeled using both continuous and categorical factors. Due to the presence of under-dispersion, tests for significance of main effects and two-factor interactions required bias adjustment. This paper presents a comparison of tests of effects for the Poisson and quasi-Poisson models, details of fitting these models using common statistical software packages, and calculation of dispersion tests.

SELECTION OF CITATIONS
SEARCH DETAIL
...