Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters










Database
Language
Publication year range
1.
Mach Learn ; 106(2): 277-305, 2017 Feb.
Article in English | MEDLINE | ID: mdl-29249866

ABSTRACT

Machine learning methods provide a powerful approach for analyzing longitudinal data in which repeated measurements are observed for a subject over time. We boost multivariate trees to fit a novel flexible semi-nonparametric marginal model for longitudinal data. In this model, features are assumed to be nonparametric, while feature-time interactions are modeled semi-nonparametrically utilizing P-splines with estimated smoothing parameter. In order to avoid overfitting, we describe a relatively simple in sample cross-validation method which can be used to estimate the optimal boosting iteration and which has the surprising added benefit of stabilizing certain parameter estimates. Our new multivariate tree boosting method is shown to be highly flexible, robust to covariance misspecification and unbalanced designs, and resistant to overfitting in high dimensions. Feature selection can be used to identify important features and feature-time interactions. An application to longitudinal data of forced 1-second lung expiratory volume (FEV1) for lung transplant patients identifies an important feature-time interaction and illustrates the ease with which our method can find complex relationships in longitudinal data.

2.
Biostatistics ; 15(4): 757-73, 2014 Oct.
Article in English | MEDLINE | ID: mdl-24728979

ABSTRACT

We introduce a new approach to competing risks using random forests. Our method is fully non-parametric and can be used for selecting event-specific variables and for estimating the cumulative incidence function. We show that the method is highly effective for both prediction and variable selection in high-dimensional problems and in settings such as HIV/AIDS that involve many competing risks.


Subject(s)
Data Interpretation, Statistical , Models, Statistical , Risk , Survival Analysis , HIV Infections/drug therapy , HIV Infections/mortality , Humans
3.
Circ Cardiovasc Qual Outcomes ; 4(5): 521-32, 2011 Sep.
Article in English | MEDLINE | ID: mdl-21862719

ABSTRACT

BACKGROUND- Simultaneous contribution of hundreds of electrocardiographic (ECG) biomarkers to prediction of long-term mortality in postmenopausal women with clinically normal resting ECGs is unknown. METHODS AND RESULTS- We analyzed ECGs and all-cause mortality in 33 144 women enrolled in the Women's Health Initiative trials who were without baseline cardiovascular disease or cancer and had normal ECGs by Minnesota and Novacode criteria. Four hundred and seventy-seven ECG biomarkers, encompassing global and individual ECG findings, were measured with computer algorithms. During a median follow-up of 8.1 years (range for survivors, 0.5 to 11.2 years), 1229 women died. For analyses, the cohort was randomly split into derivation (n=22 096; deaths, 819) and validation (n=11 048; deaths, 410) subsets. ECG biomarkers and demographic and clinical characteristics were simultaneously analyzed using both traditional Cox regression and random survival forest, a novel algorithmic machine-learning approach. Regression modeling failed to converge. Random survival forest variable selection yielded 20 variables that were independently predictive of long-term mortality, 14 of which were ECG biomarkers related to autonomic tone, atrial conduction, and ventricular depolarization and repolarization. CONCLUSIONS- We identified 14 ECG biomarkers from among hundreds that were associated with long-term prognosis using a novel random forest variable selection methodology. These biomarkers were related to autonomic tone, atrial conduction, ventricular depolarization, and ventricular repolarization. Quantitative ECG biomarkers have prognostic importance and may be markers of subclinical disease in apparently healthy postmenopausal women.


Subject(s)
Biomarkers/metabolism , Cardiovascular Diseases/diagnosis , Electrocardiography , Models, Statistical , Postmenopause , Aged , Algorithms , Cardiovascular Diseases/epidemiology , Cardiovascular Diseases/mortality , Cardiovascular Diseases/physiopathology , Female , Follow-Up Studies , Heart Conduction System/physiology , Humans , Middle Aged , Postmenopause/physiology , Prognosis , Survival Analysis , Women's Health/trends
4.
Stat Probab Lett ; 80(13-14): 1056-1064, 2010 Jul 01.
Article in English | MEDLINE | ID: mdl-20582150

ABSTRACT

We prove uniform consistency of Random Survival Forests (RSF), a newly introduced forest ensemble learner for analysis of right-censored survival data. Consistency is proven under general splitting rules, bootstrapping, and random selection of variables-that is, under true implementation of the methodology. Under this setting we show that the forest ensemble survival function converges uniformly to the true population survival function. To prove this result we make one key assumption regarding the feature space: we assume that all variables are factors. Doing so ensures that the feature space has finite cardinality and enables us to exploit counting process theory and the uniform consistency of the Kaplan-Meier survival function.

5.
BMC Bioinformatics ; 7: 59, 2006 Feb 08.
Article in English | MEDLINE | ID: mdl-16466568

ABSTRACT

BACKGROUND: DNA microarrays open up a new horizon for studying the genetic determinants of disease. The high throughput nature of these arrays creates an enormous wealth of information, but also poses a challenge to data analysis. Inferential problems become even more pronounced as experimental designs used to collect data become more complex. An important example is multigroup data collected over different experimental groups, such as data collected from distinct stages of a disease process. We have developed a method specifically addressing these issues termed Bayesian ANOVA for microarrays (BAM). The BAM approach uses a special inferential regularization known as spike-and-slab shrinkage that provides an optimal balance between total false detections and total false non-detections. This translates into more reproducible differential calls. Spike and slab shrinkage is a form of regularization achieved by using information across all genes and groups simultaneously. RESULTS: BAMarray is a graphically oriented Java-based software package that implements the BAM method for detecting differentially expressing genes in multigroup microarray experiments (up to 256 experimental groups can be analyzed). Drop-down menus allow the user to easily select between different models and to choose various run options. BAMarraycan also be operated in a fully automated mode with preselected run options. Tuning parameters have been preset at theoretically optimal values freeing the user from such specifications. BAMarray provides estimates for gene differential effects and automatically estimates data adaptive, optimal cutoff values for classifying genes into biological patterns of differential activity across experimental groups. A graphical suite is a core feature of the product and includes diagnostic plots for assessing model assumptions and interactive plots that enable tracking of prespecified gene lists to study such things as biological pathway perturbations. The user can zoom in and lasso genes of interest that can then be saved for downstream analyses. CONCLUSION: BAMarray is user friendly platform independent software that effectively and efficiently implements the BAM methodology. Classifying patterns of differential activity is greatly facilitated by a data adaptive cutoff rule and a graphical suite. BAMarray is licensed software freely available to academic institutions. More information can be found at http://www.bamarray.com.


Subject(s)
Algorithms , Gene Expression Profiling/methods , Oligonucleotide Array Sequence Analysis/methods , Programming Languages , Software , Analysis of Variance , Bayes Theorem , Data Interpretation, Statistical , Reproducibility of Results , Sensitivity and Specificity
SELECTION OF CITATIONS
SEARCH DETAIL
...