Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 2 de 2
Filter
Add more filters










Database
Language
Publication year range
1.
Biostat Bioinforma Biomath ; 2(4): 157-186, 2011 Aug 01.
Article in English | MEDLINE | ID: mdl-25364211

ABSTRACT

BACKGROUND: In the typical setting of gene-selection problems from high-dimensional data, e.g., gene expression data from microarray or next-generation sequencing-based technologies, an enormous volume of high-throughput data is generated, and there is often a need for a simple, computationally-inexpensive, non-parametric screening procedure than can quickly and accurately find a low-dimensional variable subset that preserves biological information from the original very high-dimensional data (dimension p > 40,000). This is in contrast to the very sophisticated variable selection methods that are computationally expensive, need pre-processing routines, and often require calibration of priors. RESULTS: We present a tree-based sequential CART (S-CART) approach to variable selection in the binary classification setting and compare it against the more sophisticated procedures using simulated and real biological data. In simulated data, we analyze S-CART performance versus (i) a random forest (RF), (ii) a fully-parametric Bayesian stochastic search variable selection (SSVS), and (iii) the moderated t-test statistic from the LIMMA package in R. The simulation study is based on a hierarchical Bayesian model, where dataset dimensionality, percentage of significant variables, and substructure via dependency vary. Selection efficacy is measured through false-discovery and missed-discovery rates. In all scenarios, the S-CART method is seen to consistently outperform SSVS and RF in both speed and detection accuracy. We demonstrate the utility of the S-CART technique both on simulated data and in a control-treatment mouse study. We show that the network analysis based on the S-CART-selected gene subset in essence recapitulates the biological findings of the study using only a fraction of the original set of genes considered in the study's analysis. CONCLUSIONS: The relatively simple-minded gene selection algorithms like S-CART may often in practical circumstances be preferred over much more sophisticated ones. The advantage of the "greedy" selection methods utilized by S-CART and the likes is that they scale well with the problem size and require virtually no tuning or training while remaining efficient in extracting the relevant information from microarray-like datasets containing large number of redundant or irrelevant variables. AVAILABILITY: The MATLAB 7.4b code for the S-CART implementation is available for download from https://neyman.mcg.edu/posts/scart.zip.

2.
Birth Defects Res C Embryo Today ; 81(1): 1-19, 2007 Mar.
Article in English | MEDLINE | ID: mdl-17539026

ABSTRACT

Cells in the developing embryo must integrate complex signals from the genome and environment to make decisions about their behavior or fate. The ability to understand the fundamental biology of the decision-making process, and how these decisions may go awry during abnormal development, requires a systems biology paradigm. Presently, the ability to build models with predictive capability in birth defects research is constrained by an incomplete understanding of the fundamental parameters underlying embryonic susceptibility, sensitivity, and vulnerability. Key developmental milestones must be parameterized in terms of system structure and dynamics, the relevant control methods, and the overall design logic of metabolic and regulatory networks. High-content data from genome-based studies provide some comprehensive coverage of these operational processes but a key research challenge is data integration. Analysis can be facilitated by data management resources and software to reveal the structure and function of bionetwork motifs potentially associated with an altered developmental phenotype. Borrowing from applied mathematics and artificial intelligence, we conceptualize a system that can help address the new challenges posed by the transformation of birth defects research into a data-driven science.


Subject(s)
Database Management Systems , Embryonic Development , Mice/abnormalities , Mice/embryology , Animals , Artificial Intelligence , Computational Biology , Female , Knowledge Bases , Mathematics , Mice/genetics , Models, Biological , Pregnancy , Systems Biology , Toxicogenetics
SELECTION OF CITATIONS
SEARCH DETAIL
...