Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 6 de 6
Filter
Add filters








Year range
1.
Journal of Xi'an Jiaotong University(Medical Sciences) ; (6): 628-632, 2021.
Article in Chinese | WPRIM | ID: wpr-1006702

ABSTRACT

【Objective】 To compare the performance of five commonly used variable selection methods in high-dimensional biomedical data variable screening so as to explore the effects of sample size and association among candidate variables on screening results and provide evidence for the development of variable selection strategy in high-dimensional biomedical data analysis. 【Methods】 Variable selection algorithms were implemented based on R-programming language. Monte Carlo method was used to simulate high-dimensional biomedical data under different conditions to evaluate and compare the performance of different variable selection methods. Variable selection performance was evaluated based on the true positive rate and true negative rate in screening. 【Results】 For specified high-dimensional data, the variable selection performance was improved for all the methods when sample size was increased, and the association between candidate variables did affect variable screening results. Simulation results indicated that the elastic network algorithm yielded the best screening performance, LASSO algorithm took the second place, and ridge algorithm did not work at all. 【Conclusion】 Elastic network algorithm is an ideal variable screening method for high-dimensional data variable screening.

2.
Journal of Zhejiang University. Science. B ; (12): 935-947, 2018.
Article in English | WPRIM | ID: wpr-1010434

ABSTRACT

OBJECTIVE@#As one of the most popular designs used in genetic research, family-based design has been well recognized for its advantages, such as robustness against population stratification and admixture. With vast amounts of genetic data collected from family-based studies, there is a great interest in studying the role of genetic markers from the aspect of risk prediction. This study aims to develop a new statistical approach for family-based risk prediction analysis with an improved prediction accuracy compared with existing methods based on family history.@*METHODS@#In this study, we propose an ensemble-based likelihood ratio (ELR) approach, Fam-ELR, for family-based genomic risk prediction. Fam-ELR incorporates a clustered receiver operating characteristic (ROC) curve method to consider correlations among family samples, and uses a computationally efficient tree-assembling procedure for variable selection and model building.@*RESULTS@#Through simulations, Fam-ELR shows its robustness in various underlying disease models and pedigree structures, and attains better performance than two existing family-based risk prediction methods. In a real-data application to a family-based genome-wide dataset of conduct disorder, Fam-ELR demonstrates its ability to integrate potential risk predictors and interactions into the model for improved accuracy, especially on a genome-wide level.@*CONCLUSIONS@#By comparing existing approaches, such as genetic risk-score approach, Fam-ELR has the capacity of incorporating genetic variants with small or moderate marginal effects and their interactions into an improved risk prediction model. Therefore, it is a robust and useful approach for high-dimensional family-based risk prediction, especially on complex disease with unknown or less known disease etiology.


Subject(s)
Female , Humans , Male , Area Under Curve , Computer Simulation , Conduct Disorder/physiopathology , Family Health , Genetic Markers , Genetic Predisposition to Disease , Genetic Variation , Genome, Human , Genome-Wide Association Study , Genomics , Likelihood Functions , Models, Genetic , Odds Ratio , Pedigree , ROC Curve , Reproducibility of Results , Risk Factors
3.
Chinese Journal of Epidemiology ; (12): 679-683, 2017.
Article in Chinese | WPRIM | ID: wpr-737706

ABSTRACT

With the rapid development of genome sequencing technology and bioinformatics in recent years,it has become possible to measure thousands of omics data which might be associated with the progress of diseases,i.e."high-dimensional data".This type of omics data have a common feature that the number of variable p is usually greater than the observation cases n,and often has high correlation between independent variables.Therefore,it is a great statistical challenge to identify really meaningful variables from omics data.This paper summarizes the methods of Bayesian variable selection in the analysis of high-dimensional data.

4.
Chinese Journal of Epidemiology ; (12): 679-683, 2017.
Article in Chinese | WPRIM | ID: wpr-736238

ABSTRACT

With the rapid development of genome sequencing technology and bioinformatics in recent years,it has become possible to measure thousands of omics data which might be associated with the progress of diseases,i.e."high-dimensional data".This type of omics data have a common feature that the number of variable p is usually greater than the observation cases n,and often has high correlation between independent variables.Therefore,it is a great statistical challenge to identify really meaningful variables from omics data.This paper summarizes the methods of Bayesian variable selection in the analysis of high-dimensional data.

5.
Genomics & Informatics ; : 129-132, 2006.
Article in English | WPRIM | ID: wpr-61948

ABSTRACT

Toxicogenomics has recently emerged in the field of toxicology and the DNA microarray technique has become common strategy for predictive toxicology which studies molecular mechanism caused by exposure of chemical or environmental stress. Although microarray experiment offers extensive genomic information to the researchers, yet high dimensional characteristic of the data often makes it hard to extract meaningful result. Therefore we developed toxicant enrichment analysis similar to the common enrichment approach. We also developed web-based system graPT to enable considerable prediction of toxic endpoints of experimental chemical.


Subject(s)
Oligonucleotide Array Sequence Analysis , Toxicogenetics , Toxicology
6.
Genomics & Informatics ; : 65-74, 2003.
Article in English | WPRIM | ID: wpr-197484

ABSTRACT

Data mining differs primarily from traditional data analysis on an important dimension, namely the scale of the data. That is the reason why not only statistical but also computer science principles are needed to extract information from large data sets. In this paper we briefly review data mining, its characteristics, typical data mining algorithms, and potential and ongoing applications of data mining at biopharmaceutical industries. The distinguishing characteristics of data mining lie in its understandability, scalability, its problem driven nature, and its analysis of retrospective or observational data in contrast to experimentally designed data. At a high level one can identify three types of problems for which data mining is useful: description, prediction and search. Brief review of data mining algorithms include decision trees and rules, nonlinear classification methods, memory-based methods, model-based clustering, and graphical dependency models. Application areas covered are discovery compound libraries, clinical trial and disease management data, genomics and proteomics, structural databases for candidate drug compounds, and other applications of pharmaceutical relevance.


Subject(s)
Classification , Data Mining , Dataset , Decision Trees , Disease Management , Drug Discovery , Genomics , Proteomics , Retrospective Studies , Statistics as Topic
SELECTION OF CITATIONS
SEARCH DETAIL