Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 2 de 2
Filter
Add more filters











Database
Language
Publication year range
1.
Bioinformatics ; 23(16): 2080-7, 2007 Aug 15.
Article in English | MEDLINE | ID: mdl-17553857

ABSTRACT

MOTIVATION: Survival prediction from gene expression data and other high-dimensional genomic data has been subject to much research during the last years. These kinds of data are associated with the methodological problem of having many more gene expression values than individuals. In addition, the responses are censored survival times. Most of the proposed methods handle this by using Cox's proportional hazards model and obtain parameter estimates by some dimension reduction or parameter shrinkage estimation technique. Using three well-known microarray gene expression data sets, we compare the prediction performance of seven such methods: univariate selection, forward stepwise selection, principal components regression (PCR), supervised principal components regression, partial least squares regression (PLS), ridge regression and the lasso. RESULTS: Statistical learning from subsets should be repeated several times in order to get a fair comparison between methods. Methods using coefficient shrinkage or linear combinations of the gene expression values have much better performance than the simple variable selection methods. For our data sets, ridge regression has the overall best performance. AVAILABILITY: Matlab and R code for the prediction methods are available at http://www.med.uio.no/imb/stat/bmms/software/microsurv/.


Subject(s)
Biomarkers, Tumor/analysis , Neoplasm Proteins/analysis , Neoplasms/metabolism , Neoplasms/mortality , Oligonucleotide Array Sequence Analysis/methods , Proportional Hazards Models , Survival Analysis , Algorithms , Diagnosis, Computer-Assisted/methods , Female , Forecasting , Gene Expression Profiling/methods , Humans , Reproducibility of Results , Sensitivity and Specificity , Survival Rate
2.
Ann Hum Genet ; 68(Pt 5): 461-71, 2004 Sep.
Article in English | MEDLINE | ID: mdl-15469423

ABSTRACT

In a number of practical cases it is important to determine the likely geographical origin of an individual or a biological sample. A dead body, old bones or a sample of semen may be available. Information on where the sample might come from can assist investigation or research. The first part of this paper is independent of specific data structure. We formulate the problem as a classification problem. Bayes' theorem allows different sources of information or data to be reconciled conveniently. The main part of the paper involves high dimensional data for which simple, standard methods are not likely to work properly. Mitochondrial DNA (mtDNA) data is a typical example of such data. We propose a procedure involving essentially two steps. First, principal component analysis is used to reduce the dimension of the data. Next, quadratic discriminant analysis performs the actual classification. A cross validation procedure is implemented to select the optimal number of principal components. The importance of using separate data sets for model fitting and testing is emphasized. This method distinguishes well between individuals with a self reported European (Icelandic or German) origin and SE Africans. In this case the error rate is 2.0%.


Subject(s)
DNA, Mitochondrial/genetics , Genetics, Population , Geography , Models, Theoretical , Population Dynamics , Africa , Anthropology, Physical , Bayes Theorem , Discriminant Analysis , Europe , Forensic Medicine , Humans
SELECTION OF CITATIONS
SEARCH DETAIL