Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 2 de 2
Filter
Add more filters










Database
Language
Publication year range
1.
BMC Bioinformatics ; 12: 42, 2011 Feb 01.
Article in English | MEDLINE | ID: mdl-21281522

ABSTRACT

BACKGROUND: Typical analysis of microarray data ignores the correlation between gene expression values. In this paper we present a model for microarray data which specifically allows for correlation between genes. As a result we combine gene network ideas with linear models and differential expression. RESULTS: We use sparse inverse covariance matrices and their associated graphical representation to capture the notion of gene networks. An important issue in using these models is the identification of the pattern of zeroes in the inverse covariance matrix. The limitations of existing methods for doing this are discussed and we provide a workable solution for determining the zero pattern. We then consider a method for estimating the parameters in the inverse covariance matrix which is suitable for very high dimensional matrices. We also show how to construct multivariate tests of hypotheses. These overall multivariate tests can be broken down into two components, the first one being similar to tests for differential expression and the second involving the connections between genes. CONCLUSION: The methods in this paper enable the extraction of a wealth of information concerning the relationships between genes which can be conveniently represented in graphical form. Differentially expressed genes can be placed in the context of the gene network and places in the gene network where unusual or interesting patterns have emerged can be identified, leading to the formulation of hypotheses for future experimentation.


Subject(s)
Gene Expression Profiling/methods , Linear Models , Multivariate Analysis , Oligonucleotide Array Sequence Analysis/methods , Algorithms , Computer Simulation , Gene Regulatory Networks
2.
BMC Bioinformatics ; 9: 195, 2008 Apr 15.
Article in English | MEDLINE | ID: mdl-18410693

ABSTRACT

BACKGROUND: With the advent of high throughput biotechnology data acquisition platforms such as micro arrays, SNP chips and mass spectrometers, data sets with many more variables than observations are now routinely being collected. Finding relationships between response variables of interest and variables in such data sets is an important problem akin to finding needles in a haystack. Whilst methods for a number of response types have been developed a general approach has been lacking. RESULTS: The major contribution of this paper is to present a unified methodology which allows many common (statistical) response models to be fitted to such data sets. The class of models includes virtually any model with a linear predictor in it, for example (but not limited to), multiclass logistic regression (classification), generalised linear models (regression) and survival models. A fast algorithm for finding sparse well fitting models is presented. The ideas are illustrated on real data sets with numbers of variables ranging from thousands to millions. R code implementing the ideas is available for download. CONCLUSION: The method described in this paper enables existing work on response models when there are less variables than observations to be leveraged to the situation when there are many more variables than observations. It is a powerful approach to finding parsimonious models for such datasets. The method is capable of handling problems with millions of variables and a large variety of response types within the one framework. The method compares favourably to existing methods such as support vector machines and random forests, but has the advantage of not requiring separate variable selection steps. It is also works for data types which these methods were not designed to handle. The method usually produces very sparse models which make biological interpretation simpler and more focused.


Subject(s)
Information Storage and Retrieval/methods , Models, Statistical , Research Design , Software , Female , Gene Expression Profiling/methods , Humans , Leukemia/classification , Leukemia/genetics , Leukemia/pathology , Lymphoma/genetics , Lymphoma/mortality , Male , Models, Biological , Neoplasm Staging/methods , Oligonucleotide Array Sequence Analysis , Pattern Recognition, Automated/methods , Prostatic Neoplasms/genetics , Prostatic Neoplasms/pathology , Reference Values , Smoking/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...