Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 9 de 9
Filter
Add more filters










Database
Publication year range
1.
IEEE Trans Image Process ; 28(7): 3274-3285, 2019 Jul.
Article in English | MEDLINE | ID: mdl-30703025

ABSTRACT

Principal component analysis (PCA) is widely used for feature extraction and dimension reduction in pattern recognition and data analysis. Despite its popularity, the reduced dimension obtained from the PCA is difficult to interpret due to the dense structure of principal loading vectors. To address this issue, several methods have been proposed for sparse PCA, all of which estimate loading vectors with few non-zero elements. However, when more than one principal component is estimated, the associated loading vectors do not possess the same sparsity pattern. Therefore, it becomes difficult to determine a small subset of variables from the original feature space that have the highest contribution in the principal components. To address this issue, an adaptive block sparse PCA method is proposed. The proposed method is guaranteed to obtain the same sparsity pattern across all principal components. Experiments show that applying the proposed sparse PCA method can help improve the performance of feature selection for image processing applications. We further demonstrate that our proposed sparse PCA method can be used to improve the performance of blind source separation for functional magnetic resonance imaging data.

2.
Aust N Z J Stat ; 58(1): 99-119, 2016 Mar.
Article in English | MEDLINE | ID: mdl-27478405

ABSTRACT

Quadratic forms capture multivariate information in a single number, making them useful, for example, in hypothesis testing. When a quadratic form is large and hence interesting, it might be informative to partition the quadratic form into contributions of individual variables. In this paper it is argued that meaningful partitions can be formed, though the precise partition that is determined will depend on the criterion used to select it. An intuitively reasonable criterion is proposed and the partition to which it leads is determined. The partition is based on a transformation that maximises the sum of the correlations between individual variables and the variables to which they transform under a constraint. Properties of the partition, including optimality properties, are examined. The contributions of individual variables to a quadratic form are less clear-cut when variables are collinear, and forming new variables through rotation can lead to greater transparency. The transformation is adapted so that it has an invariance property under such rotation, whereby the assessed contributions are unchanged for variables that the rotation does not affect directly. Application of the partition to Hotelling's one- and two-sample test statistics, Mahalanobis distance and discriminant analysis is described and illustrated through examples. It is shown that bootstrap confidence intervals for the contributions of individual variables to a partition are readily obtained.

3.
Proteomics ; 16(11-12): 1731-5, 2016 06.
Article in English | MEDLINE | ID: mdl-27028088

ABSTRACT

Applying MALDI-MS imaging to tissue microarrays (TMAs) provides access to proteomics data from large cohorts of patients in a cost- and time-efficient way, and opens the potential for applying this technology in clinical diagnosis. The complexity of these TMA data-high-dimensional low sample size-provides challenges for the statistical analysis, as classical methods typically require a nonsingular covariance matrix that cannot be satisfied if the dimension is greater than the sample size. We use TMAs to collect data from endometrial primary carcinomas from 43 patients. Each patient has a lymph node metastasis (LNM) status of positive or negative, which we predict on the basis of the MALDI-MS imaging TMA data. We propose a variable selection approach based on canonical correlation analysis that explicitly uses the LNM information. We apply LDA to the selected variables only. Our method misclassifies 2.3-20.9% of patients by leave-one-out cross-validation and strongly outperforms LDA after reduction of the original data with principle component analysis.


Subject(s)
Endometrial Neoplasms/diagnostic imaging , Proteomics/methods , Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization/methods , Tissue Array Analysis/methods , Endometrial Neoplasms/diagnosis , Endometrial Neoplasms/pathology , Female , Humans , Lymphatic Metastasis , Neoplasm Staging , Principal Component Analysis
4.
Cytometry A ; 89(1): 44-58, 2016 Jan.
Article in English | MEDLINE | ID: mdl-26097104

ABSTRACT

Many methods have been described for automated clustering analysis of complex flow cytometry data, but so far the goal to efficiently estimate multivariate densities and their modes for a moderate number of dimensions and potentially millions of data points has not been attained. We have devised a novel approach to describing modes using second order polynomial histogram estimators (SOPHE). The method divides the data into multivariate bins and determines the shape of the data in each bin based on second order polynomials, which is an efficient computation. These calculations yield local maxima and allow joining of adjacent bins to identify clusters. The use of second order polynomials also optimally uses wide bins, such that in most cases each parameter (dimension) need only be divided into 4-8 bins, again reducing computational load. We have validated this method using defined mixtures of up to 17 fluorescent beads in 16 dimensions, correctly identifying all populations in data files of 100,000 beads in <10 s, on a standard laptop. The method also correctly clustered granulocytes, lymphocytes, including standard T, B, and NK cell subsets, and monocytes in 9-color stained peripheral blood, within seconds. SOPHE successfully clustered up to 36 subsets of memory CD4 T cells using differentiation and trafficking markers, in 14-color flow analysis, and up to 65 subpopulations of PBMC in 33-dimensional CyTOF data, showing its usefulness in discovery research. SOPHE has the potential to greatly increase efficiency of analysing complex mixtures of cells in higher dimensions.


Subject(s)
Cluster Analysis , Computational Biology/methods , Flow Cytometry/methods , Adult , Algorithms , B-Lymphocytes/cytology , Biomarkers/analysis , Data Interpretation, Statistical , Electronic Data Processing/methods , Granulocytes/cytology , Humans , Killer Cells, Natural/cytology , T-Lymphocyte Subsets/cytology
5.
BMC Bioinformatics ; 16: 196, 2015 Jun 18.
Article in English | MEDLINE | ID: mdl-26084333

ABSTRACT

BACKGROUND: We consider data from a time course microarray experiment that was conducted on grapevines over the development cycle of the grape berries at two different vineyards in South Australia. Although the underlying biological process of berry development is the same at both vineyards, there are differences in the timing of the development due to local conditions. We aim to align the data from the two vineyards to enable an integrated analysis of the gene expression and use the alignment of the expression profiles to classify likely developmental function. RESULTS: We present a novel alignment method based on hidden Markov models (HMMs) and use the method to align the motivating grapevine data. We show that our alignment method is robust against subsets of profiles that are not suitable for alignment, investigate alignment diagnostics under the model and demonstrate the classification of developmentally driven genes. CONCLUSIONS: The classification of developmentally driven genes both validates that the alignment we obtain is meaningful and also gives new evidence that can be used to identify the role of genes with unknown function. Using our alignment methodology, we find at least 1279 grapevine probe sets with no current annotated function that are likely to be controlled in a developmental manner.


Subject(s)
Algorithms , Gene Expression Profiling/methods , Gene Expression Regulation, Developmental , Genes, Plant/genetics , Vitis/growth & development , Vitis/genetics , Genome, Plant , Humans , Likelihood Functions , Markov Chains , Time Factors , Wine
6.
Biom J ; 51(3): 504-21, 2009 Jun.
Article in English | MEDLINE | ID: mdl-19588456

ABSTRACT

Motivated by the needs of scientists using flow cytometry, we study the problem of estimating the region where two multivariate samples differ in density. We call this problem highest density difference region estimation and recognise it as a two-sample analogue of highest density region or excess set estimation. Flow cytometry samples are typically in the order of 10,000 and 100,000 and with dimension ranging from about 3 to 20. The industry standard for the problem being studied is called Frequency Difference Gating, due to Roederer and Hardy (2001). After couching the problem in a formal statistical framework we devise an alternative estimator that draws upon recent statistical developments such as patient rule induction methods. Improved performance is illustrated in simulations. While motivated by flow cytometry, the methodology is suitable for general multivariate random samples where density difference regions are of interest.


Subject(s)
Cell Count/methods , Cells, Cultured/cytology , Cells, Cultured/physiology , Flow Cytometry/methods , Image Interpretation, Computer-Assisted/methods , Data Interpretation, Statistical , Statistical Distributions
7.
Neural Comput ; 19(2): 513-45, 2007 Feb.
Article in English | MEDLINE | ID: mdl-17206873

ABSTRACT

This letter is concerned with the problem of selecting the best or most informative dimension for dimension reduction and feature extraction in high-dimensional data. The dimension of the data is reduced by principal component analysis; subsequent application of independent component analysis to the principal component scores determines the most nongaussian directions in the lower-dimensional space. A criterion for choosing the optimal dimension based on bias-adjusted skewness and kurtosis is proposed. This new dimension selector is applied to real data sets and compared to existing methods. Simulation studies for a range of densities show that the proposed method performs well and is more appropriate for nongaussian data than existing methods.


Subject(s)
Data Interpretation, Statistical , Models, Statistical , Principal Component Analysis , Algorithms , Numerical Analysis, Computer-Assisted
8.
BMC Public Health ; 6: 200, 2006 Aug 03.
Article in English | MEDLINE | ID: mdl-16884546

ABSTRACT

BACKGROUND: In early 2001 Australia experienced a sudden reduction in the availability of heroin which had widespread effects on illicit drug markets across the country. The consequences of this event, commonly referred to as the Australian 'heroin shortage', have been extensively studied and there has been considerable debate as to the causes of the shortage and its implications for drug policy. This paper aims to investigate the presence of these epidemic patterns, to quantify the scale over which they occur and to estimate the relative importance of the 'heroin shortage' and any epidemic patterns in the drug markets. METHOD: Key indicator data series from the New South Wales illicit drug market were analysed using the statistical methods Principal Component Analysis and SiZer. RESULTS: The 'heroin shortage' represents the single most important source of variation in this illicit drug market. Furthermore the size of the effect of the heroin shortage is more than three times that evidenced by long-term 'epidemic' patterns. CONCLUSION: The 'heroin shortage' was unlikely to have been a simple correction at the end of a long period of reduced heroin availability, and represents a separate non-random shock which strongly affected the markets.


Subject(s)
Amphetamine-Related Disorders/epidemiology , Cocaine-Related Disorders/epidemiology , Drug and Narcotic Control/trends , Heroin Dependence/mortality , Heroin/supply & distribution , Illicit Drugs/supply & distribution , Law Enforcement , Amphetamine/economics , Amphetamine/supply & distribution , Amphetamine-Related Disorders/economics , Cluster Analysis , Cocaine/economics , Cocaine/supply & distribution , Cocaine-Related Disorders/economics , Drug and Narcotic Control/economics , Heroin/economics , Heroin Dependence/economics , Humans , Illicit Drugs/economics , New South Wales/epidemiology , Normal Distribution , Principal Component Analysis , Time Factors
SELECTION OF CITATIONS
SEARCH DETAIL
...