Pesquisa | Portal Regional da BVS (teste)

Sparse Principal Component Analysis With Preserved Sparsity Pattern.

Seghouane, Abd-Krim; Shokouhi, Navid; Koch, Inge.

IEEE Trans Image Process ; 28(7): 3274-3285, 2019 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-30703025

RESUMO

Principal component analysis (PCA) is widely used for feature extraction and dimension reduction in pattern recognition and data analysis. Despite its popularity, the reduced dimension obtained from the PCA is difficult to interpret due to the dense structure of principal loading vectors. To address this issue, several methods have been proposed for sparse PCA, all of which estimate loading vectors with few non-zero elements. However, when more than one principal component is estimated, the associated loading vectors do not possess the same sparsity pattern. Therefore, it becomes difficult to determine a small subset of variables from the original feature space that have the highest contribution in the principal components. To address this issue, an adaptive block sparse PCA method is proposed. The proposed method is guaranteed to obtain the same sparsity pattern across all principal components. Experiments show that applying the proposed sparse PCA method can help improve the performance of feature selection for image processing applications. We further demonstrate that our proposed sparse PCA method can be used to improve the performance of blind source separation for functional magnetic resonance imaging data.

Evaluating the Contributions of Individual Variables to a Quadratic Form.

Garthwaite, Paul H; Koch, Inge.

Aust N Z J Stat ; 58(1): 99-119, 2016 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-27478405

RESUMO

Quadratic forms capture multivariate information in a single number, making them useful, for example, in hypothesis testing. When a quadratic form is large and hence interesting, it might be informative to partition the quadratic form into contributions of individual variables. In this paper it is argued that meaningful partitions can be formed, though the precise partition that is determined will depend on the criterion used to select it. An intuitively reasonable criterion is proposed and the partition to which it leads is determined. The partition is based on a transformation that maximises the sum of the correlations between individual variables and the variables to which they transform under a constraint. Properties of the partition, including optimality properties, are examined. The contributions of individual variables to a quadratic form are less clear-cut when variables are collinear, and forming new variables through rotation can lead to greater transparency. The transformation is adapted so that it has an invariance property under such rotation, whereby the assessed contributions are unchanged for variables that the rotation does not affect directly. Application of the partition to Hotelling's one- and two-sample test statistics, Mahalanobis distance and discriminant analysis is described and illustrated through examples. It is shown that bootstrap confidence intervals for the contributions of individual variables to a partition are readily obtained.

Classification of MALDI-MS imaging data of tissue microarrays using canonical correlation analysis-based variable selection.

Winderbaum, Lyron; Koch, Inge; Mittal, Parul; Hoffmann, Peter.

Proteomics ; 16(11-12): 1731-5, 2016 06.

Artigo em Inglês | MEDLINE | ID: mdl-27028088

RESUMO

Applying MALDI-MS imaging to tissue microarrays (TMAs) provides access to proteomics data from large cohorts of patients in a cost- and time-efficient way, and opens the potential for applying this technology in clinical diagnosis. The complexity of these TMA data-high-dimensional low sample size-provides challenges for the statistical analysis, as classical methods typically require a nonsingular covariance matrix that cannot be satisfied if the dimension is greater than the sample size. We use TMAs to collect data from endometrial primary carcinomas from 43 patients. Each patient has a lymph node metastasis (LNM) status of positive or negative, which we predict on the basis of the MALDI-MS imaging TMA data. We propose a variable selection approach based on canonical correlation analysis that explicitly uses the LNM information. We apply LDA to the selected variables only. Our method misclassifies 2.3-20.9% of patients by leave-one-out cross-validation and strongly outperforms LDA after reduction of the original data with principle component analysis.

Assuntos

Neoplasias do Endométrio/diagnóstico por imagem , Proteômica/métodos , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz/métodos , Análise Serial de Tecidos/métodos , Neoplasias do Endométrio/diagnóstico , Neoplasias do Endométrio/patologia , Feminino , Humanos , Metástase Linfática , Estadiamento de Neoplasias , Análise de Componente Principal

Computationally efficient multidimensional analysis of complex flow cytometry data using second order polynomial histograms.

Zaunders, John; Jing, Junmei; Leipold, Michael; Maecker, Holden; Kelleher, Anthony D; Koch, Inge.

Cytometry A ; 89(1): 44-58, 2016 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-26097104

RESUMO

Many methods have been described for automated clustering analysis of complex flow cytometry data, but so far the goal to efficiently estimate multivariate densities and their modes for a moderate number of dimensions and potentially millions of data points has not been attained. We have devised a novel approach to describing modes using second order polynomial histogram estimators (SOPHE). The method divides the data into multivariate bins and determines the shape of the data in each bin based on second order polynomials, which is an efficient computation. These calculations yield local maxima and allow joining of adjacent bins to identify clusters. The use of second order polynomials also optimally uses wide bins, such that in most cases each parameter (dimension) need only be divided into 4-8 bins, again reducing computational load. We have validated this method using defined mixtures of up to 17 fluorescent beads in 16 dimensions, correctly identifying all populations in data files of 100,000 beads in <10 s, on a standard laptop. The method also correctly clustered granulocytes, lymphocytes, including standard T, B, and NK cell subsets, and monocytes in 9-color stained peripheral blood, within seconds. SOPHE successfully clustered up to 36 subsets of memory CD4 T cells using differentiation and trafficking markers, in 14-color flow analysis, and up to 65 subpopulations of PBMC in 33-dimensional CyTOF data, showing its usefulness in discovery research. SOPHE has the potential to greatly increase efficiency of analysing complex mixtures of cells in higher dimensions.

Assuntos

Análise por Conglomerados , Biologia Computacional/métodos , Citometria de Fluxo/métodos , Adulto , Algoritmos , Linfócitos B/citologia , Biomarcadores/análise , Interpretação Estatística de Dados , Processamento Eletrônico de Dados/métodos , Granulócitos/citologia , Humanos , Células Matadoras Naturais/citologia , Subpopulações de Linfócitos T/citologia

Alignment of time course gene expression data and the classification of developmentally driven genes with hidden Markov models.

Robinson, Sean; Glonek, Garique; Koch, Inge; Thomas, Mark; Davies, Christopher.

BMC Bioinformatics ; 16: 196, 2015 Jun 18.

Artigo em Inglês | MEDLINE | ID: mdl-26084333

RESUMO

BACKGROUND: We consider data from a time course microarray experiment that was conducted on grapevines over the development cycle of the grape berries at two different vineyards in South Australia. Although the underlying biological process of berry development is the same at both vineyards, there are differences in the timing of the development due to local conditions. We aim to align the data from the two vineyards to enable an integrated analysis of the gene expression and use the alignment of the expression profiles to classify likely developmental function. RESULTS: We present a novel alignment method based on hidden Markov models (HMMs) and use the method to align the motivating grapevine data. We show that our alignment method is robust against subsets of profiles that are not suitable for alignment, investigate alignment diagnostics under the model and demonstrate the classification of developmentally driven genes. CONCLUSIONS: The classification of developmentally driven genes both validates that the alignment we obtain is meaningful and also gives new evidence that can be used to identify the role of genes with unknown function. Using our alignment methodology, we find at least 1279 grapevine probe sets with no current annotated function that are likely to be controlled in a developmental manner.

Assuntos

Algoritmos , Perfilação da Expressão Gênica/métodos , Regulação da Expressão Gênica no Desenvolvimento , Genes de Plantas/genética , Vitis/crescimento & desenvolvimento , Vitis/genética , Genoma de Planta , Humanos , Funções Verossimilhança , Cadeias de Markov , Fatores de Tempo , Vinho

Highest density difference region estimation with application to flow cytometric data.

Duong, Tarn; Koch, Inge; Wand, M P.

Biom J ; 51(3): 504-21, 2009 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-19588456

RESUMO

Motivated by the needs of scientists using flow cytometry, we study the problem of estimating the region where two multivariate samples differ in density. We call this problem highest density difference region estimation and recognise it as a two-sample analogue of highest density region or excess set estimation. Flow cytometry samples are typically in the order of 10,000 and 100,000 and with dimension ranging from about 3 to 20. The industry standard for the problem being studied is called Frequency Difference Gating, due to Roederer and Hardy (2001). After couching the problem in a formal statistical framework we devise an alternative estimator that draws upon recent statistical developments such as patient rule induction methods. Improved performance is illustrated in simulations. While motivated by flow cytometry, the methodology is suitable for general multivariate random samples where density difference regions are of interest.

Assuntos

Contagem de Células/métodos , Células Cultivadas/citologia , Células Cultivadas/fisiologia , Citometria de Fluxo/métodos , Interpretação de Imagem Assistida por Computador/métodos , Interpretação Estatística de Dados , Distribuições Estatísticas

Dimension selection for feature selection and dimension reduction with principal and independent component analysis.

Koch, Inge; Naito, Kanta.

Neural Comput ; 19(2): 513-45, 2007 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-17206873

RESUMO

This letter is concerned with the problem of selecting the best or most informative dimension for dimension reduction and feature extraction in high-dimensional data. The dimension of the data is reduced by principal component analysis; subsequent application of independent component analysis to the principal component scores determines the most nongaussian directions in the lower-dimensional space. A criterion for choosing the optimal dimension based on bias-adjusted skewness and kurtosis is proposed. This new dimension selector is applied to real data sets and compared to existing methods. Simulation studies for a range of densities show that the proposed method performs well and is more appropriate for nongaussian data than existing methods.

Assuntos

Interpretação Estatística de Dados , Modelos Estatísticos , Análise de Componente Principal , Algoritmos , Análise Numérica Assistida por Computador

Identification and quantification of change in Australian illicit drug markets.

Gilmour, Stuart; Koch, Inge; Degenhardt, Louisa; Day, Carolyn.

BMC Public Health ; 6: 200, 2006 Aug 03.

Artigo em Inglês | MEDLINE | ID: mdl-16884546

RESUMO

BACKGROUND: In early 2001 Australia experienced a sudden reduction in the availability of heroin which had widespread effects on illicit drug markets across the country. The consequences of this event, commonly referred to as the Australian 'heroin shortage', have been extensively studied and there has been considerable debate as to the causes of the shortage and its implications for drug policy. This paper aims to investigate the presence of these epidemic patterns, to quantify the scale over which they occur and to estimate the relative importance of the 'heroin shortage' and any epidemic patterns in the drug markets. METHOD: Key indicator data series from the New South Wales illicit drug market were analysed using the statistical methods Principal Component Analysis and SiZer. RESULTS: The 'heroin shortage' represents the single most important source of variation in this illicit drug market. Furthermore the size of the effect of the heroin shortage is more than three times that evidenced by long-term 'epidemic' patterns. CONCLUSION: The 'heroin shortage' was unlikely to have been a simple correction at the end of a long period of reduced heroin availability, and represents a separate non-random shock which strongly affected the markets.

Assuntos

Transtornos Relacionados ao Uso de Anfetaminas/epidemiologia , Transtornos Relacionados ao Uso de Cocaína/epidemiologia , Controle de Medicamentos e Entorpecentes/tendências , Dependência de Heroína/mortalidade , Heroína/provisão & distribuição , Drogas Ilícitas/provisão & distribuição , Aplicação da Lei , Anfetamina/economia , Anfetamina/provisão & distribuição , Transtornos Relacionados ao Uso de Anfetaminas/economia , Análise por Conglomerados , Cocaína/economia , Cocaína/provisão & distribuição , Transtornos Relacionados ao Uso de Cocaína/economia , Controle de Medicamentos e Entorpecentes/economia , Heroína/economia , Dependência de Heroína/economia , Humanos , Drogas Ilícitas/economia , New South Wales/epidemiologia , Distribuição Normal , Análise de Componente Principal , Fatores de Tempo

Zur Wirkung von Endoxan auf die Entwicklung des Unterkieferskelets Beim Haushuhn.

Koch, Inge; Heydecke, Rolf.

Wilhelm Roux Arch Entwickl Mech Org ; 158(2): 195-204, 1967 Jun.

Artigo em Alemão | MEDLINE | ID: mdl-28304644

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA