Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 8 de 8
Filter
Add more filters










Database
Language
Publication year range
1.
Comput Graph Forum ; 41(3): 157-168, 2022 Jun.
Article in English | MEDLINE | ID: mdl-36248193

ABSTRACT

Analysis of spatial multivariate data, i.e., measurements at irregularly-spaced locations, is a challenging topic in visualization and statistics alike. Such data are inteGral to many domains, e.g., indicators of valuable minerals are measured for mine prospecting. Popular analysis methods, like PCA, often by design do not account for the spatial nature of the data. Thus they, together with their spatial variants, must be employed very carefully. Clearly, it is preferable to use methods that were specifically designed for such data, like spatial blind source separation (SBSS). However, SBSS requires two tuning parameters, which are themselves complex spatial objects. Setting these parameters involves navigating two large and interdependent parameter spaces, while also taking into account prior knowledge of the physical reality represented by the data. To support analysts in this process, we developed a visual analytics prototype. We evaluated it with experts in visualization, SBSS, and geochemistry. Our evaluations show that our interactive prototype allows to define complex and realistic parameter settings efficiently, which was so far impractical. Settings identified by a non-expert led to remarkable and surprising insights for a domain expert. Therefore, this paper presents important first steps to enable the use of a promising analysis method for spatial multivariate data.

2.
J Appl Stat ; 48(2): 214-233, 2021.
Article in English | MEDLINE | ID: mdl-35707689

ABSTRACT

A data table arranged according to two factors can often be considered a compositional table. An example is the number of unemployed people, split according to gender and age classes. Analyzed as compositions, the relevant information consists of ratios between different cells of such a table. This is particularly useful when analyzing several compositional tables jointly, where the absolute numbers are in very different ranges, e.g. if unemployment data are considered from different countries. Within the framework of the logratio methodology, compositional tables can be decomposed into independent and interactive parts, and orthonormal coordinates can be assigned to these parts. However, these coordinates usually require some prior knowledge about the data, and they are not easy to handle for exploring the relationships between the given factors. Here we propose a special choice of coordinates with direct relation to centered logratio (clr) coefficients, which are particularly useful for an interpretation in terms of the original cells of the tables. With these coordinates, robust principal component analysis (rPCA) is performed for dimension reduction, allowing to investigate relationships between the factors. The link between orthonormal coordinates and clr coefficients enables to apply rPCA, which would otherwise suffer from the singularity of clr coefficients.

3.
J Appl Stat ; 47(7): 1144-1167, 2020.
Article in English | MEDLINE | ID: mdl-35707025

ABSTRACT

Outlier detection can be seen as a pre-processing step for locating data points in a data sample, which do not conform to the majority of observations. Various techniques and methods for outlier detection can be found in the literature dealing with different types of data. However, many data sets are inflated by true zeros and, in addition, some components/variables might be of compositional nature. Important examples of such data sets are the Structural Earnings Survey, the Structural Business Statistics, the European Statistics on Income and Living Conditions, tax data or - as in this contribution - household expenditure data which are used, for example, to estimate the Purchase Power Parity of a country. In this work, robust univariate and multivariate outlier detection methods are compared by a complex simulation study that considers various challenges included in data sets, namely structural (true) zeros, missing values, and compositional variables. These circumstances make it difficult or impossible to flag true outliers and influential observations by well-known outlier detection methods. Our aim is to assess the performance of outlier detection methods in terms of their effectiveness to identify outliers when applied to challenging data sets such as the household expenditures data surveyed all over the world. Moreover, different methods are evaluated through a close-to-reality simulation study. Differences in performance of univariate and multivariate robust techniques for outlier detection and their shortcomings are reported. We found that robust multivariate methods outperform robust univariate methods. The best performing methods in finding the outliers and in providing a low false discovery rate were found to be the generalized S estimators (GSE), the BACON-EEM algorithm and a compositional method (CoDa-Cov). In addition, these methods performed also best when the outliers are imputed based on the corresponding outlier detection method and indicators are estimated from the data sets.

4.
Sci Total Environ ; 607-608: 965-971, 2017 Dec 31.
Article in English | MEDLINE | ID: mdl-28724228

ABSTRACT

Most data in environmental sciences and geochemistry are compositional. Already the unit used to report the data (e.g., µg/l, mg/kg, wt%) implies that the analytical results for each element are not free to vary independently of the other measured variables. This is often neglected in statistical analysis, where a simple log-transformation of the single variables is insufficient to put the data into an acceptable geometry. This is also important for bivariate data analysis and for correlation analysis, for which the data need to be appropriately log-ratio transformed. A new approach based on the isometric log-ratio (ilr) transformation, leading to so-called symmetric coordinates, is presented here. Summarizing the correlations in a heat-map gives a powerful tool for bivariate data analysis. Here an application of the new method using a data set from a regional geochemical mapping project based on soil O and C horizon samples is demonstrated. Differences to 'classical' correlation analysis based on log-transformed data are highlighted. The fact that some expected strong positive correlations appear and remain unchanged even following a log-ratio transformation has probably led to the misconception that the special nature of compositional data can be ignored when working with trace elements. The example dataset is employed to demonstrate that using 'classical' correlation analysis and plotting XY diagrams, scatterplots, based on the original or simply log-transformed data can easily lead to severe misinterpretations of the relationships between elements.

5.
J Chromatogr A ; 1362: 194-205, 2014 Oct 03.
Article in English | MEDLINE | ID: mdl-25201255

ABSTRACT

Our study focuses on the removal of the so-called size effect, related to a different sample volume and/or concentration. This effect is associated with many types of instrumental signals, particularly with those originating from HPLC-DAD, LC-MS, and UPLC-MS. These signals do not carry any absolute information about the sample components. If the data comparison has to be performed based on sample fingerprints, then the size effect is undesired, and the shape effect is of main interest. With "shape", we refer to data information which is contained in the ratios between the variables. So far, different normalization methods have been applied to the removal of size effect. In our study, the performance of popular normalization methods is compared with those of the CODA (Compositional Data Analysis) methods, relying on log-ratio transformations, and the performance is evaluated through the prism of proper identification of biomarkers.


Subject(s)
Chromatography, Liquid/methods , Biomarkers/analysis , Chromatography, Liquid/instrumentation , Computer Simulation , Mass Spectrometry/methods
6.
Talanta ; 90: 46-50, 2012 Feb 15.
Article in English | MEDLINE | ID: mdl-22340114

ABSTRACT

Eight phenolic acids (vanillic, gentisic, protocatechuic, syringic, gallic, coumaric, ferulic and caffeic) were quantitatively determined in 30 commercially available wines from South Moravia by gas chromatography-mass spectrometry. Raw (untransformed) and centered log-ratio transformed data were evaluated by classical and robust version of principal component analysis (PCA). A robust compositional biplot of the centered log-ratio transformed data gives the best resolution of particular categories of wines. Vanillic, syringic and gallic acids were identified as presumed markers occurring in relatively higher concentrations in red wines. Gentisic and caffeic acid were tentatively suggested as prospective technological markers, reflecting presumably some kinds of technological aspects of wine making.


Subject(s)
Biomarkers/analysis , Gas Chromatography-Mass Spectrometry , Hydroxybenzoates/analysis , Wine/analysis
7.
Environ Pollut ; 113(1): 41-57, 2001.
Article in English | MEDLINE | ID: mdl-11351761

ABSTRACT

Duplicate samples of the two terrestrial moss species Hylocomium splendens and Pleurozium schreberi, which are widely used to monitor airborne heavy metal pollution, have been collected from eight catchments spread over a 1,500,000 km2 area in northern Europe. These were analysed for a total of 38 elements by inductively coupled plasma-mass spectrometry, inductively coupled plasma-atomic emission spectrometry and cold vapour-atomic absorption spectometry techniques. Results show that the moss species can be combined without interspecies calibration for regional mapping purposes. For the majority of elements the observed within-catchment variation is large--big composite samples over a large area should thus be collected when moss is to be used for monitoring purposes. For the majority of elements the input of dust governs moss chemistry. For a reliable 'contamination' signal over a sizeable area a major source is needed. Some elements show a dependence on climate/vegetation zone. In coastal areas the input of marine aerosols will alter the chemical signal obtained from moss samples.


Subject(s)
Air Pollutants/analysis , Bryopsida , Environmental Monitoring , Metals, Heavy/analysis , Environmental Monitoring/methods , Europe , Humans
8.
Magn Reson Imaging ; 17(6): 817-26, 1999 Jul.
Article in English | MEDLINE | ID: mdl-10402588

ABSTRACT

We introduce a novel method for detecting anatomic and functional structures in fMRI. The main idea is to divide the data hierarchically into smaller groups using k-means clustering. The separation is halted if the clusters contain no further structure that is verified by several independent tests. The resulting cluster centers are then used for computing the final results in one step. The procedure is flexible, fast to compute, and the numbers of clusters in the data are obtained in a data-driven manner. Applying the algorithm to synthetic fMRI data yields perfect separation of "anatomic," i.e., time-invariant, and "functional," i.e., time-varying, information for a standard off-on paradigm and a typical functional contrast-to-noise ratio of two and higher. In addition, an EPI-fMRI data set of the human motor cortex was analyzed to demonstrate the performance of this novel approach in vivo.


Subject(s)
Brain Mapping , Magnetic Resonance Imaging , Motor Cortex/physiology , Algorithms , Artifacts , Cerebrovascular Circulation , Cluster Analysis , Electronic Data Processing , Humans , Motor Cortex/blood supply , Signal Processing, Computer-Assisted
SELECTION OF CITATIONS
SEARCH DETAIL
...