Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 42
Filter
1.
Article in English | MEDLINE | ID: mdl-37922175

ABSTRACT

Modern science and industry rely on computational models for simulation, prediction, and data analysis. Spatial blind source separation (SBSS) is a model used to analyze spatial data. Designed explicitly for spatial data analysis, it is superior to popular non-spatial methods, like PCA. However, a challenge to its practical use is setting two complex tuning parameters, which requires parameter space analysis. In this paper, we focus on sensitivity analysis (SA). SBSS parameters and outputs are spatial data, which makes SA difficult as few SA approaches in the literature assume such complex data on both sides of the model. Based on the requirements in our design study with statistics experts, we developed a visual analytics prototype for data type agnostic visual sensitivity analysis that fits SBSS and other contexts. The main advantage of our approach is that it requires only dissimilarity measures for parameter settings and outputs (Fig. 1). We evaluated the prototype heuristically with visualization experts and through interviews with two SBSS experts. In addition, we show the transferability of our approach by applying it to microclimate simulations. Study participants could confirm suspected and known parameter-output relations, find surprising associations, and identify parameter subspaces to examine in the future. During our design study and evaluation, we identified challenging future research opportunities.

2.
Anal Chim Acta ; 1279: 341762, 2023 Oct 23.
Article in English | MEDLINE | ID: mdl-37827663

ABSTRACT

Data sets derived from practical experiments often pose challenges for (robust) statistical methods. In high-dimensional data sets, more variables than observations are recorded and often, there are also data present that do not follow the structure of the data majority. In order to handle such data with outlying observations, a variety of robust regression and classification methods have been developed for low-dimensional data. The high-dimensional case, however, is more challenging, and the variety of robust methods is much more limited. The choice of the method depends on the specific data structure, and numerical problems are more likely to occur. We give an overview of selected robust methods as well as implementations and demonstrate the application with two high-dimensional data sets from tribology. We show that robust statistical methods combined with appropriate pre-processing and sampling strategies yield increased prediction performance and insight into data differing from the majority.

3.
Stat Pap (Berl) ; 64(3): 955-985, 2023.
Article in English | MEDLINE | ID: mdl-35971537

ABSTRACT

Compositional data are commonly known as multivariate observations carrying relative information. Even though the case of vector or even two-factorial compositional data (compositional tables) is already well described in the literature, there is still a need for a comprehensive approach to the analysis of multi-factorial relative-valued data. Therefore, this contribution builds around the current knowledge about compositional data a general theoretical framework for k-factorial compositional data. As a main finding it turns out that, similar to the case of compositional tables, also the multi-factorial structures can be orthogonally decomposed into an independent and several interactive parts and, moreover, a coordinate representation allowing for their separate analysis by standard analytical methods can be constructed. For the sake of simplicity, these features are explained in detail for the case of three-factorial compositions (compositional cubes), followed by an outline covering the general case. The three-dimensional structure is analyzed in depth in two practical examples, dealing with systems of spatial and time dependent compositional cubes. The methodology is implemented in the R package robCompositions.

4.
JEADV Clin Pract ; 1(2): 122-125, 2022 Jun.
Article in English | MEDLINE | ID: mdl-37829553

ABSTRACT

Introduction: We investigated whether governmental measures and lockdowns during the COVID-19 pandemic had an impact on the number and histopathologic stages of melanoma. Methods: The number and thickness (Breslow) of all diagnosed melanomas per day, month, or period at the 'Institute for Pathology in the Centre' in 2019 and 2020 were compared. For 2020, we defined four time periods: Period 1: 1 January-15 March; Period 2: 16 March-15 May (Lockdown 1); Period 3: 16 May-2 November; Period 4: 3 November-7 December (Lockdown 2). Results: We found similar melanoma numbers in 2019 (577) and 2020 (608). The mean number of diagnoses per day during Lockdown 1 (Period 2) was significantly lower (0.87 melanomas/day; p = 0.005) when compared to the respective time periods in 2019 and to the other three periods in 2020 (Period 1: 1.65 melanomas/day, Period 3: 1.77 melanomas/day, and Period 4: 2.49 melanomas/day). Tumour thickness in July 2020 (1.9 mm) was significantly higher (p = 0.02) than in July 2019 (1.1 mm). Discussion: The significant lower number of histopathologic diagnoses of melanoma during 'Lockdown 1' may be explained by postponed or missed patient consultations. This assumption is supported by the demonstration of a higher tumour thickness in July and August 2020, compared to 2019.

5.
Math Geosci ; 53(5): 905-924, 2021.
Article in English | MEDLINE | ID: mdl-34721726

ABSTRACT

Many geological phenomena are regularly measured over time to follow developments and changes. For many of these phenomena, the absolute values are not of interest, but rather the relative information, which means that the data are compositional time series. Thus, the serial nature and the compositional geometry should be considered when analyzing the data. Multivariate time series are already challenging, especially if they are higher dimensional, and latent variable models are a popular way to deal with this kind of data. Blind source separation techniques are well-established latent factor models for time series, with many variants covering quite different time series models. Here, several such methods and their assumptions are reviewed, and it is shown how they can be applied to high-dimensional compositional time series. Also, a novel blind source separation method is suggested which is quite flexible regarding the assumptions of the latent time series. The methodology is illustrated using simulations and in an application to light absorbance data from water samples taken from a small stream in Lower Austria.

6.
Bioinformatics ; 37(21): 3805-3814, 2021 11 05.
Article in English | MEDLINE | ID: mdl-34358286

ABSTRACT

MOTIVATION: High-throughput sequencing technologies generate a huge amount of data, permitting the quantification of microbiome compositions. The obtained data are essentially sparse compositional data vectors, namely vectors of bacterial gene proportions which compose the microbiome. Subsequently, the need for statistical and computational methods that consider the special nature of microbiome data has increased. A critical aspect in microbiome research is to identify microbes associated with a clinical outcome. Another crucial aspect with high-dimensional data is the detection of outlying observations, whose presence affects seriously the prediction accuracy. RESULTS: In this article, we connect robustness and sparsity in the context of variable selection in regression with compositional covariates with a continuous response. The compositional character of the covariates is taken into account by a linear log-contrast model, and elastic-net regularization achieves sparsity in the regression coefficient estimates. Robustness is obtained by performing trimming in the objective function of the estimator. A reweighting step increases the efficiency of the estimator, and it also allows for diagnostics in terms of outlier identification. The numerical performance of the proposed method is evaluated via simulation studies, and its usefulness is illustrated by an application to a microbiome study with the aim to predict caffeine intake based on the human gut microbiome composition. AVAILABILITY AND IMPLEMENTATION: The R-package 'RobZS' can be downloaded at https://github.com/giannamonti/RobZS. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Gastrointestinal Microbiome , Microbiota , Humans , Computer Simulation , Linear Models , Microbiota/genetics , Genes, Bacterial
7.
J Chemom ; 34(4): e3218, 2020 Apr.
Article in English | MEDLINE | ID: mdl-32355406

ABSTRACT

The instrument COSIMA (COmetary Secondary Ion Mass Analyzer) onboard of the European Space Agency mission Rosetta collected and analyzed dust particles in the neighborhood of comet 67P/Churyumov-Gerasimenko. The chemical composition of the particle surfaces was characterized by time-of-flight secondary ion mass spectrometry. A set of 2213 spectra has been selected, and relative abundances for CH-containing positive ions as well as positive elemental ions define a set of multivariate data with nine variables. Evaluation by complementary chemometric techniques shows different compositions of sample groups collected during two periods of the mission. The first period was August to November 2014 (far from the Sun); the second period was January 2015 to February 2016 (nearer to the Sun). The applied data evaluation methods consider the compositional nature of the mass spectral data and comprise robust principal component analysis as well as classification with discriminant partial least squares regression, k-nearest neighbor search, and random forest decision trees. The results indicate a high importance of the relative abundances of the secondary ions C+ and Fe+ for the group separation and demonstrate an enhanced content of carbon-containing substances in samples collected in the period with smaller distances to the Sun.

8.
J Chemom ; 34(1): e3182, 2020 Jan.
Article in English | MEDLINE | ID: mdl-32189829

ABSTRACT

Data outliers can carry very valuable information and might be most informative for the interpretation. Nevertheless, they are often neglected. An algorithm called cellwise outlier diagnostics using robust pairwise log ratios (cell-rPLR) for the identification of outliers in single cell of a data matrix is proposed. The algorithm is designed for metabolomic data, where due to the size effect, the measured values are not directly comparable. Pairwise log ratios between the variable values form the elemental information for the algorithm, and the aggregation of appropriate outlyingness values results in outlyingness information. A further feature of cell-rPLR is that it is useful for biomarker identification, particularly in the presence of cellwise outliers. Real data examples and simulation studies underline the good performance of this algorithm in comparison with alternative methods.

9.
Sci Total Environ ; 655: 1457-1467, 2019 Mar 10.
Article in English | MEDLINE | ID: mdl-30577137

ABSTRACT

Sewage sludge (SS) reuse in forest plantation as soil fertilizer/amendment has tremendously increased in recent years. However, SS may have high concentrations of potentially toxic elements (PTE), representing a potential risk for soil and the whole ecosystem. This paper was aimed to assess the toxicity of PTE in unfertile tropical soils amended with SS in a commercial Eucalyptus plantation, with an integrated multiple approaches combining: i) the use of a battery of bioassays (Daphnia magna, Pseudokcrichirella subcapitata, Lactuca sativa, and Allium cepa); and ii) the evaluation of some PTE (Cd, Cr, Cu, Fe, Mn, Ni, Pb, and Zn) and their availability into the pedoenvironment. Differences in total and available PTE between SS doses and time of treatments were evaluated using ANOVA; correlations between PTE and bioassays by a sparse partial robust M-regression (SPRM), while multiple correlations among parameters were performed by principal factor analysis (PFA). Results show that PTE contents in soils tended to increase with SS application doses. However this cannot be assumed as a general rule since in all the investigated treatments the PTE concentrations were consistently below both soil natural background concentrations and quality reference values. Bioassays showed a generalized low eco- and genotoxicity of SS with an increase in toxicity at increasing SS doses but with a clear decreasing trend as time went by. A. cepa was the most sensitive bioassay followed by P. subcapitata > D. magna > L. sativa. Overall, the results indicate that in realistic open field conditions SS risk may be lower than expected due to dynamic decrease in PTE toxicity with time after application. This study has an important implication that open-field trials should be strongly encouraged for evaluating environmental risk of SS application in forestry.


Subject(s)
Eucalyptus/drug effects , Fertilizers/analysis , Sewage/adverse effects , Soil Pollutants/toxicity , Eucalyptus/growth & development , Eucalyptus/physiology , Forestry , Soil/chemistry
10.
PLoS One ; 13(8): e0200647, 2018.
Article in English | MEDLINE | ID: mdl-30089119

ABSTRACT

Although Scandinavian flint is one of the most important materials used for prehistoric stone tool production in Northern and Central Europe, a conclusive method for securely differentiating between flint sources, geologically bound to northern European chalk formations, has never been achieved. The main problems with traditional approaches concern the oftentimes high similarities of SiO2 raw materials (i.e. chert and flint) on different scales due to similar genetic conditions and higher intra- than inter-source variation. Conventional chert and flint provenance studies chiefly concentrate on visual, petrographic or geochemical investigations. Hence, attempts to generate characteristic fingerprints of particular chert raw materials were in most cases unsatisfying. Here we show that the Multi Layered Chert Sourcing Approach (MLA) achieves a clear differentiation between primary sources of Scandinavian flint. The MLA combines visual comparative studies, stereo-microscopic analyses of microfossil inclusions, geochemical trace element analyses applying LA-ICP-MS (Laser Ablation Inductively Coupled Plasma Mass Spectrometry) and statistical analyses through CODA (Compositional Data Analysis). For archaeologists, provenance studies are the gateway to advance interpretations of economic behavior expressed in resource management strategies entailing the procurement, use and distribution of lithic raw materials. We demonstrate the relevance of our results for archaeological materials in a case study in which we were able to differentiate between Scandinavian flint sources and establish the provenance of historic ballast flint from a shipwreck found near Kristiansand close to the shore of southern Norway from a beach source in Northern Jutland, the Vigsø Bay.


Subject(s)
Archaeology , Quartz/analysis , Discriminant Analysis , Mass Spectrometry , Metals/chemistry , Microscopy , Norway , Ships
11.
Sci Total Environ ; 624: 1152-1162, 2018 May 15.
Article in English | MEDLINE | ID: mdl-29929227

ABSTRACT

Sardinia (Italy), the second largest island of the Mediterranean Sea, is a fire-prone land. Most Sardinian environments over time were shaped by fire, but some of them are too intrinsically fragile to withstand the currently increasing fire frequency. Calcareous pedoenvironments represent a significant part of Mediterranean areas, and require important efforts to prevent long-lasting degradation from fire. The aim of this study was to assess through an integrated multiple approach the impact of a single and highly severe wildland fire on limestone-derived soils. For this purpose, we selected two recently burned sites, Sant'Antioco and Laconi. Soil was sampled from 80 points on a 100×100m grid - 40 in the burned area and 40 in unburned one - and analyzed for particle size fractions, pH, electrical conductivity, organic carbon, total N, total P, and water repellency (WR). Fire behavior (surface rate of spread (ROS), fireline intensity (FLI), flame length (FL)) was simulated by BehavePlus 5.0.5 software. Comparisons between burned and unburned areas were done through ANOVA as well as deterministic and stochastic interpolation techniques; multiple correlations among parameters were evaluated by principal factor analysis (PFA) and differences/similarities between areas by principal component analysis (PCA). In both sites, fires were characterized by high severity and determined significant changes to some soil properties. The PFA confirmed the key ecological role played by fire in both sites, with the variability of a four-modeled components mainly explained by fire parameters, although the induced changes on soils were mainly site-specific. The PCA revealed the presence of two main "driving factors": slope (in Sant'Antioco), which increased the magnitude of ROS and FLI; and soil properties (in Laconi), which mostly affected FL. In both sites, such factors played a direct role in differentiating fire behavior and sites, while they played an indirect role in determining some effects on soil.

12.
Sci Total Environ ; 639: 129-145, 2018 Oct 15.
Article in English | MEDLINE | ID: mdl-29783114

ABSTRACT

Geochemical element separation is studied in 14 different sample media collected at 41 sites along an approximately 100-km long transect north of Oslo. At each site, soil C and O horizons and 12 plant materials (birch/spruce/cowberry/blueberry leaves/needles and twigs, horsetail, braken fern, pine bark and terrestrial moss) were sampled. The observed concentrations of 29 elements (K, Ca, P, Mg, Mn, S, Fe, Zn, Na, B, Cu, Mo, Co, Al, Ba, Rb, Sr, Ti, Ni, Pb, Cs, Cd, Ce, Sn, La, Tl, Y, Hg, Ag) were used to investigate soil-plant relations, and to evaluate the element differentiation between different plants, or between foliage and twigs of the same plant. In relation to the soil C horizon, the O horizon is strongly enriched (O/C ratio > 5) in Ag, Hg, Cd, Sn, S and Pb. Other elements (B, K, Ca, P, S, Mn) show higher concentrations in the plants than in the substrate represented by the C horizon, and often even higher concentrations than in the soil O horizon. Elements like B, K, Ca, S, Mg, P, Ba, and Cu are well tuned to certain concentration levels in most of the plants. This is demonstrated by their lower interquartile variability in the plants than in the soil. Cross-plots of element concentration, variance, and ratios, supported by linear discrimination analysis, establish that different plants are marked by their individual element composition, which is separable from, and largely independent of the natural substrate variability across the Gjøvik transect. Element allocation to foliage or twigs of the same plants can also be separated and thus dominantly depend on metabolism, physiology, and structure linked to biological functions, and only to a lesser degree on the substrate and environmental background. The results underline the importance of understanding the biological mechanisms of plant-soil interaction in order to correctly quantify anthropogenic impact on soil and plant geochemistry.


Subject(s)
Environmental Monitoring , Soil Pollutants/analysis , Norway , Picea , Soil , Trace Elements
13.
Stat Methods Med Res ; 27(6): 1878-1891, 2018 06.
Article in English | MEDLINE | ID: mdl-29767591

ABSTRACT

Compositional data analysis refers to analyzing relative information, based on ratios between the variables in a data set. Data from epidemiology are usually treated as absolute information in an analysis. We outline the differences in both approaches for univariate and multivariate statistical analyses, using illustrative data sets from Austrian districts. Not only the results of the analyses can differ, but in particular the interpretation differs. It is demonstrated that the compositional data analysis approach leads to new and interesting insights.


Subject(s)
Data Analysis , Epidemiologic Studies , Algorithms , Austria , Confounding Factors, Epidemiologic , Data Interpretation, Statistical , Multivariate Analysis
14.
Risk Anal ; 38(10): 2073-2086, 2018 10.
Article in English | MEDLINE | ID: mdl-29723427

ABSTRACT

The guidelines for setting environmental quality standards are increasingly based on probabilistic risk assessment due to a growing general awareness of the need for probabilistic procedures. One of the commonly used tools in probabilistic risk assessment is the species sensitivity distribution (SSD), which represents the proportion of species affected belonging to a biological assemblage as a function of exposure to a specific toxicant. Our focus is on the inverse use of the SSD curve with the aim of estimating the concentration, HCp, of a toxic compound that is hazardous to p% of the biological community under study. Toward this end, we propose the use of robust statistical methods in order to take into account the presence of outliers or apparent skew in the data, which may occur without any ecological basis. A robust approach exploits the full neighborhood of a parametric model, enabling the analyst to account for the typical real-world deviations from ideal models. We examine two classic HCp estimation approaches and consider robust versions of these estimators. In addition, we also use data transformations in conjunction with robust estimation methods in case of heteroscedasticity. Different scenarios using real data sets as well as simulated data are presented in order to illustrate and compare the proposed approaches. These scenarios illustrate that the use of robust estimation methods enhances HCp estimation.

15.
J Chromatogr A ; 1525: 109-115, 2017 Nov 24.
Article in English | MEDLINE | ID: mdl-29037593

ABSTRACT

While analyzing chromatographic data, it is necessary to preprocess it properly before exploration and/or supervised modeling. To make chromatographic signals comparable, it is crucial to remove the scaling effect, caused by differences in overall sample concentrations. One of the efficient methods of signal scaling is Probabilistic Quotient Normalization (PQN) [1]. However, it can be applied only to data for which the majority of features do not vary systematically among the studied classes of signals. When studying the influence of the traditional "fermentation" (oxidation) process on the concentration of 56 individual peaks detected in rooibos plant material, this assumption is not fulfilled. In this case, the only possible solution is the analysis of pairwise log-ratios, which are not influenced by the scaling constant. To estimate significant features, i.e., peaks differentiating the studied classes of samples (green and fermented rooibos plant material), we propose the application of rPLR (robust pair-wise log-ratios) as proposed by Walach et al. [2]. It allows for fast computation and identification of the significant features in terms of original variables (peaks) which is problematic, while working with the unfolded pair-wise log ratios. As demonstrated, it can be applied to designed data sets and in the case of contaminated data, it allows proper conclusions.


Subject(s)
Aspalathus/chemistry , Chromatography , Statistics as Topic/methods , Fermentation , Oxidation-Reduction
16.
BMC Bioinformatics ; 18(Suppl 2): 65, 2017 Feb 15.
Article in English | MEDLINE | ID: mdl-28251866

ABSTRACT

BACKGROUND: In the field of root biology there has been a remarkable progress in root phenotyping, which is the efficient acquisition and quantitative description of root morphology. What is currently missing are means to efficiently explore, exchange and present the massive amount of acquired, and often time dependent root phenotypes. RESULTS: In this work, we present visual summaries of root ensembles by aggregating root images with identical genetic characteristics. We use the generalized box plot concept with a new formulation of data depth. In addition to spatial distributions, we created a visual representation to encode temporal distributions associated with the development of root individuals. CONCLUSIONS: The new formulation of data depth allows for much faster implementation close to interactive frame rates. This allows us to present the statistics from bootstrapping that characterize the root sample set quality. As a positive side effect of the new data-depth formulation we are able to define the geometric median for the curve ensemble, which was well received by the domain experts.


Subject(s)
Models, Theoretical , Plant Roots/growth & development , Databases, Factual , Evaluation Studies as Topic , Image Processing, Computer-Assisted/methods , Phenotype , Reproducibility of Results
17.
Int J Biometeorol ; 61(7): 1347-1358, 2017 Jul.
Article in English | MEDLINE | ID: mdl-28220255

ABSTRACT

Long-term changes of plant phenological phases determined by complex interactions of environmental factors are in the focus of recent climate impact research. There is a lack of studies on the comparison of biogeographical regions in Europe in terms of plant responses to climate. We examined the flowering phenology of plant species to identify the spatio-temporal patterns in their responses to environmental variables over the period 1970-2010. Data were collected from 12 countries along a 3000-km-long, North-South transect from northern to eastern Central Europe.Biogeographical regions of Europe were covered from Finland to Macedonia. Robust statistical methods were used to determine the most influential factors driving the changes of the beginning of flowering dates. Significant species-specific advancements in plant flowering onsets within the Continental (3 to 8.3 days), Alpine (2 to 3.8 days) and by highest magnitude in the Boreal biogeographical regions (2.2 to 9.6 days per decades) were found, while less pronounced responses were detected in the Pannonian and Mediterranean regions. While most of the other studies only use mean temperature in the models, we show that also the distribution of minimum and maximum temperatures are reasonable to consider as explanatory variable. Not just local (e.g. temperature) but large scale (e.g. North Atlantic Oscillation) climate factors, as well as altitude and latitude play significant role in the timing of flowering across biogeographical regions of Europe. Our analysis gave evidences that species show a delay in the timing of flowering with an increase in latitude (between the geographical coordinates of 40.9 and 67.9), and an advance with changing climate. The woody species (black locust and small-leaved lime) showed stronger advancements in their timing of flowering than the herbaceous species (dandelion, lily of the valley). In later decades (1991-2010), more pronounced phenological change was detected than during the earlier years (1970-1990), which indicates the increased influence of human induced higher spring temperatures in the late twentieth century.


Subject(s)
Flowers/physiology , Magnoliopsida/physiology , Seasons , Europe , Temperature
20.
Math Geosci ; 48(8): 941-961, 2016.
Article in English | MEDLINE | ID: mdl-28316755

ABSTRACT

Compositional data, as they typically appear in geochemistry in terms of concentrations of chemical elements in soil samples, need to be expressed in log-ratio coordinates before applying the traditional statistical tools if the relative structure of the data is of primary interest. There are different possibilities for this purpose, like centered log-ratio coefficients, or isometric log-ratio coordinates. In both the approaches, geometric means of the compositional parts are involved, and it is unclear how measurement errors or detection limit problems affect their presentation in coordinates. This problem is investigated theoretically by making use of the theory of error propagation. Due to certain limitations of this approach, the effect of error propagation is also studied by means of simulations. This allows to provide recommendations for practitioners on the amount of error and on the expected distortion of the results, depending on the purpose of the analysis.

SELECTION OF CITATIONS
SEARCH DETAIL
...