Search | VHL Regional Portal

1.

Predicting yield of individual field-grown rapeseed plants from rosette-stage leaf gene expression.

De Meyer, Sam; Cruz, Daniel Felipe; De Swaef, Tom; Lootens, Peter; De Block, Jolien; Bird, Kevin; Sprenger, Heike; Van de Voorde, Michael; Hawinkel, Stijn; Van Hautegem, Tom; Inzé, Dirk; Nelissen, Hilde; Roldán-Ruiz, Isabel; Maere, Steven.

PLoS Comput Biol ; 19(5): e1011161, 2023 May.

Article in English | MEDLINE | ID: mdl-37253069

ABSTRACT

In the plant sciences, results of laboratory studies often do not translate well to the field. To help close this lab-field gap, we developed a strategy for studying the wiring of plant traits directly in the field, based on molecular profiling and phenotyping of individual plants. Here, we use this single-plant omics strategy on winter-type Brassica napus (rapeseed). We investigate to what extent early and late phenotypes of field-grown rapeseed plants can be predicted from their autumnal leaf gene expression, and find that autumnal leaf gene expression not only has substantial predictive power for autumnal leaf phenotypes but also for final yield phenotypes in spring. Many of the top predictor genes are linked to developmental processes known to occur in autumn in winter-type B. napus accessions, such as the juvenile-to-adult and vegetative-to-reproductive phase transitions, indicating that the yield potential of winter-type B. napus is influenced by autumnal development. Our results show that single-plant omics can be used to identify genes and processes influencing crop yield in the field.

Subject(s)

Brassica napus , Brassica napus/genetics , Plant Leaves/genetics , Phenotype , Gene Expression

2.

Spatial Regression Models for Field Trials: A Comparative Study and New Ideas.

Hawinkel, Stijn; De Meyer, Sam; Maere, Steven.

Front Plant Sci ; 13: 858711, 2022.

Article in English | MEDLINE | ID: mdl-35432426

ABSTRACT

Naturally occurring variability within a study region harbors valuable information on relationships between biological variables. Yet, spatial patterns within these study areas, e.g., in field trials, violate the assumption of independence of observations, setting particular challenges in terms of hypothesis testing, parameter estimation, feature selection, and model evaluation. We evaluate a number of spatial regression methods in a simulation study, including more realistic spatial effects than employed so far. Based on our results, we recommend generalized least squares (GLS) estimation for experimental as well as for observational setups and demonstrate how it can be incorporated into popular regression models for high-dimensional data such as regularized least squares. This new method is available in the BioConductor R-package pengls. Inclusion of a spatial error structure improves parameter estimation and predictive model performance in low-dimensional settings and also improves feature selection in high-dimensional settings by reducing "red-shift": the preferential selection of features with spatial structure. In addition, we argue that the absence of spatial autocorrelation (SAC) in the model residuals should not be taken as a sign of a good fit, since it may result from overfitting the spatial trend. Finally, we confirm our findings in a case study on the prediction of winter wheat yield based on multispectral measurements.

3.

Statistical detection of synergy: New methods and a comparative study.

Thas, Olivier; Tourny, Annelies; Verbist, Bie; Hawinkel, Stijn; Nazarov, Maxim; Mutambanengwe, Kathy; Bijnens, Luc.

Pharm Stat ; 21(2): 345-360, 2022 03.

Article in English | MEDLINE | ID: mdl-34608741

ABSTRACT

Combination therapies are increasingly adopted as the standard of care for various diseases to improve treatment response, minimise the development of resistance and/or minimise adverse events. Therefore, synergistic combinations are screened early in the drug discovery process, in which their potential is evaluated by comparing the observed combination effect to that expected under a null model. Such methodology is implemented in the BIGL R-package which allows for a quick screening of drug combinations. We extend the meanR and maxR tests from this package by allowing non-constant variance of the responses and by extending the list of null models (Loewe, Loewe2, HSA, Bliss). These new tests are evaluated in a comprehensive simulation study under various models for additivity and synergy, various monotherapeutic dose-response models (complete, partial and incomplete responders) and various types of deviation from the constant variance assumption. In addition, the BIGL package is extended with bootstrap confidence intervals for the individual off-axis points and for the overall synergy strength, which were demonstrated to have reliable coverage and can complement the existing tests. We conclude that the differences in performance between the different null models are small and depend on the simulation scenario. As a result, the choice of null model should be driven by expert knowledge on the particular problem. Finally, we demonstrate the new features of the BIGL package and the difference between the synergy models on a real dataset from drug discovery. The BIGL package is available at CRAN (https://CRAN.R-project.org/package=BIGL) and as a Shiny app (https://synergy.openanalytics.eu/app).

Subject(s)

Drug Discovery , Computer Simulation , Drug Combinations , Drug Discovery/methods , Drug Synergism , Humans

4.

Sequence count data are poorly fit by the negative binomial distribution.

Hawinkel, Stijn; Rayner, J C W; Bijnens, Luc; Thas, Olivier.

PLoS One ; 15(4): e0224909, 2020.

Article in English | MEDLINE | ID: mdl-32352970

ABSTRACT

Sequence count data are commonly modelled using the negative binomial (NB) distribution. Several empirical studies, however, have demonstrated that methods based on the NB-assumption do not always succeed in controlling the false discovery rate (FDR) at its nominal level. In this paper, we propose a dedicated statistical goodness of fit test for the NB distribution in regression models and demonstrate that the NB-assumption is violated in many publicly available RNA-Seq and 16S rRNA microbiome datasets. The zero-inflated NB distribution was not found to give a substantially better fit. We also show that the NB-based tests perform worse on the features for which the NB-assumption was violated than on the features for which no significant deviation was detected. This gives an explanation for the poor behaviour of NB-based tests in many published evaluation studies. We conclude that nonparametric tests should be preferred over parametric methods.

Subject(s)

Binomial Distribution , RNA-Seq/methods , Microbiota , Poisson Distribution , RNA, Ribosomal, 16S/genetics , Regression Analysis

5.

Model-based joint visualization of multiple compositional omics datasets.

Hawinkel, Stijn; Bijnens, Luc; Cao, Kim-Anh Lê; Thas, Olivier.

NAR Genom Bioinform ; 2(3): lqaa050, 2020 Sep.

Article in English | MEDLINE | ID: mdl-33575602

ABSTRACT

The integration of multiple omics datasets measured on the same samples is a challenging task: data come from heterogeneous sources and vary in signal quality. In addition, some omics data are inherently compositional, e.g. sequence count data. Most integrative methods are limited in their ability to handle covariates, missing values, compositional structure and heteroscedasticity. In this article we introduce a flexible model-based approach to data integration to address these current limitations: COMBI. We combine concepts, such as compositional biplots and log-ratio link functions with latent variable models, and propose an attractive visualization through multiplots to improve interpretation. Using real data examples and simulations, we illustrate and compare our method with other data integration techniques. Our algorithm is available in the R-package combi.

6.

A unified framework for unconstrained and constrained ordination of microbiome read count data.

Hawinkel, Stijn; Kerckhof, Frederiek-Maarten; Bijnens, Luc; Thas, Olivier.

PLoS One ; 14(2): e0205474, 2019.

Article in English | MEDLINE | ID: mdl-30759084

ABSTRACT

Explorative visualization techniques provide a first summary of microbiome read count datasets through dimension reduction. A plethora of dimension reduction methods exists, but many of them focus primarily on sample ordination, failing to elucidate the role of the bacterial species. Moreover, implicit but often unrealistic assumptions underlying these methods fail to account for overdispersion and differences in sequencing depth, which are two typical characteristics of sequencing data. We combine log-linear models with a dispersion estimation algorithm and flexible response function modelling into a framework for unconstrained and constrained ordination. The method is able to cope with differences in dispersion between taxa and varying sequencing depths, to yield meaningful biological patterns. Moreover, it can correct for observed technical confounders, whereas other methods are adversely affected by these artefacts. Unlike distance-based ordination methods, the assumptions underlying our method are stated explicitly and can be verified using simple diagnostics. The combination of unconstrained and constrained ordination in the same framework is unique in the field and facilitates microbiome data exploration. We illustrate the advantages of our method on simulated and real datasets, while pointing out flaws in existing methods. The algorithms for fitting and plotting are available in the R-package RCM.

Subject(s)

Data Visualization , Microbiota/genetics , Algorithms , Bacteria/genetics , Computer Simulation , Humans , Monte Carlo Method , Neoplasms/microbiology , RNA, Ribosomal, 16S/genetics

7.

A broken promise: microbiome differential abundance methods do not control the false discovery rate.

Hawinkel, Stijn; Mattiello, Federico; Bijnens, Luc; Thas, Olivier.

Brief Bioinform ; 20(1): 210-221, 2019 01 18.

Article in English | MEDLINE | ID: mdl-28968702

ABSTRACT

High-throughput sequencing technologies allow easy characterization of the human microbiome, but the statistical methods to analyze microbiome data are still in their infancy. Differential abundance methods aim at detecting associations between the abundances of bacterial species and subject grouping factors. The results of such methods are important to identify the microbiome as a prognostic or diagnostic biomarker or to demonstrate efficacy of prodrug or antibiotic drugs. Because of a lack of benchmarking studies in the microbiome field, no consensus exists on the performance of the statistical methods. We have compared a large number of popular methods through extensive parametric and nonparametric simulation as well as real data shuffling algorithms. The results are consistent over the different approaches and all point to an alarming excess of false discoveries. This raises great doubts about the reliability of discoveries in past studies and imperils reproducibility of microbiome experiments. To further improve method benchmarking, we introduce a new simulation tool that allows to generate correlated count data following any univariate count distribution; the correlation structure may be inferred from real data. Most simulation studies discard the correlation between species, but our results indicate that this correlation can negatively affect the performance of statistical methods.

Subject(s)

Microbiota , Algorithms , Biodiversity , Computational Biology/methods , Computer Simulation , Databases, Genetic/statistics & numerical data , High-Throughput Nucleotide Sequencing/statistics & numerical data , Humans , Microbiota/genetics , Statistics, Nonparametric

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL