RESUMO
Background: Technological advances involving RNA-Seq and Bioinformatics allow quantifying the transcriptional levels of genes in cells, tissues, and cell lines, permitting the identification of Differentially Expressed Genes (DEGs). DESeq2 and edgeR are well-established computational tools used for this purpose and they are based upon generalized linear models (GLMs) that consider only fixed effects in modeling. However, the inclusion of random effects reduces the risk of missing potential DEGs that may be essential in the context of the biological phenomenon under investigation. The generalized linear mixed models (GLMM) can be used to include both effects. Methods: We present DEGRE (Differentially Expressed Genes with Random Effects), a user-friendly tool capable of inferring DEGs where fixed and random effects on individuals are considered in the experimental design of RNA-Seq research. DEGRE preprocesses the raw matrices before fitting GLMMs on the genes and the derived regression coefficients are analyzed using the Wald statistical test. DEGRE offers the Benjamini-Hochberg or Bonferroni techniques for P-value adjustment. Results: The datasets used for DEGRE assessment were simulated with known identification of DEGs. These have fixed effects, and the random effects were estimated and inserted to measure the impact of experimental designs with high biological variability. For DEGs' inference, preprocessing effectively prepares the data and retains overdispersed genes. The biological coefficient of variation is inferred from the counting matrices to assess variability before and after the preprocessing. The DEGRE is computationally validated through its performance by the simulation of counting matrices, which have biological variability related to fixed and random effects. DEGRE also provides improved assessment measures for detecting DEGs in cases with higher biological variability. We show that the preprocessing established here effectively removes technical variation from those matrices. This tool also detects new potential candidate DEGs in the transcriptome data of patients with bipolar disorder, presenting a promising tool to detect more relevant genes. Conclusions: DEGRE provides data preprocessing and applies GLMMs for DEGs' inference. The preprocessing allows efficient remotion of genes that could impact the inference. Also, the computational and biological validation of DEGRE has shown to be promising in identifying possible DEGs in experiments derived from complex experimental designs. This tool may help handle random effects on individuals in the inference of DEGs and presents a potential for discovering new interesting DEGs for further biological investigation.
Assuntos
Perfilação da Expressão Gênica , Transcriptoma , Humanos , Modelos Lineares , Perfilação da Expressão Gênica/métodos , Transcriptoma/genética , Biologia Computacional/métodosRESUMO
Although wild birds are considered the main reservoir of the influenza A virus (IAV) in nature, empirical investigations exploring the interaction between the IAV prevalence in these populations and environmental drivers remain scarce. Chile has a coastline of more than 4000 kilometres with hundreds of wetlands, which are important habitats for both resident and inter-hemispheric migratory species. The aim of this study was to characterize the temporal dynamics of IAV in main wetlands in central Chile and to assess the influence of environmental variables on AIV prevalence. For that purpose, four wetlands were studied from September 2015 to June 2018. Fresh faecal samples of wild birds were collected for IAV detection by real-time RT-PCR. Furthermore, a count of wild birds present at the site was performed and environmental variables, such as temperature, rainfall, vegetation coverage (Normalized Difference Vegetation Index (NDVI)) and water body size, were determined. A generalized linear mixed model was built to assess the association between IAV prevalence and explanatory variables. An overall prevalence of 4.28% ± 0.28% was detected with important fluctuations among seasons, being greater during summer (OR = 4.87, 95% CI 2.11 to 11.21) and fall (OR = 2.59, 95% CI 1.12 to 5.97). Prevalence was positively associated with minimum temperature for the month of sampling and negatively associated with water body size measured two months before sampling, and NDVI measured three months before sampling. These results contribute to the understanding of IAV ecological drivers in Chilean wetlands providing important considerations for the global surveillance of IAV.
Assuntos
Vírus da Influenza A/fisiologia , Influenza Aviária/epidemiologia , Animais , Aves , Chile/epidemiologia , Meio Ambiente , Influenza Aviária/virologia , Prevalência , Fatores de Tempo , Áreas AlagadasRESUMO
Gene-environment (GE) interaction has important implications in the etiology of complex diseases that are caused by a combination of genetic factors and environment variables. Several authors have developed GE analysis in the context of independent subjects or longitudinal data using a gene-set. In this paper, we propose to analyze GE interaction for discrete and continuous phenotypes in family studies by incorporating the relatedness among the relatives for each family into a generalized linear mixed model (GLMM) and by using a gene-based variance component test. In addition, we deal with collinearity problems arising from linkage disequilibrium among single nucleotide polymorphisms (SNPs) by considering their coefficients as random effects under the null model estimation. We show that the best linear unbiased predictor (BLUP) of such random effects in the GLMM is equivalent to the ridge regression estimator. This equivalence provides a simple method to estimate the ridge penalty parameter in comparison to other computationally-demanding estimation approaches based on cross-validation schemes. We evaluated the proposed test using simulation studies and applied it to real data from the Baependi Heart Study consisting of 76 families. Using our approach, we identified an interaction between BMI and the Peroxisome Proliferator Activated Receptor Gamma (PPARG) gene associated with diabetes.
Assuntos
Família , Interação Gene-Ambiente , Desequilíbrio de Ligação , Modelos Genéticos , Humanos , Modelos Lineares , Fenótipo , Polimorfismo de Nucleotídeo ÚnicoRESUMO
The intraclass correlation is commonly used with clustered data. It is often estimated based on fitting a model to hierarchical data and it leads, in turn, to several concepts such as reliability, heritability, inter-rater agreement, etc. For data where linear models can be used, such measures can be defined as ratios of variance components. Matters are more difficult for non-Gaussian outcomes. The focus here is on count and time-to-event outcomes where so-called combined models are used, extending generalized linear mixed models, to describe the data. These models combine normal and gamma random effects to allow for both correlation due to data hierarchies as well as for overdispersion. Furthermore, because the models admit closed-form expressions for the means, variances, higher moments, and even the joint marginal distribution, it is demonstrated that closed forms of intraclass correlations exist. The proposed methodology is illustrated using data from agricultural and livestock studies.
Assuntos
Biometria/métodos , Modelos Lineares , Agricultura/estatística & dados numéricos , Animais , Gado , Reprodutibilidade dos Testes , Estatística como AssuntoRESUMO
Objetivou-se com este trabalho comparar modelos de predição de plantas sobreviventes de Eucalyptus grandis. Utilizaram-se os seguintes modelos: modelo linear misto com os dados transformados, utilizando-se as transformações angular e BOX-COX; modelo linear generalizado misto com distribuição binomial e funções de ligação logística, probit e complemento log-log; modelo linear generalizado misto com distribuição Poisson e função de ligação logarítmica. Os dados são provenientes de um experimento em blocos ao acaso, para avaliação de progênies maternas de Eucalyptus grandis, aos 5 anos de idade, em que a variável resposta são plantas sobreviventes. Para comparação dos efeitos entre os modelos foram estimadas as correlações de Spearman e aplicado o teste de permutação de Fisher. Foi possível concluir que, o modelo linear generalizado misto com distribuição Poisson e função de ligação logarítmica se ajustou mal aos dados e que as estimativas para os efeitos fixos e predição para os efeitos aleatórios, não se diferenciaram entre os demais modelos estudados.
The objective of this work was to compare models for prediction of the survival of plants of Eucalyptus grandis. The following models were used: linear mixed model with the transformed data, by utilizing the angular transformations and BOX-COX; generalized linear mixed model with binomial distribution and logistic functions, probit and complement log-log links; generalized linear mixed model with Poisson distribution and logarithmic link function. The data came from a randomized block experiment for evaluation of Eucalyptus grandis maternal progenies at five years old, in which the variable response are surviving plants. For comparison of the effects among the models the correlations of Spearman were estimated and the test of permutation of Fisher was applied. It was possible to conclude that: the generalized linear mixed model with Poisson distribution and logarithmic link function misadjusted to the data; the estimates for the fixed effects and prediction for the random effects did not differ among the to other studied models.