Search | VHL Regional Portal

1.

Beyond ANOVA and MANOVA for repeated measures: Advantages of generalized estimated equations and generalized linear mixed models and its use in neuroscience research.

de Melo, Márcio Braga; Daldegan-Bueno, Dimitri; Menezes Oliveira, Maria Gabriela; de Souza, Altay Lino.

Eur J Neurosci ; 56(12): 6089-6098, 2022 12.

Article in English | MEDLINE | ID: mdl-36342498

ABSTRACT

In neuroscience research, longitudinal data are often analysed using analysis of variance (ANOVA) and multivariate analysis of variance (MANOVA) for repeated measures (rmANOVA/rmMANOVA). However, these analyses have special requirements: The variances of the differences between all possible pairs of within-subject conditions (i.e., levels of the independent variable) must be equal. They are also limited to fixed repeated time intervals and are sensitive to missing data. In contrast, other models, such as the generalized estimating equations (GEE) and the generalized linear mixed models (GLMM), suggest another way to think about the data and the studied phenomenon. Instead of forcing the data into the ANOVAs assumptions, it is possible to design a flexible/personalized model according to the nature of the dependent variable. We discuss some advantages of GEE and GLMM as alternatives to rmANOVA and rmMANOVA in neuroscience research, including the possibility of using different distributions for the parameters of the dependent variable, a better approach for different time length points, and better adjustment to missing data. We illustrate these advantages by showing a comparison between rmANOVA and GEE in a real example and providing the data and a tutorial code to reproduce these analyses in R. We conclude that GEE and GLMM may provide more reliable results when compared to rmANOVA and rmMANOVA in neuroscience research, especially in small sample sizes with unbalanced longitudinal designs with or without missing data.

Subject(s)

Models, Statistical , Neurosciences , Analysis of Variance , Research Design , Linear Models , Longitudinal Studies

2.

A heteroskedastic model of Park Grass spring hay yields in response to weather suggests continuing yield decline with climate change in future decades.

Addy, John W G; Ellis, Richard H; MacLaren, Chloe; Macdonald, Andy J; Semenov, Mikhail A; Mead, Andrew.

J R Soc Interface ; 19(193): 20220361, 2022 08.

Article in English | MEDLINE | ID: mdl-36000226

ABSTRACT

UK grasslands perform important environmental and economic functions, but their future productivity under climate change is uncertain. Spring hay yields from 1902 to 2016 at one site (the Park Grass Long Term Experiment) in southern England under four different fertilizer regimes were modelled in response to weather (seasonal temperature and rainfall). The modelling approach applied comprised: (1) a Bayesian model comparison to model parametrically the heteroskedasticity in a gamma likelihood function; (2) a Bayesian varying intercept multiple regression model with an autoregressive lag one process (to incorporate the effect of productivity in the previous year) of the response of hay yield to weather from 1902 to 2016. The model confirmed that warmer and drier years, specifically, autumn, winter and spring, in the twentieth and twenty-first centuries reduced yield. The model was applied to forecast future spring hay yields at Park Grass under different climate change scenarios (HadGEM2 and GISS RCP 4.5 and 8.5). This application indicated that yields are forecast to decline further between 2020 and 2080, by as much as 48-50%. These projections are specific to Park Grass, but implied a severe reduction in grassland productivity in southern England with climate change during the twenty-first century.

Subject(s)

Climate Change , Poaceae , Bayes Theorem , Poaceae/physiology , Seasons , Weather

3.

A model for an undergraduate research experience program in quantitative sciences.

Tan, Kay See; Elkin, Elena B; Satagopan, Jaya M.

J Stat Data Sci Educ ; 30(1): 65-74, 2022.

Article in English | MEDLINE | ID: mdl-35722171

ABSTRACT

We developed a summer research experience program within a freestanding comprehensive cancer center to cultivate undergraduate students with an interest in and an aptitude for quantitative sciences focused on oncology. This unconventional location for an undergraduate program is an ideal setting for interdisciplinary training in the intersection of oncology, statistics, and epidemiology. This paper describes the development and implementation of a hands-on research experience program in this unique environment. Core components of the program include faculty-mentored projects, instructional programs to improve research skills and domain knowledge, and professional development activities. We discuss key considerations such as effective partnership between research and administrative units, recruiting students, and identifying faculty mentors with quantitative projects. We describe evaluation approaches and discuss post-program outcomes and lessons learned. In its initial two years, the program successfully improved students' perception of competence gained in research skills and statistical knowledge across several knowledge domains. The majority of students also went on to pursue graduate degrees in a quantitative field or work in oncology-centric academic research roles. Our research-based training model can be adapted by a variety of organizations motivated to develop a summer research experience program in quantitative sciences for undergraduate students.

4.

Optimization of Cutting Parameters in Turning of Titanium Alloy (Grade 5) by Analysing Surface Roughness, Tool Wear and Energy Consumption.

Akkus, H; Yaka, H.

Exp Tech ; 46(6): 945-956, 2022.

Article in English | MEDLINE | ID: mdl-34848920

ABSTRACT

In this study, Ti 6Al-4 V (grade 5) ELI alloy was machined with minimum energy and optimum surface quality and minimum tool wear. The appropriate cutting tool and suitable cutting parameters have been selected. As a result of the turning process, average surface roughness (Ra), tool wear and energy consumption were measured. The results have been analyzed by normality test, linear regression model, Taguchi analysis, ANOVA, Pareto graphics and multiple optimization method. It has been observed that high tool wear value increases Ra and energy consumption. In multiple optimization, it was concluded that it made predictions with 89,1% accuracy for Ra, 58,33% for tool wear, 96,75% for energy consumption. While the feed rate was the effective parameter for Ra and energy consumption, the effective parameter in tool wear was the cutting speed. Our study has revealed that by controlling energy consumption, surface quality can be maintained and tool wear can be controlled.

5.

Calling small variants using universality with Bayes-factor-adjusted odds ratios.

Zhao, Xiaofei; Hu, Allison C; Wang, Sizhen; Wang, Xiaoyue.

Brief Bioinform ; 23(1)2022 01 17.

Article in English | MEDLINE | ID: mdl-34791010

ABSTRACT

The application of next-generation sequencing in research and particularly in clinical routine requires highly accurate variant calling. Here we describe UVC, a method for calling small variants of germline or somatic origin. By unifying opposite assumptions with sublation, we discovered the following two empirical laws to improve variant calling: allele fraction at high sequencing depth is inversely proportional to the cubic root of variant-calling error rate, and odds ratios adjusted with Bayes factors can model various sequencing biases. UVC outperformed other variant callers on the GIAB germline truth sets, 192 scenarios of in silico mixtures simulating 192 combinations of tumor/normal sequencing depths and tumor/normal purities, the GIAB somatic truth sets derived from physical mixture, and the SEQC2 somatic reference sets derived from the breast-cancer cell-line HCC1395. UVC achieved 100% concordance with the manual review conducted by multiple independent researchers on a Qiagen 71-gene-panel dataset derived from 16 patients with colon adenoma. UVC outperformed other unique molecular identifier (UMI)-aware variant callers on the datasets used for publishing these variant callers. Performance was measured with sensitivity-specificity trade off for called variants. The improved variant calls generated by UVC from previously published UMI-based sequencing data provided additional insight about DNA damage repair. UVC is open-sourced under the BSD 3-Clause license at https://github.com/genetronhealth/uvc and quay.io/genetronhealth/gcc-6-3-0-uvc-0-6-0-441a694.

Subject(s)

High-Throughput Nucleotide Sequencing , Software , Alleles , Bayes Theorem , High-Throughput Nucleotide Sequencing/methods , Humans , Odds Ratio , Polymorphism, Single Nucleotide

6.

Confounding, Mediation, or Independent Effect? Childhood Psychological Abuse, Mental Health, Mood/Psychological State, COPD, and Migraine.

Sheikh, Mashhood Ahmed.

J Interpers Violence ; 36(15-16): NP8706-NP8723, 2021 08.

Article in English | MEDLINE | ID: mdl-31046532

ABSTRACT

In some settings, it may be difficult to differentiate between a confounder and a mediator. For instance, the observed association of self-reported childhood psychological abuse (CPA) with onset of chronic obstructive pulmonary disease (COPD) and migraine may be confounded by current mood/psychological state (e.g., the subjective evaluation of one's own affective state), as well as mediated by an individual's psychopathological symptoms. In this study, we propose the "independence hypothesis," which could prove meaningful to explore in data that lack prospective or objective indices of CPA. We used cross-sectional data from wave VI (2007-2008) of the Tromsø Study, Norway (N = 12,981). The associations between CPA and COPD and migraine were assessed with Poisson regression models. CPA was associated with a 46% increased risk of COPD (relative risk [RR] = 1.46, 95% confidence interval [CI]: [1.02, 1.90]) and a 28% increased risk of migraine in adulthood (RR = 1.28, 95% CI: [1.04, 1.53]), independent of age, sex, parental history of psychiatric problems/asthma/dementia, smoking, respondent's mood/psychological state, and mental health. These findings suggest that the association between retrospectively reported CPA and COPD and migraine is not driven entirely by respondent's mood/psychological state and mental health. Assessing the independent effect of self-reported CPA on COPD and migraine in retrospective studies may prove more meaningful than exploring the mediating role of mental health. Here, we provide the analytical rationale for assessing the independent effect in settings where it is difficult to differentiate between a confounder and a mediator. Moreover, we provide a theoretical rationale for assessing the independent effect of retrospectively reported childhood adversity on health and well-being.

Subject(s)

Migraine Disorders , Pulmonary Disease, Chronic Obstructive , Adult , Cross-Sectional Studies , Emotional Abuse , Humans , Mental Health , Migraine Disorders/epidemiology , Norway/epidemiology , Prospective Studies , Pulmonary Disease, Chronic Obstructive/epidemiology , Retrospective Studies

7.

The Role of Academia in Data Science Education.

Irizarry, Rafael A.

Harv Data Sci Rev ; 2(1)2020.

Article in English | MEDLINE | ID: mdl-38807746

ABSTRACT

As the demand for data scientists continues to grow, universities are trying to figure out how to best contribute to the training of a workforce. However, there does not appear to be a consensus on the fundamental principles, expertise, skills, or knowledge-base needed to define an academic discipline. We argue that data science is not a discipline but rather an umbrella term used to describe a complex process involving not one data scientist possessing all the necessary expertise, but a team of data scientists with nonoverlapping complementary skills. We provide some recommendations for how to take this into account when designing data science academic programs.

8.

Applied Statistics for Human Genetics Using R.

Chan, Bertram K C.

Adv Exp Med Biol ; 1082: 123-144, 2018.

Article in English | MEDLINE | ID: mdl-30357718

ABSTRACT

This chapter considers the fundamental concepts in the theory of probability and applied statistics in epidemiology, including the biostatistical concepts and measures in genetic association and familial aggregation studies, including: Additional Approaches in Familial Aggregation Studies Twin Studies Adoption Studies Inbreeding Studies Randomization Test Segregation studies, Linkage studies, Association studies Genome-wide Association Studies (GWAS) Big Data and Human Genomics.

Subject(s)

Human Genetics , Statistics as Topic , Adoption , Biometry , Genetic Linkage , Genetic Predisposition to Disease , Genome-Wide Association Study , Humans , Inbreeding , Twin Studies as Topic

9.

A Guide to Teaching Data Science.

Hicks, Stephanie C; Irizarry, Rafael A.

Am Stat ; 72(4): 382-391, 2018.

Article in English | MEDLINE | ID: mdl-31105314

ABSTRACT

Demand for data science education is surging and traditional courses offered by statistics departments are not meeting the needs of those seeking training. This has led to a number of opinion pieces advocating for an update to the Statistics curriculum. The unifying recommendation is that computing should play a more prominent role. We strongly agree with this recommendation, but advocate the main priority is to bring applications to the forefront as proposed by Nolan and Speed (1999). We also argue that the individuals tasked with developing data science courses should not only have statistical training, but also have experience analyzing data with the main objective of solving real-world problems. Here, we share a set of general principles and offer a detailed guide derived from our successful experience developing and teaching a graduate-level, introductory data science course centered entirely on case studies. We argue for the importance of statistical thinking, as defined by Wild and Pfannkuch (1999) and describe how our approach teaches students three key skills needed to succeed in data science, which we refer to as creating, connecting, and computing. This guide can also be used for statisticians wanting to gain more practical knowledge about data science before embarking on teaching an introductory course.

10.

Inequalities involving hypergeometric and related functions.

Lehnigk, Siegfried H.

J Inequal Appl ; 2018(1): 253, 2018.

Article in English | MEDLINE | ID: mdl-30839642

ABSTRACT

An inequality is being proved which is connected to cost-effective numerical density estimation of the hyper-gamma probability distribution. The left-hand side of the inequality is a combination of two in the third parameter distinct versions of the hypergeometric function at the point one. All three parameters are functions of the distribution's terminal shape. The first and second are equal. The distinct third parameters of the two hypergeometric functions depend on terminal and initial shape. The other side of the inequality is determined by the quotient of two infinite series, which are related to the first derivatives with respect to terminal shape of the hypergeometric functions which appear in its left-hand side.

11.

Methodological challenges to multivariate syndromic surveillance: a case study using Swiss animal health data.

Vial, Flavie; Wei, Wei; Held, Leonhard.

BMC Vet Res ; 12(1): 288, 2016 Dec 20.

Article in English | MEDLINE | ID: mdl-27998276

ABSTRACT

BACKGROUND: In an era of ubiquitous electronic collection of animal health data, multivariate surveillance systems (which concurrently monitor several data streams) should have a greater probability of detecting disease events than univariate systems. However, despite their limitations, univariate aberration detection algorithms are used in most active syndromic surveillance (SyS) systems because of their ease of application and interpretation. On the other hand, a stochastic modelling-based approach to multivariate surveillance offers more flexibility, allowing for the retention of historical outbreaks, for overdispersion and for non-stationarity. While such methods are not new, they are yet to be applied to animal health surveillance data. We applied an example of such stochastic model, Held and colleagues' two-component model, to two multivariate animal health datasets from Switzerland. RESULTS: In our first application, multivariate time series of the number of laboratories test requests were derived from Swiss animal diagnostic laboratories. We compare the performance of the two-component model to parallel monitoring using an improved Farrington algorithm and found both methods yield a satisfactorily low false alarm rate. However, the calibration test of the two-component model on the one-step ahead predictions proved satisfactory, making such an approach suitable for outbreak prediction. In our second application, the two-component model was applied to the multivariate time series of the number of cattle abortions and the number of test requests for bovine viral diarrhea (a disease that often results in abortions). We found that there is a two days lagged effect from the number of abortions to the number of test requests. We further compared the joint modelling and univariate modelling of the number of laboratory test requests time series. The joint modelling approach showed evidence of superiority in terms of forecasting abilities. CONCLUSIONS: Stochastic modelling approaches offer the potential to address more realistic surveillance scenarios through, for example, the inclusion of times series specific parameters, or of covariates known to have an impact on syndrome counts. Nevertheless, many methodological challenges to multivariate surveillance of animal SyS data still remain. Deciding on the amount of corroboration among data streams that is required to escalate into an alert is not a trivial task given the sparse data on the events under consideration (e.g. disease outbreaks).

Subject(s)

Abortion, Veterinary/epidemiology , Bovine Virus Diarrhea-Mucosal Disease/epidemiology , Cattle Diseases/epidemiology , Disease Outbreaks/veterinary , Epidemiological Monitoring/veterinary , Models, Theoretical , Abortion, Veterinary/etiology , Algorithms , Animals , Bovine Virus Diarrhea-Mucosal Disease/complications , Bovine Virus Diarrhea-Mucosal Disease/diagnosis , Cattle , Cattle Diseases/diagnosis , Population Surveillance , Switzerland/epidemiology , Syndrome

12.

Using a Data Quality Framework to Clean Data Extracted from the Electronic Health Record: A Case Study.

Dziadkowiec, Oliwier; Callahan, Tiffany; Ozkaynak, Mustafa; Reeder, Blaine; Welton, John.

EGEMS (Wash DC) ; 4(1): 1201, 2016.

Article in English | MEDLINE | ID: mdl-27429992

ABSTRACT

OBJECTIVES: We examine the following: (1) the appropriateness of using a data quality (DQ) framework developed for relational databases as a data-cleaning tool for a data set extracted from two EPIC databases, and (2) the differences in statistical parameter estimates on a data set cleaned with the DQ framework and data set not cleaned with the DQ framework. BACKGROUND: The use of data contained within electronic health records (EHRs) has the potential to open doors for a new wave of innovative research. Without adequate preparation of such large data sets for analysis, the results might be erroneous, which might affect clinical decision-making or the results of Comparative Effectives Research studies. METHODS: Two emergency department (ED) data sets extracted from EPIC databases (adult ED and children ED) were used as examples for examining the five concepts of DQ based on a DQ assessment framework designed for EHR databases. The first data set contained 70,061 visits; and the second data set contained 2,815,550 visits. SPSS Syntax examples as well as step-by-step instructions of how to apply the five key DQ concepts these EHR database extracts are provided. CONCLUSIONS: SPSS Syntax to address each of the DQ concepts proposed by Kahn et al. (2012)1 was developed. The data set cleaned using Kahn's framework yielded more accurate results than the data set cleaned without this framework. Future plans involve creating functions in R language for cleaning data extracted from the EHR as well as an R package that combines DQ checks with missing data analysis functions.

13.

The use of statistical software in food science and technology: Advantages, limitations and misuses.

Nunes, Cleiton Antônio; Alvarenga, Verônica Ortiz; de Souza Sant'Ana, Anderson; Santos, Jânio Sousa; Granato, Daniel.

Food Res Int ; 75: 270-280, 2015 Sep.

Article in English | MEDLINE | ID: mdl-28454957

ABSTRACT

Strict requirements of scientific journals allied to the need to prove the experimental data are (in)significant from the statistical standpoint have led to a steep increase in the use and development of statistical software. In this aspect, it is observed that the increasing number of software tools and packages and their wide usage has created a generation of 'click and go' users, who are eagerly destined to obtain the p-values and multivariate graphs (projection of samples and variables on the factor plane), but have no idea on how the statistical parameters are calculated and the theoretical and practical reasons he/she performed such tests. However, in this paper, some published examples are listed and discussed in detail to provide a holistic insight (positive points and limitations) about the uses and misuses of some statistical methods using different available statistical software. Additionally, a brief description of several commercial and free statistical software is made highlighting their advantages and limitations.

14.

Data Acquisition and Preprocessing in Studies on Humans: What Is Not Taught in Statistics Classes?

Zhu, Yeyi; Hernandez, Ladia M; Mueller, Peter; Dong, Yongquan; Forman, Michele R.

Am Stat ; 67(4): 235-241, 2013.

Article in English | MEDLINE | ID: mdl-24511148

ABSTRACT

The aim of this paper is to address issues in research that may be missing from statistics classes and important for (bio-)statistics students. In the context of a case study, we discuss data acquisition and preprocessing steps that fill the gap between research questions posed by subject matter scientists and statistical methodology for formal inference. Issues include participant recruitment, data collection training and standardization, variable coding, data review and verification, data cleaning and editing, and documentation. Despite the critical importance of these details in research, most of these issues are rarely discussed in an applied statistics program. One reason for the lack of more formal training is the difficulty in addressing the many challenges that can possibly arise in the course of a study in a systematic way. This article can help to bridge this gap between research questions and formal statistical inference by using an illustrative case study for a discussion. We hope that reading and discussing this paper and practicing data preprocessing exercises will sensitize statistics students to these important issues and achieve optimal conduct, quality control, analysis, and interpretation of a study.

15.

Modelos de predição para sobrevivência de plantas de Eucalyptus grandis / Prediction models of Eucalyptus grandis plant survival

Custódio, Telde Natel; Barbin, Décio.

Ciênc. agrotec., (Impr.) ; 33(spe): 1948-1952, 2009. tab, ilus

Article in Portuguese | LILACS | ID: lil-542350

ABSTRACT

Objetivou-se com este trabalho comparar modelos de predição de plantas sobreviventes de Eucalyptus grandis. Utilizaram-se os seguintes modelos: modelo linear misto com os dados transformados, utilizando-se as transformações angular e BOX-COX; modelo linear generalizado misto com distribuição binomial e funções de ligação logística, probit e complemento log-log; modelo linear generalizado misto com distribuição Poisson e função de ligação logarítmica. Os dados são provenientes de um experimento em blocos ao acaso, para avaliação de progênies maternas de Eucalyptus grandis, aos 5 anos de idade, em que a variável resposta são plantas sobreviventes. Para comparação dos efeitos entre os modelos foram estimadas as correlações de Spearman e aplicado o teste de permutação de Fisher. Foi possível concluir que, o modelo linear generalizado misto com distribuição Poisson e função de ligação logarítmica se ajustou mal aos dados e que as estimativas para os efeitos fixos e predição para os efeitos aleatórios, não se diferenciaram entre os demais modelos estudados.

The objective of this work was to compare models for prediction of the survival of plants of Eucalyptus grandis. The following models were used: linear mixed model with the transformed data, by utilizing the angular transformations and BOX-COX; generalized linear mixed model with binomial distribution and logistic functions, probit and complement log-log links; generalized linear mixed model with Poisson distribution and logarithmic link function. The data came from a randomized block experiment for evaluation of Eucalyptus grandis maternal progenies at five years old, in which the variable response are surviving plants. For comparison of the effects among the models the correlations of Spearman were estimated and the test of permutation of Fisher was applied. It was possible to conclude that: the generalized linear mixed model with Poisson distribution and logarithmic link function misadjusted to the data; the estimates for the fixed effects and prediction for the random effects did not differ among the to other studied models.

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL