Search | VHL Regional Portal

Show: 20 | 50 | 100

Results 1 - 10 de 10

Filter

clusterBMA: Bayesian model averaging for clustering.

Forbes, Owen; Santos-Fernandez, Edgar; Wu, Paul Pao-Yen; Xie, Hong-Bo; Schwenn, Paul E; Lagopoulos, Jim; Mills, Lia; Sacks, Dashiell D; Hermens, Daniel F; Mengersen, Kerrie.

PLoS One ; 18(8): e0288000, 2023.

Article in English | MEDLINE | ID: mdl-37603575

ABSTRACT

Various methods have been developed to combine inference across multiple sets of results for unsupervised clustering, within the ensemble clustering literature. The approach of reporting results from one 'best' model out of several candidate clustering models generally ignores the uncertainty that arises from model selection, and results in inferences that are sensitive to the particular model and parameters chosen. Bayesian model averaging (BMA) is a popular approach for combining results across multiple models that offers some attractive benefits in this setting, including probabilistic interpretation of the combined cluster structure and quantification of model-based uncertainty. In this work we introduce clusterBMA, a method that enables weighted model averaging across results from multiple unsupervised clustering algorithms. We use clustering internal validation criteria to develop an approximation of the posterior model probability, used for weighting the results from each model. From a combined posterior similarity matrix representing a weighted average of the clustering solutions across models, we apply symmetric simplex matrix factorisation to calculate final probabilistic cluster allocations. In addition to outperforming other ensemble clustering methods on simulated data, clusterBMA offers unique features including probabilistic allocation to averaged clusters, combining allocation probabilities from 'hard' and 'soft' clustering algorithms, and measuring model-based uncertainty in averaged cluster allocation. This method is implemented in an accompanying R package of the same name. We use simulated datasets to explore the ability of the proposed technique to identify robust integrated clusters with varying levels of separation between subgroups, and with varying numbers of clusters between models. Benchmarking accuracy against four other ensemble methods previously demonstrated to be highly effective in the literature, clusterBMA matches or exceeds the performance of competing approaches under various conditions of dimensionality and cluster separation. clusterBMA substantially outperformed other ensemble methods for high dimensional simulated data with low cluster separation, with 1.16 to 7.12 times better performance as measured by the Adjusted Rand Index. We also explore the performance of this approach through a case study that aims to identify probabilistic clusters of individuals based on electroencephalography (EEG) data. In applied settings for clustering individuals based on health data, the features of probabilistic allocation and measurement of model-based uncertainty in averaged clusters are useful for clinical relevance and statistical communication.

Subject(s)

Algorithms , Benchmarking , Humans , Bayes Theorem , Clinical Relevance , Cluster Analysis

Next-Generation Models for Predicting Winning Times in Elite Swimming Events: Updated Predictions for the Paris 2024 Olympic Games.

Mujika, Iñigo; Pyne, David B; Wu, Paul Pao-Yen; Ng, Kwok; Crowley, Emmet; Powell, Cormac.

Int J Sports Physiol Perform ; 18(11): 1269-1274, 2023 Nov 01.

Article in English | MEDLINE | ID: mdl-37487585

ABSTRACT

PURPOSE: To evaluate statistical models developed for predicting medal-winning performances for international swimming events and generate updated performance predictions for the Paris 2024 Olympic Games. METHODS: The performance of 2 statistical models developed for predicting international swimming performances was evaluated. The first model employed linear regression and forecasting to examine performance trends among medal winners, finalists, and semifinalists over an 8-year period. A machine-learning algorithm was used to generate time predictions for each individual event for the Paris 2024 Olympic Games. The second model was a Bayesian framework and comprised an autoregressive term (the previous winning time), moving average (past 3 events), and covariates for stroke, gender, distance, and type of event (World Championships vs Olympic Games). To examine the accuracy of the predictions from both models, the mean absolute error was determined between the predicted times for the Budapest 2022 World Championships and the actual results from said championships. RESULTS: The mean absolute error for prediction of swimming performances was 0.80% for the linear-regression machine-learning model and 0.85% for the Bayesian model. The predicted times and actual times from the Budapest 2022 World Championships were highly correlated (r = .99 for both approaches). CONCLUSIONS: These models, and associated predictions for swimming events at the Paris 2024 Olympic Games, provide an evidence-based performance framework for coaches, sport-science support staff, and athletes to develop and evaluate training plans, strategies, and tactics, as well as informing resource allocation to athletes, based on their potential for the Paris 2024 Olympic Games.

Subject(s)

Athletes , Swimming , Humans , Bayes Theorem , Paris , Linear Models

Classifying ball trajectories in invasion sports using dynamic time warping: A basketball case study.

Yu, Yu Yi; Wu, Paul Pao-Yen; Mengersen, Kerrie; Hobbs, Wade.

PLoS One ; 17(10): e0272848, 2022.

Article in English | MEDLINE | ID: mdl-36264879

ABSTRACT

Comparison and classification of ball trajectories can provide insight to support coaches and players in analysing their plays or opposition plays. This is challenging due to the innate variability and uncertainty of ball trajectories in space and time. We propose a framework based on Dynamic Time Warping (DTW) to cluster, compare and characterise trajectories in relation to play outcomes. Seventy-two international women's basketball games were analysed, where features such as ball trajectory, possession time and possession outcome were recorded. DTW was used to quantify the alignment-adjusted distance between three dimensional (two spatial, one temporal) trajectories. This distance, along with final location for the play (usually the shot), was then used to cluster trajectories. These clusters supported the conventional wisdom of higher scoring rates for fast breaks, but also identified other contextual factors affecting scoring rate, including bias towards one side of the court. In addition, some high scoring rate clusters were associated with greater mean change in the direction of ball movement, supporting the notion of entropy affecting effectiveness. Coaches and other end users could use such a framework to help make better use of their time by honing in on groups of effective or problematic plays for manual video analysis, for both their team and when scouting opponent teams and suggests new predictors for machine learning to analyse and predict trajectory-based sports.

Subject(s)

Athletic Performance , Basketball , Humans , Female , Movement , Entropy , Machine Learning

Guidelines for model adaptation: A study of the transferability of a general seagrass ecosystem Dynamic Bayesian Networks model.

Hatum, Paula Sobenko; McMahon, Kathryn; Mengersen, Kerrie; Wu, Paul Pao-Yen.

Ecol Evol ; 12(8): e9172, 2022 Aug.

Article in English | MEDLINE | ID: mdl-35949537

ABSTRACT

In general, it is not feasible to collect enough empirical data to capture the entire range of processes that define a complex system, either intrinsically or when viewing the system from a different geographical or temporal perspective. In this context, an alternative approach is to consider model transferability, which is the act of translating a model built for one environment to another less well-known situation. Model transferability and adaptability may be extremely beneficial-approaches that aid in the reuse and adaption of models, particularly for sites with limited data, would benefit from widespread model uptake. Besides the reduced effort required to develop a model, data collection can be simplified when transferring a model to a different application context. The research presented in this paper focused on a case study to identify and implement guidelines for model adaptation. Our study adapted a general Dynamic Bayesian Networks (DBN) of a seagrass ecosystem to a new location where nodes were similar, but the conditional probability tables varied. We focused on two species of seagrass (Zostera noltei and Zostera marina) located in Arcachon Bay, France. Expert knowledge was used to complement peer-reviewed literature to identify which components needed adjustment including parameterization and quantification of the model and desired outcomes. We adopted both linguistic labels and scenario-based elicitation to elicit from experts the conditional probabilities used to quantify the DBN. Following the proposed guidelines, the model structure of the general DBN was retained, but the conditional probability tables were adapted for nodes that characterized the growth dynamics in Zostera spp. population located in Arcachon Bay, as well as the seasonal variation on their reproduction. Particular attention was paid to the light variable as it is a crucial driver of growth and physiology for seagrasses. Our guidelines provide a way to adapt a general DBN to specific ecosystems to maximize model reuse and minimize re-development effort. Especially important from a transferability perspective are guidelines for ecosystems with limited data, and how simulation and prior predictive approaches can be used in these contexts.

EEG-based clusters differentiate psychological distress, sleep quality and cognitive function in adolescents.

Forbes, Owen; Schwenn, Paul E; Wu, Paul Pao-Yen; Santos-Fernandez, Edgar; Xie, Hong-Bo; Lagopoulos, Jim; McLoughlin, Larisa T; Sacks, Dashiell D; Mengersen, Kerrie; Hermens, Daniel F.

Biol Psychol ; 173: 108403, 2022 09.

Article in English | MEDLINE | ID: mdl-35908602

ABSTRACT

INTRODUCTION: To better understand the relationships between neurophysiology, cognitive function and psychopathology risk in adolescence there is value in identifying data-driven subgroups based on measurements of brain activity and function, and then comparing cognition and mental health between such subgroups. METHODS: We developed a flexible and scaleable multi-stage analysis pipeline to identify data-driven clusters of 12-year-olds (M = 12.64, SD = 0.32) based on frequency characteristics calculated from resting state, eyes-closed electroencephalography (EEG) recordings. For this preliminary cross-sectional study, EEG data was collected from 59 individuals in the Longitudinal Adolescent Brain Study (LABS) being undertaken in Queensland, Australia. Applying multiple unsupervised clustering algorithms to these EEG features, we identified well-separated subgroups of individuals. To study patterns of difference in cognitive function and mental health symptoms between clusters, we applied Bayesian regression models to probabilistically identify differences in these measures between clusters. RESULTS: We identified 5 core clusters associated with distinct subtypes of resting state EEG frequency content. Bayesian models demonstrated substantial differences in psychological distress, sleep quality and cognitive function between clusters. By examining associations between neurophysiology and health measures across clusters, we have identified preliminary risk and protective profiles linked to EEG characteristics. CONCLUSION: This method provides the potential to identify neurophysiological subgroups of adolescents in the general population based on resting state EEG, and associated patterns of health and cognition that are not observed at the whole group level. This approach offers potential utility in clinical risk prediction for mental and cognitive health outcomes throughout adolescent development.

Subject(s)

Psychological Distress , Sleep Quality , Adolescent , Bayes Theorem , Brain/physiology , Cognition , Cross-Sectional Studies , Electroencephalography/methods , Humans

Bayesian prediction of winning times for elite swimming events.

Wu, Paul Pao-Yen; Garufi, Lawrence; Drovandi, Christopher; Mengersen, Kerrie; Mitchell, Lachlan J G; Osborne, Mark A; Pyne, David B.

J Sports Sci ; 40(1): 24-31, 2022 Jan.

Article in English | MEDLINE | ID: mdl-34544331

ABSTRACT

To develop a statistical model of winning times for international swimming events with the aim of predicting winning time distributions and the probability of winning for the 2020 and 2024 Olympic Games. The data set included first and third place times from all individual swimming events from the Olympics and World Championships from 1990 to 2019. We compared different model formulations fitted with Bayesian inference to obtain predictive distributions; comparisons were based on mean percentage error in out-of-sample predictions of Olympics and World Championships winning swim times from 2011 to 2019. The Bayesian time series regression model, comprising auto-regressive and moving average terms and other predictors, had the smallest mean prediction error of 0.57% (CI 0.46-0.74%). For context, using the respective previous Olympics or World Championships winning time resulted in a mean prediction error of 0.70% (CI 0.59-0.82%). The Olympics were on average 0.5% (CI 0.3-0.7%) faster than World Championships over the study period. The model computes the posterior predictive distribution, which allows coaches and athletes to evaluate the probability of winning given an individual's swim time, and the probability of being faster or slower than the previous winning time or even the world record.

Subject(s)

Competitive Behavior , Swimming , Athletes , Bayes Theorem , Humans , Time Factors

Predicting performance in 4 x 200-m freestyle swimming relay events.

Wu, Paul Pao-Yen; Babaei, Toktam; O'Shea, Michael; Mengersen, Kerrie; Drovandi, Christopher; McGibbon, Katie E; Pyne, David B; Mitchell, Lachlan J G; Osborne, Mark A.

PLoS One ; 16(7): e0254538, 2021.

Article in English | MEDLINE | ID: mdl-34265006

ABSTRACT

AIM: The aim was to predict and understand variations in swimmer performance between individual and relay events, and develop a predictive model for the 4x200-m swimming freestyle relay event to help inform team selection and strategy. DATA AND METHODS: Race data for 716 relay finals (4 x 200-m freestyle) from 14 international competitions between 2010-2018 were analysed. Individual 200-m freestyle season best time for the same year was located for each swimmer. Linear regression and machine learning was applied to 4 x 200-m swimming freestyle relay events. RESULTS: Compared to the individual event, the lowest ranked swimmer in the team (-0.62 s, CI = [-0.94, -0.30]) and American swimmers (-0.48 s [-0.89, -0.08]) typically swam faster 200-m times in relay events. Random forest models predicted gold, silver, bronze and non-medal with 100%, up to 41%, up to 63%, and 93% sensitivity, respectively. DISCUSSION: Team finishing position was strongly associated with the differential time to the fastest team (mean decrease in Gini (MDG) when this variable was omitted = 31.3), world rankings of team members (average ranking MDG of 18.9), and the order of swimmers (MDG = 6.9). Differential times are based on the sum of individual swimmer's season's best times, and along with world rankings, reflect team strength. In contrast, the order of swimmers reflects strategy. This type of analysis could assist coaches and support staff in selecting swimmers and team orders for relay events to enhance the likelihood of success.

Subject(s)

Competitive Behavior , Swimming , Athletic Performance

Bayesian hierarchical modelling of basketball tracking data - a case study of spatial entropy and spatial effectiveness.

Hobbs, Wade; Wu, Paul Pao-Yen; Gorman, Adam D; Mooney, Mitchell; Freeston, Jonathan.

J Sports Sci ; 38(8): 886-896, 2020 Apr.

Article in English | MEDLINE | ID: mdl-32122274

ABSTRACT

Spatio-temporal data in sport is increasing rapidly, however suitable statistical methods for analysing this data are underdeveloped. The current study establishes the need for spatial statistical methods, propose a Bayesian hierarchical model as an appropriate method for comparing spatial variables, and test this model across three spatial scales. The need for spatial statistical methods was established through the identification of spatial autocorrelation. This necessitated the use of a Bayesian hierarchical model to test for an association between spatial ball movement entropy and spatial effectiveness. Posterior distribution results showed a generally positive association such that increases in entropy were associated with increases in effectiveness. The strength and confidence of the associations were impacted by the spatial scale, with the 6 × 6 grid showing the most conclusive evidence of a positive relationship; the 4 × 4 grid was mostly positive, however with a large variation; and finally, the basket-centric scale results were less conclusive. The results of the current study demonstrate the suitability of a Bayesian hierarchical model for testing for associations or differences between spatial variables. With the increase in spatial analyses in sport, this study presents an appropriate statistical method for dealing with complex problems associated with spatial analyses.

Subject(s)

Basketball/physiology , Basketball/statistics & numerical data , Bayes Theorem , Entropy , Female , Humans , Movement , Spatial Analysis , Sports Equipment

Predicting fatigue using countermovement jump force-time signatures: PCA can distinguish neuromuscular versus metabolic fatigue.

Wu, Paul Pao-Yen; Sterkenburg, Nicholas; Everett, Kirsten; Chapman, Dale W; White, Nicole; Mengersen, Kerrie.

PLoS One ; 14(7): e0219295, 2019.

Article in English | MEDLINE | ID: mdl-31291303

ABSTRACT

PURPOSE: This study investigated the relationship between the ground reaction force-time profile of a countermovement jump (CMJ) and fatigue, specifically focusing on predicting the onset of neuromuscular versus metabolic fatigue using the CMJ. METHOD: Ten recreational athletes performed 5 CMJs at time points prior to, immediately following, and at 0.5, 1, 3, 6, 24 and 48 h after training, which comprised repeated sprint sessions of low, moderate, or high workloads. Features of the concentric portion of the CMJ force-time signature at the measurement time points were analysed using Principal Components Analysis (PCA) and functional PCA (fPCA) to better understand fatigue onset given training workload. In addition, Linear Mixed Effects (LME) models were developed to predict the onset of fatigue. RESULTS: The first two Principal Components (PCs) using PCA explained 68% of the variation in CMJ features, capturing variation between athletes through weighted combinations of force, concentric time and power. The next two PCs explained 9.9% of the variation and revealed fatigue effects between 6 to 48 h after training for PC3, and contrasting neuromuscular and metabolic fatigue effects in PC4. fPCA supported these findings and further revealed contrasts between metabolic and neuromuscular fatigue effects in the first and second half of the force-time curve in PC3, and a double peak effect in PC4. Subsequently, CMJ measurements up to 0.5 h after training were used to predict relative peak CMJ force, with mean squared errors of 0.013 and 0.015 at 6 and 48 h corresponding to metabolic and neuromuscular fatigue. CONCLUSION: The CMJ was found to provide a strong predictor of neuromuscular and metabolic fatigue, after accounting for force, concentric time and power. This method can be used to assist coaches to individualise future training based on CMJ response to the immediate session.

Subject(s)

Athletic Performance/physiology , Football/physiology , Muscle Fatigue/physiology , Muscle Strength/physiology , Adult , Athletes , Exercise Test , Humans , Male , Neuromuscular Monitoring/methods , Plyometric Exercise , Principal Component Analysis , Young Adult

10.

Timing anthropogenic stressors to mitigate their impact on marine ecosystem resilience.

Wu, Paul Pao-Yen; Mengersen, Kerrie; McMahon, Kathryn; Kendrick, Gary A; Chartrand, Kathryn; York, Paul H; Rasheed, Michael A; Caley, M Julian.

Nat Commun ; 8(1): 1263, 2017 11 02.

Article in English | MEDLINE | ID: mdl-29093493

ABSTRACT

Better mitigation of anthropogenic stressors on marine ecosystems is urgently needed to address increasing biodiversity losses worldwide. We explore opportunities for stressor mitigation using whole-of-systems modelling of ecological resilience, accounting for complex interactions between stressors, their timing and duration, background environmental conditions and biological processes. We then search for ecological windows, times when stressors minimally impact ecological resilience, defined here as risk, recovery and resistance. We show for 28 globally distributed seagrass meadows that stressor scheduling that exploits ecological windows for dredging campaigns can achieve up to a fourfold reduction in recovery time and 35% reduction in extinction risk. Although the timing and length of windows vary among sites to some degree, global trends indicate favourable windows in autumn and winter. Our results demonstrate that resilience is dynamic with respect to space, time and stressors, varying most strongly with: (i) the life history of the seagrass genus and (ii) the duration and timing of the impacting stress.

Subject(s)

Alismatales/physiology , Ecosystem , Oceans and Seas , Stress, Physiological/physiology , Bayes Theorem , Biodiversity , Ecology , Hydrocharitaceae/physiology , Time Factors , Zosteraceae/physiology

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL