Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 4 de 4
Filter
Add more filters










Database
Language
Publication year range
1.
Biom J ; 66(1): e2200238, 2024 Jan.
Article in English | MEDLINE | ID: mdl-36999395

ABSTRACT

The constant development of new data analysis methods in many fields of research is accompanied by an increasing awareness that these new methods often perform better in their introductory paper than in subsequent comparison studies conducted by other researchers. We attempt to explain this discrepancy by conducting a systematic experiment that we call "cross-design validation of methods". In the experiment, we select two methods designed for the same data analysis task, reproduce the results shown in each paper, and then reevaluate each method based on the study design (i.e., datasets, competing methods, and evaluation criteria) that was used to show the abilities of the other method. We conduct the experiment for two data analysis tasks, namely cancer subtyping using multiomic data and differential gene expression analysis. Three of the four methods included in the experiment indeed perform worse when they are evaluated on the new study design, which is mainly caused by the different datasets. Apart from illustrating the many degrees of freedom existing in the assessment of a method and their effect on its performance, our experiment suggests that the performance discrepancies between original and subsequent papers may not only be caused by the nonneutrality of the authors proposing the new method but also by differences regarding the level of expertise and field of application. Authors of new methods should thus focus not only on a transparent and extensive evaluation but also on comprehensive method documentation that enables the correct use of their methods in subsequent studies.


Subject(s)
Research Design
2.
PLoS Comput Biol ; 19(1): e1010820, 2023 01.
Article in English | MEDLINE | ID: mdl-36608142

ABSTRACT

In recent years, unsupervised analysis of microbiome data, such as microbial network analysis and clustering, has increased in popularity. Many new statistical and computational methods have been proposed for these tasks. This multiplicity of analysis strategies poses a challenge for researchers, who are often unsure which method(s) to use and might be tempted to try different methods on their dataset to look for the "best" ones. However, if only the best results are selectively reported, this may cause over-optimism: the "best" method is overly fitted to the specific dataset, and the results might be non-replicable on validation data. Such effects will ultimately hinder research progress. Yet so far, these topics have been given little attention in the context of unsupervised microbiome analysis. In our illustrative study, we aim to quantify over-optimism effects in this context. We model the approach of a hypothetical microbiome researcher who undertakes four unsupervised research tasks: clustering of bacterial genera, hub detection in microbial networks, differential microbial network analysis, and clustering of samples. While these tasks are unsupervised, the researcher might still have certain expectations as to what constitutes interesting results. We translate these expectations into concrete evaluation criteria that the hypothetical researcher might want to optimize. We then randomly split an exemplary dataset from the American Gut Project into discovery and validation sets multiple times. For each research task, multiple method combinations (e.g., methods for data normalization, network generation, and/or clustering) are tried on the discovery data, and the combination that yields the best result according to the evaluation criterion is chosen. While the hypothetical researcher might only report this result, we also apply the "best" method combination to the validation dataset. The results are then compared between discovery and validation data. In all four research tasks, there are notable over-optimism effects; the results on the validation data set are worse compared to the discovery data, averaged over multiple random splits into discovery/validation data. Our study thus highlights the importance of validation and replication in microbiome analysis to obtain reliable results and demonstrates that the issue of over-optimism goes beyond the context of statistical testing and fishing for significance.


Subject(s)
Microbiota , Machine Learning , Microbial Consortia , Bacteria , Cluster Analysis
3.
J Clin Med ; 11(5)2022 Feb 22.
Article in English | MEDLINE | ID: mdl-35268268

ABSTRACT

BACKGROUND: Kawasaki Disease (KD) is a generalized vasculitis in childhood with possible long-term impact on cardiovascular health besides the presence of coronary artery lesions. Standard vascular parameters such as carotid intima-media thickness (cIMT) have not been established as reliable markers of vascular anomalies after KD. The carotid intima-media roughness (cIMR) representing carotid intimal surface structure is considered a promising surrogate marker for predicting cardiovascular risk even beyond cIMT. We therefore measured cIMR in patients with a history of KD in comparison to healthy controls to investigate whether KD itself and/or KD key clinical aspects are associated with cIMR alterations in the long-term. METHODS: We assessed cIMR in this case-control study (44 KD, mean age in years (SD); 13.4 (7.5); 36 controls, mean age 12.1 (5.3)) approximately matched by sex and age. Different clinical outcomes such as the coronary artery status and acute phase inflammation data were analyzed in association with cIMR values. RESULTS: When comparing all patients with KD to healthy controls, we detected no significant difference in cIMR. None of the clinical parameters indicating the disease severity, such as the persistence of coronary artery aneurysm, were significantly associated with our cIMR values. However, according to our marginally significant findings (p = 0.044), we postulate that the end-diastolic cIMR may be rougher than the end-systolic values in KD patients. CONCLUSIONS: We detected no significant differences in cIMR between KD patients and controls that could confirm any evidence that KD predisposes patients to a subsequent general arteriopathy. Our results, however, need to be interpreted in the light of the low number of study participants.

4.
Proc Natl Acad Sci U S A ; 117(30): 17680-17687, 2020 07 28.
Article in English | MEDLINE | ID: mdl-32665436

ABSTRACT

Smartphones enjoy high adoption rates around the globe. Rarely more than an arm's length away, these sensor-rich devices can easily be repurposed to collect rich and extensive records of their users' behaviors (e.g., location, communication, media consumption), posing serious threats to individual privacy. Here we examine the extent to which individuals' Big Five personality dimensions can be predicted on the basis of six different classes of behavioral information collected via sensor and log data harvested from smartphones. Taking a machine-learning approach, we predict personality at broad domain ([Formula: see text] = 0.37) and narrow facet levels ([Formula: see text] = 0.40) based on behavioral data collected from 624 volunteers over 30 consecutive days (25,347,089 logging events). Our cross-validated results reveal that specific patterns in behaviors in the domains of 1) communication and social behavior, 2) music consumption, 3) app usage, 4) mobility, 5) overall phone activity, and 6) day- and night-time activity are distinctively predictive of the Big Five personality traits. The accuracy of these predictions is similar to that found for predictions based on digital footprints from social media platforms and demonstrates the possibility of obtaining information about individuals' private traits from behavioral patterns passively collected from their smartphones. Overall, our results point to both the benefits (e.g., in research settings) and dangers (e.g., privacy implications, psychological targeting) presented by the widespread collection and modeling of behavioral data obtained from smartphones.


Subject(s)
Machine Learning , Personality , Smartphone , Social Behavior , Humans , Models, Theoretical , Privacy , Quantitative Trait, Heritable , Reproducibility of Results
SELECTION OF CITATIONS
SEARCH DETAIL
...