Search | VHL Regional Portal

1.

How predictable are mass extinction events?

Foster, William J; Allen, Bethany J; Kitzmann, Niklas H; Münchmeyer, Jannes; Rettelbach, Tabea; Witts, James D; Whittle, Rowan J; Larina, Ekaterina; Clapham, Matthew E; Dunhill, Alexander M.

R Soc Open Sci ; 10(3): 221507, 2023 Mar.

Article in English | MEDLINE | ID: mdl-36938535

ABSTRACT

Many modern extinction drivers are shared with past mass extinction events, such as rapid climate warming, habitat loss, pollution and invasive species. This commonality presents a key question: can the extinction risk of species during past mass extinction events inform our predictions for a modern biodiversity crisis? To investigate if it is possible to establish which species were more likely to go extinct during mass extinctions, we applied a functional trait-based model of extinction risk using a machine learning algorithm to datasets of marine fossils for the end-Permian, end-Triassic and end-Cretaceous mass extinctions. Extinction selectivity was inferred across each individual mass extinction event, before testing whether the selectivity patterns obtained could be used to 'predict' the extinction selectivity exhibited during the other mass extinctions. Our analyses show that, despite some similarities in extinction selectivity patterns between ancient crises, the selectivity of mass extinction events is inconsistent, which leads to a poor predictive performance. This lack of predictability is attributed to evolution in marine ecosystems, particularly during the Mesozoic Marine Revolution, associated with shifts in community structure alongside coincident Earth system changes. Our results suggest that past extinctions are unlikely to be informative for predicting extinction risk during a projected mass extinction.

2.

Graph Neural Networks for Learning Molecular Excitation Spectra.

Singh, Kanishka; Münchmeyer, Jannes; Weber, Leon; Leser, Ulf; Bande, Annika.

J Chem Theory Comput ; 18(7): 4408-4417, 2022 Jul 12.

Article in English | MEDLINE | ID: mdl-35671364

ABSTRACT

Machine learning (ML) approaches have demonstrated the ability to predict molecular spectra at a fraction of the computational cost of traditional theoretical chemistry methods while maintaining high accuracy. Graph neural networks (GNNs) are particularly promising in this regard, but different types of GNNs have not yet been systematically compared. In this work, we benchmark and analyze five different GNNs for the prediction of excitation spectra from the QM9 dataset of organic molecules. We compare the GNN performance in the obvious runtime measurements, prediction accuracy, and analysis of outliers in the test set. Moreover, through TMAP clustering and statistical analysis, we are able to highlight clear hotspots of high prediction errors as well as optimal spectra prediction for molecules with certain functional groups. This in-depth benchmarking and subsequent analysis protocol lays down a recipe for comparing different ML methods and evaluating dataset quality.

Subject(s)

Machine Learning , Neural Networks, Computer

3.

HunFlair: an easy-to-use tool for state-of-the-art biomedical named entity recognition.

Weber, Leon; Sänger, Mario; Münchmeyer, Jannes; Habibi, Maryam; Leser, Ulf; Akbik, Alan.

Bioinformatics ; 37(17): 2792-2794, 2021 Sep 09.

Article in English | MEDLINE | ID: mdl-33508086

ABSTRACT

SUMMARY: Named entity recognition (NER) is an important step in biomedical information extraction pipelines. Tools for NER should be easy to use, cover multiple entity types, be highly accurate and be robust toward variations in text genre and style. We present HunFlair, a NER tagger fulfilling these requirements. HunFlair is integrated into the widely used NLP framework Flair, recognizes five biomedical entity types, reaches or overcomes state-of-the-art performance on a wide set of evaluation corpora, and is trained in a cross-corpus setting to avoid corpus-specific bias. Technically, it uses a character-level language model pretrained on roughly 24 million biomedical abstracts and three million full texts. It outperforms other off-the-shelf biomedical NER tools with an average gain of 7.26 pp over the next best tool in a cross-corpus setting and achieves on-par results with state-of-the-art research prototypes in in-corpus experiments. HunFlair can be installed with a single command and is applied with only four lines of code. Furthermore, it is accompanied by harmonized versions of 23 biomedical NER corpora. AVAILABILITY AND IMPLEMENTATION: HunFlair ist freely available through the Flair NLP framework (https://github.com/flairNLP/flair) under an MIT license and is compatible with all major operating systems. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

4.

HUNER: improving biomedical NER with pretraining.

Weber, Leon; Münchmeyer, Jannes; Rocktäschel, Tim; Habibi, Maryam; Leser, Ulf.

Bioinformatics ; 36(1): 295-302, 2020 01 01.

Article in English | MEDLINE | ID: mdl-31243432

ABSTRACT

MOTIVATION: Several recent studies showed that the application of deep neural networks advanced the state-of-the-art in named entity recognition (NER), including biomedical NER. However, the impact on performance and the robustness of improvements crucially depends on the availability of sufficiently large training corpora, which is a problem in the biomedical domain with its often rather small gold standard corpora. RESULTS: We evaluate different methods for alleviating the data sparsity problem by pretraining a deep neural network (LSTM-CRF), followed by a rather short fine-tuning phase focusing on a particular corpus. Experiments were performed using 34 different corpora covering five different biomedical entity types, yielding an average increase in F1-score of â¼2 pp compared to learning without pretraining. We experimented both with supervised and semi-supervised pretraining, leading to interesting insights into the precision/recall trade-off. Based on our results, we created the stand-alone NER tool HUNER incorporating fully trained models for five entity types. On the independent CRAFT corpus, which was not used for creating HUNER, it outperforms the state-of-the-art tools GNormPlus and tmChem by 5-13 pp on the entity types chemicals, species and genes. AVAILABILITY AND IMPLEMENTATION: HUNER is freely available at https://hu-ner.github.io. HUNER comes in containers, making it easy to install and use, and it can be applied off-the-shelf to arbitrary texts. We also provide an integrated tool for obtaining and converting all 34 corpora used in our evaluation, including fixed training, development and test splits to enable fair comparisons in the future. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Computational Biology , Neural Networks, Computer , Computational Biology/methods , Data Analysis , Software

5.

Estimating genome-wide regulatory activity from multi-omics data sets using mathematical optimization.

Trescher, Saskia; Münchmeyer, Jannes; Leser, Ulf.

BMC Syst Biol ; 11(1): 41, 2017 03 27.

Article in English | MEDLINE | ID: mdl-28347313

ABSTRACT

BACKGROUND: Gene regulation is one of the most important cellular processes, indispensable for the adaptability of organisms and closely interlinked with several classes of pathogenesis and their progression. Elucidation of regulatory mechanisms can be approached by a multitude of experimental methods, yet integration of the resulting heterogeneous, large, and noisy data sets into comprehensive and tissue or disease-specific cellular models requires rigorous computational methods. Recently, several algorithms have been proposed which model genome-wide gene regulation as sets of (linear) equations over the activity and relationships of transcription factors, genes and other factors. Subsequent optimization finds those parameters that minimize the divergence of predicted and measured expression intensities. In various settings, these methods produced promising results in terms of estimating transcription factor activity and identifying key biomarkers for specific phenotypes. However, despite their common root in mathematical optimization, they vastly differ in the types of experimental data being integrated, the background knowledge necessary for their application, the granularity of their regulatory model, the concrete paradigm used for solving the optimization problem and the data sets used for evaluation. RESULTS: Here, we review five recent methods of this class in detail and compare them with respect to several key properties. Furthermore, we quantitatively compare the results of four of the presented methods based on publicly available data sets. CONCLUSIONS: The results show that all methods seem to find biologically relevant information. However, we also observe that the mutual result overlaps are very low, which contradicts biological intuition. Our aim is to raise further awareness of the power of these methods, yet also to identify common shortcomings and necessary extensions enabling focused research on the critical points.

Subject(s)

Genomics/methods , Algorithms , Databases, Genetic , RNA, Messenger/genetics , RNA, Messenger/metabolism

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL