Search | VHL Regional Portal

Assessing the effects of hyperparameters on knowledge graph embedding quality.

Lloyd, Oliver; Liu, Yi; R Gaunt, Tom.

J Big Data ; 10(1): 59, 2023.

Article in English | MEDLINE | ID: mdl-37168524

ABSTRACT

Embedding knowledge graphs into low-dimensional spaces is a popular method for applying approaches, such as link prediction or node classification, to these databases. This embedding process is very costly in terms of both computational time and space. Part of the reason for this is the optimisation of hyperparameters, which involves repeatedly sampling, by random, guided, or brute-force selection, from a large hyperparameter space and testing the resulting embeddings for their quality. However, not all hyperparameters in this search space will be equally important. In fact, with prior knowledge of the relative importance of the hyperparameters, some could be eliminated from the search altogether without significantly impacting the overall quality of the outputted embeddings. To this end, we ran a Sobol sensitivity analysis to evaluate the effects of tuning different hyperparameters on the variance of embedding quality. This was achieved by performing thousands of embedding trials, each time measuring the quality of embeddings produced by different hyperparameter configurations. We regressed the embedding quality on those hyperparameter configurations, using this model to generate Sobol sensitivity indices for each of the hyperparameters. By evaluating the correlation between Sobol indices, we find substantial variability in the hyperparameter sensitivities between knowledge graphs with differing dataset characteristics as the probable cause of these inconsistencies. As an additional contribution of this work we identify several relations in the UMLS knowledge graph that may cause data leakage via inverse relations, and derive and present UMLS-43, a leakage-robust variant of that graph. Supplementary Information: The online version contains supplementary material available at 10.1186/s40537-023-00732-5.

Erratum to: EpiGraphDB: a database and data mining platform for health data science.

Liu, Yi; Elsworth, Benjamin; Erola, Pau; Haberland, Valeriia; Hemani, Gibran; Lyon, Matt; Zheng, Jie; Lloyd, Oliver; Vabistsevits, Marina; Gaunt, Tom R.

Bioinformatics ; 37(2): 288, 2021 Apr 19.

Article in English | MEDLINE | ID: mdl-33693535

EpiGraphDB: a database and data mining platform for health data science.

Liu, Yi; Elsworth, Benjamin; Erola, Pau; Haberland, Valeriia; Hemani, Gibran; Lyon, Matt; Zheng, Jie; Lloyd, Oliver; Vabistsevits, Marina; Gaunt, Tom R.

Bioinformatics ; 37(9): 1304-1311, 2021 06 09.

Article in English | MEDLINE | ID: mdl-33165574

ABSTRACT

MOTIVATION: The wealth of data resources on human phenotypes, risk factors, molecular traits and therapeutic interventions presents new opportunities for population health sciences. These opportunities are paralleled by a growing need for data integration, curation and mining to increase research efficiency, reduce mis-inference and ensure reproducible research. RESULTS: We developed EpiGraphDB (https://epigraphdb.org/), a graph database containing an array of different biomedical and epidemiological relationships and an analytical platform to support their use in human population health data science. In addition, we present three case studies that illustrate the value of this platform. The first uses EpiGraphDB to evaluate potential pleiotropic relationships, addressing mis-inference in systematic causal analysis. In the second case study, we illustrate how protein-protein interaction data offer opportunities to identify new drug targets. The final case study integrates causal inference using Mendelian randomization with relationships mined from the biomedical literature to 'triangulate' evidence from different sources. AVAILABILITY AND IMPLEMENTATION: The EpiGraphDB platform is openly available at https://epigraphdb.org. Code for replicating case study results is available at https://github.com/MRCIEU/epigraphdb as Jupyter notebooks using the API, and https://mrcieu.github.io/epigraphdb-r using the R package. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Data Science , Software , Data Mining , Databases, Factual , Humans , Phenotype

Takayasu's arteritis and an elevated antistreptolysin O titre - a potentially expensive diagnostic conundrum.

Lloyd, Oliver; Lammy, Simon; Edwards, Rebecca; Laing, Robert.

JRSM Open ; 5(6): 2054270414531125, 2014 Jun.

Article in English | MEDLINE | ID: mdl-25057401

ABSTRACT

Takayasu's arteritis is a chronic large vessel vasculitis which may be associated with a false positive antistreptolysin O titre.

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL