Search | VHL Regional Portal

The importance of graph databases and graph learning for clinical applications.

Walke, Daniel; Micheel, Daniel; Schallert, Kay; Muth, Thilo; Broneske, David; Saake, Gunter; Heyer, Robert.

Database (Oxford) ; 20232023 07 10.

Article in English | MEDLINE | ID: mdl-37428679

ABSTRACT

The increasing amount and complexity of clinical data require an appropriate way of storing and analyzing those data. Traditional approaches use a tabular structure (relational databases) for storing data and thereby complicate storing and retrieving interlinked data from the clinical domain. Graph databases provide a great solution for this by storing data in a graph as nodes (vertices) that are connected by edges (links). The underlying graph structure can be used for the subsequent data analysis (graph learning). Graph learning consists of two parts: graph representation learning and graph analytics. Graph representation learning aims to reduce high-dimensional input graphs to low-dimensional representations. Then, graph analytics uses the obtained representations for analytical tasks like visualization, classification, link prediction and clustering which can be used to solve domain-specific problems. In this survey, we review current state-of-the-art graph database management systems, graph learning algorithms and a variety of graph applications in the clinical domain. Furthermore, we provide a comprehensive use case for a clearer understanding of complex graph learning algorithms. Graphical abstract.

Subject(s)

Algorithms , Database Management Systems , Databases, Factual , Cluster Analysis

Decision tree learning in Neo4j on homogeneous and unconnected graph nodes from biological and clinical datasets.

Mondal, Rahul; Do, Minh Dung; Ahmed, Nasim Uddin; Walke, Daniel; Micheel, Daniel; Broneske, David; Saake, Gunter; Heyer, Robert.

BMC Med Inform Decis Mak ; 22(Suppl 6): 347, 2023 03 06.

Article in English | MEDLINE | ID: mdl-36879243

ABSTRACT

BACKGROUND: Graph databases enable efficient storage of heterogeneous, highly-interlinked data, such as clinical data. Subsequently, researchers can extract relevant features from these datasets and apply machine learning for diagnosis, biomarker discovery, or understanding pathogenesis. METHODS: To facilitate machine learning and save time for extracting data from the graph database, we developed and optimized Decision Tree Plug-in (DTP) containing 24 procedures to generate and evaluate decision trees directly in the graph database Neo4j on homogeneous and unconnected nodes. RESULTS: Creation of the decision tree for three clinical datasets directly in the graph database from the nodes required between 0.059 and 0.099 s, while calculating the decision tree with the same algorithm in Java from CSV files took 0.085-0.112 s. Furthermore, our approach was faster than the standard decision tree implementations in R (0.62 s) and equal to Python (0.08 s), also using CSV files as input for small datasets. In addition, we have explored the strengths of DTP by evaluating a large dataset (approx. 250,000 instances) to predict patients with diabetes and compared the performance against algorithms generated by state-of-the-art packages in R and Python. By doing so, we have been able to show competitive results on the performance of Neo4j, in terms of quality of predictions as well as time efficiency. Furthermore, we could show that high body-mass index and high blood pressure are the main risk factors for diabetes. CONCLUSION: Overall, our work shows that integrating machine learning into graph databases saves time for additional processes as well as external memory, and could be applied to a variety of use cases, including clinical applications. This provides user with the advantages of high scalability, visualization and complex querying.

Subject(s)

Algorithms , Biomedical Research , Humans , Body Mass Index , Databases, Factual , Decision Trees

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL