Search | VHL Regional Portal

Large language model based framework for automated extraction of genetic interactions from unstructured data.

Gill, Jaskaran Kaur; Chetty, Madhu; Lim, Suryani; Hallinan, Jennifer.

PLoS One ; 19(5): e0303231, 2024.

Article in English | MEDLINE | ID: mdl-38771886

ABSTRACT

Extracting biological interactions from published literature helps us understand complex biological systems, accelerate research, and support decision-making in drug or treatment development. Despite efforts to automate the extraction of biological relations using text mining tools and machine learning pipelines, manual curation continues to serve as the gold standard. However, the rapidly increasing volume of literature pertaining to biological relations poses challenges in its manual curation and refinement. These challenges are further compounded because only a small fraction of the published literature is relevant to biological relation extraction, and the embedded sentences of relevant sections have complex structures, which can lead to incorrect inference of relationships. To overcome these challenges, we propose GIX, an automated and robust Gene Interaction Extraction framework, based on pre-trained Large Language models fine-tuned through extensive evaluations on various gene/protein interaction corpora including LLL and RegulonDB. GIX identifies relevant publications with minimal keywords, optimises sentence selection to reduce computational overhead, simplifies sentence structure while preserving meaning, and provides a confidence factor indicating the reliability of extracted relations. GIX's Stage-2 relation extraction method performed well on benchmark protein/gene interaction datasets, assessed using 10-fold cross-validation, surpassing state-of-the-art approaches. We demonstrated that the proposed method, although fully automated, performs as well as manual relation extraction, with enhanced robustness. We also observed GIX's capability to augment existing datasets with new sentences, incorporating newly discovered biological terms and processes. Further, we demonstrated GIX's real-world applicability in inferring E. coli gene circuits.

Subject(s)

Data Mining , Data Mining/methods , Natural Language Processing , Machine Learning , Computational Biology/methods , Humans , Algorithms

Physics-Informed Explainable Continual Learning on Graphs.

Peng, Ciyuan; Tang, Tao; Yin, Qiuyang; Bai, Xiaomei; Lim, Suryani; Aggarwal, Charu C.

IEEE Trans Neural Netw Learn Syst ; PP2024 Jan 10.

Article in English | MEDLINE | ID: mdl-38198265

ABSTRACT

Temporal graph learning has attracted great attention with its ability to deal with dynamic graphs. Although current methods are reasonably accurate, most of them are unexplainable due to their black-box nature. It remains a challenge to explain how temporal graph learning models adapt to information evolution. Furthermore, with the increasing application of artificial intelligence in various scientific domains, such as chemistry and biomedicine, the importance of delivering not only precise outcomes but also offering explanations regarding the learning models becomes paramount. This transparency aids users in comprehending the decision-making procedures and instills greater confidence in the generated models. To address this issue, this article proposes a novel physics-informed explainable continual learning (PiECL), focusing on temporal graphs. Our proposed method utilizes physical and mathematical algorithms to quantify the disturbance of new data to previous knowledge for obtaining changed information over time. As the proposed model is based on theories in physics, it can provide a transparent underlying mechanism for information evolution detection, thus enhancing explainability. The experimental results on three real-world datasets demonstrate that PiECL can explain the learning process, and the generated model outperforms other state-of-the-art methods. PiECL shows tremendous potential for explaining temporal graph learning in various scientific contexts.

MICFuzzy: A maximal information content based fuzzy approach for reconstructing genetic networks.

Nakulugamuwa Gamage, Hasini; Chetty, Madhu; Lim, Suryani; Hallinan, Jennifer.

PLoS One ; 18(7): e0288174, 2023.

Article in English | MEDLINE | ID: mdl-37418430

ABSTRACT

In systems biology, the accurate reconstruction of Gene Regulatory Networks (GRNs) is crucial since these networks can facilitate the solving of complex biological problems. Amongst the plethora of methods available for GRN reconstruction, information theory and fuzzy concepts-based methods have abiding popularity. However, most of these methods are not only complex, incurring a high computational burden, but they may also produce a high number of false positives, leading to inaccurate inferred networks. In this paper, we propose a novel hybrid fuzzy GRN inference model called MICFuzzy which involves the aggregation of the effects of Maximal Information Coefficient (MIC). This model has an information theory-based pre-processing stage, the output of which is applied as an input to the novel fuzzy model. In this preprocessing stage, the MIC component filters relevant genes for each target gene to significantly reduce the computational burden of the fuzzy model when selecting the regulatory genes from these filtered gene lists. The novel fuzzy model uses the regulatory effect of the identified activator-repressor gene pairs to determine target gene expression levels. This approach facilitates accurate network inference by generating a high number of true regulatory interactions while significantly reducing false regulatory predictions. The performance of MICFuzzy was evaluated using DREAM3 and DREAM4 challenge data, and the SOS real gene expression dataset. MICFuzzy outperformed the other state-of-the-art methods in terms of F-score, Matthews Correlation Coefficient, Structural Accuracy, and SS_mean, and outperformed most of them in terms of efficiency. MICFuzzy also had improved efficiency compared with the classical fuzzy model since the design of MICFuzzy leads to a reduction in combinatorial computation.

Subject(s)

Algorithms , Computational Biology , Computational Biology/methods , Gene Regulatory Networks , Systems Biology , Genes, Regulator , Models, Genetic

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL