Search | VHL Regional Portal

Building a search tool for compositely annotated entities using Transformer-based approach: Case study in Biosimulation Model Search Engine (BMSE).

Munarko, Yuda; Rampadarath, Anand; Nickerson, David.

F1000Res ; 12: 162, 2023.

Article in English | MEDLINE | ID: mdl-37842339

ABSTRACT

The Transformer-based approaches to solving natural language processing (NLP) tasks such as BERT and GPT are gaining popularity due to their ability to achieve high performance. These approaches benefit from using enormous data sizes to create pre-trained models and the ability to understand the context of words in a sentence. Their use in the information retrieval domain is thought to increase effectiveness and efficiency. This paper demonstrates a BERT-based method (CASBERT) implementation to build a search tool over data annotated compositely using ontologies. The data was a collection of biosimulation models written using the CellML standard in the Physiome Model Repository (PMR). A biosimulation model structurally consists of basic entities of constants and variables that construct higher-level entities such as components, reactions, and the model. Finding these entities specific to their level is beneficial for various purposes regarding variable reuse, experiment setup, and model audit. Initially, we created embeddings representing compositely-annotated entities for constant and variable search (lowest level entity). Then, these low-level entity embeddings were vertically and efficiently combined to create higher-level entity embeddings to search components, models, images, and simulation setups. Our approach was general, so it can be used to create search tools with other data semantically annotated with ontologies - biosimulation models encoded in the SBML format, for example. Our tool is named Biosimulation Model Search Engine (BMSE).

Subject(s)

Information Storage and Retrieval , Search Engine , Computer Simulation , Natural Language Processing

CASBERT: BERT-based retrieval for compositely annotated biosimulation model entities.

Munarko, Yuda; Rampadarath, Anand; Nickerson, David P.

Front Bioinform ; 3: 1107467, 2023.

Article in English | MEDLINE | ID: mdl-36865672

ABSTRACT

Maximising FAIRness of biosimulation models requires a comprehensive description of model entities such as reactions, variables, and components. The COmputational Modeling in BIology NEtwork (COMBINE) community encourages the use of Resource Description Framework with composite annotations that semantically involve ontologies to ensure completeness and accuracy. These annotations facilitate scientists to find models or detailed information to inform further reuse, such as model composition, reproduction, and curation. SPARQL has been recommended as a key standard to access semantic annotation with RDF, which helps get entities precisely. However, SPARQL is unsuitable for most repository users who explore biosimulation models freely without adequate knowledge of ontologies, RDF structure, and SPARQL syntax. We propose here a text-based information retrieval approach, CASBERT, that is easy to use and can present candidates of relevant entities from models across a repository's contents. CASBERT adapts Bidirectional Encoder Representations from Transformers (BERT), where each composite annotation about an entity is converted into an entity embedding for subsequent storage in a list of entity embeddings. For entity lookup, a query is transformed to a query embedding and compared to the entity embeddings, and then the entities are displayed in order based on their similarity. The list structure makes it possible to implement CASBERT as an efficient search engine product, with inexpensive addition, modification, and insertion of entity embedding. To demonstrate and test CASBERT, we created a dataset for testing from the Physiome Model Repository and a static export of the BioModels database consisting of query-entities pairs. Measured using Mean Average Precision and Mean Reciprocal Rank, we found that our approach can perform better than the traditional bag-of-words method.

NLIMED: Natural Language Interface for Model Entity Discovery in Biosimulation Model Repositories.

Munarko, Yuda; Sarwar, Dewan M; Rampadarath, Anand; Atalag, Koray; Gennari, John H; Neal, Maxwell L; Nickerson, David P.

Front Physiol ; 13: 820683, 2022.

Article in English | MEDLINE | ID: mdl-35283794

ABSTRACT

Semantic annotation is a crucial step to assure reusability and reproducibility of biosimulation models in biology and physiology. For this purpose, the COmputational Modeling in BIology NEtwork (COMBINE) community recommends the use of the Resource Description Framework (RDF). This grounding in RDF provides the flexibility to enable searching for entities within models (e.g., variables, equations, or entire models) by utilizing the RDF query language SPARQL. However, the rigidity and complexity of the SPARQL syntax and the nature of the tree-like structure of semantic annotations, are challenging for users. Therefore, we propose NLIMED, an interface that converts natural language queries into SPARQL. We use this interface to query and discover model entities from repositories of biosimulation models. NLIMED works with the Physiome Model Repository (PMR) and the BioModels database and potentially other repositories annotated using RDF. Natural language queries are first "chunked" into phrases and annotated against ontology classes and predicates utilizing different natural language processing tools. Then, the ontology classes and predicates are composed as SPARQL and finally ranked using our SPARQL Composer and our indexing system. We demonstrate that NLIMED's approach for chunking and annotating queries is more effective than the NCBO Annotator for identifying relevant ontology classes in natural language queries.Comparison of NLIMED's behavior against historical query records in the PMR shows that it can adapt appropriately to queries associated with well-annotated models.

libOmexMeta: enabling semantic annotation of models to support FAIR principles.

Welsh, Ciaran; Nickerson, David P; Rampadarath, Anand; Neal, Maxwell L; Sauro, Herbert M; Gennari, John H.

Bioinformatics ; 37(24): 4898-4900, 2021 12 11.

Article in English | MEDLINE | ID: mdl-34132740

ABSTRACT

SUMMARY: As the number and complexity of biosimulation models grows, so do demands for tools that can help users better understand models and make those models more findable, shareable and reproducible. Consistent model annotation is a step toward these goals. Both models and tools are written in a variety of different languages; thus, the community has recognized the need for standard, language-independent methods for annotation. Based on the Computational Modeling in Biology Network community consensus, we introduce an open-source, cross-platform software library for semantic annotation of models. AVAILABILITY AND IMPLEMENTATION: libOmexMeta is freely available at https://github.com/sys-bio/libOmexMeta under the Apache License 2.0. A live demonstration is at github.com/sys-bio/pyomexmeta-binder-notebook. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Semantics , Software , Computer Simulation , Language , Consensus

Best Practices for Making Reproducible Biochemical Models.

Porubsky, Veronica L; Goldberg, Arthur P; Rampadarath, Anand K; Nickerson, David P; Karr, Jonathan R; Sauro, Herbert M.

Cell Syst ; 11(2): 109-120, 2020 08 26.

Article in English | MEDLINE | ID: mdl-32853539

ABSTRACT

Like many scientific disciplines, dynamical biochemical modeling is hindered by irreproducible results. This limits the utility of biochemical models by making them difficult to understand, trust, or reuse. We comprehensively list the best practices that biochemical modelers should follow to build reproducible biochemical model artifacts-all data, model descriptions, and custom software used by the model-that can be understood and reused. The best practices provide advice for all steps of a typical biochemical modeling workflow in which a modeler collects data; constructs, trains, simulates, and validates the model; uses the predictions of a model to advance knowledge; and publicly shares the model artifacts. The best practices emphasize the benefits obtained by using standard tools and formats and provides guidance to modelers who do not or cannot use standards in some stages of their modeling workflow. Adoption of these best practices will enhance the ability of researchers to reproduce, understand, and reuse biochemical models.

Subject(s)

Computer Simulation/standards , Systems Biology/methods , Humans

Improving reproducibility in computational biology research.

Papin, Jason A; Mac Gabhann, Feilim; Sauro, Herbert M; Nickerson, David; Rampadarath, Anand.

PLoS Comput Biol ; 16(5): e1007881, 2020 05.

Article in English | MEDLINE | ID: mdl-32427998

Subject(s)

Computational Biology/standards , Reproducibility of Results , Research

A Distribution-Moment Approximation for Coupled Dynamics of the Airway Wall and Airway Smooth Muscle.

Rampadarath, Anand K; Donovan, Graham M.

Biophys J ; 114(2): 493-501, 2018 01 23.

Article in English | MEDLINE | ID: mdl-29401446

ABSTRACT

Asthma is fundamentally a disease of airway constriction. Due to a variety of experimental challenges, the dynamics of airways are poorly understood. Of specific interest is the narrowing of the airway due to forces produced by the airway smooth muscle wrapped around each airway. The interaction between the muscle and the airway wall is crucial for the airway constriction that occurs during an asthma attack. Although cross-bridge theory is a well-studied representation of complex smooth muscle dynamics, and these dynamics can be coupled to the airway wall, this comes at significant computational cost-even for isolated airways. Because many phenomena of interest in pulmonary physiology cannot be adequately understood by studying isolated airways, this presents a significant limitation. We present a distribution-moment approximation of this coupled system and study the validity of the approximation throughout the physiological range. We show that the distribution-moment approximation is valid in most conditions, and we explore the region of breakdown. These results show that in many situations, the distribution-moment approximation is a viable option that provides an orders-of-magnitude reduction in computational complexity; not only is this valuable for isolated airway studies, but it moreover offers the prospect that rich ASM dynamics might be incorporated into interacting airway models where previously this was precluded by computational cost.

Subject(s)

Mechanical Phenomena , Models, Biological , Muscle, Smooth/physiology , Respiratory Physiological Phenomena , Biomechanical Phenomena , Pressure

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL