Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 13 de 13
Filter
Add more filters










Publication year range
1.
Article in English | MEDLINE | ID: mdl-38577265

ABSTRACT

The cellular immune response comprises several processes, with the most notable ones being the binding of the peptide to the Major Histocompability Complex (MHC), the peptide-MHC (pMHC) presentation to the surface of the cell, and the recognition of the pMHC by the T-Cell Receptor. Identifying the most potent peptide targets for MHC binding, presentation and T-cell recognition is vital for developing peptide-based vaccines and T-cell-based immunotherapies. Data-driven tools that predict each of these steps have been developed, and the availability of mass spectrometry (MS) datasets has facilitated the development of accurate Machine Learning (ML) methods for class-I pMHC binding prediction. However, the accuracy of ML-based tools for pMHC kinetic stability prediction and peptide immunogenicity prediction is uncertain, as stability and immunogenicity datasets are not abundant. Here, we use transfer learning techniques to improve stability and immunogenicity predictions, by taking advantage of a large number of binding affinity and MS datasets. The resulting models, TLStab and TLImm, exhibit comparable or better performance than state-of-the-art approaches on different stability and immunogenicity test sets respectively. Our approach demonstrates the promise of learning from the task of peptide binding to improve predictions on downstream tasks. The source code of TLStab and TLImm is publicly available at https://github.com/KavrakiLab/TL-MHC.

2.
Orphanet J Rare Dis ; 19(1): 66, 2024 Feb 14.
Article in English | MEDLINE | ID: mdl-38355534

ABSTRACT

BACKGROUND: The EURO-NMD Registry collects data from all neuromuscular patients seen at EURO-NMD's expert centres. In-kind contributions from three patient organisations have ensured that the registry is patient-centred, meaningful, and impactful. The consenting process covers other uses, such as research, cohort finding and trial readiness. RESULTS: The registry has three-layered datasets, with European Commission-mandated data elements (EU-CDEs), a set of cross-neuromuscular data elements (NMD-CDEs) and a dataset of disease-specific data elements that function modularly (DS-DEs). The registry captures clinical, neuromuscular imaging, neuromuscular histopathology, biological and genetic data and patient-reported outcomes in a computer-interpretable format using selected ontologies and classifications. The EURO-NMD registry is connected to the EURO-NMD Registry Hub through an interoperability layer. The Hub provides an entry point to other neuromuscular registries that follow the FAIR data stewardship principles and enable GDPR-compliant information exchange. Four national or disease-specific patient registries are interoperable with the EURO-NMD Registry, allowing for federated analysis across these different resources. CONCLUSIONS: Collectively, the Registry Hub brings together data that are currently siloed and fragmented to improve healthcare and advance research for neuromuscular diseases.


Subject(s)
Neuromuscular Diseases , Humans , Registries , Neuromuscular Diseases/genetics , Rare Diseases
3.
Front Res Metr Anal ; 8: 1250930, 2023.
Article in English | MEDLINE | ID: mdl-37841902

ABSTRACT

Biomedical experts are facing challenges in keeping up with the vast amount of biomedical knowledge published daily. With millions of citations added to databases like MEDLINE/PubMed each year, efficiently accessing relevant information becomes crucial. Traditional term-based searches may lead to irrelevant or missed documents due to homonyms, synonyms, abbreviations, or term mismatch. To address this, semantic search approaches employing predefined concepts with associated synonyms and relations have been used to expand query terms and improve information retrieval. The National Library of Medicine (NLM) plays a significant role in this area, indexing citations in the MEDLINE database with topic descriptors from the Medical Subject Headings (MeSH) thesaurus, enabling advanced semantic search strategies to retrieve relevant citations, despite synonymy, and polysemy of biomedical terms. Over time, advancements in semantic indexing have been made, with Machine Learning facilitating the transition from manual to automatic semantic indexing in the biomedical literature. The paper highlights the journey of this transition, starting with manual semantic indexing and the initial efforts toward automatic indexing. The BioASQ challenge has served as a catalyst in revolutionizing the domain of semantic indexing, further pushing the boundaries of efficient knowledge retrieval in the biomedical field.

4.
J Biomed Inform ; 146: 104499, 2023 Oct.
Article in English | MEDLINE | ID: mdl-37714418

ABSTRACT

OBJECTIVE: Semantic indexing of biomedical literature is usually done at the level of MeSH descriptors with several related but distinct biomedical concepts often grouped together and treated as a single topic. This study proposes a new method for the automated refinement of subject annotations at the level of MeSH concepts. METHODS: Lacking labelled data, we rely on weak supervision based on concept occurrence in the abstract of an article, which is also enhanced by dictionary-based heuristics. In addition, we investigate deep learning approaches, making design choices to tackle the particular challenges of this task. The new method is evaluated on a large-scale retrospective scenario, based on concepts that have been promoted to descriptors. RESULTS: In our experiments concept occurrence was the strongest heuristic achieving a macro-F1 score of about 0.63 across several labels. The proposed method improved it further by more than 4pp. CONCLUSION: The results suggest that concept occurrence is a strong heuristic for refining the coarse-grained labels at the level of MeSH concepts and the proposed method improves it further.

5.
BMC Bioinformatics ; 24(1): 272, 2023 Jun 30.
Article in English | MEDLINE | ID: mdl-37391722

ABSTRACT

This paper applies different link prediction methods on a knowledge graph generated from biomedical literature, with the aim to compare their ability to identify unknown drug-gene interactions and explain their predictions. Identifying novel drug-target interactions is a crucial step in drug discovery and repurposing. One approach to this problem is to predict missing links between drug and gene nodes, in a graph that contains relevant biomedical knowledge. Such a knowledge graph can be extracted from biomedical literature, using text mining tools. In this work, we compare state-of-the-art graph embedding approaches and contextual path analysis on the interaction prediction task. The comparison reveals a trade-off between predictive accuracy and explainability of predictions. Focusing on explainability, we train a decision tree on model predictions and show how it can aid the understanding of the prediction process. We further test the methods on a drug repurposing task and validate the predicted interactions against external databases, with very encouraging results.


Subject(s)
Data Mining , Pattern Recognition, Automated , Drug Interactions , Databases, Factual , Drug Discovery
6.
Sci Data ; 10(1): 170, 2023 03 27.
Article in English | MEDLINE | ID: mdl-36973320

ABSTRACT

The BioASQ question answering (QA) benchmark dataset contains questions in English, along with golden standard (reference) answers and related material. The dataset has been designed to reflect real information needs of biomedical experts and is therefore more realistic and challenging than most existing datasets. Furthermore, unlike most previous QA benchmarks that contain only exact answers, the BioASQ-QA dataset also includes ideal answers (in effect summaries), which are particularly useful for research on multi-document summarization. The dataset combines structured and unstructured data. The materials linked with each question comprise documents and snippets, which are useful for Information Retrieval and Passage Retrieval experiments, as well as concepts that are useful in concept-to-text Natural Language Generation. Researchers working on paraphrasing and textual entailment can also measure the degree to which their methods improve the performance of biomedical QA systems. Last but not least, the dataset is continuously extended, as the BioASQ challenge is running and new data are generated.

7.
Web Semant ; 75: 100760, 2023 Jan.
Article in English | MEDLINE | ID: mdl-36268112

ABSTRACT

In this paper, we present Knowledge4COVID-19, a framework that aims to showcase the power of integrating disparate sources of knowledge to discover adverse drug effects caused by drug-drug interactions among COVID-19 treatments and pre-existing condition drugs. Initially, we focus on constructing the Knowledge4COVID-19 knowledge graph (KG) from the declarative definition of mapping rules using the RDF Mapping Language. Since valuable information about drug treatments, drug-drug interactions, and side effects is present in textual descriptions in scientific databases (e.g., DrugBank) or in scientific literature (e.g., the CORD-19, the Covid-19 Open Research Dataset), the Knowledge4COVID-19 framework implements Natural Language Processing. The Knowledge4COVID-19 framework extracts relevant entities and predicates that enable the fine-grained description of COVID-19 treatments and the potential adverse events that may occur when these treatments are combined with treatments of common comorbidities, e.g., hypertension, diabetes, or asthma. Moreover, on top of the KG, several techniques for the discovery and prediction of interactions and potential adverse effects of drugs have been developed with the aim of suggesting more accurate treatments for treating the virus. We provide services to traverse the KG and visualize the effects that a group of drugs may have on a treatment outcome. Knowledge4COVID-19 was part of the Pan-European hackathon#EUvsVirus in April 2020 and is publicly available as a resource through a GitHub repository and a DOI.

8.
BMC Med Inform Decis Mak ; 22(1): 271, 2022 10 17.
Article in English | MEDLINE | ID: mdl-36253849

ABSTRACT

BACKGROUND: Dementia develops as cognitive abilities deteriorate, and early detection is critical for effective preventive interventions. However, mainstream diagnostic tests and screening tools, such as CAMCOG and MMSE, often fail to detect dementia accurately. Various graph-based or feature-dependent prediction and progression models have been proposed. Whenever these models exploit information in the patients' Electronic Medical Records, they represent promising options to identify the presence and severity of dementia more precisely. METHODS: The methods presented in this paper aim to address two problems related to dementia: (a) Basic diagnosis: identifying the presence of dementia in individuals, and (b) Severity diagnosis: predicting the presence of dementia, as well as the severity of the disease. We formulate these two tasks as classification problems and address them using machine learning models based on random forests and decision tree, analysing structured clinical data from an elderly population cohort. We perform a hybrid data curation strategy in which a dementia expert is involved to verify that curation decisions are meaningful. We then employ the machine learning algorithms that classify individual episodes into a specific dementia class. Decision trees are also used for enhancing the explainability of decisions made by prediction models, allowing medical experts to identify the most crucial patient features and their threshold values for the classification of dementia. RESULTS: Our experiment results prove that baseline arithmetic or cognitive tests, along with demographic features, can predict dementia and its severity with high accuracy. In specific, our prediction models have reached an average f1-score of 0.93 and 0.81 for problems (a) and (b), respectively. Moreover, the decision trees produced for the two issues empower the interpretability of the prediction models. CONCLUSIONS: This study proves that there can be an accurate estimation of the existence and severity of dementia disease by analysing various electronic medical record features and cognitive tests from the episodes of the elderly population. Moreover, a set of decision rules may comprise the building blocks for an efficient patient classification. Relevant clinical and screening test features (e.g. simple arithmetic or animal fluency tasks) represent precise predictors without calculating the scores of mainstream cognitive tests such as MMSE and CAMCOG. Such predictive model can identify not only meaningful features, but also justifications of classification. As a result, the predictive power of machine learning models over curated clinical data is proved, paving the path for a more accurate diagnosis of dementia.


Subject(s)
Dementia , Machine Learning , Aged , Algorithms , Dementia/diagnosis , Dementia/psychology , Electronic Health Records , Humans , Neuropsychological Tests
9.
Nucleic Acids Res ; 50(W1): W191-W198, 2022 07 05.
Article in English | MEDLINE | ID: mdl-35670672

ABSTRACT

The development of the CRISPR-Cas9 technology has provided a simple yet powerful system for genome editing. Current gRNA design tools serve as an important platform for the efficient application of the CRISPR systems. However, most of the existing tools are black-box models that suffer from limitations, such as variable performance and unclear mechanism of decision making. Here, we introduce CRISPRedict, an interpretable gRNA efficiency prediction model for CRISPR-Cas9 gene editing. Its strength lies in the fact that it can accurately predict efficient guide RNAs-with equivalent performance to state-of-the-art tools-while being a simple linear model. Implemented as a user-friendly web server, CRISPRedict offers (i) quick and accurate predictions across various experimental conditions (e.g. U6/T7 transcription); (ii) regression and classification models for scoring gRNAs and (iii) multiple visualizations to explain the obtained results. Given its performance, interpretability, and versatility, we expect that it will assist researchers in the gRNA design process and facilitate genome editing research. CRISPRedict is available for use at http://www.crispredict.org/.


Subject(s)
CRISPR-Cas Systems , Computers , Gene Editing , Internet , Linear Models , RNA, Guide, CRISPR-Cas Systems , Software , CRISPR-Cas Systems/genetics , Gene Editing/methods , RNA, Guide, CRISPR-Cas Systems/chemistry , RNA, Guide, CRISPR-Cas Systems/genetics , RNA, Guide, CRISPR-Cas Systems/metabolism , Data Visualization
10.
Nucleic Acids Res ; 50(7): 3616-3637, 2022 04 22.
Article in English | MEDLINE | ID: mdl-35349718

ABSTRACT

The clustered regularly interspaced short palindromic repeat (CRISPR)/CRISPR-associated protein 9 (Cas9) system has become a successful and promising technology for gene-editing. To facilitate its effective application, various computational tools have been developed. These tools can assist researchers in the guide RNA (gRNA) design process by predicting cleavage efficiency and specificity and excluding undesirable targets. However, while many tools are available, assessment of their application scenarios and performance benchmarks are limited. Moreover, new deep learning tools have been explored lately for gRNA efficiency prediction, but have not been systematically evaluated. Here, we discuss the approaches that pertain to the on-target activity problem, focusing mainly on the features and computational methods they utilize. Furthermore, we evaluate these tools on independent datasets and give some suggestions for their usage. We conclude with some challenges and perspectives about future directions for CRISPR-Cas9 guide design.


Subject(s)
Deep Learning , Gene Editing , RNA, Guide, Kinetoplastida , CRISPR-Cas Systems , Gene Editing/methods , RNA, Guide, Kinetoplastida/genetics
11.
Emerg Top Life Sci ; 5(6): 789-802, 2021 12 21.
Article in English | MEDLINE | ID: mdl-34665257

ABSTRACT

The field of structural proteomics, which is focused on studying the structure-function relationship of proteins and protein complexes, is experiencing rapid growth. Since the early 2000s, structural databases such as the Protein Data Bank are storing increasing amounts of protein structural data, in addition to modeled structures becoming increasingly available. This, combined with the recent advances in graph-based machine-learning models, enables the use of protein structural data in predictive models, with the goal of creating tools that will advance our understanding of protein function. Similar to using graph learning tools to molecular graphs, which currently undergo rapid development, there is also an increasing trend in using graph learning approaches on protein structures. In this short review paper, we survey studies that use graph learning techniques on proteins, and examine their successes and shortcomings, while also discussing future directions.


Subject(s)
Machine Learning , Proteomics , Databases, Protein , Learning , Proteins
12.
J Neuromuscul Dis ; 8(6): 1097-1108, 2021.
Article in English | MEDLINE | ID: mdl-34334415

ABSTRACT

BACKGROUND: For patients with rare diseases such as Duchenne and Becker muscular dystrophy (DMD/BMD), access to their health data is key to being able to advocate for themselves and be in control of their care. Since 2018, the DMD/BMD patient community has been committed to making DMD/BMD-related data FAIR, i.e., Findable, Accessible, Interoperable, and Reusable. On March 3, 2021, the second international meeting on FAIR data sharing for DMD/BMD was held virtually. OBJECTIVE: The aim of this meeting report is to summarize the presentations and discussions of the meeting. METHODS: During this meeting, the progress of FAIRification efforts since the first international meeting in 2019, new developments, stakeholder perspectives, and experiences from implementing FAIR data principles in practice were presented and discussed. RESULTS: Over 120 attendees representing various stakeholder groups (ie, patient organizations, clinicians, clinical and academic researchers, pharmaceutical companies, regulators, and EU organizations) from 22 countries participated in the meeting. This meeting report summarizes the presentations and discussions from the meeting, provides an overview of the key lessons learned since the first meeting, and outlines the next steps. CONCLUSIONS: Patient organizations are key drivers of the FAIRification process in practice and dialogue with stakeholders is critical to success.


Subject(s)
Delivery of Health Care , Information Dissemination , Muscular Dystrophy, Duchenne , Congresses as Topic , Humans , Patient Advocacy
13.
BMC Bioinformatics ; 16: 138, 2015 Apr 30.
Article in English | MEDLINE | ID: mdl-25925131

ABSTRACT

BACKGROUND: This article provides an overview of the first BIOASQ challenge, a competition on large-scale biomedical semantic indexing and question answering (QA), which took place between March and September 2013. BIOASQ assesses the ability of systems to semantically index very large numbers of biomedical scientific articles, and to return concise and user-understandable answers to given natural language questions by combining information from biomedical articles and ontologies. RESULTS: The 2013 BIOASQ competition comprised two tasks, Task 1a and Task 1b. In Task 1a participants were asked to automatically annotate new PUBMED documents with MESH headings. Twelve teams participated in Task 1a, with a total of 46 system runs submitted, and one of the teams performing consistently better than the MTI indexer used by NLM to suggest MESH headings to curators. Task 1b used benchmark datasets containing 29 development and 282 test English questions, along with gold standard (reference) answers, prepared by a team of biomedical experts from around Europe and participants had to automatically produce answers. Three teams participated in Task 1b, with 11 system runs. The BIOASQ infrastructure, including benchmark datasets, evaluation mechanisms, and the results of the participants and baseline methods, is publicly available. CONCLUSIONS: A publicly available evaluation infrastructure for biomedical semantic indexing and QA has been developed, which includes benchmark datasets, and can be used to evaluate systems that: assign MESH headings to published articles or to English questions; retrieve relevant RDF triples from ontologies, relevant articles and snippets from PUBMED Central; produce "exact" and paragraph-sized "ideal" answers (summaries). The results of the systems that participated in the 2013 BIOASQ competition are promising. In Task 1a one of the systems performed consistently better from the NLM's MTI indexer. In Task 1b the systems received high scores in the manual evaluation of the "ideal" answers; hence, they produced high quality summaries as answers. Overall, BIOASQ helped obtain a unified view of how techniques from text classification, semantic indexing, document and passage retrieval, question answering, and text summarization can be combined to allow biomedical experts to obtain concise, user-understandable answers to questions reflecting their real information needs.


Subject(s)
Abstracting and Indexing/methods , Algorithms , Medical Subject Headings , Natural Language Processing , PubMed , Semantics , Software , Humans , National Library of Medicine (U.S.) , United States
SELECTION OF CITATIONS
SEARCH DETAIL
...