Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 7 de 7
Filter
Add more filters










Database
Language
Publication year range
1.
Front Vet Sci ; 11: 1352239, 2024.
Article in English | MEDLINE | ID: mdl-38322169

ABSTRACT

The development of natural language processing techniques for deriving useful information from unstructured clinical narratives is a fast-paced and rapidly evolving area of machine learning research. Large volumes of veterinary clinical narratives now exist curated by projects such as the Small Animal Veterinary Surveillance Network (SAVSNET) and VetCompass, and the application of such techniques to these datasets is already (and will continue to) improve our understanding of disease and disease patterns within veterinary medicine. In part one of this two part article series, we discuss the importance of understanding the lexical structure of clinical records and discuss the use of basic tools for filtering records based on key words and more complex rule based pattern matching approaches. We discuss the strengths and weaknesses of these approaches highlighting the on-going potential value in using these "traditional" approaches but ultimately recognizing that these approaches constrain how effectively information retrieval can be automated. This sets the scene for the introduction of machine-learning methodologies and the plethora of opportunities for automation of information extraction these present which is discussed in part two of the series.

2.
JMIR Med Inform ; 8(8): e16948, 2020 Aug 06.
Article in English | MEDLINE | ID: mdl-32759099

ABSTRACT

BACKGROUND: How to treat a disease remains to be the most common type of clinical question. Obtaining evidence-based answers from biomedical literature is difficult. Analogical reasoning with embeddings from deep learning (embedding analogies) may extract such biomedical facts, although the state-of-the-art focuses on pair-based proportional (pairwise) analogies such as man:woman::king:queen ("queen = -man +king +woman"). OBJECTIVE: This study aimed to systematically extract disease treatment statements with a Semantic Deep Learning (SemDeep) approach underpinned by prior knowledge and another type of 4-term analogy (other than pairwise). METHODS: As preliminaries, we investigated Continuous Bag-of-Words (CBOW) embedding analogies in a common-English corpus with five lines of text and observed a type of 4-term analogy (not pairwise) applying the 3CosAdd formula and relating the semantic fields person and death: "dagger = -Romeo +die +died" (search query: -Romeo +die +died). Our SemDeep approach worked with pre-existing items of knowledge (what is known) to make inferences sanctioned by a 4-term analogy (search query -x +z1 +z2) from CBOW and Skip-gram embeddings created with a PubMed systematic reviews subset (PMSB dataset). Stage1: Knowledge acquisition. Obtaining a set of terms, candidate y, from embeddings using vector arithmetic. Some n-gram pairs from the cosine and validated with evidence (prior knowledge) are the input for the 3cosAdd, seeking a type of 4-term analogy relating the semantic fields disease and treatment. Stage 2: Knowledge organization. Identification of candidates sanctioned by the analogy belonging to the semantic field treatment and mapping these candidates to unified medical language system Metathesaurus concepts with MetaMap. A concept pair is a brief disease treatment statement (biomedical fact). Stage 3: Knowledge validation. An evidence-based evaluation followed by human validation of biomedical facts potentially useful for clinicians. RESULTS: We obtained 5352 n-gram pairs from 446 search queries by applying the 3CosAdd. The microaveraging performance of MetaMap for candidate y belonging to the semantic field treatment was F-measure=80.00% (precision=77.00%, recall=83.25%). We developed an empirical heuristic with some predictive power for clinical winners, that is, search queries bringing candidate y with evidence of a therapeutic intent for target disease x. The search queries -asthma +inhaled_corticosteroids +inhaled_corticosteroid and -epilepsy +valproate +antiepileptic_drug were clinical winners, finding eight evidence-based beneficial treatments. CONCLUSIONS: Extracting treatments with therapeutic intent by analogical reasoning from embeddings (423K n-grams from the PMSB dataset) is an ambitious goal. Our SemDeep approach is knowledge-based, underpinned by embedding analogies that exploit prior knowledge. Biomedical facts from embedding analogies (4-term type, not pairwise) are potentially useful for clinicians. The heuristic offers a practical way to discover beneficial treatments for well-known diseases. Learning from deep learning models does not require a massive amount of data. Embedding analogies are not limited to pairwise analogies; hence, analogical reasoning with embeddings is underexploited.

3.
J Biomed Semantics ; 10(Suppl 1): 22, 2019 11 12.
Article in English | MEDLINE | ID: mdl-31711540

ABSTRACT

BACKGROUND: Deep Learning opens up opportunities for routinely scanning large bodies of biomedical literature and clinical narratives to represent the meaning of biomedical and clinical terms. However, the validation and integration of this knowledge on a scale requires cross checking with ground truths (i.e. evidence-based resources) that are unavailable in an actionable or computable form. In this paper we explore how to turn information about diagnoses, prognoses, therapies and other clinical concepts into computable knowledge using free-text data about human and animal health. We used a Semantic Deep Learning approach that combines the Semantic Web technologies and Deep Learning to acquire and validate knowledge about 11 well-known medical conditions mined from two sets of unstructured free-text data: 300 K PubMed Systematic Review articles (the PMSB dataset) and 2.5 M veterinary clinical notes (the VetCN dataset). For each target condition we obtained 20 related clinical concepts using two deep learning methods applied separately on the two datasets, resulting in 880 term pairs (target term, candidate term). Each concept, represented by an n-gram, is mapped to UMLS using MetaMap; we also developed a bespoke method for mapping short forms (e.g. abbreviations and acronyms). Existing ontologies were used to formally represent associations. We also create ontological modules and illustrate how the extracted knowledge can be queried. The evaluation was performed using the content within BMJ Best Practice. RESULTS: MetaMap achieves an F measure of 88% (precision 85%, recall 91%) when applied directly to the total of 613 unique candidate terms for the 880 term pairs. When the processing of short forms is included, MetaMap achieves an F measure of 94% (precision 92%, recall 96%). Validation of the term pairs with BMJ Best Practice yields precision between 98 and 99%. CONCLUSIONS: The Semantic Deep Learning approach can transform neural embeddings built from unstructured free-text data into reliable and reusable One Health knowledge using ontologies and content from BMJ Best Practice.


Subject(s)
Deep Learning , Knowledge Bases , One Health , PubMed , Semantics , Systematic Reviews as Topic , Veterinarians , Biological Ontologies
4.
J Biomed Semantics ; 9(1): 13, 2018 04 12.
Article in English | MEDLINE | ID: mdl-29650041

ABSTRACT

BACKGROUND: Automatic identification of term variants or acceptable alternative free-text terms for gene and protein names from the millions of biomedical publications is a challenging task. Ontologies, such as the Cardiovascular Disease Ontology (CVDO), capture domain knowledge in a computational form and can provide context for gene/protein names as written in the literature. This study investigates: 1) if word embeddings from Deep Learning algorithms can provide a list of term variants for a given gene/protein of interest; and 2) if biological knowledge from the CVDO can improve such a list without modifying the word embeddings created. METHODS: We have manually annotated 105 gene/protein names from 25 PubMed titles/abstracts and mapped them to 79 unique UniProtKB entries corresponding to gene and protein classes from the CVDO. Using more than 14 M PubMed articles (titles and available abstracts), word embeddings were generated with CBOW and Skip-gram. We setup two experiments for a synonym detection task, each with four raters, and 3672 pairs of terms (target term and candidate term) from the word embeddings created. For Experiment I, the target terms for 64 UniProtKB entries were those that appear in the titles/abstracts; Experiment II involves 63 UniProtKB entries and the target terms are a combination of terms from PubMed titles/abstracts with terms (i.e. increased context) from the CVDO protein class expressions and labels. RESULTS: In Experiment I, Skip-gram finds term variants (full and/or partial) for 89% of the 64 UniProtKB entries, while CBOW finds term variants for 67%. In Experiment II (with the aid of the CVDO), Skip-gram finds term variants for 95% of the 63 UniProtKB entries, while CBOW finds term variants for 78%. Combining the results of both experiments, Skip-gram finds term variants for 97% of the 79 UniProtKB entries, while CBOW finds term variants for 81%. CONCLUSIONS: This study shows performance improvements for both CBOW and Skip-gram on a gene/protein synonym detection task by adding knowledge formalised in the CVDO and without modifying the word embeddings created. Hence, the CVDO supplies context that is effective in inducing term variability for both CBOW and Skip-gram while reducing ambiguity. Skip-gram outperforms CBOW and finds more pertinent term variants for gene/protein names annotated from the scientific literature.


Subject(s)
Biological Ontologies , Cardiovascular Diseases , Deep Learning , Cardiovascular Diseases/genetics , Cardiovascular Diseases/metabolism , Humans , Molecular Sequence Annotation , PubMed , ROC Curve
5.
Stud Health Technol Inform ; 235: 516-520, 2017.
Article in English | MEDLINE | ID: mdl-28423846

ABSTRACT

We investigate the application of distributional semantics models for facilitating unsupervised extraction of biomedical terms from unannotated corpora. Term extraction is used as the first step of an ontology learning process that aims to (semi-)automatic annotation of biomedical concepts and relations from more than 300K PubMed titles and abstracts. We experimented with both traditional distributional semantics methods such as Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA) as well as the neural language models CBOW and Skip-gram from Deep Learning. The evaluation conducted concentrates on sepsis, a major life-threatening condition, and shows that Deep Learning models outperform LSA and LDA with much higher precision.


Subject(s)
Machine Learning , PubMed , Semantics , Sepsis , Humans , Information Storage and Retrieval , Natural Language Processing
6.
J Biomed Semantics ; 7(1): 33, 2016 06 04.
Article in English | MEDLINE | ID: mdl-27259807

ABSTRACT

BACKGROUND: The Proteasix Ontology (PxO) is an ontology that supports the Proteasix tool; an open-source peptide-centric tool that can be used to predict automatically and in a large-scale fashion in silico the proteases involved in the generation of proteolytic cleavage fragments (peptides) METHODS: The PxO re-uses parts of the Protein Ontology, the three Gene Ontology sub-ontologies, the Chemical Entities of Biological Interest Ontology, the Sequence Ontology and bespoke extensions to the PxO in support of a series of roles: 1. To describe the known proteases and their target cleaveage sites. 2. To enable the description of proteolytic cleaveage fragments as the outputs of observed and predicted proteolysis. 3. To use knowledge about the function, species and cellular location of a protease and protein substrate to support the prioritisation of proteases in observed and predicted proteolysis. RESULTS: The PxO is designed to describe the biological underpinnings of the generation of peptides. The peptide-centric PxO seeks to support the Proteasix tool by separating domain knowledge from the operational knowledge used in protease prediction by Proteasix and to support the confirmation of its analyses and results. AVAILABILITY: The Proteasix Ontology may be found at: http://bioportal.bioontology.org/ontologies/PXO . This ontology is free and open for use by everyone.


Subject(s)
Computational Biology/methods , Gene Ontology , Proteolysis , Peptide Hydrolases/metabolism , Software
7.
JMIR Med Inform ; 3(1): e4, 2015 Jan 07.
Article in English | MEDLINE | ID: mdl-25785897

ABSTRACT

BACKGROUND: Patients with multiple conditions have complex needs and are increasing in number as populations age. This multimorbidity is one of the greatest challenges facing health care. Having more than 1 condition generates (1) interactions between pathologies, (2) duplication of tests, (3) difficulties in adhering to often conflicting clinical practice guidelines, (4) obstacles in the continuity of care, (5) confusing self-management information, and (6) medication errors. In this context, clinical decision support (CDS) systems need to be able to handle realistic complexity and minimize iatrogenic risks. OBJECTIVE: The aim of this review was to identify to what extent CDS is adopted in multimorbidity. METHODS: This review followed PRISMA guidance and adopted a multidisciplinary approach. Scopus and PubMed searches were performed by combining terms from 3 different thesauri containing synonyms for (1) multimorbidity and comorbidity, (2) polypharmacy, and (3) CDS. The relevant articles were identified by examining the titles and abstracts. The full text of selected/relevant articles was analyzed in-depth. For articles appropriate for this review, data were collected on clinical tasks, diseases, decision maker, methods, data input context, user interface considerations, and evaluation of effectiveness. RESULTS: A total of 50 articles were selected for the full in-depth analysis and 20 studies were included in the final review. Medication (n=10) and clinical guidance (n=8) were the predominant clinical tasks. Four studies focused on merging concurrent clinical practice guidelines. A total of 17 articles reported their CDS systems were knowledge-based. Most articles reviewed considered patients' clinical records (n=19), clinical practice guidelines (n=12), and clinicians' knowledge (n=10) as contextual input data. The most frequent diseases mentioned were cardiovascular (n=9) and diabetes mellitus (n=5). In all, 12 articles mentioned generalist doctor(s) as the decision maker(s). For articles reviewed, there were no studies referring to the active involvement of the patient in the decision-making process or to patient self-management. None of the articles reviewed adopted mobile technologies. There were no rigorous evaluations of usability or effectiveness of the CDS systems reported. CONCLUSIONS: This review shows that multimorbidity is underinvestigated in the informatics of supporting clinical decisions. CDS interventions that systematize clinical practice guidelines without considering the interactions of different conditions and care processes may lead to unhelpful or harmful clinical actions. To improve patient safety in multimorbidity, there is a need for more evidence about how both conditions and care processes interact. The data needed to build this evidence base exist in many electronic health record systems and are underused.

SELECTION OF CITATIONS
SEARCH DETAIL
...