Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 110
Filtrar
1.
J Biomed Inform ; 154: 104650, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38701887

RESUMO

BACKGROUND: Distinguishing diseases into distinct subtypes is crucial for study and effective treatment strategies. The Open Targets Platform (OT) integrates biomedical, genetic, and biochemical datasets to empower disease ontologies, classifications, and potential gene targets. Nevertheless, many disease annotations are incomplete, requiring laborious expert medical input. This challenge is especially pronounced for rare and orphan diseases, where resources are scarce. METHODS: We present a machine learning approach to identifying diseases with potential subtypes, using the approximately 23,000 diseases documented in OT. We derive novel features for predicting diseases with subtypes using direct evidence. Machine learning models were applied to analyze feature importance and evaluate predictive performance for discovering both known and novel disease subtypes. RESULTS: Our model achieves a high (89.4%) ROC AUC (Area Under the Receiver Operating Characteristic Curve) in identifying known disease subtypes. We integrated pre-trained deep-learning language models and showed their benefits. Moreover, we identify 515 disease candidates predicted to possess previously unannotated subtypes. CONCLUSIONS: Our models can partition diseases into distinct subtypes. This methodology enables a robust, scalable approach for improving knowledge-based annotations and a comprehensive assessment of disease ontology tiers. Our candidates are attractive targets for further study and personalized medicine, potentially aiding in the unveiling of new therapeutic indications for sought-after targets.


Assuntos
Aprendizado de Máquina , Humanos , Doença/classificação , Curva ROC , Biologia Computacional/métodos , Algoritmos , Aprendizado Profundo
2.
BMC Med Inform Decis Mak ; 23(Suppl 4): 299, 2024 Feb 07.
Artigo em Inglês | MEDLINE | ID: mdl-38326827

RESUMO

BACKGROUND: In this era of big data, data harmonization is an important step to ensure reproducible, scalable, and collaborative research. Thus, terminology mapping is a necessary step to harmonize heterogeneous data. Take the Medical Dictionary for Regulatory Activities (MedDRA) and International Classification of Diseases (ICD) for example, the mapping between them is essential for drug safety and pharmacovigilance research. Our main objective is to provide a quantitative and qualitative analysis of the mapping status between MedDRA and ICD. We focus on evaluating the current mapping status between MedDRA and ICD through the Unified Medical Language System (UMLS) and Observational Medical Outcomes Partnership Common Data Model (OMOP CDM). We summarized the current mapping statistics and evaluated the quality of the current MedDRA-ICD mapping; for unmapped terms, we used our self-developed algorithm to rank the best possible mapping candidates for additional mapping coverage. RESULTS: The identified MedDRA-ICD mapped pairs cover 27.23% of the overall MedDRA preferred terms (PT). The systematic quality analysis demonstrated that, among the mapped pairs provided by UMLS, only 51.44% are considered an exact match. For the 2400 sampled unmapped terms, 56 of the 2400 MedDRA Preferred Terms (PT) could have exact match terms from ICD. CONCLUSION: Some of the mapped pairs between MedDRA and ICD are not exact matches due to differences in granularity and focus. For 72% of the unmapped PT terms, the identified exact match pairs illustrate the possibility of identifying additional mapped pairs. Referring to its own mapping standard, some of the unmapped terms should qualify for the expansion of MedDRA to ICD mapping in UMLS.


Assuntos
Sistemas de Notificação de Reações Adversas a Medicamentos , Classificação Internacional de Doenças , Humanos , Unified Medical Language System , Farmacovigilância , Algoritmos
3.
J Am Med Inform Assoc ; 31(2): 426-434, 2024 Jan 18.
Artigo em Inglês | MEDLINE | ID: mdl-37952122

RESUMO

OBJECTIVE: To construct an exhaustive Complementary and Integrative Health (CIH) Lexicon (CIHLex) to help better represent the often underrepresented physical and psychological CIH approaches in standard terminologies, and to also apply state-of-the-art natural language processing (NLP) techniques to help recognize them in the biomedical literature. MATERIALS AND METHODS: We constructed the CIHLex by integrating various resources, compiling and integrating data from biomedical literature and relevant sources of knowledge. The Lexicon encompasses 724 unique concepts with 885 corresponding unique terms. We matched these concepts to the Unified Medical Language System (UMLS), and we developed and utilized BERT models comparing their efficiency in CIH named entity recognition to well-established models including MetaMap and CLAMP, as well as the large language model GPT3.5-turbo. RESULTS: Of the 724 unique concepts in CIHLex, 27.2% could be matched to at least one term in the UMLS. About 74.9% of the mapped UMLS Concept Unique Identifiers were categorized as "Therapeutic or Preventive Procedure." Among the models applied to CIH named entity recognition, BLUEBERT delivered the highest macro-average F1-score of 0.91, surpassing other models. CONCLUSION: Our CIHLex significantly augments representation of CIH approaches in biomedical literature. Demonstrating the utility of advanced NLP models, BERT notably excelled in CIH entity recognition. These results highlight promising strategies for enhancing standardization and recognition of CIH terminology in biomedical contexts.


Assuntos
Algoritmos , Unified Medical Language System , Processamento de Linguagem Natural , Idioma
4.
J Am Med Inform Assoc ; 30(12): 1895-1903, 2023 11 17.
Artigo em Inglês | MEDLINE | ID: mdl-37615994

RESUMO

OBJECTIVE: Outcomes are important clinical study information. Despite progress in automated extraction of PICO (Population, Intervention, Comparison, and Outcome) entities from PubMed, rarely are these entities encoded by standard terminology to achieve semantic interoperability. This study aims to evaluate the suitability of the Unified Medical Language System (UMLS) and SNOMED-CT in encoding outcome concepts in randomized controlled trial (RCT) abstracts. MATERIALS AND METHODS: We iteratively developed and validated an outcome annotation guideline and manually annotated clinically significant outcome entities in the Results and Conclusions sections of 500 randomly selected RCT abstracts on PubMed. The extracted outcomes were fully, partially, or not mapped to the UMLS via MetaMap based on established heuristics. Manual UMLS browser search was performed for select unmapped outcome entities to further differentiate between UMLS and MetaMap errors. RESULTS: Only 44% of 2617 outcome concepts were fully covered in the UMLS, among which 67% were complex concepts that required the combination of 2 or more UMLS concepts to represent them. SNOMED-CT was present as a source in 61% of the fully mapped outcomes. DISCUSSION: Domains such as Metabolism and Nutrition, and Infections and Infectious Diseases need expanded outcome concept coverage in the UMLS and MetaMap. Future work is warranted to similarly assess the terminology coverage for P, I, C entities. CONCLUSION: Computational representation of clinical outcomes is important for clinical evidence extraction and appraisal and yet faces challenges from the inherent complexity and lack of coverage of these concepts in UMLS and SNOMED-CT, as demonstrated in this study.


Assuntos
Systematized Nomenclature of Medicine , Unified Medical Language System , PubMed , Ensaios Clínicos Controlados Aleatórios como Assunto
5.
J Am Med Inform Assoc ; 30(12): 1887-1894, 2023 11 17.
Artigo em Inglês | MEDLINE | ID: mdl-37528056

RESUMO

OBJECTIVE: Use heuristic, deep learning (DL), and hybrid AI methods to predict semantic group (SG) assignments for new UMLS Metathesaurus atoms, with target accuracy ≥95%. MATERIALS AND METHODS: We used train-test datasets from successive 2020AA-2022AB UMLS Metathesaurus releases. Our heuristic "waterfall" approach employed a sequence of 7 different SG prediction methods. Atoms not qualifying for a method were passed on to the next method. The DL approach generated BioWordVec and SapBERT embeddings for atom names, BioWordVec embeddings for source vocabulary names, and BioWordVec embeddings for atom names of the second-to-top nodes of an atom's source hierarchy. We fed a concatenation of the 4 embeddings into a fully connected multilayer neural network with an output layer of 15 nodes (one for each SG). For both approaches, we developed methods to estimate the probability that their predicted SG for an atom would be correct. Based on these estimations, we developed 2 hybrid SG prediction methods combining the strengths of heuristic and DL methods. RESULTS: The heuristic waterfall approach accurately predicted 94.3% of SGs for 1 563 692 new unseen atoms. The DL accuracy on the same dataset was also 94.3%. The hybrid approaches achieved an average accuracy of 96.5%. CONCLUSION: Our study demonstrated that AI methods can predict SG assignments for new UMLS atoms with sufficient accuracy to be potentially useful as an intermediate step in the time-consuming task of assigning new atoms to UMLS concepts. We showed that for SG prediction, combining heuristic methods and DL methods can produce better results than either alone.


Assuntos
Aprendizado Profundo , Heurística , Semântica , Unified Medical Language System , Redes Neurais de Computação
6.
Front Public Health ; 11: 1169222, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37377542

RESUMO

Introduction: Emergency Medical Language Services (EMLS) have played a crucial role in the COVID-19 pandemic. Research on the quality and its influencing factors of EMLS is necessary. Methods: This study used the SERvice QUALity (SERVQUAL) model to determine factors affecting the quality of EMLS during the pandemic. An online questionnaire was completed by 206 participants who received the service in 2021-2022. Structural Equation Modeling (SEM) indicated that the service provider and service process significantly influenced the Service Results. Results: In the service process, the evaluation of service content and responsiveness were highly correlated, and both factors significantly affected user satisfaction. In the service provider, tangibility and reliability were highly correlated. The key factors for user willingness to recommend the service were service content and tangibility. Discussion: Based on the results of the data analysis, it can be inferred that EMLS should be improved and upgraded in terms of service organization, talent cultivation, and service channel expansion. To enhance service organization, an emergency medical language team should establish a close collaboration with local medical institutions and government departments, and an EMLS center should be established with the support of hospitals, government, or civil organizations.


Assuntos
COVID-19 , Saúde Pública , Humanos , Reprodutibilidade dos Testes , Pandemias , COVID-19/epidemiologia , Idioma
7.
Int J Med Inform ; 170: 104928, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36442443

RESUMO

OBJECTIVE: Study identification refers to formalizing an effective search over biomedical databases for retrieving all eligible evidence for a systematic review. Manual construction of queries, where a user submit a search query for which a biomedical search system such as PubMed would identify the most relevant documents, has been recognized as a very costly step in conducting systematic reviews. The objective of this paper is to present an automatic query generation approach to reduce the time and labor cost of manual biomedical study identification. MATERIALS AND METHODS: The evaluation benchmark is the widely adopted CLEF 2018 Technology Assisted Reviews (TAR) collection, with 72 systematic reviews on Diagnosis Test Accuracy. We use and fine-tune pre-trained language models for generating high-level key-phrases and their dense embeddings. We constructed and published a dataset consists of almost one million PubMed articles' abstracts and their keywords for fine-tuning pre-trained language models. We also use concepts that are represented in the Unified Medical Language System, UMLS, for query expansion and embedding generation. We exploit and test different clustering methods, namely Agglomerative clustering, Affinity Propagation, and K-Means, over the generated embeddings to form query clauses. RESULTS: Our proposed methods outperform existing state-of-the-art automatic query generation models across Precision (0.0821 compared with 0.005), Recall (0.9676 compared with 0.878), and F-measures (0.2898 compared with 0.0356 in F3 measure). In addition, some of the proposed methods can even outperform the performance of the manually crafted queries in some specific measures. CONCLUSION: The proposed model in this paper can be utilized to form an effective initial search query that can be further refined and updated by human reviewers for achieving the desired performance. For future work, we would like to explore the application of the presented query formalization methods in existing study identification methodologies and techniques, especially those that iteratively train machine learning models based on the domain experts' feedback on the relevancy of the retrieved studies.


Assuntos
Semântica , Unified Medical Language System , Humanos , PubMed , Aprendizado de Máquina , Retroalimentação
8.
Sichuan Da Xue Xue Bao Yi Xue Ban ; 54(6): 1263-1268, 2023 Nov 20.
Artigo em Chinês | MEDLINE | ID: mdl-38162053

RESUMO

Objective: In this study, we used artificial intelligence (AI) technology to explore for automated medical record quality control methods, standardize the process for medical record documentation, and deal with the drawbacks of manually implemented quality control. Methods: In this study, we constructed a medical record quality control system based on AI. We first designed and built, for the system, a quality control rule base based on authoritative standards and expert opinions. Then, medical records data were automatically collected through a data acquisition engine and were converted into structured data through a post-structured engine. Finally, the medical record quality control engine was combined with the rule base to analyze the data, identify quality problems, and realize automated intelligent quality control. This system was applied to the quality control of medical records and five quality control points were selected, including similarities in the history of the present illness, defects in the description of chief complaints, incomplete initial diagnosis, missing in formation in the history of menstruation, marriage, and childbirth, and mismatch between the chief complaints and the history of the present illness. We randomly selected 2 918 medical records of patients discharged in January 2022 to conduct AI quality control. Then we organized medical record quality control experts to conduct an accuracy review, made a comparison with previous manual quality control records, and analyzed the results. The number of quality problems that were verified in the accuracy review was taken as the gold standard and receiver operating characteristic (ROC) curves were drawn for the 5 quality control points. Results: According to the accuracy review performed by medical record quality control experts, the accuracy of AI quality control reached 89.57%. For the sampled medical records, the results of AI quality control were compared with those of previous manually performed quality control and only one problem detected by manual quality control of the sampled medical records was not detected by the AI quality control system. The number of medical record quality problems correctly detected by AI quality control was about 2.97 times that of manual quality control. Analysis of the ROC curves showed that the AUC of the five quality control points of the AI quality control system were statistically significant (P<0.05) and all the AUC values approximated or exceeded 0.9. In contrast, results obtained through manually performed quality control found significant AUC (0.797) for only one quality control point-similarities in the history of present illness (P<0.05). Comparison of the AUC values of the two quality control methods showed that AI quality control system had an advantage over manually performed quality control for the five quality control points. Conclusion: Through the application of medical record quality control system based on AI, efficient full quality control of medical record documentation can be achieved and the detection rate of quality problems can be effectively improved. In addition, the system helps save manpower and improve the quality of medical record documentation.


Assuntos
Inteligência Artificial , Prontuários Médicos , Feminino , Humanos , Curva ROC , Controle de Qualidade
9.
Stud Health Technol Inform ; 299: 217-222, 2022 Nov 03.
Artigo em Inglês | MEDLINE | ID: mdl-36325866

RESUMO

Mapping clinical attributes from hospital information systems to standardized terminologies may allow their scientific reuse for multicenter studies. The Unified Medical Language System (UMLS) defines synonyms in different terminologies, which could be valuable for achieving semantic interoperability between different sites. Here we aim to explore the potential relevance of UMLS concepts and associated semantic relations for widely used clinical terminologies in a German university hospital. To semi-automatically examine a sample of the 200 most frequent codes from Erlangen University Hospital for three relevant terminologies, we implemented a script that queries their UMLS representation and associated mappings via a programming interface. We found that 94% of frequent diagnostic codes were available in UMLS, and that most of these codes could be mapped to other terminologies such as SNOMED CT. We observed that all examined laboratory codes were represented in UMLS, and that various translations to other languages were available for these concepts. The classification that is most widely used in German hospital for documenting clinical procedures was not originally represented in UMLS, but external mappings to SNOMED CT allowed identifying UMLS entries for 90.5% of frequent codes. Future research could extend this investigation to other code sets and terminologies, or study the potential utility of available mappings for specific applications.


Assuntos
Systematized Nomenclature of Medicine , Unified Medical Language System , Humanos , Semântica , Idioma , Traduções
10.
J Med Internet Res ; 24(11): e40361, 2022 11 25.
Artigo em Inglês | MEDLINE | ID: mdl-36427233

RESUMO

BACKGROUND: Electronic medical records (EMRs) of patients with lung cancer (LC) capture a variety of health factors. Understanding the distribution of these factors will help identify key factors for risk prediction in preventive screening for LC. OBJECTIVE: We aimed to generate an integrated biomedical graph from EMR data and Unified Medical Language System (UMLS) ontology for LC, and to generate an LC health factor distribution from a hospital EMR of approximately 1 million patients. METHODS: The data were collected from 2 sets of 1397 patients with and those without LC. A patient-centered health factor graph was plotted with 108,000 standardized data, and a graph database was generated to integrate the graphs of patient health factors and the UMLS ontology. With the patient graph, we calculated the connection delta ratio (CDR) for each of the health factors to measure the relative strength of the factor's relationship to LC. RESULTS: The patient graph had 93,000 relations between the 2794 patient nodes and 650 factor nodes. An LC graph with 187 related biomedical concepts and 188 horizontal biomedical relations was plotted and linked to the patient graph. Searching the integrated biomedical graph with any number or category of health factors resulted in graphical representations of relationships between patients and factors, while searches using any patient presented the patient's health factors from the EMR and the LC knowledge graph (KG) from the UMLS in the same graph. Sorting the health factors by CDR in descending order generated a distribution of health factors for LC. The top 70 CDR-ranked factors of disease, symptom, medical history, observation, and laboratory test categories were verified to be concordant with those found in the literature. CONCLUSIONS: By collecting standardized data of thousands of patients with and those without LC from the EMR, it was possible to generate a hospital-wide patient-centered health factor graph for graph search and presentation. The patient graph could be integrated with the UMLS KG for LC and thus enable hospitals to bring continuously updated international standard biomedical KGs from the UMLS for clinical use in hospitals. CDR analysis of the graph of patients with LC generated a CDR-sorted distribution of health factors, in which the top CDR-ranked health factors were concordant with the literature. The resulting distribution of LC health factors can be used to help personalize risk evaluation and preventive screening recommendations.


Assuntos
Registros Eletrônicos de Saúde , Neoplasias Pulmonares , Humanos , Estudos Retrospectivos , Unified Medical Language System , Neoplasias Pulmonares/epidemiologia , Hospitais
11.
JMIR Med Inform ; 10(9): e37812, 2022 Sep 13.
Artigo em Inglês | MEDLINE | ID: mdl-36099001

RESUMO

BACKGROUND: Severe drug hypersensitivity reactions (DHRs) refer to allergic reactions caused by drugs and usually present with severe skin rashes and internal damage as the main symptoms. Reporting of severe DHRs in hospitals now solely occurs through spontaneous reporting systems (SRSs), which clinicians in charge operate. An automatic identification system scrutinizes clinical notes and reports potential severe DHR cases. OBJECTIVE: The goal of the research was to develop an automatic identification system for mining severe DHR cases and discover more DHR cases for further study. The proposed method was applied to 9 years of data in pediatrics electronic health records (EHRs) of Beijing Children's Hospital. METHODS: The phenotyping task was approached as a document classification problem. A DHR dataset containing tagged documents for training was prepared. Each document contains all the clinical notes generated during 1 inpatient visit in this data set. Document-level tags correspond to DHR types and a negative category. Strategies were evaluated for long document classification on the openly available National NLP Clinical Challenges 2016 smoking task. Four strategies were evaluated in this work: document truncation, hierarchy representation, efficient self-attention, and key sentence selection. In-domain and open-domain pretrained embeddings were evaluated on the DHR dataset. An automatic grid search was performed to tune statistical classifiers for the best performance over the transformed data. Inference efficiency and memory requirements of the best performing models were analyzed. The most efficient model for mining DHR cases from millions of documents in the EHR system was run. RESULTS: For long document classification, key sentence selection with guideline keywords achieved the best performance and was 9 times faster than hierarchy representation models for inference. The best model discovered 1155 DHR cases in Beijing Children's Hospital EHR system. After double-checking by clinician experts, 357 cases of severe DHRs were finally identified. For the smoking challenge, our model reached the record of state-of-the-art performance (94.1% vs 94.2%). CONCLUSIONS: The proposed method discovered 357 positive DHR cases from a large archive of EHR records, about 90% of which were missed by SRSs. SRSs reported only 36 cases during the same period. The case analysis also found more suspected drugs associated with severe DHRs in pediatrics.

12.
Stud Health Technol Inform ; 290: 116-119, 2022 Jun 06.
Artigo em Inglês | MEDLINE | ID: mdl-35672982

RESUMO

BACKGROUND: Terminology integration at the scale of the UMLS Metathesaurus (i.e., over 200 source vocabularies) remains challenging despite recent advances in ontology alignment techniques based on neural networks. OBJECTIVES: To improve the performance of the neural network architecture we developed for predicting synonymy between terms in the UMLS Metathesaurus, specifically through the addition of an attention layer. METHODS: We modify our original Siamese neural network architecture with Long-Short Term Memory (LSTM) and create two variants by (1) adding an attention layer on top of the existing LSTM, and (2) replacing the existing LSTM layer by an attention layer. RESULTS: Adding an attention layer to the LSTM layer resulted in increasing precision to 92.38% (+3.63%) and F1 score to 91,74% (+1.13%), with limited impact on recall at 91.12% (-1.42%). CONCLUSIONS: Although limited, this increase in precision substantially reduces the false positive rate and minimizes the need for manual curation.


Assuntos
Redes Neurais de Computação , Unified Medical Language System , Atenção
13.
Stud Health Technol Inform ; 294: 357-361, 2022 May 25.
Artigo em Inglês | MEDLINE | ID: mdl-35612096

RESUMO

The distributed nature of our digital healthcare and the rapid emergence of new data sources prevents a compelling overview and the joint use of new data. Data integration, e.g., with metadata and semantic annotations, is expected to overcome this challenge. In this paper, we present an approach to predict UMLS codes to given German metadata using recurrent neural networks. The augmentation of the training dataset using the Medical Subject Headings (MeSH), particularly the German translations, also improved the model accuracy. The model demonstrates robust performance with 75% accuracy and aims to show that increasingly sophisticated machine learning tools can already play a significant role in data integration.


Assuntos
Metadados , Semântica , Armazenamento e Recuperação da Informação , Medical Subject Headings , Redes Neurais de Computação , Unified Medical Language System
14.
Inf Serv Use ; 42(1): 95-106, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35600122

RESUMO

Donald A.B. Lindberg M.D. arrived at the U.S. National Library of Medicine in 1984 and quickly launched the Unified Medical Language System (UMLS) research and development project to help computers understand biomedical meaning and to enable retrieval and integration of information from disparate electronic sources, e.g., patient records, biomedical literature, knowledge bases. This chapter focuses on how Lindberg's thinking, preferred ways of working, and decision-making guided UMLS goals and development and on what made the UMLS markedly "new and different" and ahead of its time.

15.
Stud Health Technol Inform ; 288: 100-112, 2022 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-35102832

RESUMO

Donald A.B. Lindberg M.D. arrived at the U.S. National Library of Medicine in 1984 and quickly launched the Unified Medical Language System (UMLS) research and development project to help computer understand biomedical meaning and to enable retrieval and integration of information from disparate electronic sources, e.g., patient records, biomedical literature, knowledge bases. This chapter focuses on how Lindberg's thinking, preferred ways of working, and decision-making guided UMLS goals and development and on what made the UMLS markedly "new and different" and ahead of its time.


Assuntos
Bases de Conhecimento , Unified Medical Language System , Humanos , National Library of Medicine (U.S.) , Estados Unidos
16.
Artif Intell Med ; 120: 102167, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-34629150

RESUMO

Biomedical natural language processing (NLP) has an important role in extracting consequential information in medical discharge notes. Detecting meaningful features from unstructured notes is a challenging task in medical document classification. The domain specific phrases and different synonyms within the medical documents make it hard to analyze them. Analyzing clinical notes becomes more challenging for short documents like abstract texts. All of these can result in poor classification performance, especially when there is a shortage of the clinical data in real life. Two new approaches (an ontology-guided approach and a combined ontology-based with dictionary-based approach) are suggested for augmenting medical data to enrich training data. Three different deep learning approaches are used to evaluate the classification performance of the proposed methods. The obtained results show that the proposed methods improved the classification accuracy in clinical notes classification.


Assuntos
Aprendizado de Máquina , Processamento de Linguagem Natural
17.
Comput Struct Biotechnol J ; 19: 4559-4573, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34471499

RESUMO

Drug repurposing has become a widely used strategy to accelerate the process of finding treatments. While classical de novo drug development involves high costs, risks, and time-consuming paths, drug repurposing allows to reuse already-existing and approved drugs for new indications. Numerous research has been carried out in this field, both in vitro and in silico. Computational drug repurposing methods make use of modern heterogeneous biomedical data to identify and prioritize new indications for old drugs. In the current paper, we present a new complete methodology to evaluate new potentially repurposable drugs based on disease-gene and disease-phenotype associations, identifying significant differences between repurposing and non-repurposing data. We have collected a set of known successful drug repurposing case studies from the literature and we have analysed their dissimilarities with other biomedical data not necessarily participating in repurposing processes. The information used has been obtained from the DISNET platform. We have performed three analyses (at the genetical, phenotypical, and categorization levels), to conclude that there is a statistically significant difference between actual repurposing-related information and non-repurposing data. The insights obtained could be relevant when suggesting new potential drug repurposing hypotheses.

18.
J Am Med Inform Assoc ; 28(10): 2093-2100, 2021 09 18.
Artigo em Inglês | MEDLINE | ID: mdl-34363664

RESUMO

OBJECTIVE: De-identification is a fundamental task in electronic health records to remove protected health information entities. Deep learning models have proven to be promising tools to automate de-identification processes. However, when the target domain (where the model is applied) is different from the source domain (where the model is trained), the model often suffers a significant performance drop, commonly referred to as domain adaptation issue. In de-identification, domain adaptation issues can make the model vulnerable for deployment. In this work, we aim to close the domain gap by leveraging unlabeled data from the target domain. MATERIALS AND METHODS: We introduce a self-training framework to address the domain adaptation issue by leveraging unlabeled data from the target domain. We validate the effectiveness on 4 standard de-identification datasets. In each experiment, we use a pair of datasets: labeled data from the source domain and unlabeled data from the target domain. We compare the proposed self-training framework with supervised learning that directly deploys the model trained on the source domain. RESULTS: In summary, our proposed framework improves the F1-score by 5.38 (on average) when compared with direct deployment. For example, using i2b2-2014 as the training dataset and i2b2-2006 as the test, the proposed framework increases the F1-score from 76.61 to 85.41 (+8.8). The method also increases the F1-score by 10.86 for mimic-radiology and mimic-discharge. CONCLUSION: Our work demonstrates an effective self-training framework to boost the domain adaptation performance for the de-identification task for electronic health records.


Assuntos
Anonimização de Dados , Registros Eletrônicos de Saúde , Humanos , Alta do Paciente
19.
J Surg Res ; 268: 552-561, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34464893

RESUMO

BACKGROUND: The Unified Medical Language System (UMLS) maps relationships between and within >100 biomedical vocabularies, including Current Procedural Terminology (CPT) codes, creating a powerful knowledge resource which can accelerate clinical research. METHODS: We used synonymy and concepts relating hierarchical structure of CPT codes within the UMLS, (1) guiding surgical experts in expanding the Operative Stress Score (OSS) from 565 originally rated CPT codes to additional, 1,853 related procedures; (2) establishing validity of the association between the added OSS ratings and 30-day outcomes in VASQIP (2015-2018). RESULTS: The UMLS Metathesaurus and Semantic Network was converted into an interactive graph database (https://github.com/dbmi-pitt/UMLS-Graph) delineating ontology relatedness. From this UMLS-graph, the CPT hierarchy was queried obtaining all paths from each code to the hierarchical apex. Of 1,853 added ratings, 43% and 76% were siblings and cousins of original OSS CPT codes. Of 857,577 VASQIP cases (mean age, 64±11years; 91% male; 75% white), 786,122 (92%) and 71,455 (8%) were rated in the original and added OSS. Compared to original, added OSS cases included more females (14% versus 9%) and frail patients (25% versus 19%) undergoing high stress procedures (11% versus 8%; all P <.001). Postoperative mortality consistently increased with OSS. Very low stress procedures had <0.5% (original, 0.4% [95%CI, 0.4%-0.5%] versus added, 0.9% [95%CI, 0.6%-1.2%]) and very high 3.8% (original, 3.5% [95%CI, 3.0%-4.0%] versus added, 5.8% [95%CI, 4.6-7.3%]) mortality rates. CONCLUSIONS: The synonymy and concepts relating biomedical data within the UMLS can be abstracted and efficiently used to expand the utility of existing clinical research tools.


Assuntos
Indexação e Redação de Resumos , Unified Medical Language System , Idoso , Bases de Dados Factuais , Feminino , Humanos , Masculino , Pessoa de Meia-Idade
20.
JMIR Med Inform ; 9(8): e20675, 2021 Aug 27.
Artigo em Inglês | MEDLINE | ID: mdl-34236337

RESUMO

BACKGROUND: The Unified Medical Language System (UMLS) has been a critical tool in biomedical and health informatics, and the year 2021 marks its 30th anniversary. The UMLS brings together many broadly used vocabularies and standards in the biomedical field to facilitate interoperability among different computer systems and applications. OBJECTIVE: Despite its longevity, there is no comprehensive publication analysis of the use of the UMLS. Thus, this review and analysis is conducted to provide an overview of the UMLS and its use in English-language peer-reviewed publications, with the objective of providing a comprehensive understanding of how the UMLS has been used in English-language peer-reviewed publications over the last 30 years. METHODS: PubMed, ACM Digital Library, and the Nursing & Allied Health Database were used to search for studies. The primary search strategy was as follows: UMLS was used as a Medical Subject Headings term or a keyword or appeared in the title or abstract. Only English-language publications were considered. The publications were screened first, then coded and categorized iteratively, following the grounded theory. The review process followed the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. RESULTS: A total of 943 publications were included in the final analysis. Moreover, 32 publications were categorized into 2 categories; hence the total number of publications before duplicates are removed is 975. After analysis and categorization of the publications, UMLS was found to be used in the following emerging themes or areas (the number of publications and their respective percentages are given in parentheses): natural language processing (230/975, 23.6%), information retrieval (125/975, 12.8%), terminology study (90/975, 9.2%), ontology and modeling (80/975, 8.2%), medical subdomains (76/975, 7.8%), other language studies (53/975, 5.4%), artificial intelligence tools and applications (46/975, 4.7%), patient care (35/975, 3.6%), data mining and knowledge discovery (25/975, 2.6%), medical education (20/975, 2.1%), degree-related theses (13/975, 1.3%), digital library (5/975, 0.5%), and the UMLS itself (150/975, 15.4%), as well as the UMLS for other purposes (27/975, 2.8%). CONCLUSIONS: The UMLS has been used successfully in patient care, medical education, digital libraries, and software development, as originally planned, as well as in degree-related theses, the building of artificial intelligence tools, data mining and knowledge discovery, foundational work in methodology, and middle layers that may lead to advanced products. Natural language processing, the UMLS itself, and information retrieval are the 3 most common themes that emerged among the included publications. The results, although largely related to academia, demonstrate that UMLS achieves its intended uses successfully, in addition to achieving uses broadly beyond its original intentions.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...