Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 20
Filter
1.
Stud Health Technol Inform ; 294: 868-869, 2022 May 25.
Article in English | MEDLINE | ID: mdl-35612229

ABSTRACT

We address the problem of semantic labeling of terms in two French medical corpora with the subset of the UMLS. We perform two experiments relying on the structure of words and terms, and on their context: 1) the semantic label of already identified terms is predicted; 2) the terms are detected in raw texts and their semantic label is predicted. Our results show over 0.90 F-measure.


Subject(s)
Semantics , Unified Medical Language System , Natural Language Processing
2.
Stud Health Technol Inform ; 281: 253-257, 2021 May 27.
Article in English | MEDLINE | ID: mdl-34042744

ABSTRACT

This paper presents a prototype for the visualization of food-drug interactions implemented in the MIAM project, whose objective is to develop methods for the extraction and representation of these interactions and to make them available in the Thériaque database. The prototype provides users with a graphical visualization showing the hierarchies of drugs and foods in front of each other and the links between them representing the existing interactions as well as additional details about them, including the number of articles reporting the interaction. The prototype is interactive in the following ways: hierarchies can be easily folded and unfolded, a filter can be applied to view only certain types of interactions, and details about a given interaction are displayed when the mouse is moved over the corresponding link. Future work includes proposing a version more suitable for non-health professional users and the representation of the food hierarchy based on a reference classification.


Subject(s)
Food-Drug Interactions , Animals , Databases, Factual , Mice
3.
Stud Health Technol Inform ; 264: 1327-1331, 2019 Aug 21.
Article in English | MEDLINE | ID: mdl-31438141

ABSTRACT

Detection of difficult for understanding words is a crucial task for ensuring the proper understanding of medical texts such as diagnoses and drug instructions. We propose to combine supervised machine learning algorithms using various features with word embeddings which contain context information of words. Data in French are manually cross-annotated by seven annotators. On the basis of these data, we propose cross-validation scenarios in order to test the generalization ability of models to detect the difficulty of medical words. On data provided by seven annotators, we show that the models are generalizable from one annotator to another.


Subject(s)
Algorithms , Comprehension , Language , Natural Language Processing , Supervised Machine Learning
4.
Stud Health Technol Inform ; 247: 730-734, 2018.
Article in English | MEDLINE | ID: mdl-29678057

ABSTRACT

Exchanges between diabetic patients on discussion fora permit to study their understanding of their disorder, their behavior and needs when facing health problems. When analyzing these exchanges and behavior, it is necessary to collect information on user profile. We present an approach combining lexicon and super-vised classifiers for the identification of age and gender of contributors, their disorders and relation between contributor and patient. According to parameters of the method, precision is between 100% for gender and 53.48% for disorders.


Subject(s)
Data Mining , Diabetes Mellitus , Patients , Humans , Internet , Social Media
5.
CEUR Workshop Proc ; 1609: 28-42, 2016 Sep.
Article in English | MEDLINE | ID: mdl-29308065

ABSTRACT

This paper reports on Task 2 of the 2016 CLEF eHealth evaluation lab which extended the previous information extraction tasks of ShARe/CLEF eHealth evaluation labs. The task continued with named entity recognition and normalization in French narratives, as offered in CLEF eHealth 2015. Named entity recognition involved ten types of entities including disorders that were defined according to Semantic Groups in the Unified Medical Language System® (UMLS®), which was also used for normalizing the entities. In addition, we introduced a large-scale classification task in French death certificates, which consisted of extracting causes of death as coded in the International Classification of Diseases, tenth revision (ICD10). Participant systems were evaluated against a blind reference standard of 832 titles of scientific articles indexed in MEDLINE, 4 drug monographs published by the European Medicines Agency (EMEA) and 27,850 death certificates using Precision, Recall and F-measure. In total, seven teams participated, including five in the entity recognition and normalization task, and five in the death certificate coding task. Three teams submitted their systems to our newly offered reproducibility track. For entity recognition, the highest performance was achieved on the EMEA corpus, with an overall F-measure of 0.702 for plain entities recognition and 0.529 for normalized entity recognition. For entity normalization, the highest performance was achieved on the MEDLINE corpus, with an overall F-measure of 0.552. For death certificate coding, the highest performance was 0.848 F-measure.

6.
Stud Health Technol Inform ; 216: 815-20, 2015.
Article in English | MEDLINE | ID: mdl-26262165

ABSTRACT

With the recent and intensive research in the biomedical area, the knowledge accumulated is disseminated through various knowledge bases. Links between these knowledge bases are needed in order to use them jointly. Linked Data, SPARQL language, and interfaces in Natural Language question-answering provide interesting solutions for querying such knowledge bases. We propose a method for translating natural language questions in SPARQL queries. We use Natural Language Processing tools, semantic resources, and the RDF triples description. The method is designed on 50 questions over 3 biomedical knowledge bases, and evaluated on 27 questions. It achieves 0.78 F-measure on the test set. The method for translating natural language questions into SPARQL queries is implemented as Perl module available at http://search.cpan.org/ thhamon/RDF-NLP-SPARQLQuery.


Subject(s)
Information Storage and Retrieval , Natural Language Processing , Databases, Factual , Humans , Information Storage and Retrieval/methods , Semantics
7.
Stud Health Technol Inform ; 210: 80-4, 2015.
Article in English | MEDLINE | ID: mdl-25991106

ABSTRACT

While patients can freely access their Electronic Health Records or online health information, they may not be able to correctly understand the content of these documents. One of the challenges is related to the difference between expert and non-expert languages. We propose to investigate this issue within the Information Retrieval field. The patient queries have to be associated with the corresponding expert documents, that provide trustworthy information. Our approach relies on a state-of-the-art IR system called Indri and on semantic resources. Different query expansion strategies are explored. Our system shows up to 0.6740 P@10, up to 0.7610 R@10, and up to 0.6793 NDCG@10.


Subject(s)
Consumer Health Information/organization & administration , Data Mining/methods , Electronic Health Records/organization & administration , Health Information Systems/organization & administration , Natural Language Processing , User-Computer Interface , Machine Learning , Patient Access to Records
8.
J Biomed Semantics ; 5: 18, 2014.
Article in English | MEDLINE | ID: mdl-24739596

ABSTRACT

Pharmacovigilance is the activity related to the collection, analysis and prevention of adverse drug reactions (ADRs) induced by drugs. This activity is usually performed within dedicated databases (national, European, international...), in which the ADRs declared for patients are usually coded with a specific controlled terminology MedDRA (Medical Dictionary for Drug Regulatory Activities). Traditionally, the detection of adverse drug reactions is performed with data mining algorithms, while more recently the groupings of close ADR terms are also being exploited. The Standardized MedDRA Queries (SMQs) have become a standard in pharmacovigilance. They are created manually by international boards of experts with the objective to group together the MedDRA terms related to a given safety topic. Within the MedDRA version 13, 84 SMQs exist, although several important safety topics are not yet covered. The objective of our work is to propose an automatic method for assisting the creation of SMQs using the clustering of semantically close MedDRA terms. The experimented method relies on semantic approaches: semantic distance and similarity algorithms, terminology structuring methods and term clustering. The obtained results indicate that the proposed unsupervised methods appear to be complementary for this task, they can generate subsets of the existing SMQs and make this process systematic and less time consuming.

9.
Biomed Inform Insights ; 6(Suppl 1): 51-62, 2013.
Article in English | MEDLINE | ID: mdl-24052691

ABSTRACT

Medical entity recognition is currently generally performed by data-driven methods based on supervised machine learning. Expert-based systems, where linguistic and domain expertise are directly provided to the system are often combined with data-driven systems. We present here a case study where an existing expert-based medical entity recognition system, Ogmios, is combined with a data-driven system, Caramba, based on a linear-chain Conditional Random Field (CRF) classifier. Our case study specifically highlights the risk of overfitting incurred by an expert-based system. We observe that it prevents the combination of the 2 systems from obtaining improvements in precision, recall, or F-measure, and analyze the underlying mechanisms through a post-hoc feature-level analysis. Wrapping the expert-based system alone as attributes input to a CRF classifier does boost its F-measure from 0.603 to 0.710, bringing it on par with the data-driven system. The generalization of this method remains to be further investigated.

10.
Stud Health Technol Inform ; 192: 1189, 2013.
Article in English | MEDLINE | ID: mdl-23920963

ABSTRACT

Extraction of information related to the medication is an important task within the biomedical area. Our method is applied to different types of documents in three languages. The results indicate that our approach can efficiently update and enrich the existing drug vocabularies.


Subject(s)
Artificial Intelligence , Databases, Pharmaceutical/classification , Drug Labeling/classification , Natural Language Processing , Pharmaceutical Preparations/classification , Terminology as Topic , Vocabulary, Controlled , Algorithms , Data Mining/methods , England , France , Pattern Recognition, Automated/methods , Semantics , Sweden , Translating
11.
Patient Educ Couns ; 92(2): 197-204, 2013 Aug.
Article in English | MEDLINE | ID: mdl-23769423

ABSTRACT

OBJECTIVE: Automatically analyze the online discussions related to diabetes and extract information on patient skills for managing this disease. METHODS: Two collections of about 7000 and 23,000 messages from online discussion fora and 174 skills from an available taxonomy are processed with Natural Language Processing methods and semantically enriched. Skills are projected on the messages to detect those skills which are mentioned by patients. Quantitative and qualitative evaluation is performed. RESULTS: The method recognizes almost all the aimed skills in fora. The quality of the skills' recognition varies with the method's parameters. Most of the selected messages are relevant to at least one of the associated skills. Manual analysis shows a substantial number of messages is dedicated to daily self-care and psychosocial skills. CONCLUSION: Study of real exchanges between patients leads to a better understanding of their skills in daily self-management of diabetes. PRACTICE IMPLICATIONS: Our experiments can be useful for a better understanding and better knowledge of self-management of diseases by patients. They can also refine existing patient education programs.


Subject(s)
Diabetes Mellitus/therapy , Electronic Mail , Internet , Natural Language Processing , Disease Management , Female , Health Knowledge, Attitudes, Practice , Humans , Male , Self Care
12.
J Am Med Inform Assoc ; 20(5): 820-7, 2013.
Article in English | MEDLINE | ID: mdl-23571851

ABSTRACT

OBJECTIVE: To identify the temporal relations between clinical events and temporal expressions in clinical reports, as defined in the i2b2/VA 2012 challenge. DESIGN: To detect clinical events, we used rules and Conditional Random Fields. We built Random Forest models to identify event modality and polarity. To identify temporal expressions we built on the HeidelTime system. To detect temporal relations, we systematically studied their breakdown into distinct situations; we designed an oracle method to determine the most prominent situations and the most suitable associated classifiers, and combined their results. RESULTS: We achieved F-measures of 0.8307 for event identification, based on rules, and 0.8385 for temporal expression identification. In the temporal relation task, we identified nine main situations in three groups, experimentally confirming shared intuitions: within-sentence relations, section-related time, and across-sentence relations. Logistic regression and Naïve Bayes performed best on the first and third groups, and decision trees on the second. We reached a 0.6231 global F-measure, improving by 7.5 points our official submission. CONCLUSIONS: Carefully hand-crafted rules obtained good results for the detection of events and temporal expressions, while a combination of classifiers improved temporal link prediction. The characterization of the oracle recall of situations allowed us to point at directions where further work would be most useful for temporal relation detection: within-sentence relations and linking History of Present Illness events to the admission date. We suggest that the systematic situation breakdown proposed in this paper could also help improve other systems addressing this task.


Subject(s)
Electronic Health Records , Information Storage and Retrieval/methods , Natural Language Processing , Artificial Intelligence , Humans , Time
13.
Stud Health Technol Inform ; 180: 235-9, 2012.
Article in English | MEDLINE | ID: mdl-22874187

ABSTRACT

Pharmacovigilance is the activity related to the collection, analysis and prevention of adverse drug reactions (ADRs) induced by drugs. It leads to the safety survey of pharmaceutical products. The pharmacovigilance process benefits from the traditional statistical approaches and also from the qualitative information on semantic relations between close ADR terms, such as SMQs or hierarchical levels of MedDRA. In this work, our objective is to detect the semantic relatedness between the ADR MedDRA terms. To achieve this, we combine two approaches: semantic similarity algorithms computed within structured resources and terminology structuring methods applied to a raw list of the MedDRA terms. We compare these methods between them and study their differences and complementarity. The results are evaluated against the gold standard manually compiled within the pharmacovigilance area and also with an expert. The combination of the methods leads to an improved recall.


Subject(s)
Adverse Drug Reaction Reporting Systems , Database Management Systems , Databases, Factual , Drug-Related Side Effects and Adverse Reactions/epidemiology , Natural Language Processing , Pharmacovigilance , Vocabulary, Controlled , Artificial Intelligence , France/epidemiology , Humans
14.
Stud Health Technol Inform ; 160(Pt 2): 964-8, 2010.
Article in English | MEDLINE | ID: mdl-20841827

ABSTRACT

Risk factors discovery and prevention is an active research field within the biomedical domain. Despite abundant existing information on risk factors, as found in bibliographical databases or on several websites, accessing this information may be difficult. Methods from Natural Language Processing and Information Extraction can be helpful to access it more easily. Specifically, we show a procedure for analyzing massive amounts of scientific literature and for detecting linguistically marked associations between pathologies and risk factors. This approach allowed us to extract over 22,000 risk factors and associated pathologies. The performed evaluations pointed out that (1) over 88% of risk factors for coronary heart disease are correct, (2) associated pathologies, when they could be compared to MeSH indexing, are correct in about 70%, and (3) in existing terminologies links between risk factors and their pathologies are seldom recorded.


Subject(s)
Data Mining/standards , Abstracting and Indexing/methods , Databases, Bibliographic , Disease , Medical Subject Headings , Natural Language Processing , Risk Factors , Semantics , United States
15.
Stud Health Technol Inform ; 160(Pt 2): 1015-9, 2010.
Article in English | MEDLINE | ID: mdl-20841837

ABSTRACT

Acquisition and enrichment of lexical resources is an important research area for the computational linguistics. We propose a method for inducing a lexicon of synonyms and for its weighting in order to establish its reliability. The method is based on the analysis of syntactic structure of complex terms. We apply and evaluate the approach on three biomedical terminologies (MeSH, Snomed Int, Snomed CT). Between 7.7 and 33.6% of the induced synonyms are ambiguous and cooccur with other semantic relations. A virtual reference allows to validate 9 to 14% of the induced synonyms.


Subject(s)
Semantics , Linguistics , Medical Subject Headings , Natural Language Processing , Systematized Nomenclature of Medicine
16.
J Am Med Inform Assoc ; 17(5): 549-54, 2010.
Article in English | MEDLINE | ID: mdl-20819862

ABSTRACT

BACKGROUND: Pharmacotherapy is an integral part of any medical care process and plays an important role in the medical history of most patients. Information on medication is crucial for several tasks such as pharmacovigilance, medical decision or biomedical research. OBJECTIVES: Within a narrative text, medication-related information can be buried within other non-relevant data. Specific methods, such as those provided by text mining, must be designed for accessing them, and this is the objective of this study. METHODS: The authors designed a system for analyzing narrative clinical documents to extract from them medication occurrences and medication-related information. The system also attempts to deduce medications not covered by the dictionaries used. RESULTS: Results provided by the system were evaluated within the framework of the I2B2 NLP challenge held in 2009. The system achieved an F-measure of 0.78 and ranked 7th out of 20 participating teams (the highest F-measure was 0.86). The system provided good results for the annotation and extraction of medication names, their frequency, dosage and mode of administration (F-measure over 0.81), while information on duration and reasons is poorly annotated and extracted (F-measure 0.36 and 0.29, respectively). The performance of the system was stable between the training and test sets.


Subject(s)
Electronic Health Records , Information Storage and Retrieval/methods , Natural Language Processing , Pharmaceutical Preparations , Drug Therapy , Humans , Linguistics , Software Design
17.
Methods Inf Med ; 48(2): 149-54, 2009.
Article in English | MEDLINE | ID: mdl-19283312

ABSTRACT

OBJECTIVE: Currently, the use of natural language processing (NLP) approaches in order to improve search and exploration of electronic health records (EHRs) within healthcare information systems is not a common practice. One reason for this is the lack of suitable lexical resources. Indeed, in order to support such tasks, various types of such resources need to be collected or acquired (i.e., morphological, orthographic, synonymous). METHODS: We propose a novel method for the acquisition of synonymy resources. This method is language-independent and relies on existence of structured terminologies. It enables to decipher hidden synonymy relations between simple words and terms on the basis of their syntactic analysis and exploitation of their compositionality. RESULTS: Applied to series of synonym terms from the French subset of the UMLS , the method shows 99% precision. The overlap between thus inferred terms and the existing sparse resources of synonyms is very low. In order to better integrate these resources in an EHR search system, we analyzed a sample of clinical queries submitted by healthcare professionals. CONCLUSIONS: Observation of clinical queries shows that they make a very little use of the query expansion function, and, whenever they do, synonymy relations are rarely involved.


Subject(s)
Hospital Information Systems/organization & administration , Medical Records Systems, Computerized , Natural Language Processing , Terminology as Topic , France , Humans
18.
AMIA Annu Symp Proc ; 2009: 203-7, 2009 Nov 14.
Article in English | MEDLINE | ID: mdl-20351850

ABSTRACT

The motivation of this work is to study the use of speculation markers within scientific writing: this may be useful for discovering whether these markers are regularly spread across biomedical articles and then for establishing the logical structure of articles. To achieve these objectives, we compute associations between article sections and speculation markers. We use machine learning algorithms to show that there are strong and interesting associations between speculation markers and article structure. For instance, strong markers, which strongly influence the presentation of knowledge, are specific to Results, Discussion and Abstract; while non strong markers appear with higher regularity within Material and Methods. Our results indicate that speculation is governed by observable usage rules within scientific articles and can help their structuring.


Subject(s)
Biomedical Research , Writing , Algorithms , Science
19.
AMIA Annu Symp Proc ; : 252-6, 2008 Nov 06.
Article in English | MEDLINE | ID: mdl-18999042

ABSTRACT

Acquisition and enrichment of lexical resources is acknowledged as an important research in the area of computational linguistics. While such resources are often missing, specialized domains, ie biomedicine, propose several structured terminologies. In this paper, we propose a high-quality method for exploiting a structured terminology and inferring elementary synonym lexicon. The method is based on the analysis of syntactic structure of complex terms. The inferred synonym pairs are then profiled according to different clues endogenously computed within the same terminology. We apply and evaluate the approach on the Gene Ontology biomedical terminology.


Subject(s)
Information Storage and Retrieval/methods , Semantics , Vocabulary, Controlled , Computational Biology/methods , Databases as Topic , Gene Expression Profiling/methods , Genomics , Information Management , Microarray Analysis/methods , Molecular Biology/methods
20.
Stud Health Technol Inform ; 136: 809-14, 2008.
Article in English | MEDLINE | ID: mdl-18487831

ABSTRACT

Currently, the use of Natural Language Processing (NLP) approaches in order to improve search and exploration of electronic health records (EHRs) within healthcare information systems is not a common practice. One reason for this is the lack of suitable lexical resources: various types of such resources need to be collected or acquired. In this work, we propose a novel method for the acquisition of synonymous resources. This method is language-independent and relies on existence of structured terminologies. It enables to decipher hidden synonymous relations between simple words and terms on the basis of their syntactic analysis and exploitation of their compositionality. Applied to series of synonym terms from the French subset of the UMLS, the method shows 99% precision. The overlap between thus inferred terms and the existing sparse resources of synonyms is very low.


Subject(s)
Information Storage and Retrieval , Medical Records Systems, Computerized , Multilingualism , Natural Language Processing , Vocabulary, Controlled , Algorithms , Data Collection , Dictionaries as Topic , France , Knowledge Bases , Unified Medical Language System
SELECTION OF CITATIONS
SEARCH DETAIL