Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 10 de 10
Filter
1.
Front Digit Health ; 5: 1195017, 2023.
Article in English | MEDLINE | ID: mdl-37388252

ABSTRACT

Objectives: The objective of this study is the exploration of Artificial Intelligence and Natural Language Processing techniques to support the automatic assignment of the four Response Evaluation Criteria in Solid Tumors (RECIST) scales based on radiology reports. We also aim at evaluating how languages and institutional specificities of Swiss teaching hospitals are likely to affect the quality of the classification in French and German languages. Methods: In our approach, 7 machine learning methods were evaluated to establish a strong baseline. Then, robust models were built, fine-tuned according to the language (French and German), and compared with the expert annotation. Results: The best strategies yield average F1-scores of 90% and 86% respectively for the 2-classes (Progressive/Non-progressive) and the 4-classes (Progressive Disease, Stable Disease, Partial Response, Complete Response) RECIST classification tasks. Conclusions: These results are competitive with the manual labeling as measured by Matthew's correlation coefficient and Cohen's Kappa (79% and 76%). On this basis, we confirm the capacity of specific models to generalize on new unseen data and we assess the impact of using Pre-trained Language Models (PLMs) on the accuracy of the classifiers.

2.
J Med Internet Res ; 23(1): e24594, 2021 01 26.
Article in English | MEDLINE | ID: mdl-33496673

ABSTRACT

BACKGROUND: Interoperability and secondary use of data is a challenge in health care. Specifically, the reuse of clinical free text remains an unresolved problem. The Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) has become the universal language of health care and presents characteristics of a natural language. Its use to represent clinical free text could constitute a solution to improve interoperability. OBJECTIVE: Although the use of SNOMED and SNOMED CT has already been reviewed, its specific use in processing and representing unstructured data such as clinical free text has not. This review aims to better understand SNOMED CT's use for representing free text in medicine. METHODS: A scoping review was performed on the topic by searching MEDLINE, Embase, and Web of Science for publications featuring free-text processing and SNOMED CT. A recursive reference review was conducted to broaden the scope of research. The review covered the type of processed data, the targeted language, the goal of the terminology binding, the method used and, when appropriate, the specific software used. RESULTS: In total, 76 publications were selected for an extensive study. The language targeted by publications was 91% (n=69) English. The most frequent types of documents for which the terminology was used are complementary exam reports (n=18, 24%) and narrative notes (n=16, 21%). Mapping to SNOMED CT was the final goal of the research in 21% (n=16) of publications and a part of the final goal in 33% (n=25). The main objectives of mapping are information extraction (n=44, 39%), feature in a classification task (n=26, 23%), and data normalization (n=23, 20%). The method used was rule-based in 70% (n=53) of publications, hybrid in 11% (n=8), and machine learning in 5% (n=4). In total, 12 different software packages were used to map text to SNOMED CT concepts, the most frequent being Medtex, Mayo Clinic Vocabulary Server, and Medical Text Extraction Reasoning and Mapping System. Full terminology was used in 64% (n=49) of publications, whereas only a subset was used in 30% (n=23) of publications. Postcoordination was proposed in 17% (n=13) of publications, and only 5% (n=4) of publications specifically mentioned the use of the compositional grammar. CONCLUSIONS: SNOMED CT has been largely used to represent free-text data, most frequently with rule-based approaches, in English. However, currently, there is no easy solution for mapping free text to this terminology and to perform automatic postcoordination. Most solutions conceive SNOMED CT as a simple terminology rather than as a compositional bag of ontologies. Since 2012, the number of publications on this subject per year has decreased. However, the need for formal semantic representation of free text in health care is high, and automatic encoding into a compositional ontology could be a solution.


Subject(s)
Natural Language Processing , Systematized Nomenclature of Medicine , Humans
3.
Stud Health Technol Inform ; 270: 48-52, 2020 Jun 16.
Article in English | MEDLINE | ID: mdl-32570344

ABSTRACT

Adverse drug reactions (ADRs) are frequent and associated to significant morbidity, mortality and costs. Therefore, their early detection in the hospital context is vital. Automatic tools could be developed taking into account structured and textual data. In this paper, we present the methodology followed for the manual annotation and automatic classification of discharge letters from a tertiary hospital. The results show that ADRs and causal drugs are explicitly mentioned in the discharge letters and that machine learning algorithms are efficient for the automatic detection of documents containing mentions of ADRs.


Subject(s)
Adverse Drug Reaction Reporting Systems , Algorithms , Drug-Related Side Effects and Adverse Reactions , Pharmacovigilance , Humans , Patient Discharge
4.
Stud Health Technol Inform ; 270: 1098-1102, 2020 Jun 16.
Article in English | MEDLINE | ID: mdl-32570551

ABSTRACT

Understanding motivation and resistance factors affecting citizen participation in health and scientific research allows to find solutions to improve citizen engagement and interest in science. Through a survey, we identified the main factors influencing citizens' participation in scientific research, and their wishes to be more informed. Results show that the respondents' reasons to participate in research were altruistic motivations, in line with other studies carried out in developed countries. The main factor influencing the non-participation is the lack of opportunity, highlighting the importance to better inform citizens about ongoing studies.


Subject(s)
Biomedical Research , Community Participation , Comprehension , Motivation , Surveys and Questionnaires , Switzerland
5.
J Med Internet Res ; 21(6): e12876, 2019 06 13.
Article in English | MEDLINE | ID: mdl-31199327

ABSTRACT

BACKGROUND: Social media platforms constitute a rich data source for natural language processing tasks such as named entity recognition, relation extraction, and sentiment analysis. In particular, social media platforms about health provide a different insight into patient's experiences with diseases and treatment than those found in the scientific literature. OBJECTIVE: This paper aimed to report a study of entities related to chronic diseases and their relation in user-generated text posts. The major focus of our research is the study of biomedical entities found in health social media platforms and their relations and the way people suffering from chronic diseases express themselves. METHODS: We collected a corpus of 17,624 text posts from disease-specific subreddits of the social news and discussion website Reddit. For entity and relation extraction from this corpus, we employed the PKDE4J tool developed by Song et al (2015). PKDE4J is a text mining system that integrates dictionary-based entity extraction and rule-based relation extraction in a highly flexible and extensible framework. RESULTS: Using PKDE4J, we extracted 2 types of entities and relations: biomedical entities and relations and subject-predicate-object entity relations. In total, 82,138 entities and 30,341 relation pairs were extracted from the Reddit dataset. The most highly mentioned entities were those related to oncological disease (2884 occurrences of cancer) and asthma (2180 occurrences). The relation pair anatomy-disease was the most frequent (5550 occurrences), the highest frequent entities in this pair being cancer and lymph. The manual validation of the extracted entities showed a very good performance of the system at the entity extraction task (3682/5151, 71.48% extracted entities were correctly labeled). CONCLUSIONS: This study showed that people are eager to share their personal experience with chronic diseases on social media platforms despite possible privacy and security issues. The results reported in this paper are promising and demonstrate the need for more in-depth studies on the way patients with chronic diseases express themselves on social media platforms.


Subject(s)
Data Mining/methods , Health Information Exchange/standards , Social Media/standards , Chronic Disease , Female , Humans , Male
6.
J Med Internet Res ; 21(5): e13484, 2019 05 31.
Article in English | MEDLINE | ID: mdl-31152528

ABSTRACT

BACKGROUND: The secondary use of health data is central to biomedical research in the era of data science and precision medicine. National and international initiatives, such as the Global Open Findable, Accessible, Interoperable, and Reusable (GO FAIR) initiative, are supporting this approach in different ways (eg, making the sharing of research data mandatory or improving the legal and ethical frameworks). Preserving patients' privacy is crucial in this context. De-identification and anonymization are the two most common terms used to refer to the technical approaches that protect privacy and facilitate the secondary use of health data. However, it is difficult to find a consensus on the definitions of the concepts or on the reliability of the techniques used to apply them. A comprehensive review is needed to better understand the domain, its capabilities, its challenges, and the ratio of risk between the data subjects' privacy on one side, and the benefit of scientific advances on the other. OBJECTIVE: This work aims at better understanding how the research community comprehends and defines the concepts of de-identification and anonymization. A rich overview should also provide insights into the use and reliability of the methods. Six aspects will be studied: (1) terminology and definitions, (2) backgrounds and places of work of the researchers, (3) reasons for anonymizing or de-identifying health data, (4) limitations of the techniques, (5) legal and ethical aspects, and (6) recommendations of the researchers. METHODS: Based on a scoping review protocol designed a priori, MEDLINE was searched for publications discussing de-identification or anonymization and published between 2007 and 2017. The search was restricted to MEDLINE to focus on the life sciences community. The screening process was performed by two reviewers independently. RESULTS: After searching 7972 records that matched at least one search term, 135 publications were screened and 60 full-text articles were included. (1) Terminology: Definitions of the terms de-identification and anonymization were provided in less than half of the articles (29/60, 48%). When both terms were used (41/60, 68%), their meanings divided the authors into two equal groups (19/60, 32%, each) with opposed views. The remaining articles (3/60, 5%) were equivocal. (2) Backgrounds and locations: Research groups were based predominantly in North America (31/60, 52%) and in the European Union (22/60, 37%). The authors came from 19 different domains; computer science (91/248, 36.7%), biomedical informatics (47/248, 19.0%), and medicine (38/248, 15.3%) were the most prevalent ones. (3) Purpose: The main reason declared for applying these techniques is to facilitate biomedical research. (4) Limitations: Progress is made on specific techniques but, overall, limitations remain numerous. (5) Legal and ethical aspects: Differences exist between nations in the definitions, approaches, and legal practices. (6) Recommendations: The combination of organizational, legal, ethical, and technical approaches is necessary to protect health data. CONCLUSIONS: Interest is growing for privacy-enhancing techniques in the life sciences community. This interest crosses scientific boundaries, involving primarily computer science, biomedical informatics, and medicine. The variability observed in the use of the terms de-identification and anonymization emphasizes the need for clearer definitions as well as for better education and dissemination of information on the subject. The same observation applies to the methods. Several legislations, such as the American Health Insurance Portability and Accountability Act (HIPAA) and the European General Data Protection Regulation (GDPR), regulate the domain. Using the definitions they provide could help address the variable use of these two concepts in the research community.


Subject(s)
Biomedical Research/methods , Data Anonymization/standards , Humans , Reproducibility of Results
7.
Stud Health Technol Inform ; 255: 210-214, 2018.
Article in English | MEDLINE | ID: mdl-30306938

ABSTRACT

The aim of this work is to develop and validate an automatic annotation tool for the detection and bone localization of scaphoid fractures in radiology reports. To achieve this goal, a rule-based method using a Natural Language Processing (NLP) tool was applied. Finite state automata were constructed to detect, classify and annotate reports. An evaluation of the method on a manually annotated dataset has shown 96,8% of total match.


Subject(s)
Fractures, Bone , Natural Language Processing , Scaphoid Bone , Supervised Machine Learning , Fractures, Bone/diagnosis , Humans , Research Report , Scaphoid Bone/injuries
8.
Rev Med Suisse ; 14(617): 1559-1563, 2018 Sep 05.
Article in French | MEDLINE | ID: mdl-30226672

ABSTRACT

Digitalization is transforming every aspect of life, it is also transforming deeply medicine. The digitalization era is characterized by a large production of new data streams while existing processes are progressively migrated, such as writing or imaging. The very large and fast-growing amount of data available requires new storage, transport and analytical tools. This paper presents some of them, such as natural language processing, artificial intelligence, and graph databases. A short introduction to the technology of blockchain is also provided, as it is increasingly used in some non-monetary transaction in medicine, such as data exchanges and consent management.


La société en général, la médecine en particulier, sont emportées par la vague de la digitalisation. Ce phénomène s'appuie sur une production d'immenses quantités de données, parfois du fait de la dématérialisation de processus, comme l'écriture ou la photographie, parfois du fait de l'acquisition de nouvelles données, comme la géolocalisation. Ceci nécessite de nouveaux instruments pour le transport, le stockage et le traitement de l'information. Cet article présente quelques enjeux et instruments utilisés, telles les techniques de traitement du langage naturel, de l'intelligence artificielle et des bases de données en graphes. Enfin, nous décrivons brièvement la technologie de la blockchain, qui est de plus en plus proposée en médecine pour des processus non monétaires, tels que l'échange de données ou la gestion du consentement.


Subject(s)
Artificial Intelligence , Big Data
9.
Stud Health Technol Inform ; 247: 710-714, 2018.
Article in English | MEDLINE | ID: mdl-29678053

ABSTRACT

Medical data is multimodal. In particular, it is composed of both structured data and narrative data (free text). Narrative data is a type of unstructured data that, although containing valuable semantic and conceptual information, is rarely reused. In order to assure interoperability of medical data, automatic annotation of free text with SNOMED CT concepts via Natural Language Processing (NLP) tools is proposed. This task is performed using a hybrid multilingual syntactic parser. A preliminary evaluation of the annotation shows encouraging results and confirms that semantic enrichment of patient-related narratives can be accomplished by hybrid NLP systems, heavily based on syntax and lexicosemantic resources.


Subject(s)
Data Curation , Natural Language Processing , Systematized Nomenclature of Medicine , Automation , Humans , Language , Narration , Semantics
10.
Stud Health Technol Inform ; 244: 23-27, 2017.
Article in English | MEDLINE | ID: mdl-29039370

ABSTRACT

Maintaining data security and privacy in an era of cybersecurity is a challenge. The enormous and rapidly growing amount of health-related data available today raises numerous questions about data collection, storage, analysis, comparability and interoperability but also about data protection. The US Health Portability and Accountability Act (HIPAA) of 1996 provides a legal framework and a guidance for using and disclosing health data. Practically, the approach proposed by HIPAA is the de-identification of medical documents by removing certain Protected Health Information (PHI). In this work, a rule-based method for the de-identification of French free-text medical data using Natural Language Processing (NLP) tools will be presented.


Subject(s)
Computer Security , Data Anonymization , Health Insurance Portability and Accountability Act , Confidentiality , Natural Language Processing , United States
SELECTION OF CITATIONS
SEARCH DETAIL
...