Search | VHL Regional Portal

Automatic text classification of prostate cancer malignancy scores in radiology reports using NLP models.

Collado-Montañez, Jaime; López-Úbeda, Pilar; Chizhikova, Mariia; Díaz-Galiano, M Carlos; Ureña-López, L Alfonso; Martín-Noguerol, Teodoro; Luna, Antonio; Martín-Valdivia, M Teresa.

Med Biol Eng Comput ; 2024 Jun 07.

Article in English | MEDLINE | ID: mdl-38844661

ABSTRACT

This paper presents the implementation of two automated text classification systems for prostate cancer findings based on the PI-RADS criteria. Specifically, a traditional machine learning model using XGBoost and a language model-based approach using RoBERTa were employed. The study focused on Spanish-language radiological MRI prostate reports, which has not been explored before. The results demonstrate that the RoBERTa model outperforms the XGBoost model, although both achieve promising results. Furthermore, the best-performing system was integrated into the radiological company's information systems as an API, operating in a real-world environment.

CARES: A Corpus for classification of Spanish Radiological reports.

Chizhikova, Mariia; López-Úbeda, Pilar; Collado-Montañez, Jaime; Martín-Noguerol, Teodoro; Díaz-Galiano, Manuel C; Luna, Antonio; Ureña-López, L Alfonso; Martín-Valdivia, M Teresa.

Comput Biol Med ; 154: 106581, 2023 03.

Article in English | MEDLINE | ID: mdl-36701968

ABSTRACT

This paper presents a new corpus of radiology medical reports written in Spanish and labeled with ICD-10. CARES (Corpus of Anonymised Radiological Evidences in Spanish) is a high-quality corpus manually labeled and reviewed by radiologists that is freely available for the research community on HuggingFace. These types of resources are essential for developing automatic text classification tools as they are necessary for training and tuning computational systems. However, in the medical domain these are very difficult to obtain for different reasons including privacy and data protection issues or the involvement of medical specialists in the generation of these resources. We present a corpus labeled and reviewed by radiologists in their daily practice that is available for research purposes. In addition, after describing the corpus and explaining how it has been generated, a first experimental approach is carried out using several machine learning algorithms based on transformer language models such as BioBERT and RoBERTa to test the validity of this linguistic resource. The best performing classifier achieved 0.8676 micro and 0.8328 macro f1-score and these results encourage us to continue working in this research line.

Subject(s)

Natural Language Processing , Radiology , Language , Machine Learning , Algorithms

Combining word embeddings to extract chemical and drug entities in biomedical literature.

López-Úbeda, Pilar; Díaz-Galiano, Manuel Carlos; Ureña-López, L Alfonso; Martín-Valdivia, M Teresa.

BMC Bioinformatics ; 22(Suppl 1): 599, 2021 Dec 17.

Article in English | MEDLINE | ID: mdl-34920708

ABSTRACT

BACKGROUND: Natural language processing (NLP) and text mining technologies for the extraction and indexing of chemical and drug entities are key to improving the access and integration of information from unstructured data such as biomedical literature. METHODS: In this paper we evaluate two important tasks in NLP: the named entity recognition (NER) and Entity indexing using the SNOMED-CT terminology. For this purpose, we propose a combination of word embeddings in order to improve the results obtained in the PharmaCoNER challenge. RESULTS: For the NER task we present a neural network composed of BiLSTM with a CRF sequential layer where different word embeddings are combined as an input to the architecture. A hybrid method combining supervised and unsupervised models is used for the concept indexing task. In the supervised model, we use the training set to find previously trained concepts, and the unsupervised model is based on a 6-step architecture. This architecture uses a dictionary of synonyms and the Levenshtein distance to assign the correct SNOMED-CT code. CONCLUSION: On the one hand, the combination of word embeddings helps to improve the recognition of chemicals and drugs in the biomedical literature. We achieved results of 91.41% for precision, 90.14% for recall, and 90.77% for F1-score using micro-averaging. On the other hand, our indexing system achieves a 92.67% F1-score, 92.44% for recall, and 92.91% for precision. With these results in a final ranking, we would be in the first position.

Subject(s)

Information Storage and Retrieval , Medical Informatics , Pharmaceutical Preparations , Medical Informatics/methods , Semantics , Unified Medical Language System

Automatic medical protocol classification using machine learning approaches.

López-Úbeda, Pilar; Díaz-Galiano, Manuel Carlos; Martín-Noguerol, Teodoro; Luna, Antonio; Ureña-López, L Alfonso; Martín-Valdivia, M Teresa.

Comput Methods Programs Biomed ; 200: 105939, 2021 Mar.

Article in English | MEDLINE | ID: mdl-33486337

ABSTRACT

BACKGROUND AND OBJECTIVE: Assignment of medical imaging procedure protocols requires extensive knowledge about patient's data, usually included in radiological request forms and radiological reports. Assignment of protocol is required prior to radiological study acquisition, determining procedure for each patient. The automation of this protocol assignment process could improve the efficiency of patient's diagnosis. Artificial intelligence has proven to be of great help in these healthcare-related problems, and specifically the application of Natural Language Processing (NLP) techniques for extracting information from text reports has been successfully used in automatic text classification tasks. METHODS: In this paper, machine learning classification models based on NLP have been developed using patient's data present in radiological reports and radiological imaging protocols. We have used a real corpus provided by the private medical center "HT medica" composed of almost 700,000 Computed Tomography (CT) and Magnetic Resonance Imaging (MRI) examinations obtained during routine clinical use. We have compared several models including traditional machine learning methods such as support vector machine and random forest, neural networks and transfer language techniques. RESULTS: The results obtained are encouraging taking into account that the system is performing a complex text multiclass classification task. Specifically, for the best proposed system we obtain 92.2% accuracy in the CT dataset and 86.9% in the MRI dataset. CONCLUSIONS: The best machine learning system is potentially efficient, quality and cost effective. For this reason it is currently used in real scenarios by radiologists as decision support tool for assigning protocols of CT and MRI studies.

Subject(s)

Artificial Intelligence , Machine Learning , Humans , Magnetic Resonance Imaging , Natural Language Processing , Support Vector Machine

COVID-19 detection in radiological text reports integrating entity recognition.

López-Úbeda, Pilar; Díaz-Galiano, Manuel Carlos; Martín-Noguerol, Teodoro; Luna, Antonio; Ureña-López, L Alfonso; Martín-Valdivia, M Teresa.

Comput Biol Med ; 127: 104066, 2020 12.

Article in English | MEDLINE | ID: mdl-33130435

ABSTRACT

COVID-19 diagnosis is usually based on PCR test using radiological images, mainly chest Computed Tomography (CT) for the assessment of lung involvement by COVID-19. However, textual radiological reports also contain relevant information for determining the likelihood of presenting radiological signs of COVID-19 involving lungs. The development of COVID-19 automatic detection systems based on Natural Language Processing (NLP) techniques could provide a great help in supporting clinicians and detecting COVID-19 related disorders within radiological reports. In this paper we propose a text classification system based on the integration of different information sources. The system can be used to automatically predict whether or not a patient has radiological findings consistent with COVID-19 on the basis of radiological reports of chest CT. To carry out our experiments we use 295 radiological reports from chest CT studies provided by the ''HT médica" clinic. All of them are radiological requests with suspicions of chest involvement by COVID-19. In order to train our text classification system we apply Machine Learning approaches and Named Entity Recognition. The system takes two sources of information as input: the text of the radiological report and COVID-19 related disorders extracted from SNOMED-CT. The best system is trained using SVM and the baseline results achieve 85% accuracy predicting lung involvement by COVID-19, which already offers competitive values that are difficult to overcome. Moreover, we apply mutual information in order to integrate the best quality information extracted from SNOMED-CT. In this way, we achieve around 90% accuracy improving the baseline results by 5 points.

Subject(s)

COVID-19/diagnosis , SARS-CoV-2/isolation & purification , Algorithms , Automation , COVID-19/virology , Humans , Language , Spain , Systematized Nomenclature of Medicine

How do we talk about doctors and drugs? Sentiment analysis in forums expressing opinions for medical domain.

Jiménez-Zafra, Salud María; Martín-Valdivia, M Teresa; Molina-González, M Dolores; Ureña-López, L Alfonso.

Artif Intell Med ; 93: 50-57, 2019 01.

Article in English | MEDLINE | ID: mdl-29685725

ABSTRACT

OBJECTIVE: The main goal of this study is to examine how people express their opinion in medical forums. We analyze the language used in order to determine the best way to tackle sentiment analysis in this domain. METHODS: We have applied supervised learning and lexicon-based sentiment analysis approaches over two different corpora extracted from social web. Specifically, we have focused on two aspects: drugs and doctors. We have selected two forums and we have collected corpora for each one: (i) DOS, a Spanish corpus of drug reviews and (ii) COPOS, a Spanish corpus of patients' opinions about physicians. RESULTS: The classification results show that drug reviews are more difficult to classify than those about physicians. In order to understand the difference in the results, we have studied the linguistic features of both corpora. CONCLUSIONS: Although opinions about physicians and drugs are written in most cases by non-professional users, reviews about physicians are characterized by the use of an informal language while reviews about drugs are characterized by a combination of informal language with specific terminology (e.g. adverse effects, drug names) with greater lexical diversity, making the task of sentiment analysis difficult.

Subject(s)

Drug Therapy , Physician-Patient Relations , Algorithms , Humans , Learning

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL