Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 138
Filtrar
1.
Med Biol Eng Comput ; 2024 Oct 03.
Artículo en Inglés | MEDLINE | ID: mdl-39358488

RESUMEN

Heart failure represents the ultimate stage in the progression of diverse cardiac ailments. Throughout the management of heart failure, physicians require observation of medical imagery to formulate therapeutic regimens for patients. Automated report generation technology serves as a tool aiding physicians in patient management. However, previous studies failed to generate targeted reports for specific diseases. To produce high-quality medical reports with greater relevance across diverse conditions, we introduce an automatic report generation model HF-CMN, tailored to heart failure. Firstly, the generated report includes comprehensive information pertaining to heart failure gleaned from chest radiographs. Additionally, we construct a storage query matrix grouping based on a multi-label type, enhancing the accuracy of our model in aligning images with text. Experimental results demonstrate that our method can generate reports strongly correlated with heart failure and outperforms most other advanced methods on benchmark datasets MIMIC-CXR and IU X-Ray. Further analysis confirms that our method achieves superior alignment between images and texts, resulting in higher-quality reports.

2.
Acad Radiol ; 2024 Sep 18.
Artículo en Inglés | MEDLINE | ID: mdl-39299861

RESUMEN

RATIONALE AND OBJECTIVES: To investigate and discern if preferences and expectations regarding the stylistics of the radiology report varied across roles, specialties, and practice location amongst referring providers. MATERIALS AND METHODS: A total of 579 referring clinicians were invited to complete our survey electronically and were asked to identify themselves as either physicians or advanced practice providers (APPs), specify their specialty, and primary practice environment. They were asked to rank the three reports on appearance, formatting, level of detail, and overall preference, with additional queries about their preferences regarding literature citation inclusions and placement of dose reduction statements. RESULTS: 477 surveys were completed and returned for analysis, resulting in an 82.2% response rate. The most preferred reporting style was the blended report (62.5%), followed by the narrative report (18.9%) and the highly templated report (18.7%), respectively. There were no statistically significant differences in the most preferred reporting style between provider types (F(1, 475) = [0.69], p = 0.4067), between different practice settings (F(2, 474) = [2.32], p = 0.0995), and between different medical specialties (F(5, 471) = [2.23], p = 0.051). Among the three report styles, blended reporting received the highest satisfaction scores overall. The highly templated report was rated lowest for appearance and detail, while narrative reports received moderate satisfaction scores for appearance and detail. A majority favored inclusion of literature citations and similarly, the placement of dose-optimization statements at the end of the report. Preferences were consistent across specialties and practice settings. CONCLUSION: This survey highlights that a majority of clinicians across a variety of specialties prefer a mix of structured reporting with narrative elements. The standardization of required metrics included in the radiology report may have far-reaching consequences for future reimbursement.

3.
Diagn Interv Radiol ; 2024 Sep 02.
Artículo en Inglés | MEDLINE | ID: mdl-39221690

RESUMEN

PURPOSE: Unstructured, free-text dictation (FT), the current standard in breast magnetic resonance imaging (MRI) reporting, is considered time-consuming and prone to error. The purpose of this study is to assess the usability and performance of a novel, software-based guided reporting (GR) strategy in breast MRI. METHODS: Eighty examinations previously evaluated for a clinical indication (e.g., mass and focus/non-mass enhancement) with FT were reevaluated by three specialized radiologists using GR. Each radiologist had a different number of cases (R1, n = 24; R2, n = 20; R3, n = 36). Usability was assessed by subjective feedback, and quality was assessed by comparing the completeness of automatically generated GR reports with that of their FT counterparts. Errors in GR were categorized and analyzed for debugging with a final software version. Combined reading and reporting times and learning curves were analyzed. RESULTS: Usability was rated high by all readers. No non-sense, omission/commission, or translational errors were detected with the GR method. Spelling and grammar errors were observed in 3/80 patient reports (3.8%) with GR (exclusively in the discussion section) and in 36/80 patient reports (45%) with FT. Between FT and GR, 41 patient reports revealed no content differences, 33 revealed minor differences, and 6 revealed major differences that resulted in changes in treatment. The errors in all patient reports with major content differences were categorized as content omission errors caused by improper software operation (n = 2) or by missing content in software v. 0.8 displayable with v. 1.7 (n = 4). The mean combined reading and reporting time was 576 s (standard deviation: 327 s; min: 155 s; max: 1,517 s). The mean times for each reader were 485, 557, and 754 s, and the respective learning curves evaluated by regression models revealed statistically significant slopes (P = 0.002; P = 0.0002; P < 0.0001). Overall times were shorter compared with external references that used FT. The mean combined reading and reporting time of MRI examinations using FT was 1,043 s and decreased by 44.8% with GR. CONCLUSION: GR allows for complete reporting with minimized error rates and reduced combined reading and reporting times. The streamlining of the process (evidenced by lower reading times) for the readers in this study proves that GR can be learned quickly. Reducing reporting errors leads to fewer therapeutic faults and lawsuits against radiologists. It is known that delays in radiology reporting hinder early treatment and lead to poorer patient outcomes. CLINICAL SIGNIFICANCE: While the number of scans and images per examination is continuously rising, staff shortages create a bottleneck in radiology departments. The IT-based GR method can be a major boon, improving radiologist efficiency, report quality, and the quality of simultaneously generated data.

4.
Res Sq ; 2024 Aug 30.
Artículo en Inglés | MEDLINE | ID: mdl-39257991

RESUMEN

Purpose: Radiology report generation, translating radiological images into precise and clinically relevant description, may face the data imbalance challenge - medical tokens appear less frequently than regular tokens; and normal entries are significantly more than abnormal ones. However, very few studies consider the imbalance issues, not even with conjugate imbalance factors. Methods: In this study, we propose a Joint Imbalance Adaptation (JIMA) model to promote task robustness by leveraging token and label imbalance. JIMA predicts entity distributions from images and generates reports based on these distributions and image features. We employ a hard-to-easy learning strategy that mitigates overfitting to frequent labels and tokens, thereby encouraging the model to focus more on rare labels and clinical tokens. Results: JIMA shows notable improvements (16.75% - 50.50% on average) across evaluation metrics on IU X-ray and MIMIC-CXR datasets. Our ablation analysis proves that JIMA's enhanced handling of infrequent tokens and abnormal labels counts the major contribution. Human evaluation and case study experiments further validate that JIMA can generate more clinically accurate reports. Conclusion: Data imbalance (e.g., infrequent tokens and abnormal labels) leads to the underperformance of radiology report generation. Our curriculum learning strategy successfully reduce data imbalance impacts by reducing overfitting on frequent patterns and underfitting on infrequent patterns. While data imbalance remains challenging, our approach opens new directions for the generation task.

5.
J Am Coll Radiol ; 2024 Aug 21.
Artículo en Inglés | MEDLINE | ID: mdl-39155027

RESUMEN

OBJECTIVE: Patients increasingly have access to their radiology reports. This systematic review examined the opinions of patients, referring physicians, and radiologists over time on providing patients full access to their radiology reports. METHODS: A systematic review examining quantitative, qualitative, and mixed methods research using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines (PROSPERO CRD42023466502). Our search was conducted through September 30, 2023, and spanned five databases (CINAHL Plus, Web of Science, ProQuest, PubMed, and Scopus). The studies included were peer-reviewed journal articles about the opinions of patients, referring physicians, or radiologists regarding giving patients unrestricted access to their radiology reports. RESULTS: After screening 4,520 articles, the full texts of 439 studies were assessed for eligibility. Thirty-three studies met the inclusion criteria. The studies showed that, over time, patients have consistently expressed a strong desire to access radiology reports, and referring physicians and radiologists have varied opinions about patient access to radiology reports. The main advantages of patient access found in the studies were enhanced understanding and empowerment and increased patient-physician engagement and communication. The main disadvantages were difficulties in patients understanding reports and patient anxiety from accessing reports. Referring physicians' opinions and radiologists' opinions were found in less than 20% (six studies) and 10% (three studies), respectively. DISCUSSION: The studies show patients have desired access to radiology reports over time. Future research should elicit the opinions of referring physicians and radiologists to enable a more informed design of patient access to radiology reports.

6.
Stud Health Technol Inform ; 316: 1780-1784, 2024 Aug 22.
Artículo en Inglés | MEDLINE | ID: mdl-39176562

RESUMEN

Radiology reports contain crucial patient information, in addition to images, that can be automatically extracted for secondary uses such as clinical support and research for diagnosis. We tested several classifiers to classify 1,218 breast MRI reports in French from two Swiss clinical centers. Logistic regression performed better for both internal (accuracy > 0.95 and macro-F1 > 0.86) and external data (accuracy > 0.81 and macro-F1 > 0.41). Automating this task will facilitate efficient extraction of targeted clinical parameters and provide a good basis for future annotation processes through automatic pre-annotation.


Asunto(s)
Neoplasias de la Mama , Imagen por Resonancia Magnética , Humanos , Femenino , Neoplasias de la Mama/diagnóstico por imagen , Francia , Sistemas de Información Radiológica , Registros Electrónicos de Salud , Procesamiento de Lenguaje Natural , Suiza , Minería de Datos
7.
Acad Radiol ; 2024 Aug 13.
Artículo en Inglés | MEDLINE | ID: mdl-39142976

RESUMEN

RATIONALE AND OBJECTIVES: The process of generating radiology reports is often time-consuming and labor-intensive, prone to incompleteness, heterogeneity, and errors. By employing natural language processing (NLP)-based techniques, this study explores the potential for enhancing the efficiency of radiology report generation through the remarkable capabilities of ChatGPT (Generative Pre-training Transformer), a prominent large language model (LLM). MATERIALS AND METHODS: Using a sample of 1000 records from the Medical Information Mart for Intensive Care (MIMIC) Chest X-ray Database, this investigation employed Claude.ai to extract initial radiological report keywords. ChatGPT then generated radiology reports using a consistent 3-step prompt template outline. Various lexical and sentence similarity techniques were employed to evaluate the correspondence between the AI assistant-generated reports and reference reports authored by medical professionals. RESULTS: Results showed varying performance among NLP models, with Bart (Bidirectional and Auto-Regressive Transformers) and XLM (Cross-lingual Language Model) displaying high proficiency (mean similarity scores up to 99.3%), closely mirroring physician reports. Conversely, DeBERTa (Decoding-enhanced BERT with disentangled attention) and sequence-matching models scored lower, indicating less alignment with medical language. In the Impression section, the Word-Embedding model excelled with a mean similarity of 84.4%, while others like the Jaccard index showed lower performance. CONCLUSION: Overall, the study highlights significant variations across NLP models in their ability to generate radiology reports consistent with medical professionals' language. Pairwise comparisons and Kruskal-Wallis tests confirmed these differences, emphasizing the need for careful selection and evaluation of NLP models in radiology report generation. This research underscores the potential of ChatGPT to streamline and improve the radiology reporting process, with implications for enhancing efficiency and accuracy in clinical practice.

8.
J Biomed Inform ; 157: 104718, 2024 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-39209086

RESUMEN

Radiology report generation automates diagnostic narrative synthesis from medical imaging data. Current report generation methods primarily employ knowledge graphs for image enhancement, neglecting the interpretability and guiding function of the knowledge graphs themselves. Additionally, few approaches leverage the stable modal alignment information from multimodal pre-trained models to facilitate the generation of radiology reports. We propose the Terms-Guided Radiology Report Generation (TGR), a simple and practical model for generating reports guided primarily by anatomical terms. Specifically, we utilize a dual-stream visual feature extraction module comprised of detail extraction module and a frozen multimodal pre-trained model to separately extract visual detail features and semantic features. Furthermore, a Visual Enhancement Module (VEM) is proposed to further enrich the visual features, thereby facilitating the generation of a list of anatomical terms. We integrate anatomical terms with image features and proceed to engage contrastive learning with frozen text embeddings, utilizing the stable feature space from these embeddings to boost modal alignment capabilities further. Our model incorporates the capability for manual input, enabling it to generate a list of organs for specifically focused abnormal areas or to produce more accurate single-sentence descriptions based on selected anatomical terms. Comprehensive experiments demonstrate the effectiveness of our method in report generation tasks, our TGR-S model reduces training parameters by 38.9% while performing comparably to current state-of-the-art models, and our TGR-B model exceeds the best baseline models across multiple metrics.


Asunto(s)
Procesamiento de Lenguaje Natural , Humanos , Radiología/educación , Radiología/métodos , Algoritmos , Aprendizaje Automático , Semántica , Sistemas de Información Radiológica , Diagnóstico por Imagen/métodos
9.
Int Dent J ; 2024 Jul 26.
Artículo en Inglés | MEDLINE | ID: mdl-39068121

RESUMEN

OBJECTIVES: Several factors such as unavailability of specialists, dental phobia, and financial difficulties may lead to a delay between receiving an oral radiology report and consulting a dentist. The primary aim of this study was to distinguish between high-risk and low-risk oral lesions according to the radiologist's reports of cone beam computed tomography (CBCT) images. Such a facility may be employed by dentist or his/her assistant to make the patient aware of the severity and the grade of the oral lesion and referral for immediate treatment or other follow-up care. METHODS: A total number of 1134 CBCT radiography reports owned by Shiraz University of Medical Sciences were collected. The severity level of each sample was specified by three experts, and an annotation was carried out accordingly. After preprocessing the data, a deep learning model, referred to as CNN-LSTM, was developed, which aims to detect the degree of severity of the problem based on analysis of the radiologist's report. Unlike traditional models which usually use a simple collection of words, the proposed deep model uses words embedded in dense vector representations, which empowers it to effectively capture semantic similarities. RESULTS: The results indicated that the proposed model outperformed its counterparts in terms of precision, recall, and F1 criteria. This suggests its potential as a reliable tool for early estimation of the severity of oral lesions. CONCLUSIONS: This study shows the effectiveness of deep learning in the analysis of textual reports and accurately distinguishing between high-risk and low-risk lesions. Employing the proposed model which can Provide timely warnings about the need for follow-up and prompt treatment can shield the patient from the risks associated with delays. CLINICAL SIGNIFICANCE: Our collaboratively collected and expert-annotated dataset serves as a valuable resource for exploratory research. The results demonstrate the pivotal role of our deep learning model could play in assessing the severity of oral lesions in dental reports.

10.
Med Image Anal ; 97: 103264, 2024 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-39013207

RESUMEN

Natural Image Captioning (NIC) is an interdisciplinary research area that lies within the intersection of Computer Vision (CV) and Natural Language Processing (NLP). Several works have been presented on the subject, ranging from the early template-based approaches to the more recent deep learning-based methods. This paper conducts a survey in the area of NIC, especially focusing on its applications for Medical Image Captioning (MIC) and Diagnostic Captioning (DC) in the field of radiology. A review of the state-of-the-art is conducted summarizing key research works in NIC and DC to provide a wide overview on the subject. These works include existing NIC and MIC models, datasets, evaluation metrics, and previous reviews in the specialized literature. The revised work is thoroughly analyzed and discussed, highlighting the limitations of existing approaches and their potential implications in real clinical practice. Similarly, future potential research lines are outlined on the basis of the detected limitations.


Asunto(s)
Procesamiento de Lenguaje Natural , Humanos , Sistemas de Información Radiológica , Aprendizaje Profundo , Diagnóstico por Imagen/métodos , Procesamiento de Imagen Asistido por Computador/métodos , Interpretación de Imagen Asistida por Computador/métodos
11.
Med Biol Eng Comput ; 2024 Jun 07.
Artículo en Inglés | MEDLINE | ID: mdl-38844661

RESUMEN

This paper presents the implementation of two automated text classification systems for prostate cancer findings based on the PI-RADS criteria. Specifically, a traditional machine learning model using XGBoost and a language model-based approach using RoBERTa were employed. The study focused on Spanish-language radiological MRI prostate reports, which has not been explored before. The results demonstrate that the RoBERTa model outperforms the XGBoost model, although both achieve promising results. Furthermore, the best-performing system was integrated into the radiological company's information systems as an API, operating in a real-world environment.

12.
J Comput Biol ; 31(6): 486-497, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38837136

RESUMEN

Automatic radiology medical report generation is a necessary development of artificial intelligence technology in the health care. This technology serves to aid doctors in producing comprehensive diagnostic reports, alleviating the burdensome workloads of medical professionals. However, there are some challenges in generating radiological reports: (1) visual and textual data biases and (2) long-distance dependency problem. To tackle these issues, we design a visual recalibration and gating enhancement network (VRGE), which composes of the visual recalibration module and the gating enhancement module (gating enhancement module, GEM). Specifically, the visual recalibration module enhances the recognition of abnormal features in lesion areas of medical images. The GEM dynamically adjusts the contextual information in the report by introducing gating mechanisms, focusing on capturing professional medical terminology in medical text reports. We have conducted sufficient experiments on the public datasets of IU X-Ray to illustrate that the VRGE outperforms existing models.


Asunto(s)
Inteligencia Artificial , Humanos , Radiología/métodos , Algoritmos
13.
Int J Med Inform ; 187: 105443, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38615509

RESUMEN

OBJECTIVES: This study addresses the critical need for accurate summarization in radiology by comparing various Large Language Model (LLM)-based approaches for automatic summary generation. With the increasing volume of patient information, accurately and concisely conveying radiological findings becomes crucial for effective clinical decision-making. Minor inaccuracies in summaries can lead to significant consequences, highlighting the need for reliable automated summarization tools. METHODS: We employed two language models - Text-to-Text Transfer Transformer (T5) and Bidirectional and Auto-Regressive Transformers (BART) - in both fine-tuned and zero-shot learning scenarios and compared them with a Recurrent Neural Network (RNN). Additionally, we conducted a comparative analysis of 100 MRI report summaries, using expert human judgment and criteria such as coherence, relevance, fluency, and consistency, to evaluate the models against the original radiologist summaries. To facilitate this, we compiled a dataset of 15,508 retrospective knee Magnetic Resonance Imaging (MRI) reports from our Radiology Information System (RIS), focusing on the findings section to predict the radiologist's summary. RESULTS: The fine-tuned models outperform the neural network and show superior performance in the zero-shot variant. Specifically, the T5 model achieved a Rouge-L score of 0.638. Based on the radiologist readers' study, the summaries produced by this model were found to be very similar to those produced by a radiologist, with about 70% similarity in fluency and consistency between the T5-generated summaries and the original ones. CONCLUSIONS: Technological advances, especially in NLP and LLM, hold great promise for improving and streamlining the summarization of radiological findings, thus providing valuable assistance to radiologists in their work.


Asunto(s)
Estudios de Factibilidad , Imagen por Resonancia Magnética , Procesamiento de Lenguaje Natural , Redes Neurales de la Computación , Humanos , Sistemas de Información Radiológica , Rodilla/diagnóstico por imagen , Estudios Retrospectivos
14.
Bioengineering (Basel) ; 11(4)2024 Apr 03.
Artículo en Inglés | MEDLINE | ID: mdl-38671773

RESUMEN

Deep learning is revolutionizing radiology report generation (RRG) with the adoption of vision encoder-decoder (VED) frameworks, which transform radiographs into detailed medical reports. Traditional methods, however, often generate reports of limited diversity and struggle with generalization. Our research introduces reinforcement learning and text augmentation to tackle these issues, significantly improving report quality and variability. By employing RadGraph as a reward metric and innovating in text augmentation, we surpass existing benchmarks like BLEU4, ROUGE-L, F1CheXbert, and RadGraph, setting new standards for report accuracy and diversity on MIMIC-CXR and Open-i datasets. Our VED model achieves F1-scores of 66.2 for CheXbert and 37.8 for RadGraph on the MIMIC-CXR dataset, and 54.7 and 45.6, respectively, on Open-i. These outcomes represent a significant breakthrough in the RRG field. The findings and implementation of the proposed approach, aimed at enhancing diagnostic precision and radiological interpretations in clinical settings, are publicly available on GitHub to encourage further advancements in the field.

15.
Jpn J Radiol ; 42(7): 697-708, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38551771

RESUMEN

PURPOSE: To propose a five-point scale for radiology report importance called Report Importance Category (RIC) and to compare the performance of natural language processing (NLP) algorithms in assessing RIC using head computed tomography (CT) reports written in Japanese. MATERIALS AND METHODS: 3728 Japanese head CT reports performed at Osaka University Hospital in 2020 were included. RIC (category 0: no findings, category 1: minor findings, category 2: routine follow-up, category 3: careful follow-up, and category 4: examination or therapy) was established based not only on patient severity but also on the novelty of the information. The manual assessment of RIC for the reports was performed under the consensus of two out of four neuroradiologists. The performance of four NLP models for classifying RIC was compared using fivefold cross-validation: logistic regression, bidirectional long-short-term memory (BiLSTM), general bidirectional encoder representations of transformers (general BERT), and domain-specific BERT (BERT for medical domain). RESULTS: The proportion of each RIC in the whole data set was 15.0%, 26.7%, 44.2%, 7.7%, and 6.4%, respectively. Domain-specific BERT showed the highest accuracy (0.8434 ± 0.0063) in assessing RIC and significantly higher AUC in categories 1 (0.9813 ± 0.0011), 2 (0.9492 ± 0.0045), 3 (0.9637 ± 0.0050), and 4 (0.9548 ± 0.0074) than the other models (p < .05). Analysis using layer-integrated gradients showed that the domain-specific BERT model could detect important words, such as disease names in reports. CONCLUSIONS: Domain-specific BERT has superiority over the other models in assessing our newly proposed criteria called RIC of head CT radiology reports. The accumulation of similar and further studies of has a potential to contribute to medical safety by preventing missed important findings by clinicians.


Asunto(s)
Procesamiento de Lenguaje Natural , Tomografía Computarizada por Rayos X , Humanos , Tomografía Computarizada por Rayos X/métodos , Japón , Algoritmos , Cabeza/diagnóstico por imagen , Sistemas de Información Radiológica , Femenino , Masculino , Pueblos del Este de Asia
16.
Healthcare (Basel) ; 12(5)2024 Feb 21.
Artículo en Inglés | MEDLINE | ID: mdl-38470621

RESUMEN

Diagnosis of necrotizing enterocolitis (NEC) relies heavily on imaging, but uncertainty in the language used in imaging reports can result in ambiguity, miscommunication, and potential diagnostic errors. To determine the degree of uncertainty in reporting imaging findings for NEC, we conducted a secondary analysis of the data from a previously completed pilot diagnostic randomized controlled trial (2019-2020). The study population comprised sixteen preterm infants with suspected NEC randomized to abdominal radiographs (AXRs) or AXR + bowel ultrasound (BUS). The level of uncertainty was determined using a four-point Likert scale. Overall, we reviewed radiology reports of 113 AXR and 24 BUS from sixteen preterm infants with NEC concern. The BUS reports showed less uncertainty for reporting pneumatosis, portal venous gas, and free air compared to AXR reports (pneumatosis: 1 [1-1.75) vs. 3 [2-3], p < 0.0001; portal venous gas: 1 [1-1] vs. 1 [1-1], p = 0.02; free air: 1 [1-1] vs. 2 [1-3], p < 0.0001). In conclusion, we found that BUS reports have a lower degree of uncertainty in reporting imaging findings of NEC compared to AXR reports. Whether the lower degree of uncertainty of BUS reports positively impacts clinical decision making in infants with possible NEC remains unknown.

17.
Clin Imaging ; 109: 110113, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38552383

RESUMEN

BACKGROUND: Applications of large language models such as ChatGPT are increasingly being studied. Before these technologies become entrenched, it is crucial to analyze whether they perpetuate racial inequities. METHODS: We asked Open AI's ChatGPT-3.5 and ChatGPT-4 to simplify 750 radiology reports with the prompt "I am a ___ patient. Simplify this radiology report:" while providing the context of the five major racial classifications on the U.S. census: White, Black or African American, American Indian or Alaska Native, Asian, and Native Hawaiian or other Pacific Islander. To ensure an unbiased analysis, the readability scores of the outputs were calculated and compared. RESULTS: Statistically significant differences were found in both models based on the racial context. For ChatGPT-3.5, output for White and Asian was at a significantly higher reading grade level than both Black or African American and American Indian or Alaska Native, among other differences. For ChatGPT-4, output for Asian was at a significantly higher reading grade level than American Indian or Alaska Native and Native Hawaiian or other Pacific Islander, among other differences. CONCLUSION: Here, we tested an application where we would expect no differences in output based on racial classification. Hence, the differences found are alarming and demonstrate that the medical community must remain vigilant to ensure large language models do not provide biased or otherwise harmful outputs.


Asunto(s)
Lenguaje , Radiología , Humanos , Estados Unidos
18.
J Imaging Inform Med ; 37(2): 471-488, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38308070

RESUMEN

Large language models (LLMs) have shown promise in accelerating radiology reporting by summarizing clinical findings into impressions. However, automatic impression generation for whole-body PET reports presents unique challenges and has received little attention. Our study aimed to evaluate whether LLMs can create clinically useful impressions for PET reporting. To this end, we fine-tuned twelve open-source language models on a corpus of 37,370 retrospective PET reports collected from our institution. All models were trained using the teacher-forcing algorithm, with the report findings and patient information as input and the original clinical impressions as reference. An extra input token encoded the reading physician's identity, allowing models to learn physician-specific reporting styles. To compare the performances of different models, we computed various automatic evaluation metrics and benchmarked them against physician preferences, ultimately selecting PEGASUS as the top LLM. To evaluate its clinical utility, three nuclear medicine physicians assessed the PEGASUS-generated impressions and original clinical impressions across 6 quality dimensions (3-point scales) and an overall utility score (5-point scale). Each physician reviewed 12 of their own reports and 12 reports from other physicians. When physicians assessed LLM impressions generated in their own style, 89% were considered clinically acceptable, with a mean utility score of 4.08/5. On average, physicians rated these personalized impressions as comparable in overall utility to the impressions dictated by other physicians (4.03, P = 0.41). In summary, our study demonstrated that personalized impressions generated by PEGASUS were clinically useful in most cases, highlighting its potential to expedite PET reporting by automatically drafting impressions.

19.
JMIR Form Res ; 8: e32690, 2024 Feb 08.
Artículo en Inglés | MEDLINE | ID: mdl-38329788

RESUMEN

BACKGROUND: The automatic generation of radiology reports, which seeks to create a free-text description from a clinical radiograph, is emerging as a pivotal intersection between clinical medicine and artificial intelligence. Leveraging natural language processing technologies can accelerate report creation, enhancing health care quality and standardization. However, most existing studies have not yet fully tapped into the combined potential of advanced language and vision models. OBJECTIVE: The purpose of this study was to explore the integration of pretrained vision-language models into radiology report generation. This would enable the vision-language model to automatically convert clinical images into high-quality textual reports. METHODS: In our research, we introduced a radiology report generation model named ClinicalBLIP, building upon the foundational InstructBLIP model and refining it using clinical image-to-text data sets. A multistage fine-tuning approach via low-rank adaptation was proposed to deepen the semantic comprehension of the visual encoder and the large language model for clinical imagery. Furthermore, prior knowledge was integrated through prompt learning to enhance the precision of the reports generated. Experiments were conducted on both the IU X-RAY and MIMIC-CXR data sets, with ClinicalBLIP compared to several leading methods. RESULTS: Experimental results revealed that ClinicalBLIP obtained superior scores of 0.570/0.365 and 0.534/0.313 on the IU X-RAY/MIMIC-CXR test sets for the Metric for Evaluation of Translation with Explicit Ordering (METEOR) and the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) evaluations, respectively. This performance notably surpasses that of existing state-of-the-art methods. Further evaluations confirmed the effectiveness of the multistage fine-tuning and the integration of prior information, leading to substantial improvements. CONCLUSIONS: The proposed ClinicalBLIP model demonstrated robustness and effectiveness in enhancing clinical radiology report generation, suggesting significant promise for real-world clinical applications.

20.
Diagnostics (Basel) ; 14(2)2024 Jan 08.
Artículo en Inglés | MEDLINE | ID: mdl-38248014

RESUMEN

This study aims to establish advanced sampling methods in free-text data for efficiently building semantic text mining models using deep learning, such as identifying vertebral compression fracture (VCF) in radiology reports. We enrolled a total of 27,401 radiology free-text reports of X-ray examinations of the spine. The predictive effects were compared between text mining models built using supervised long short-term memory networks, independently derived by four sampling methods: vector sum minimization, vector sum maximization, stratified, and simple random sampling, using four fixed percentages. The drawn samples were applied to the training set, and the remaining samples were used to validate each group using different sampling methods and ratios. The predictive accuracy was measured using the area under the receiver operating characteristics (AUROC) to identify VCF. At the sampling ratios of 1/10, 1/20, 1/30, and 1/40, the highest AUROC was revealed in the sampling methods of vector sum minimization as confidence intervals of 0.981 (95%CIs: 0.980-0.983)/0.963 (95%CIs: 0.961-0.965)/0.907 (95%CIs: 0.904-0.911)/0.895 (95%CIs: 0.891-0.899), respectively. The lowest AUROC was demonstrated in the vector sum maximization. This study proposes an advanced sampling method, vector sum minimization, in free-text data that can be efficiently applied to build the text mining models by smartly drawing a small amount of critical representative samples.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA