Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 24
Filter
1.
J Ultrasound Med ; 2024 Jul 09.
Article in English | MEDLINE | ID: mdl-38980145

ABSTRACT

OBJECTIVE: To describe the morphologic sonographic appearances and frequency of the "halo sign" in the setting of fat necrosis on shear wave elastography (SWE). METHODS: Patients with clinically suspected fat necrosis were prospectively scanned using SWE in addition to standard gray-scale and Doppler images. Cases were qualitatively grouped into one of three sonographic appearances: focal hypoechoic lesion with increased internal tissue stiffness ("focal stiffness"), focal hypoechoic lesion with isoechoic or hyperechoic periphery demonstrating increased tissue stiffness relative to the central hypoechoic lesion ("halo stiffness"), heterogeneously echogenic lesion with diffusely increased stiffness ("heterogeneous stiffness"). RESULTS: Exactly 19 patients met inclusion criteria (female n = 14; male n = 5). Shear wave velocities were recorded and retrospectively evaluated. The mean clinical follow-up was 11.4 months (range 3.0-25.5). Lesions demonstrated higher average tissue stiffness than background tissue (overall mass shear wave velocity 3.26 m/s, background 1.42 m/s, P < .001; lesion Young's modulus 40.85 kPa vs background 7.22 kPa, P < .001). The halo sign was identified in 10/19 (55%) patients. CONCLUSION: The halo sign is a potentially useful sign in the setting of fat necrosis seen in the majority of clinically suspected cases.

2.
Skeletal Radiol ; 2024 May 02.
Article in English | MEDLINE | ID: mdl-38695875

ABSTRACT

PURPOSE: We wished to evaluate if an open-source artificial intelligence (AI) algorithm ( https://www.childfx.com ) could improve performance of (1) subspecialized musculoskeletal radiologists, (2) radiology residents, and (3) pediatric residents in detecting pediatric and young adult upper extremity fractures. MATERIALS AND METHODS: A set of evaluation radiographs drawn from throughout the upper extremity (elbow, hand/finger, humerus/shoulder/clavicle, wrist/forearm, and clavicle) from 240 unique patients at a single hospital was constructed (mean age 11.3 years, range 0-22 years, 37.9% female). Two fellowship-trained musculoskeletal radiologists, three radiology residents, and two pediatric residents were recruited as readers. Each reader interpreted each case initially without and then subsequently 3-4 weeks later with AI assistance and recorded if/where fracture was present. RESULTS: Access to AI significantly improved area under the receiver operator curve (AUC) of radiology residents (0.768 [0.730-0.806] without AI to 0.876 [0.845-0.908] with AI, P < 0.001) and pediatric residents (0.706 [0.659-0.753] without AI to 0.844 [0.805-0.883] with AI, P < 0.001) in identifying fracture, respectively. There was no evidence of improvement for subspecialized musculoskeletal radiology attendings in identifying fracture (AUC 0.867 [0.832-0.902] to 0.890 [0.856-0.924], P = 0.093). There was no evidence of difference between overall resident AUC with AI and subspecialist AUC without AI (resident with AI 0.863, attending without AI AUC 0.867, P = 0.856). Overall physician radiograph interpretation time was significantly lower with AI (38.9 s with AI vs. 52.1 s without AI, P = 0.030). CONCLUSION: An openly accessible AI model significantly improved radiology and pediatric resident accuracy in detecting pediatric upper extremity fractures.

3.
AJR Am J Roentgenol ; 222(6): e2430958, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38568033

ABSTRACT

BACKGROUND. MRI utility for patients 45 years old and older with hip or knee pain is not well established. OBJECTIVE. We performed this systematic review to assess whether MRI-diagnosed hip or knee pathology in patients 45 years old and older correlates with symptoms or benefits from arthroscopic surgery. EVIDENCE ACQUISITION. A literature search (PubMed, Web of Science, Embase) of articles published before October 3, 2022, was performed to identify original research pertaining to the study question. Publication information, study design, cohort size, osteoarthritis severity, age (range, mean), measured outcomes, minimum follow-up length, and MRI field strength were extracted. Study methods were appraised with NIH's study quality assessment tools. EVIDENCE SYNTHESIS. The search yielded 1125 potential studies, of which 31 met the inclusion criteria (18 knee, 13 hip). Knee studies (10 prospective, eight retrospective) included 5907 patients (age range, 45-90 years). Bone marrow edema-like lesions, joint effusions, and synovitis on MRI were associated with symptoms. In patients with osteoarthritis, meniscal tears were less likely to be symptom generators and were less likely to respond to arthroscopic surgery with osteoarthritis progression. Hip studies (11 retrospective, two prospective) included 6385 patients (age range, 50 to ≥ 85 years). Patients with Tönnis grade 2 osteoarthritis and lower with and without femoroacetabular impingement (FAI) showed improved outcomes after arthroscopy, suggesting a role for MRI in the diagnosis of labral tears, chondral lesions, and FAI. Although this group benefited from arthroscopic surgery, outcomes were inferior to those in younger patients. Variability in study characteristics, follow-up, and outcome measures precluded a meta-analysis. CONCLUSION. In patients 45 years old and older, several knee structural lesions on MRI correlated with symptoms, representing potential imaging biomarkers. Meniscal tear identification on MRI likely has diminished clinical value as osteoarthritis progresses. For the hip, MRI can play a role in the diagnosis of labral tears, chondral lesions, and FAI in patients without advanced osteoarthritis. CLINICAL IMPACT. Several structural lesions on knee MRI correlating with symptoms may represent imaging biomarkers used as treatment targets. Osteoarthritis, not age, may play the greatest role in determining the utility of MRI for patients 45 years old and older with hip or knee pain.


Subject(s)
Arthralgia , Magnetic Resonance Imaging , Aged , Humans , Middle Aged , Arthralgia/diagnostic imaging , Arthralgia/etiology , Hip Joint/diagnostic imaging , Hip Joint/pathology , Knee Joint/diagnostic imaging , Knee Joint/pathology , Magnetic Resonance Imaging/methods , Aged, 80 and over
4.
Radiology ; 311(1): e231055, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38687217

ABSTRACT

Background Commonly used pediatric lower extremity growth standards are based on small, dated data sets. Artificial intelligence (AI) enables creation of updated growth standards. Purpose To train an AI model using standing slot-scanning radiographs in a racially diverse data set of pediatric patients to measure lower extremity length and to compare expected growth curves derived using AI measurements to those of the conventional Anderson-Green method. Materials and Methods This retrospective study included pediatric patients aged 0-21 years who underwent at least two slot-scanning radiographs in routine clinical care between August 2015 and February 2022. A Mask Region-based Convolutional Neural Network was trained to segment the femur and tibia on radiographs and measure total leg, femoral, and tibial length; accuracy was assessed with mean absolute error. AI measurements were used to create quantile polynomial regression femoral and tibial growth curves, which were compared with the growth curves of the Anderson-Green method for coverage based on the central 90% of the estimated growth distribution. Results In total, 1874 examinations in 523 patients (mean age, 12.7 years ± 2.8 [SD]; 349 female patients) were included; 40% of patients self-identified as White and not Hispanic or Latino, and the remaining 60% self-identified as belonging to a different racial or ethnic group. The AI measurement training, validation, and internal test sets included 114, 25, and 64 examinations, respectively. The mean absolute errors of AI measurements of the femur, tibia, and lower extremity in the test data set were 0.25, 0.27, and 0.33 cm, respectively. All 1874 examinations were used to generate growth curves. AI growth curves more accurately represented lower extremity growth in an external test set (n = 154 examinations) than the Anderson-Green method (90% coverage probability: 86.7% [95% CI: 82.9, 90.5] for AI model vs 73.4% [95% CI: 68.4, 78.3] for Anderson-Green method; χ2 test, P < .001). Conclusion Lower extremity growth curves derived from AI measurements on standing slot-scanning radiographs from a diverse pediatric data set enabled more accurate prediction of pediatric growth. © RSNA, 2024 Supplemental material is available for this article.


Subject(s)
Artificial Intelligence , Femur , Tibia , Humans , Child , Female , Adolescent , Retrospective Studies , Tibia/diagnostic imaging , Male , Child, Preschool , Femur/diagnostic imaging , Infant , Young Adult , Infant, Newborn , Radiography/methods , Lower Extremity/diagnostic imaging
5.
AJR Am J Roentgenol ; 222(3): e2329530, 2024 Mar.
Article in English | MEDLINE | ID: mdl-37436032

ABSTRACT

Artificial intelligence (AI) is increasingly used in clinical practice for musculoskeletal imaging tasks, such as disease diagnosis and image reconstruction. AI applications in musculoskeletal imaging have focused primarily on radiography, CT, and MRI. Although musculoskeletal ultrasound stands to benefit from AI in similar ways, such applications have been relatively underdeveloped. In comparison with other modalities, ultrasound has unique advantages and disadvantages that must be considered in AI algorithm development and clinical translation. Challenges in developing AI for musculoskeletal ultrasound involve both clinical aspects of image acquisition and practical limitations in image processing and annotation. Solutions from other radiology subspecialties (e.g., crowdsourced annotations coordinated by professional societies), along with use cases (most commonly rotator cuff tendon tears and palpable soft-tissue masses), can be applied to musculoskeletal ultrasound to help develop AI. To facilitate creation of high-quality imaging datasets for AI model development, technologists and radiologists should focus on increasing uniformity in musculoskeletal ultrasound performance and increasing annotations of images for specific anatomic regions. This Expert Panel Narrative Review summarizes available evidence regarding AI's potential utility in musculoskeletal ultrasound and challenges facing its development. Recommendations for future AI advancement and clinical translation in musculoskeletal ultrasound are discussed.


Subject(s)
Artificial Intelligence , Tendons , Humans , Ultrasonography , Algorithms , Head
6.
Inflamm Bowel Dis ; 30(4): 594-601, 2024 Apr 03.
Article in English | MEDLINE | ID: mdl-37307420

ABSTRACT

BACKGROUND: Obesity is associated with progression of inflammatory bowel disease (IBD). Visceral adiposity may be a more meaningful measure of obesity compared with traditional measures such as body mass index (BMI). This study compared visceral adiposity vs BMI as predictors of time to IBD flare among patients with Crohn's disease and ulcerative colitis. METHODS: This was a retrospective cohort study. IBD patients were included if they had a colonoscopy and computed tomography (CT) scan within a 30-day window of an IBD flare. They were followed for 6 months or until their next flare. The primary exposure was the ratio of visceral adipose tissue to subcutaneous adipose tissue (VAT:SAT) obtained from CT imaging. BMI was calculated at the time of index CT scan. RESULTS: A total of 100 Crohn's disease and 100 ulcerative colitis patients were included. The median age was 43 (interquartile range, 31-58) years, 39% had disease duration of 10 years or more, and 14% had severe disease activity on endoscopic examination. Overall, 23% of the cohort flared with median time to flare 90 (interquartile range, 67-117) days. Higher VAT:SAT was associated with shorter time to IBD flare (hazard ratio of 4.8 for VAT:SAT ≥1.0 vs VAT:SAT ratio <1.0), whereas higher BMI was not associated with shorter time to flare (hazard ratio of 0.73 for BMI ≥25 kg/m2 vs BMI <25 kg/m2). The relationship between increased VAT:SAT and shorter time to flare appeared stronger for Crohn's than for ulcerative colitis. CONCLUSIONS: Visceral adiposity was associated with decreased time to IBD flare, but BMI was not. Future studies could test whether interventions that decrease visceral adiposity will improve IBD disease activity.


An increased ratio of visceral to subcutaneous adipose tissue was associated with a shorter time to flare in patients with both Crohn's and ulcerative colitis. Conversely, increased body mass index was not associated with a shorter time to flare in inflammatory bowel disease patients.


Subject(s)
Colitis, Ulcerative , Crohn Disease , Humans , Adult , Crohn Disease/complications , Body Mass Index , Colitis, Ulcerative/complications , Adiposity , Retrospective Studies , Obesity , Intra-Abdominal Fat/diagnostic imaging
7.
Pediatr Radiol ; 53(12): 2386-2397, 2023 11.
Article in English | MEDLINE | ID: mdl-37740031

ABSTRACT

BACKGROUND: Pediatric fractures are challenging to identify given the different response of the pediatric skeleton to injury compared to adults, and most artificial intelligence (AI) fracture detection work has focused on adults. OBJECTIVE: Develop and transparently share an AI model capable of detecting a range of pediatric upper extremity fractures. MATERIALS AND METHODS: In total, 58,846 upper extremity radiographs (finger/hand, wrist/forearm, elbow, humerus, shoulder/clavicle) from 14,873 pediatric and young adult patients were divided into train (n = 12,232 patients), tune (n = 1,307), internal test (n = 819), and external test (n = 515) splits. Fracture was determined by manual inspection of all test radiographs and the subset of train/tune radiographs whose reports were classified fracture-positive by a rule-based natural language processing (NLP) algorithm. We trained an object detection model (Faster Region-based Convolutional Neural Network [R-CNN]; "strongly-supervised") and an image classification model (EfficientNetV2-Small; "weakly-supervised") to detect fractures using train/tune data and evaluate on test data. AI fracture detection accuracy was compared with accuracy of on-call residents on cases they preliminarily interpreted overnight. RESULTS: A strongly-supervised fracture detection AI model achieved overall test area under the receiver operating characteristic curve (AUC) of 0.96 (95% CI 0.95-0.97), accuracy 89.7% (95% CI 88.0-91.3%), sensitivity 90.8% (95% CI 88.5-93.1%), and specificity 88.7% (95% CI 86.4-91.0%), and outperformed a weakly-supervised model (AUC 0.93, 95% CI 0.92-0.94, P < 0.0001). AI accuracy on cases preliminary interpreted overnight was higher than resident accuracy (AI 89.4% vs. 85.1%, 95% CI 87.3-91.5% vs. 82.7-87.5%, P = 0.01). CONCLUSION: An object detection AI model identified pediatric upper extremity fractures with high accuracy.


Subject(s)
Artificial Intelligence , Fractures, Bone , Humans , Child , Young Adult , Fractures, Bone/diagnostic imaging , Neural Networks, Computer , Radiography , Elbow , Retrospective Studies
8.
Pediatr Radiol ; 53(6): 1125-1134, 2023 05.
Article in English | MEDLINE | ID: mdl-36650360

ABSTRACT

BACKGROUND: Missed fractures are the leading cause of diagnostic error in the emergency department, and fractures of pediatric bones, particularly subtle wrist fractures, can be misidentified because of their varying characteristics and responses to injury. OBJECTIVE: This study evaluated the utility of an object detection deep learning framework for classifying pediatric wrist fractures as positive or negative for fracture, including subtle buckle fractures of the distal radius, and evaluated the performance of this algorithm as augmentation to trainee radiograph interpretation. MATERIALS AND METHODS: We obtained 395 posteroanterior wrist radiographs from unique pediatric patients (65% positive for fracture, 30% positive for distal radial buckle fracture) and divided them into train (n = 229), tune (n = 41) and test (n = 125) sets. We trained a Faster R-CNN (region-based convolutional neural network) deep learning object-detection model. Two pediatric and two radiology residents evaluated radiographs initially without the artificial intelligence (AI) assistance, and then subsequently with access to the bounding box generated by the Faster R-CNN model. RESULTS: The Faster R-CNN model demonstrated an area under the curve (AUC) of 0.92 (95% confidence interval [CI] 0.87-0.97), accuracy of 88% (n = 110/125; 95% CI 81-93%), sensitivity of 88% (n = 70/80; 95% CI 78-94%) and specificity of 89% (n = 40/45, 95% CI 76-96%) in identifying any fracture and identified 90% of buckle fractures (n = 35/39, 95% CI 76-97%). Access to Faster R-CNN model predictions significantly improved average resident accuracy from 80 to 93% in detecting any fracture (P < 0.001) and from 69 to 92% in detecting buckle fracture (P < 0.001). After accessing AI predictions, residents significantly outperformed AI in cases of disagreement (73% resident correct vs. 27% AI, P = 0.002). CONCLUSION: An object-detection-based deep learning approach trained with only a few hundred examples identified radiographs containing pediatric wrist fractures with high accuracy. Access to model predictions significantly improved resident accuracy in diagnosing these fractures.


Subject(s)
Deep Learning , Fractures, Bone , Wrist Fractures , Wrist Injuries , Humans , Child , Artificial Intelligence , Fractures, Bone/diagnostic imaging , Neural Networks, Computer , Wrist Injuries/diagnostic imaging
9.
Radiol Artif Intell ; 4(4): e220124, 2022 Jul.
Article in English | MEDLINE | ID: mdl-35923380
10.
AJR Am J Roentgenol ; 219(6): 869-878, 2022 12.
Article in English | MEDLINE | ID: mdl-35731103

ABSTRACT

Fractures are common injuries that can be difficult to diagnose, with missed fractures accounting for most misdiagnoses in the emergency department. Artificial intelligence (AI) and, specifically, deep learning have shown a strong ability to accurately detect fractures and augment the performance of radiologists in proof-of-concept research settings. Although the number of real-world AI products available for clinical use continues to increase, guidance for practicing radiologists in the adoption of this new technology is limited. This review describes how AI and deep learning algorithms can help radiologists to better diagnose fractures. The article also provides an overview of commercially available U.S. FDA-cleared AI tools for fracture detection as well as considerations for the clinical adoption of these tools by radiology practices.


Subject(s)
Fractures, Bone , Radiology , Humans , Artificial Intelligence , Radiologists , Algorithms , Radiography , Fractures, Bone/diagnostic imaging
11.
Skeletal Radiol ; 51(8): 1671-1677, 2022 Aug.
Article in English | MEDLINE | ID: mdl-35184211

ABSTRACT

PURPOSE: Many children who undergo MR of the knee to evaluate traumatic injury may not undergo a separate dedicated evaluation of their skeletal maturity, and we wished to investigate how accurately skeletal maturity could be automatically inferred from knee MRI using deep learning to offer this additional information to clinicians. MATERIALS AND METHODS: Retrospective data from 894 studies from 783 patients were obtained (mean age 13.1 years, 47% female). Coronal and sagittal sequences that were T1/PD-weighted were included and resized to 224 × 224 pixels. Data were divided into train (n = 673), tune (n = 48), and test (n = 173) sets, and children were separated across sets. The chronologic age was predicted using deep learning approaches based on a long short-term memory (LSTM) model, which took as input DenseNet-121-extracted features from all T1/PD coronal and sagittal slices. Each test case was manually assigned a bone age by two radiology residents using a reference atlas provided by Pennock and Bomar. The patient's age served as ground truth. RESULTS: The error of the model's predictions for chronological age was not significantly different from that of radiology residents (model M.S.E. 1.30 vs. resident 0.99, paired t-test = 1.47, p = 0.14). Pearson correlation between model and resident prediction of chronologic age was 0.96 (p < 0.001). CONCLUSION: A deep learning-based approach demonstrated ability to infer skeletal maturity from knee MR sequences that was not significantly different from resident performance and did so in less than 2% of the time required by a human expert. This may offer a method for automatically evaluating lower extremity skeletal maturity automatically as part of every MR examination.


Subject(s)
Deep Learning , Adolescent , Child , Female , Humans , Knee , Lower Extremity , Magnetic Resonance Imaging/methods , Male , Retrospective Studies
12.
BMJ Open ; 11(8): e046761, 2021 08 13.
Article in English | MEDLINE | ID: mdl-34389565

ABSTRACT

OBJECTIVE: To validate an existing clinical decision support tool to risk-stratify patients with acute kidney injury (AKI) for hydronephrosis and compare the risk stratification framework with nephrology consultant recommendations. SETTING: Cross-sectional study of hospitalised adults with AKI who had a renal ultrasound (RUS) ordered at a large, tertiary, academic medical centre. PARTICIPANTS: Two hundred and eighty-one patients were included in the study cohort. Based on the risk stratification framework, 111 (40%), 76 (27%) and 94 (33%) patients were in the high-risk, medium-risk and low-risk groups for hydronephrosis, respectively. OUTCOMES: Outcomes were the presence of unilateral or bilateral hydronephrosis on RUS. RESULTS: Thirty-five patients (12%) were found to have hydronephrosis. The high-risk group had 86% sensitivity and 67% specificity for identifying hydronephrosis. A nephrology consult was involved in 168 (60%) patients and RUS was recommended by the nephrology service in 95 (57%) cases. Among patients with a nephrology consultation, 9 (56%) of the 16 total patients with hydronephrosis were recommended to obtain an RUS. CONCLUSIONS: We further externally validated a risk stratification framework for hydronephrosis. Clinical decision support systems may be useful to supplement clinical judgement in the evaluation of AKI.


Subject(s)
Acute Kidney Injury , Hydronephrosis , Acute Kidney Injury/diagnosis , Acute Kidney Injury/etiology , Cross-Sectional Studies , Humans , Hydronephrosis/diagnostic imaging , Risk Assessment , Ultrasonography
13.
J Am Coll Radiol ; 18(4): 590-600, 2021 04.
Article in English | MEDLINE | ID: mdl-33197410

ABSTRACT

PURPOSE: To identify factors important to patients for their return to elective imaging during the coronavirus disease 2019 (COVID-19) pandemic. METHODS: In all, 249 patients had elective MRIs postponed from March 23, 2020, to April 24, 2020, because of the COVID-19 pandemic. Of these patients, 99 completed a 22-question survey about living arrangement and health care follow-up, effect of imaging postponement, safety of imaging, and factors important for elective imaging. Mann-Whitney U, Fisher's exact, χ2 tests, and logistic regression analyses were performed. Statistical significance was set to P ≤ .05 with Bonferroni correction applied. RESULTS: Overall, 68% of patients felt imaging postponement had no impact or a small impact on health, 68% felt it was fairly or extremely safe to obtain imaging, and 53% thought there was no difference in safety between hospital-based and outpatient locations. Patients who already had imaging performed or rescheduled were more likely to feel it was safe to get an MRI (odds ratio [OR] 3.267, P = .028) and that the hospital setting was safe (OR 3.976, P = .004). Staff friendliness was the most important factor related to an imaging center visit (95% fairly or extremely important). Use of masks by staff was the top infection prevention measure (94% fairly or extremely important). Likelihood of rescheduling imaging decreased if a short waiting time was important (OR = 0.107, P = .030). CONCLUSION: As patients begin to feel that it is safe to obtain imaging examinations during the COVID-19 pandemic, many factors important to their imaging experience can be considered by radiology practices when developing new strategies to conduct elective imaging.


Subject(s)
COVID-19 , Diagnostic Imaging/trends , Pandemics , Adult , Aged , Aged, 80 and over , Female , Humans , Male , Middle Aged , United States , Young Adult
14.
NPJ Digit Med ; 2: 31, 2019.
Article in English | MEDLINE | ID: mdl-31304378

ABSTRACT

Hip fractures are a leading cause of death and disability among older adults. Hip fractures are also the most commonly missed diagnosis on pelvic radiographs, and delayed diagnosis leads to higher cost and worse outcomes. Computer-aided diagnosis (CAD) algorithms have shown promise for helping radiologists detect fractures, but the image features underpinning their predictions are notoriously difficult to understand. In this study, we trained deep-learning models on 17,587 radiographs to classify fracture, 5 patient traits, and 14 hospital process variables. All 20 variables could be individually predicted from a radiograph, with the best performances on scanner model (AUC = 1.00), scanner brand (AUC = 0.98), and whether the order was marked "priority" (AUC = 0.79). Fracture was predicted moderately well from the image (AUC = 0.78) and better when combining image features with patient data (AUC = 0.86, DeLong paired AUC comparison, p = 2e-9) or patient data plus hospital process features (AUC = 0.91, p = 1e-21). Fracture prediction on a test set that balanced fracture risk across patient variables was significantly lower than a random test set (AUC = 0.67, DeLong unpaired AUC comparison, p = 0.003); and on a test set with fracture risk balanced across patient and hospital process variables, the model performed randomly (AUC = 0.52, 95% CI 0.46-0.58), indicating that these variables were the main source of the model's fracture predictions. A single model that directly combines image features, patient, and hospital process data outperforms a Naive Bayes ensemble of an image-only model prediction, patient, and hospital process data. If CAD algorithms are inexplicably leveraging patient and process variables in their predictions, it is unclear how radiologists should interpret their predictions in the context of other known patient data. Further research is needed to illuminate deep-learning decision processes so that computers and clinicians can effectively cooperate.

15.
Ann Transl Med ; 7(11): 233, 2019 Jun.
Article in English | MEDLINE | ID: mdl-31317003

ABSTRACT

BACKGROUND: Errors in grammar, spelling, and usage in radiology reports are common. To automatically detect inappropriate insertions, deletions, and substitutions of words in radiology reports, we proposed using a neural sequence-to-sequence (seq2seq) model. METHODS: Head CT and chest radiograph reports from Mount Sinai Hospital (MSH) (n=61,722 and 818,978, respectively), Mount Sinai Queens (MSQ) (n=30,145 and 194,309, respectively) and MIMIC-III (n=32,259 and 54,685) were converted into sentences. Insertions, substitutions, and deletions of words were randomly introduced. Seq2seq models were trained using corrupted sentences as input to predict original uncorrupted sentences. Three models were trained using head CTs from MSH, chest radiographs from MSH, and head CTs from all three collections. Model performance was assessed across different sites and modalities. A sample of original, uncorrupted sentences were manually reviewed for any error in syntax, usage, or spelling to estimate real-world proofreading performance of the algorithm. RESULTS: Seq2seq detected 90.3% and 88.2% of corrupted sentences with 97.7% and 98.8% specificity in same-site, same-modality test sets for head CTs and chest radiographs, respectively. Manual review of original, uncorrupted same-site same-modality head CT sentences demonstrated seq2seq positive predictive value (PPV) 0.393 (157/400; 95% CI, 0.346-0.441) and negative predictive value (NPV) 0.986 (789/800; 95% CI, 0.976-0.992) for detecting sentences containing real-world errors, with estimated sensitivity of 0.389 (95% CI, 0.267-0.542) and specificity 0.986 (95% CI, 0.985-0.987) over n=86,211 uncorrupted training examples. CONCLUSIONS: Seq2seq models can be highly effective at detecting erroneous insertions, deletions, and substitutions of words in radiology reports. To achieve high performance, these models require site- and modality-specific training examples. Incorporating additional targeted training data could further improve performance in detecting real-world errors in reports.

16.
PLoS One ; 14(2): e0211057, 2019.
Article in English | MEDLINE | ID: mdl-30759094

ABSTRACT

This study trained long short-term memory (LSTM) recurrent neural networks (RNNs) incorporating an attention mechanism to predict daily sepsis, myocardial infarction (MI), and vancomycin antibiotic administration over two week patient ICU courses in the MIMIC-III dataset. These models achieved next-day predictive AUC of 0.876 for sepsis, 0.823 for MI, and 0.833 for vancomycin administration. Attention maps built from these models highlighted those times when input variables most influenced predictions and could provide a degree of interpretability to clinicians. These models appeared to attend to variables that were proxies for clinician decision-making, demonstrating a challenge of using flexible deep learning approaches trained with EHR data to build clinical decision support. While continued development and refinement is needed, we believe that such models could one day prove useful in reducing information overload for ICU physicians by providing needed clinical decision support for a variety of clinically important tasks.


Subject(s)
Clinical Decision-Making , Deep Learning , Diagnosis, Computer-Assisted , Intensive Care Units , Models, Biological , Myocardial Infarction/diagnosis , Sepsis/diagnosis , Anti-Bacterial Agents/administration & dosage , Clinical Decision-Making/methods , Humans , Myocardial Infarction/pathology , Retrospective Studies , Sepsis/drug therapy , Sepsis/pathology , Vancomycin/administration & dosage
17.
Radiol Artif Intell ; 1(1): e180019, 2019 Jan.
Article in English | MEDLINE | ID: mdl-33937782

ABSTRACT

PURPOSE: To determine if weakly supervised learning with surrogate metrics and active transfer learning can hasten clinical deployment of deep learning models. MATERIALS AND METHODS: By leveraging Liver Tumor Segmentation (LiTS) challenge 2017 public data (n = 131 studies), natural language processing of reports, and an active learning method, a model was trained to segment livers on 239 retrospectively collected portal venous phase abdominal CT studies obtained between January 1, 2014, and December 31, 2016. Absolute volume differences between predicted and originally reported liver volumes were used to guide active learning and assess accuracy. Overall survival based on liver volumes predicted by this model (n = 34 patients) versus radiology reports and Model for End-Stage Liver Disease with sodium (MELD-Na) scores was assessed. Differences in absolute liver volume were compared by using the paired Student t test, Bland-Altman analysis, and intraclass correlation; survival analysis was performed with the Kaplan-Meier method and a Mantel-Cox test. RESULTS: Data from patients with poor liver volume prediction (n = 10) with a model trained only with publicly available data were incorporated into an active learning method that trained a new model (LiTS data plus over- and underestimated active learning cases [LiTS-OU]) that performed significantly better on a held-out institutional test set (absolute volume difference of 231 vs 176 mL, P = .0005). In overall survival analysis, predicted liver volumes using the best active learning-trained model (LiTS-OU) were at least comparable with liver volumes extracted from radiology reports and MELD-Na scores in predicting survival. CONCLUSION: Active transfer learning using surrogate metrics facilitated deployment of deep learning models for clinically meaningful liver segmentation at a major liver transplant center.© RSNA, 2019Supplemental material is available for this article.

18.
Bioinformatics ; 35(9): 1610-1612, 2019 05 01.
Article in English | MEDLINE | ID: mdl-30304439

ABSTRACT

MOTIVATION: Radiologists have used algorithms for Computer-Aided Diagnosis (CAD) for decades. These algorithms use machine learning with engineered features, and there have been mixed findings on whether they improve radiologists' interpretations. Deep learning offers superior performance but requires more training data and has not been evaluated in joint algorithm-radiologist decision systems. RESULTS: We developed the Computer-Aided Note and Diagnosis Interface (CANDI) for collaboratively annotating radiographs and evaluating how algorithms alter human interpretation. The annotation app collects classification, segmentation, and image captioning training data, and the evaluation app randomizes the availability of CAD tools to facilitate clinical trials on radiologist enhancement. AVAILABILITY AND IMPLEMENTATION: Demonstrations and source code are hosted at (https://candi.nextgenhealthcare.org), and (https://github.com/mbadge/candi), respectively, under GPL-3 license. SUPPLEMENTARY INFORMATION: Supplementary material is available at Bioinformatics online.


Subject(s)
Algorithms , Software , Deep Learning , Humans , Machine Learning , Neural Networks, Computer
19.
PLoS Med ; 15(11): e1002683, 2018 11.
Article in English | MEDLINE | ID: mdl-30399157

ABSTRACT

BACKGROUND: There is interest in using convolutional neural networks (CNNs) to analyze medical imaging to provide computer-aided diagnosis (CAD). Recent work has suggested that image classification CNNs may not generalize to new data as well as previously believed. We assessed how well CNNs generalized across three hospital systems for a simulated pneumonia screening task. METHODS AND FINDINGS: A cross-sectional design with multiple model training cohorts was used to evaluate model generalizability to external sites using split-sample validation. A total of 158,323 chest radiographs were drawn from three institutions: National Institutes of Health Clinical Center (NIH; 112,120 from 30,805 patients), Mount Sinai Hospital (MSH; 42,396 from 12,904 patients), and Indiana University Network for Patient Care (IU; 3,807 from 3,683 patients). These patient populations had an age mean (SD) of 46.9 years (16.6), 63.2 years (16.5), and 49.6 years (17) with a female percentage of 43.5%, 44.8%, and 57.3%, respectively. We assessed individual models using the area under the receiver operating characteristic curve (AUC) for radiographic findings consistent with pneumonia and compared performance on different test sets with DeLong's test. The prevalence of pneumonia was high enough at MSH (34.2%) relative to NIH and IU (1.2% and 1.0%) that merely sorting by hospital system achieved an AUC of 0.861 (95% CI 0.855-0.866) on the joint MSH-NIH dataset. Models trained on data from either NIH or MSH had equivalent performance on IU (P values 0.580 and 0.273, respectively) and inferior performance on data from each other relative to an internal test set (i.e., new data from within the hospital system used for training data; P values both <0.001). The highest internal performance was achieved by combining training and test data from MSH and NIH (AUC 0.931, 95% CI 0.927-0.936), but this model demonstrated significantly lower external performance at IU (AUC 0.815, 95% CI 0.745-0.885, P = 0.001). To test the effect of pooling data from sites with disparate pneumonia prevalence, we used stratified subsampling to generate MSH-NIH cohorts that only differed in disease prevalence between training data sites. When both training data sites had the same pneumonia prevalence, the model performed consistently on external IU data (P = 0.88). When a 10-fold difference in pneumonia rate was introduced between sites, internal test performance improved compared to the balanced model (10× MSH risk P < 0.001; 10× NIH P = 0.002), but this outperformance failed to generalize to IU (MSH 10× P < 0.001; NIH 10× P = 0.027). CNNs were able to directly detect hospital system of a radiograph for 99.95% NIH (22,050/22,062) and 99.98% MSH (8,386/8,388) radiographs. The primary limitation of our approach and the available public data is that we cannot fully assess what other factors might be contributing to hospital system-specific biases. CONCLUSION: Pneumonia-screening CNNs achieved better internal than external performance in 3 out of 5 natural comparisons. When models were trained on pooled data from sites with different pneumonia prevalence, they performed better on new pooled data from these sites but not on external data. CNNs robustly identified hospital system and department within a hospital, which can have large differences in disease burden and may confound predictions.


Subject(s)
Deep Learning , Diagnosis, Computer-Assisted/methods , Pneumonia/diagnostic imaging , Radiographic Image Interpretation, Computer-Assisted/methods , Radiography, Thoracic/methods , Adult , Aged , Cross-Sectional Studies , Female , Humans , Male , Middle Aged , Predictive Value of Tests , Radiology Information Systems , Reproducibility of Results , Retrospective Studies , United States
20.
Nat Med ; 24(9): 1337-1341, 2018 09.
Article in English | MEDLINE | ID: mdl-30104767

ABSTRACT

Rapid diagnosis and treatment of acute neurological illnesses such as stroke, hemorrhage, and hydrocephalus are critical to achieving positive outcomes and preserving neurologic function-'time is brain'1-5. Although these disorders are often recognizable by their symptoms, the critical means of their diagnosis is rapid imaging6-10. Computer-aided surveillance of acute neurologic events in cranial imaging has the potential to triage radiology workflow, thus decreasing time to treatment and improving outcomes. Substantial clinical work has focused on computer-assisted diagnosis (CAD), whereas technical work in volumetric image analysis has focused primarily on segmentation. 3D convolutional neural networks (3D-CNNs) have primarily been used for supervised classification on 3D modeling and light detection and ranging (LiDAR) data11-15. Here, we demonstrate a 3D-CNN architecture that performs weakly supervised classification to screen head CT images for acute neurologic events. Features were automatically learned from a clinical radiology dataset comprising 37,236 head CTs and were annotated with a semisupervised natural-language processing (NLP) framework16. We demonstrate the effectiveness of our approach to triage radiology workflow and accelerate the time to diagnosis from minutes to seconds through a randomized, double-blinded, prospective trial in a simulated clinical environment.


Subject(s)
Imaging, Three-Dimensional , Neural Networks, Computer , Skull/diagnostic imaging , Algorithms , Automation , Humans , ROC Curve , Randomized Controlled Trials as Topic , Tomography, X-Ray Computed
SELECTION OF CITATIONS
SEARCH DETAIL
...