Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Cogn Res Princ Implic ; 9(1): 31, 2024 May 20.
Artigo em Inglês | MEDLINE | ID: mdl-38763994

RESUMO

A crucial bottleneck in medical artificial intelligence (AI) is high-quality labeled medical datasets. In this paper, we test a large variety of wisdom of the crowd algorithms to label medical images that were initially classified by individuals recruited through an app-based platform. Individuals classified skin lesions from the International Skin Lesion Challenge 2018 into 7 different categories. There was a large dispersion in the geographical location, experience, training, and performance of the recruited individuals. We tested several wisdom of the crowd algorithms of varying complexity from a simple unweighted average to more complex Bayesian models that account for individual patterns of errors. Using a switchboard analysis, we observe that the best-performing algorithms rely on selecting top performers, weighting decisions by training accuracy, and take into account the task environment. These algorithms far exceed expert performance. We conclude by discussing the implications of these approaches for the development of medical AI.


Assuntos
Inteligência Artificial , Humanos , Adulto , Crowdsourcing , Algoritmos , Teorema de Bayes
2.
Epileptic Disord ; 2024 Apr 26.
Artigo em Inglês | MEDLINE | ID: mdl-38669007

RESUMO

OBJECTIVE: To assess the effectiveness of an educational program leveraging technology-enhanced learning and retrieval practice to teach trainees how to correctly identify interictal epileptiform discharges (IEDs). METHODS: This was a bi-institutional prospective randomized controlled educational trial involving junior neurology residents. The intervention consisted of three video tutorials focused on the six IFCN criteria for IED identification and rating 500 candidate IEDs with instant feedback either on a web browser (intervention 1) or an iOS app (intervention 2). The control group underwent no educational intervention ("inactive control"). All residents completed a survey and a test at the onset and offset of the study. Performance metrics were calculated for each participant. RESULTS: Twenty-one residents completed the study: control (n = 8); intervention 1 (n = 6); intervention 2 (n = 7). All but two had no prior EEG experience. Intervention 1 residents improved from baseline (mean) in multiple metrics including AUC (.74; .85; p < .05), sensitivity (.53; .75; p < .05), and level of confidence (LOC) in identifying IEDs/committing patients to therapy (1.33; 2.33; p < .05). Intervention 2 residents improved in multiple metrics including AUC (.81; .86; p < .05) and LOC in identifying IEDs (2.00; 3.14; p < .05) and spike-wave discharges (2.00; 3.14; p < .05). Controls had no significant improvements in any measure. SIGNIFICANCE: This program led to significant subjective and objective improvements in IED identification. Rating candidate IEDs with instant feedback on a web browser (intervention 1) generated greater objective improvement in comparison to rating candidate IEDs on an iOS app (intervention 2). This program can complement trainee education concerning IED identification.

3.
Am J Perinatol ; 2024 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-38336117

RESUMO

OBJECTIVE: This proof-of-concept study assessed how confidently an artificial intelligence (AI) model can determine the sex of a fetus from an ultrasound image. STUDY DESIGN: Analysis was performed using 19,212 ultrasound image slices from a high-volume fetal sex determination practice. This dataset was split into a training set (11,769) and test set (7,443). A computer vision model was trained using a transfer learning approach with EfficientNetB4 architecture as base. The performance of the computer vision model was evaluated on the hold out test set. Accuracy, Cohen's Kappa and Multiclass Receiver Operating Characteristic area under the curve (AUC) were used to evaluate the performance of the model. RESULTS: The AI model achieved an Accuracy of 88.27% on the holdout test set and a Cohen's Kappa score 0.843. The ROC AUC score for Male was calculated to be 0.896, for Female a score of 0.897, for Unable to Assess a score of 0.916, and for Text Added a score of 0.981 was achieved. CONCLUSION: This novel AI model proved to have a high rate of fetal sex capture that could be of significant use in areas where ultrasound expertise is not readily available. KEY POINTS: · This is the first proof-of-concept AI model to determine fetal sex.. · This study adds to the growing research in ultrasound AI.. · Our findings demonstrate AI integration into obstetric care..

4.
JMIR Dermatol ; 6: e48589, 2023 Dec 26.
Artigo em Inglês | MEDLINE | ID: mdl-38147369

RESUMO

BACKGROUND: Chronic graft-versus-host disease (cGVHD) is a significant cause of long-term morbidity and mortality in patients after allogeneic hematopoietic cell transplantation. Skin is the most commonly affected organ, and visual assessment of cGVHD can have low reliability. Crowdsourcing data from nonexpert participants has been used for numerous medical applications, including image labeling and segmentation tasks. OBJECTIVE: This study aimed to assess the ability of crowds of nonexpert raters-individuals without any prior training for identifying or marking cGHVD-to demarcate photos of cGVHD-affected skin. We also studied the effect of training and feedback on crowd performance. METHODS: Using a Canfield Vectra H1 3D camera, 360 photographs of the skin of 36 patients with cGVHD were taken. Ground truth demarcations were provided in 3D by a trained expert and reviewed by a board-certified dermatologist. In total, 3000 2D images (projections from various angles) were created for crowd demarcation through the DiagnosUs mobile app. Raters were split into high and low feedback groups. The performances of 4 different crowds of nonexperts were analyzed, including 17 raters per image for the low and high feedback groups, 32-35 raters per image for the low feedback group, and the top 5 performers for each image from the low feedback group. RESULTS: Across 8 demarcation competitions, 130 raters were recruited to the high feedback group and 161 to the low feedback group. This resulted in a total of 54,887 individual demarcations from the high feedback group and 78,967 from the low feedback group. The nonexpert crowds achieved good overall performance for segmenting cGVHD-affected skin with minimal training, achieving a median surface area error of less than 12% of skin pixels for all crowds in both the high and low feedback groups. The low feedback crowds performed slightly poorer than the high feedback crowd, even when a larger crowd was used. Tracking the 5 most reliable raters from the low feedback group for each image recovered a performance similar to that of the high feedback crowd. Higher variability between raters for a given image was not found to correlate with lower performance of the crowd consensus demarcation and cannot therefore be used as a measure of reliability. No significant learning was observed during the task as more photos and feedback were seen. CONCLUSIONS: Crowds of nonexpert raters can demarcate cGVHD images with good overall performance. Tracking the top 5 most reliable raters provided optimal results, obtaining the best performance with the lowest number of expert demarcations required for adequate training. However, the agreement amongst individual nonexperts does not help predict whether the crowd has provided an accurate result. Future work should explore the performance of crowdsourcing in standard clinical photos and further methods to estimate the reliability of consensus demarcations.

5.
Clin Neurophysiol Pract ; 8: 177-186, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37681118

RESUMO

Objective: Misinterpretation of EEGs harms patients, yet few resources exist to help trainees practice interpreting EEGs. We therefore sought to evaluate a novel educational tool to teach trainees how to identify interictal epileptiform discharges (IEDs) on EEG. Methods: We created a public EEG test within the iOS app DiagnosUs using a pool of 13,262 candidate IEDs. Users were shown a candidate IED on EEG and asked to rate it as epileptiform (IED) or not (non-IED). They were given immediate feedback based on a gold standard. Learning was analyzed using a parametric model. We additionally analyzed IED features that best correlated with expert ratings. Results: Our analysis included 901 participants. Users achieved a mean improvement of 13% over 1,000 questions and an ending accuracy of 81%. Users and experts appeared to rely on a similar set of IED morphologic features when analyzing candidate IEDs. We additionally identified particular types of candidate EEGs that remained challenging for most users even after substantial practice. Conclusions: Users improved in their ability to properly classify candidate IEDs through repeated exposure and immediate feedback. Significance: This app-based learning activity has great potential to be an effective supplemental tool to teach neurology trainees how to accurately identify IEDs on EEG.

6.
IEEE J Biomed Health Inform ; 27(9): 4352-4361, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-37276107

RESUMO

Lung ultrasound (LUS) is an important imaging modality used by emergency physicians to assess pulmonary congestion at the patient bedside. B-line artifacts in LUS videos are key findings associated with pulmonary congestion. Not only can the interpretation of LUS be challenging for novice operators, but visual quantification of B-lines remains subject to observer variability. In this work, we investigate the strengths and weaknesses of multiple deep learning approaches for automated B-line detection and localization in LUS videos. We curate and publish, BEDLUS, a new ultrasound dataset comprising 1,419 videos from 113 patients with a total of 15,755 expert-annotated B-lines. Based on this dataset, we present a benchmark of established deep learning methods applied to the task of B-line detection. To pave the way for interpretable quantification of B-lines, we propose a novel "single-point" approach to B-line localization using only the point of origin. Our results show that (a) the area under the receiver operating characteristic curve ranges from 0.864 to 0.955 for the benchmarked detection methods, (b) within this range, the best performance is achieved by models that leverage multiple successive frames as input, and (c) the proposed single-point approach for B-line localization reaches an F 1-score of 0.65, performing on par with the inter-observer agreement. The dataset and developed methods can facilitate further biomedical research on automated interpretation of lung ultrasound with the potential to expand the clinical utility.


Assuntos
Aprendizado Profundo , Edema Pulmonar , Humanos , Pulmão/diagnóstico por imagem , Ultrassonografia/métodos , Edema Pulmonar/diagnóstico , Tórax
7.
JMIR Med Inform ; 11: e38412, 2023 Jan 18.
Artigo em Inglês | MEDLINE | ID: mdl-36652282

RESUMO

BACKGROUND: Dermoscopy is commonly used for the evaluation of pigmented lesions, but agreement between experts for identification of dermoscopic structures is known to be relatively poor. Expert labeling of medical data is a bottleneck in the development of machine learning (ML) tools, and crowdsourcing has been demonstrated as a cost- and time-efficient method for the annotation of medical images. OBJECTIVE: The aim of this study is to demonstrate that crowdsourcing can be used to label basic dermoscopic structures from images of pigmented lesions with similar reliability to a group of experts. METHODS: First, we obtained labels of 248 images of melanocytic lesions with 31 dermoscopic "subfeatures" labeled by 20 dermoscopy experts. These were then collapsed into 6 dermoscopic "superfeatures" based on structural similarity, due to low interrater reliability (IRR): dots, globules, lines, network structures, regression structures, and vessels. These images were then used as the gold standard for the crowd study. The commercial platform DiagnosUs was used to obtain annotations from a nonexpert crowd for the presence or absence of the 6 superfeatures in each of the 248 images. We replicated this methodology with a group of 7 dermatologists to allow direct comparison with the nonexpert crowd. The Cohen κ value was used to measure agreement across raters. RESULTS: In total, we obtained 139,731 ratings of the 6 dermoscopic superfeatures from the crowd. There was relatively lower agreement for the identification of dots and globules (the median κ values were 0.526 and 0.395, respectively), whereas network structures and vessels showed the highest agreement (the median κ values were 0.581 and 0.798, respectively). This pattern was also seen among the expert raters, who had median κ values of 0.483 and 0.517 for dots and globules, respectively, and 0.758 and 0.790 for network structures and vessels. The median κ values between nonexperts and thresholded average-expert readers were 0.709 for dots, 0.719 for globules, 0.714 for lines, 0.838 for network structures, 0.818 for regression structures, and 0.728 for vessels. CONCLUSIONS: This study confirmed that IRR for different dermoscopic features varied among a group of experts; a similar pattern was observed in a nonexpert crowd. There was good or excellent agreement for each of the 6 superfeatures between the crowd and the experts, highlighting the similar reliability of the crowd for labeling dermoscopic images. This confirms the feasibility and dependability of using crowdsourcing as a scalable solution to annotate large sets of dermoscopic images, with several potential clinical and educational applications, including the development of novel, explainable ML tools.

8.
J Arthroplasty ; 38(10): 2075-2080, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-35398523

RESUMO

BACKGROUND: The purpose of this study is to assess the viability of a knee arthroplasty prediction model using 3-view X-rays that helps determine if patients with knee pain are candidates for total knee arthroplasty (TKA), unicompartmental knee arthroplasty (UKA), or are not arthroplasty candidates. METHODS: Analysis was performed using radiographic and surgical data from a high-volume joint replacement practice. The dataset included 3 different X-ray views (anterior-posterior, lateral, and sunrise) for 2,767 patients along with information of whether that patient underwent an arthroplasty surgery (UKA or TKA) or not. This resulted in a dataset including 8,301 images from 2,707 patients. This dataset was then split into a training set (70%) and holdout test set (30%). A computer vision model was trained using a transfer learning approach. The performance of the computer vision model was evaluated on the holdout test set. Accuracy and multiclass receiver operating characteristic area under curve was used to evaluate the performance of the model. RESULTS: The artificial intelligence model achieved an accuracy of 87.8% on the holdout test set and a quadratic Cohen's kappa score of 0.811. The multiclass receiver operating characteristic area under curve score for TKA was calculated to be 0.97; for UKA a score of 0.96 and for No Surgery a score of 0.98 was achieved. An accuracy of 93.8% was achieved for predicting Surgery versus No Surgery and 88% for TKA versus not TKA was achieved. CONCLUSION: The artificial intelligence/machine learning model demonstrated viability for predicting which patients are candidates for a UKA, TKA, or no surgical intervention.


Assuntos
Artroplastia do Joelho , Osteoartrite do Joelho , Humanos , Artroplastia do Joelho/métodos , Osteoartrite do Joelho/cirurgia , Inteligência Artificial , Resultado do Tratamento , Articulação do Joelho/diagnóstico por imagem , Articulação do Joelho/cirurgia , Aprendizado de Máquina
9.
Acad Radiol ; 29 Suppl 5: S70-S75, 2022 05.
Artigo em Inglês | MEDLINE | ID: mdl-34020872

RESUMO

Radiology education is understood to be an important component of medical school and resident training, yet lacks a standardization of instruction. The lack of uniformity in both how radiology is taught and learned has afforded opportunities for new technologies to intervene. Now with the integration of artificial intelligence within medicine, it is likely that the current medical trainee curricula will experience the impact it has to offer both for education and medical practice. In this paper, we seek to investigate the landscape of radiologic education within the current medical trainee curricula, and also to understand how artificial intelligence may potentially impact the current and future radiologic education model.


Assuntos
Internato e Residência , Radiologia , Inteligência Artificial , Currículo , Humanos , Radiografia , Radiologia/educação
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...