Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 10 de 10
Filter
1.
Transl Vis Sci Technol ; 10(2): 13, 2021 02 05.
Article in English | MEDLINE | ID: mdl-34003898

ABSTRACT

Purpose: This study evaluated generative methods to potentially mitigate artificial intelligence (AI) bias when diagnosing diabetic retinopathy (DR) resulting from training data imbalance or domain generalization, which occurs when deep learning systems (DLSs) face concepts at test/inference time they were not initially trained on. Methods: The public domain Kaggle EyePACS dataset (88,692 fundi and 44,346 individuals, originally diverse for ethnicity) was modified by adding clinician-annotated labels and constructing an artificial scenario of data imbalance and domain generalization by disallowing training (but not testing) exemplars for images of retinas with DR warranting referral (DR-referable) from darker-skin individuals, who presumably have greater concentration of melanin within uveal melanocytes, on average, contributing to retinal image pigmentation. A traditional/baseline diagnostic DLS was compared against new DLSs that would use training data augmented via generative models for debiasing. Results: Accuracy (95% confidence intervals [CIs]) of the baseline diagnostics DLS for fundus images of lighter-skin individuals was 73.0% (66.9% to 79.2%) versus darker-skin of 60.5% (53.5% to 67.3%), demonstrating bias/disparity (delta = 12.5%; Welch t-test t = 2.670, P = 0.008) in AI performance across protected subpopulations. Using novel generative methods for addressing missing subpopulation training data (DR-referable darker-skin) achieved instead accuracy, for lighter-skin, of 72.0% (65.8% to 78.2%), and for darker-skin, of 71.5% (65.2% to 77.8%), demonstrating closer parity (delta = 0.5%) in accuracy across subpopulations (Welch t-test t = 0.111, P = 0.912). Conclusions: Findings illustrate how data imbalance and domain generalization can lead to disparity of accuracy across subpopulations, and show that novel generative methods of synthetic fundus images may play a role for debiasing AI. Translational Relevance: New AI methods have possible applications to address potential AI bias in DR diagnostics from fundus pigmentation, and potentially other ophthalmic DLSs too.


Subject(s)
Artificial Intelligence , Diabetic Retinopathy , Diabetic Retinopathy/diagnosis , Fundus Oculi , Humans , Mass Screening , Retina
2.
JAMA Ophthalmol ; 138(10): 1070-1077, 2020 10 01.
Article in English | MEDLINE | ID: mdl-32880609

ABSTRACT

Importance: Recent studies have demonstrated the successful application of artificial intelligence (AI) for automated retinal disease diagnostics but have not addressed a fundamental challenge for deep learning systems: the current need for large, criterion standard-annotated retinal data sets for training. Low-shot learning algorithms, aiming to learn from a relatively low number of training data, may be beneficial for clinical situations involving rare retinal diseases or when addressing potential bias resulting from data that may not adequately represent certain groups for training, such as individuals older than 85 years. Objective: To evaluate whether low-shot deep learning methods are beneficial when using small training data sets for automated retinal diagnostics. Design, Setting, and Participants: This cross-sectional study, conducted from July 1, 2019, to June 21, 2020, compared different diabetic retinopathy classification algorithms, traditional and low-shot, for 2-class designations (diabetic retinopathy warranting referral vs not warranting referral). The public domain EyePACS data set was used, which originally included 88 692 fundi from 44 346 individuals. Statistical analysis was performed from February 1 to June 21, 2020. Main Outcomes and Measures: The performance (95% CIs) of the various AI algorithms was measured via receiver operating curves and their area under the curve (AUC), precision recall curves, accuracy, and F1 score, evaluated for different training data sizes, ranging from 5120 to 10 samples per class. Results: Deep learning algorithms, when trained with sufficiently large data sets (5120 samples per class), yielded comparable performance, with an AUC of 0.8330 (95% CI, 0.8140-0.8520) for a traditional approach (eg, fined-tuned ResNet), compared with low-shot methods (AUC, 0.8348 [95% CI, 0.8159-0.8537]) (using self-supervised Deep InfoMax [our method denoted as DIM]). However, when far fewer training images were available (n = 160), the traditional deep learning approach had an AUC decreasing to 0.6585 (95% CI, 0.6332-0.6838) and was outperformed by a low-shot method using self-supervision with an AUC of 0.7467 (95% CI, 0.7239-0.7695). At very low shots (n = 10), the traditional approach had performance close to chance, with an AUC of 0.5178 (95% CI, 0.4909-0.5447) compared with the best low-shot method (AUC, 0.5778 [95% CI, 0.5512-0.6044]). Conclusions and Relevance: These findings suggest the potential benefits of using low-shot methods for AI retinal diagnostics when a limited number of annotated training retinal images are available (eg, with rare ophthalmic diseases or when addressing potential AI bias).


Subject(s)
Algorithms , Artificial Intelligence , Deep Learning , Diabetic Retinopathy/diagnosis , Neural Networks, Computer , Rare Diseases/diagnosis , Cross-Sectional Studies , Female , Humans , Male , ROC Curve , Retrospective Studies
3.
JAMA Ophthalmol ; 137(3): 258-264, 2019 03 01.
Article in English | MEDLINE | ID: mdl-30629091

ABSTRACT

Importance: Deep learning (DL) used for discriminative tasks in ophthalmology, such as diagnosing diabetic retinopathy or age-related macular degeneration (AMD), requires large image data sets graded by human experts to train deep convolutional neural networks (DCNNs). In contrast, generative DL techniques could synthesize large new data sets of artificial retina images with different stages of AMD. Such images could enhance existing data sets of common and rare ophthalmic diseases without concern for personally identifying information to assist medical education of students, residents, and retinal specialists, as well as for training new DL diagnostic models for which extensive data sets from large clinical trials of expertly graded images may not exist. Objective: To develop DL techniques for synthesizing high-resolution realistic fundus images serving as proxy data sets for use by retinal specialists and DL machines. Design, Setting, and Participants: Generative adversarial networks were trained on 133 821 color fundus images from 4613 study participants from the Age-Related Eye Disease Study (AREDS), generating synthetic fundus images with and without AMD. We compared retinal specialists' ability to diagnose AMD on both real and synthetic images, asking them to assess image gradability and testing their ability to discern real from synthetic images. The performance of AMD diagnostic DCNNs (referable vs not referable AMD) trained on either all-real vs all-synthetic data sets was compared. Main Outcomes and Measures: Accuracy of 2 retinal specialists (T.Y.A.L. and K.D.P.) for diagnosing and distinguishing AMD on real vs synthetic images and diagnostic performance (area under the curve) of DL algorithms trained on synthetic vs real images. Results: The diagnostic accuracy of 2 retinal specialists on real vs synthetic images was similar. The accuracy of diagnosis as referable vs nonreferable AMD compared with certified human graders for retinal specialist 1 was 84.54% (error margin, 4.06%) on real images vs 84.12% (error margin, 4.16%) on synthetic images and for retinal specialist 2 was 89.47% (error margin, 3.45%) on real images vs 89.19% (error margin, 3.54%) on synthetic images. Retinal specialists could not distinguish real from synthetic images, with an accuracy of 59.50% (error margin, 3.93%) for retinal specialist 1 and 53.67% (error margin, 3.99%) for retinal specialist 2. The DCNNs trained on real data showed an area under the curve of 0.9706 (error margin, 0.0029), and those trained on synthetic data showed an area under the curve of 0.9235 (error margin, 0.0045). Conclusions and Relevance: Deep learning-synthesized images appeared to be realistic to retinal specialists, and DCNNs achieved diagnostic performance on synthetic data close to that for real images, suggesting that DL generative techniques hold promise for training humans and machines.


Subject(s)
Deep Learning , Diagnostic Techniques, Ophthalmological , Macular Degeneration/diagnosis , Fundus Oculi , Humans , Reproducibility of Results
4.
JAMA Ophthalmol ; 136(12): 1359-1366, 2018 12 01.
Article in English | MEDLINE | ID: mdl-30242349

ABSTRACT

Importance: Although deep learning (DL) can identify the intermediate or advanced stages of age-related macular degeneration (AMD) as a binary yes or no, stratified gradings using the more granular Age-Related Eye Disease Study (AREDS) 9-step detailed severity scale for AMD provide more precise estimation of 5-year progression to advanced stages. The AREDS 9-step detailed scale's complexity and implementation solely with highly trained fundus photograph graders potentially hampered its clinical use, warranting development and use of an alternate AREDS simple scale, which although valuable, has less predictive ability. Objective: To describe DL techniques for the AREDS 9-step detailed severity scale for AMD to estimate 5-year risk probability with reasonable accuracy. Design, Setting, and Participants: This study used data collected from November 13, 1992, to November 30, 2005, from 4613 study participants of the AREDS data set to develop deep convolutional neural networks that were trained to provide detailed automated AMD grading on several AMD severity classification scales, using a multiclass classification setting. Two AMD severity classification problems using criteria based on 4-step (AMD-1, AMD-2, AMD-3, and AMD-4 from classifications developed for AREDS eligibility criteria) and 9-step (from AREDS detailed severity scale) AMD severity scales were investigated. The performance of these algorithms was compared with a contemporary human grader and against a criterion standard (fundus photograph reading center graders) used at the time of AREDS enrollment and follow-up. Three methods for estimating 5-year risk were developed, including one based on DL regression. Data were analyzed from December 1, 2017, through April 15, 2018. Main Outcomes and Measures: Weighted κ scores and mean unsigned errors for estimating 5-year risk probability of progression to advanced AMD. Results: This study used 67 401 color fundus images from the 4613 study participants. The weighted κ scores were 0.77 for the 4-step and 0.74 for the 9-step AMD severity scales. The overall mean estimation error for the 5-year risk ranged from 3.5% to 5.3%. Conclusions and Relevance: These findings suggest that DL AMD grading has, for the 4-step classification evaluation, performance comparable with that of humans and achieves promising results for providing AMD detailed severity grading (9-step classification), which normally requires highly trained graders, and for estimating 5-year risk of progression to advanced AMD. Use of DL has the potential to assist physicians in longitudinal care for individualized, detailed risk assessment as well as clinical studies of disease progression during treatment or as public screening or monitoring worldwide.


Subject(s)
Algorithms , Deep Learning , Diagnostic Techniques, Ophthalmological , Macula Lutea/diagnostic imaging , Macular Degeneration/diagnosis , Risk Assessment/methods , Aged , Disease Progression , Female , Follow-Up Studies , Humans , Incidence , Male , Middle Aged , Reproducibility of Results , Retrospective Studies , Risk Factors , Severity of Illness Index , Time Factors , United States/epidemiology
6.
JAMA Ophthalmol ; 135(11): 1170-1176, 2017 11 01.
Article in English | MEDLINE | ID: mdl-28973096

ABSTRACT

Importance: Age-related macular degeneration (AMD) affects millions of people throughout the world. The intermediate stage may go undetected, as it typically is asymptomatic. However, the preferred practice patterns for AMD recommend identifying individuals with this stage of the disease to educate how to monitor for the early detection of the choroidal neovascular stage before substantial vision loss has occurred and to consider dietary supplements that might reduce the risk of the disease progressing from the intermediate to the advanced stage. Identification, though, can be time-intensive and requires expertly trained individuals. Objective: To develop methods for automatically detecting AMD from fundus images using a novel application of deep learning methods to the automated assessment of these images and to leverage artificial intelligence advances. Design, Setting, and Participants: Deep convolutional neural networks that are explicitly trained for performing automated AMD grading were compared with an alternate deep learning method that used transfer learning and universal features and with a trained clinical grader. Age-related macular degeneration automated detection was applied to a 2-class classification problem in which the task was to distinguish the disease-free/early stages from the referable intermediate/advanced stages. Using several experiments that entailed different data partitioning, the performance of the machine algorithms and human graders in evaluating over 130 000 images that were deidentified with respect to age, sex, and race/ethnicity from 4613 patients against a gold standard included in the National Institutes of Health Age-related Eye Disease Study data set was evaluated. Main Outcomes and Measures: Accuracy, receiver operating characteristics and area under the curve, and kappa score. Results: The deep convolutional neural network method yielded accuracy (SD) that ranged between 88.4% (0.5%) and 91.6% (0.1%), the area under the receiver operating characteristic curve was between 0.94 and 0.96, and kappa coefficient (SD) between 0.764 (0.010) and 0.829 (0.003), which indicated a substantial agreement with the gold standard Age-related Eye Disease Study data set. Conclusions and Relevance: Applying a deep learning-based automated assessment of AMD from fundus images can produce results that are similar to human performance levels. This study demonstrates that automated algorithms could play a role that is independent of expert human graders in the current management of AMD and could address the costs of screening or monitoring, access to health care, and the assessment of novel treatments that address the development or progression of AMD.


Subject(s)
Algorithms , Machine Learning , Neural Networks, Computer , Wet Macular Degeneration/diagnosis , Fundus Oculi , Humans , ROC Curve , Reproducibility of Results
7.
Comput Biol Med ; 82: 80-86, 2017 03 01.
Article in English | MEDLINE | ID: mdl-28167406

ABSTRACT

BACKGROUND: When left untreated, age-related macular degeneration (AMD) is the leading cause of vision loss in people over fifty in the US. Currently it is estimated that about eight million US individuals have the intermediate stage of AMD that is often asymptomatic with regard to visual deficit. These individuals are at high risk for progressing to the advanced stage where the often treatable choroidal neovascular form of AMD can occur. Careful monitoring to detect the onset and prompt treatment of the neovascular form as well as dietary supplementation can reduce the risk of vision loss from AMD, therefore, preferred practice patterns recommend identifying individuals with the intermediate stage in a timely manner. METHODS: Past automated retinal image analysis (ARIA) methods applied on fundus imagery have relied on engineered and hand-designed visual features. We instead detail the novel application of a machine learning approach using deep learning for the problem of ARIA and AMD analysis. We use transfer learning and universal features derived from deep convolutional neural networks (DCNN). We address clinically relevant 4-class, 3-class, and 2-class AMD severity classification problems. RESULTS: Using 5664 color fundus images from the NIH AREDS dataset and DCNN universal features, we obtain values for accuracy for the (4-, 3-, 2-) class classification problem of (79.4%, 81.5%, 93.4%) for machine vs. (75.8%, 85.0%, 95.2%) for physician grading. DISCUSSION: This study demonstrates the efficacy of machine grading based on deep universal features/transfer learning when applied to ARIA and is a promising step in providing a pre-screener to identify individuals with intermediate AMD and also as a tool that can facilitate identifying such individuals for clinical studies aimed at developing improved therapies. It also demonstrates comparable performance between computer and physician grading.


Subject(s)
Algorithms , Fluorescein Angiography/methods , Machine Learning , Macular Degeneration/diagnostic imaging , Macular Degeneration/pathology , Pattern Recognition, Automated/methods , Early Diagnosis , Humans , Image Interpretation, Computer-Assisted , Macular Degeneration/classification , Observer Variation , Reproducibility of Results , Sensitivity and Specificity , Severity of Illness Index
8.
Am J Ophthalmol ; 177: 90-99, 2017 May.
Article in English | MEDLINE | ID: mdl-28212878

ABSTRACT

PURPOSE: To evaluate macular vascular flow abnormalities identified by optical coherence tomography angiography (OCT-A) in patients with various sickle cell genotypes. DESIGN: Prospective, observational case series. METHODS: This is a single-institution case series of adult patients with various sickle cell genotypes. All patients underwent macular OCT-A (Avanti RTVue XR). Images were analyzed qualitatively for areas of flow loss and quantitatively for measures of foveal avascular area, parafoveal flow, and vascular density. The findings were compared by sickle cell genotype and retinopathy stage and correlated to retinal thickness and visual acuity. RESULTS: OCT-A scans of 82 eyes from 46 patients (60.9% female, mean age 33.5 years) were included. Sickle cell genotypes included 27 patients with hemoglobin SS (58.7%), 14 SC (30.4%), 4 beta-thalassemia (8.7%), and 1 sickle trait (2.2%). Discrete areas of flow loss were noted in 37.8% (31/82) of eyes overall and were common in both SS (40.0%, 20/50 eyes) and SC (41.7%, 10/24 eyes). Flow loss was more extensive in the temporal and nasal parafoveal subfields of the deep plexus with sickle SC or proliferative retinopathy. Retinal thickness measurements correlated with vascular density of the fovea, parafovea, and temporal and superior subfields. Visual acuity correlated with foveal avascular zone area and parafoveal vascular density in the superficial and deep plexi. CONCLUSIONS: Areas of abnormal macular vascular flow are common in patients with various sickle cell genotypes. These areas may be seen at any retinopathy stage but may be more extensive with sickle SC or proliferative retinopathy.


Subject(s)
Anemia, Sickle Cell/complications , Macula Lutea/blood supply , Retinal Diseases/diagnosis , Retinal Vessels/abnormalities , Tomography, Optical Coherence/methods , Adult , Anemia, Sickle Cell/diagnosis , Female , Fluorescein Angiography , Follow-Up Studies , Fundus Oculi , Humans , Male , Prospective Studies , Retinal Diseases/etiology , Retinal Diseases/physiopathology , Retinal Vessels/physiopathology , Visual Acuity
9.
Arq Bras Endocrinol Metabol ; 52(1): 93-100, 2008 Feb.
Article in English | MEDLINE | ID: mdl-18345401

ABSTRACT

In order to establish cut-off limits and to distinguish isolated premature thelarche (IPT) from precocious puberty (PP), we evaluated data from 79 girls with premature thelarche, comparing basal and stimulated LH and FSH serum concentrations with those from 91 healthy girls. A GnRH stimulation test was performed in 10 normal girls and in 42 with premature thelarche. Comparison among groups was performed by Kruskal-Wallis and Dunns tests. LH values were significantly greater in girls with IPT than in control groups. Basal gonadotropin concentrations were higher in patients with PP than in controls, but not different from patients with IPT. Peak LH levels after GnRH stimulation distinguished those two groups, with a cut-off value of 4.0 IU/L, but still with minimal overlap. In conclusion, a girl with premature thelarche and LH peak value above 4.5 IU/L has, indeed, PP, but values between 3.5 and 4.5 IU/L point to careful follow-up.


Subject(s)
Breast/growth & development , Follicle Stimulating Hormone/blood , Gonadotropin-Releasing Hormone/blood , Immunoassay/methods , Luteinizing Hormone/blood , Puberty, Precocious/blood , Biomarkers/blood , Case-Control Studies , Child , Child, Preschool , Female , Humans , Infant , Luminescent Measurements , Puberty, Precocious/diagnosis , Sensitivity and Specificity , Statistics, Nonparametric
10.
Arq. bras. endocrinol. metab ; 52(1): 93-100, fev. 2008. ilus, tab
Article in English | LILACS | ID: lil-477448

ABSTRACT

In order to establish cut-off limits and to distinguish isolated premature thelarche (IPT) from precocious puberty (PP), we evaluated data from 79 girls with premature thelarche, comparing basal and stimulated LH and FSH serum concentrations with those from 91 healthy girls. A GnRH stimulation test was performed in 10 normal girls and in 42 with premature thelarche. Comparison among groups was performed by Kruskal-Wallis and Dunn’s tests. LH values were significantly greater in girls with IPT than in control groups. Basal gonadotropin concentrations were higher in patients with PP than in controls, but not different from patients with IPT. Peak LH levels after GnRH stimulation distinguished those two groups, with a cut-off value of 4.0 IU/L, but still with minimal overlap. In conclusion, a girl with premature thelarche and LH peak value above 4.5 IU/L has, indeed, PP, but values between 3.5 and 4.5 IU/L point to careful follow-up.


Com o objetivo de estabelecer o valor de corte e distinguir telarca precoce isolada (TPI) de puberdade precoce (PP), avaliamos 79 meninas com telarca precoce, comparando as dosagens basais e pós-estímulo de LH e FSH com grupo-controle. O teste de estímulo com GnRH foi realizado em 10 meninas normais e em 42 com telarca precoce. Os testes de Kruskal-Wallis and Dunn foram usados na comparação dos grupos. Os níveis de LH foram significativamente mais elevados no grupo com TPI, quando comparados com controles. As gonadotrofinas basais foram mais elevadas naquelas com PP que nos controles, mas não diferiram do grupo com TPI. O pico de LH após GnRH distinguiu estes dois grupos, com valor de corte de 4,0 UI/L, apesar de pequena sobreposição. Concluímos que uma menina com telarca precoce e LH pós-estímulo acima de 4,5 UI/L apresenta PP, mas valores entre 3,5 e 4,5 UI/L requerem seguimento cuidadoso.


Subject(s)
Child , Child, Preschool , Female , Humans , Infant , Breast/growth & development , Follicle Stimulating Hormone/blood , Gonadotropin-Releasing Hormone/blood , Immunoassay/methods , Luteinizing Hormone/blood , Puberty, Precocious/blood , Biomarkers/blood , Case-Control Studies , Luminescent Measurements , Puberty, Precocious/diagnosis , Sensitivity and Specificity , Statistics, Nonparametric
SELECTION OF CITATIONS
SEARCH DETAIL
...