Search | VHL Regional Portal

Test accuracy of artificial intelligence-based grading of fundus images in diabetic retinopathy screening: A systematic review.

Zhelev, Zhivko; Peters, Jaime; Rogers, Morwenna; Allen, Michael; Kijauskaite, Goda; Seedat, Farah; Wilkinson, Elizabeth; Hyde, Christopher.

J Med Screen ; 30(3): 97-112, 2023 09.

Article in English | MEDLINE | ID: mdl-36617971

ABSTRACT

OBJECTIVES: To systematically review the accuracy of artificial intelligence (AI)-based systems for grading of fundus images in diabetic retinopathy (DR) screening. METHODS: We searched MEDLINE, EMBASE, the Cochrane Library and the ClinicalTrials.gov from 1st January 2000 to 27th August 2021. Accuracy studies published in English were included if they met the pre-specified inclusion criteria. Selection of studies for inclusion, data extraction and quality assessment were conducted by one author with a second reviewer independently screening and checking 20% of titles. Results were analysed narratively. RESULTS: Forty-three studies evaluating 15 deep learning (DL) and 4 machine learning (ML) systems were included. Nine systems were evaluated in a single study each. Most studies were judged to be at high or unclear risk of bias in at least one QUADAS-2 domain. Sensitivity for referable DR and higher grades was ≥85% while specificity varied and was <80% for all ML systems and in 6/31 studies evaluating DL systems. Studies reported high accuracy for detection of ungradable images, but the latter were analysed and reported inconsistently. Seven studies reported that AI was more sensitive but less specific than human graders. CONCLUSIONS: AI-based systems are more sensitive than human graders and could be safe to use in clinical practice but have variable specificity. However, for many systems evidence is limited, at high risk of bias and may not generalise across settings. Therefore, pre-implementation assessment in the target clinical pathway is essential to obtain reliable and applicable accuracy estimates.

Subject(s)

Diabetes Mellitus , Diabetic Retinopathy , Humans , Artificial Intelligence , Diabetic Retinopathy/diagnostic imaging , Early Detection of Cancer , Mass Screening/methods

Recommendations for the development and use of imaging test sets to investigate the test performance of artificial intelligence in health screening.

Chalkidou, Anastasia; Shokraneh, Farhad; Kijauskaite, Goda; Taylor-Phillips, Sian; Halligan, Steve; Wilkinson, Louise; Glocker, Ben; Garrett, Peter; Denniston, Alastair K; Mackie, Anne; Seedat, Farah.

Lancet Digit Health ; 4(12): e899-e905, 2022 12.

Article in English | MEDLINE | ID: mdl-36427951

ABSTRACT

Rigorous evaluation of artificial intelligence (AI) systems for image classification is essential before deployment into health-care settings, such as screening programmes, so that adoption is effective and safe. A key step in the evaluation process is the external validation of diagnostic performance using a test set of images. We conducted a rapid literature review on methods to develop test sets, published from 2012 to 2020, in English. Using thematic analysis, we mapped themes and coded the principles using the Population, Intervention, and Comparator or Reference standard, Outcome, and Study design framework. A group of screening and AI experts assessed the evidence-based principles for completeness and provided further considerations. From the final 15 principles recommended here, five affect population, one intervention, two comparator, one reference standard, and one both reference standard and comparator. Finally, four are appliable to outcome and one to study design. Principles from the literature were useful to address biases from AI; however, they did not account for screening specific biases, which we now incorporate. The principles set out here should be used to support the development and use of test sets for studies that assess the accuracy of AI within screening programmes, to ensure they are fit for purpose and minimise bias.

Subject(s)

Artificial Intelligence , Diagnostic Imaging , Mass Screening

UK National Screening Committee's approach to reviewing evidence on artificial intelligence in breast cancer screening.

Taylor-Phillips, Sian; Seedat, Farah; Kijauskaite, Goda; Marshall, John; Halligan, Steve; Hyde, Chris; Given-Wilson, Rosalind; Wilkinson, Louise; Denniston, Alastair K; Glocker, Ben; Garrett, Peter; Mackie, Anne; Steele, Robert J.

Lancet Digit Health ; 4(7): e558-e565, 2022 07.

Article in English | MEDLINE | ID: mdl-35750402

ABSTRACT

Artificial intelligence (AI) could have the potential to accurately classify mammograms according to the presence or absence of radiological signs of breast cancer, replacing or supplementing human readers (radiologists). The UK National Screening Committee's assessments of the use of AI systems to examine screening mammograms continues to focus on maximising benefits and minimising harms to women screened, when deciding whether to recommend the implementation of AI into the Breast Screening Programme in the UK. Maintaining or improving programme specificity is important to minimise anxiety from false positive results. When considering cancer detection, AI test sensitivity alone is not sufficiently informative, and additional information on the spectrum of disease detected and interval cancers is crucial to better understand the benefits and harms of screening. Although large retrospective studies might provide useful evidence by directly comparing test accuracy and spectrum of disease detected between different AI systems and by population subgroup, most retrospective studies are biased due to differential verification (ie, the use of different reference standards to verify the target condition among study participants). Enriched, multiple-reader, multiple-case, test set laboratory studies are also biased due to the laboratory effect (ie, radiologists' performance in retrospective, laboratory, observer studies is substantially different to their performance in a clinical environment). Therefore, assessment of the effect of incorporating any AI system into the breast screening pathway in prospective studies is required as it will provide key evidence for the effect of the interaction of medical staff with AI, and the impact on women's outcomes.

Subject(s)

Breast Neoplasms , Early Detection of Cancer , Artificial Intelligence , Breast Neoplasms/diagnosis , Early Detection of Cancer/methods , Female , Humans , Retrospective Studies , United Kingdom

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL