Pesquisa | Portal Regional da BVS

1.

Understanding metric-related pitfalls in image analysis validation.

Reinke, Annika; Tizabi, Minu D; Baumgartner, Michael; Eisenmann, Matthias; Heckmann-Nötzel, Doreen; Kavur, A Emre; Rädsch, Tim; Sudre, Carole H; Acion, Laura; Antonelli, Michela; Arbel, Tal; Bakas, Spyridon; Benis, Arriel; Buettner, Florian; Cardoso, M Jorge; Cheplygina, Veronika; Chen, Jianxu; Christodoulou, Evangelia; Cimini, Beth A; Farahani, Keyvan; Ferrer, Luciana; Galdran, Adrian; van Ginneken, Bram; Glocker, Ben; Godau, Patrick; Hashimoto, Daniel A; Hoffman, Michael M; Huisman, Merel; Isensee, Fabian; Jannin, Pierre; Kahn, Charles E; Kainmueller, Dagmar; Kainz, Bernhard; Karargyris, Alexandros; Kleesiek, Jens; Kofler, Florian; Kooi, Thijs; Kopp-Schneider, Annette; Kozubek, Michal; Kreshuk, Anna; Kurc, Tahsin; Landman, Bennett A; Litjens, Geert; Madani, Amin; Maier-Hein, Klaus; Martel, Anne L; Meijering, Erik; Menze, Bjoern; Moons, Karel G M; Müller, Henning.

Nat Methods ; 21(2): 182-194, 2024 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-38347140

RESUMO

Validation metrics are key for tracking scientific progress and bridging the current chasm between artificial intelligence research and its translation into practice. However, increasing evidence shows that, particularly in image analysis, metrics are often chosen inadequately. Although taking into account the individual strengths, weaknesses and limitations of validation metrics is a critical prerequisite to making educated choices, the relevant knowledge is currently scattered and poorly accessible to individual researchers. Based on a multistage Delphi process conducted by a multidisciplinary expert consortium as well as extensive community feedback, the present work provides a reliable and comprehensive common point of access to information on pitfalls related to validation metrics in image analysis. Although focused on biomedical image analysis, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy. The work serves to enhance global comprehension of a key topic in image analysis validation.

Assuntos

Inteligência Artificial

2.

Metrics reloaded: recommendations for image analysis validation.

Maier-Hein, Lena; Reinke, Annika; Godau, Patrick; Tizabi, Minu D; Buettner, Florian; Christodoulou, Evangelia; Glocker, Ben; Isensee, Fabian; Kleesiek, Jens; Kozubek, Michal; Reyes, Mauricio; Riegler, Michael A; Wiesenfarth, Manuel; Kavur, A Emre; Sudre, Carole H; Baumgartner, Michael; Eisenmann, Matthias; Heckmann-Nötzel, Doreen; Rädsch, Tim; Acion, Laura; Antonelli, Michela; Arbel, Tal; Bakas, Spyridon; Benis, Arriel; Blaschko, Matthew B; Cardoso, M Jorge; Cheplygina, Veronika; Cimini, Beth A; Collins, Gary S; Farahani, Keyvan; Ferrer, Luciana; Galdran, Adrian; van Ginneken, Bram; Haase, Robert; Hashimoto, Daniel A; Hoffman, Michael M; Huisman, Merel; Jannin, Pierre; Kahn, Charles E; Kainmueller, Dagmar; Kainz, Bernhard; Karargyris, Alexandros; Karthikesalingam, Alan; Kofler, Florian; Kopp-Schneider, Annette; Kreshuk, Anna; Kurc, Tahsin; Landman, Bennett A; Litjens, Geert; Madani, Amin.

Nat Methods ; 21(2): 195-212, 2024 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-38347141

RESUMO

Increasing evidence shows that flaws in machine learning (ML) algorithm validation are an underestimated global problem. In biomedical image analysis, chosen performance metrics often do not reflect the domain interest, and thus fail to adequately measure scientific progress and hinder translation of ML techniques into practice. To overcome this, we created Metrics Reloaded, a comprehensive framework guiding researchers in the problem-aware selection of metrics. Developed by a large international consortium in a multistage Delphi process, it is based on the novel concept of a problem fingerprint-a structured representation of the given problem that captures all aspects that are relevant for metric selection, from the domain interest to the properties of the target structure(s), dataset and algorithm output. On the basis of the problem fingerprint, users are guided through the process of choosing and applying appropriate validation metrics while being made aware of potential pitfalls. Metrics Reloaded targets image analysis problems that can be interpreted as classification tasks at image, object or pixel level, namely image-level classification, object detection, semantic segmentation and instance segmentation tasks. To improve the user experience, we implemented the framework in the Metrics Reloaded online tool. Following the convergence of ML methodology across application domains, Metrics Reloaded fosters the convergence of validation methodology. Its applicability is demonstrated for various biomedical use cases.

Assuntos

Algoritmos , Processamento de Imagem Assistida por Computador , Aprendizado de Máquina , Semântica

3.

Fully Automated Versions of Clinically Validated Nephrometry Scores Demonstrate Superior Predictive Utility versus Human Scores.

Wood, Andrew M; Abdallah, Nour; Heller, Nicholas; Benidir, Tarik; Isensee, Fabian; Tejpaul, Resha; Suk-Ouichai, Chalairat; Curry, Caleb; You, Alex; Remer, Erick; Haywood, Samuel; Campbell, Steven; Papanikolopoulos, Nikolaos; Weight, Christopher.

BJU Int ; 133(6): 690-698, 2024 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-38343198

RESUMO

OBJECTIVE: To automate the generation of three validated nephrometry scoring systems on preoperative computerised tomography (CT) scans by developing artificial intelligence (AI)-based image processing methods. Subsequently, we aimed to evaluate the ability of these scores to predict meaningful pathological and perioperative outcomes. PATIENTS AND METHODS: A total of 300 patients with preoperative CT with early arterial contrast phase were identified from a cohort of 544 consecutive patients undergoing surgical extirpation for suspected renal cancer. A deep neural network approach was used to automatically segment kidneys and tumours, and then geometric algorithms were used to measure the components of the concordance index (C-Index), Preoperative Aspects and Dimensions Used for an Anatomical classification of renal tumours (PADUA), and tumour contact surface area (CSA) nephrometry scores. Human scores were independently calculated by medical personnel blinded to the AI scores. AI and human score agreement was assessed using linear regression and predictive abilities for meaningful outcomes were assessed using logistic regression and receiver operating characteristic curve analyses. RESULTS: The median (interquartile range) age was 60 (51-68) years, and 40% were female. The median tumour size was 4.2 cm and 91.3% had malignant tumours. In all, 27% of the tumours were high stage, 37% high grade, and 63% of the patients underwent partial nephrectomy. There was significant agreement between human and AI scores on linear regression analyses (R ranged from 0.574 to 0.828, all P < 0.001). The AI-generated scores were equivalent or superior to human-generated scores for all examined outcomes including high-grade histology, high-stage tumour, indolent tumour, pathological tumour necrosis, and radical nephrectomy (vs partial nephrectomy) surgical approach. CONCLUSIONS: Fully automated AI-generated C-Index, PADUA, and tumour CSA nephrometry scores are similar to human-generated scores and predict a wide variety of meaningful outcomes. Once validated, our results suggest that AI-generated nephrometry scores could be delivered automatically from a preoperative CT scan to a clinician and patient at the point of care to aid in decision making.

Assuntos

Neoplasias Renais , Tomografia Computadorizada por Raios X , Humanos , Feminino , Neoplasias Renais/patologia , Neoplasias Renais/cirurgia , Neoplasias Renais/diagnóstico por imagem , Masculino , Pessoa de Meia-Idade , Idoso , Nefrectomia/métodos , Valor Preditivo dos Testes , Inteligência Artificial , Estudos Retrospectivos

4.

Understanding metric-related pitfalls in image analysis validation.

Reinke, Annika; Tizabi, Minu D; Baumgartner, Michael; Eisenmann, Matthias; Heckmann-Nötzel, Doreen; Kavur, A Emre; Rädsch, Tim; Sudre, Carole H; Acion, Laura; Antonelli, Michela; Arbel, Tal; Bakas, Spyridon; Benis, Arriel; Blaschko, Matthew; Buettner, Florian; Cardoso, M Jorge; Cheplygina, Veronika; Chen, Jianxu; Christodoulou, Evangelia; Cimini, Beth A; Collins, Gary S; Farahani, Keyvan; Ferrer, Luciana; Galdran, Adrian; van Ginneken, Bram; Glocker, Ben; Godau, Patrick; Haase, Robert; Hashimoto, Daniel A; Hoffman, Michael M; Huisman, Merel; Isensee, Fabian; Jannin, Pierre; Kahn, Charles E; Kainmueller, Dagmar; Kainz, Bernhard; Karargyris, Alexandros; Karthikesalingam, Alan; Kenngott, Hannes; Kleesiek, Jens; Kofler, Florian; Kooi, Thijs; Kopp-Schneider, Annette; Kozubek, Michal; Kreshuk, Anna; Kurc, Tahsin; Landman, Bennett A; Litjens, Geert; Madani, Amin; Maier-Hein, Klaus.

ArXiv ; 2024 Feb 23.

Artigo em Inglês | MEDLINE | ID: mdl-36945687

RESUMO

Validation metrics are key for the reliable tracking of scientific progress and for bridging the current chasm between artificial intelligence (AI) research and its translation into practice. However, increasing evidence shows that particularly in image analysis, metrics are often chosen inadequately in relation to the underlying research problem. This could be attributed to a lack of accessibility of metric-related knowledge: While taking into account the individual strengths, weaknesses, and limitations of validation metrics is a critical prerequisite to making educated choices, the relevant knowledge is currently scattered and poorly accessible to individual researchers. Based on a multi-stage Delphi process conducted by a multidisciplinary expert consortium as well as extensive community feedback, the present work provides the first reliable and comprehensive common point of access to information on pitfalls related to validation metrics in image analysis. Focusing on biomedical image analysis but with the potential of transfer to other fields, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy. To facilitate comprehension, illustrations and specific examples accompany each pitfall. As a structured body of information accessible to researchers of all levels of expertise, this work enhances global comprehension of a key topic in image analysis validation.

5.

Discovering Process Dynamics for Scalable Perovskite Solar Cell Manufacturing with Explainable AI.

Klein, Lukas; Ziegler, Sebastian; Laufer, Felix; Debus, Charlotte; Götz, Markus; Maier-Hein, Klaus; Paetzold, Ulrich W; Isensee, Fabian; Jäger, Paul F.

Adv Mater ; 36(7): e2307160, 2024 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-37904613

RESUMO

Large-area processing of perovskite semiconductor thin-films is complex and evokes unexplained variance in quality, posing a major hurdle for the commercialization of perovskite photovoltaics. Advances in scalable fabrication processes are currently limited to gradual and arbitrary trial-and-error procedures. While the in situ acquisition of photoluminescence (PL) videos has the potential to reveal important variations in the thin-film formation process, the high dimensionality of the data quickly surpasses the limits of human analysis. In response, this study leverages deep learning (DL) and explainable artificial intelligence (XAI) to discover relationships between sensor information acquired during the perovskite thin-film formation process and the resulting solar cell performance indicators, while rendering these relationships humanly understandable. The study further shows how gained insights can be distilled into actionable recommendations for perovskite thin-film processing, advancing toward industrial-scale solar cell manufacturing. This study demonstrates that XAI methods will play a critical role in accelerating energy materials science.

6.

Addressing image misalignments in multi-parametric prostate MRI for enhanced computer-aided diagnosis of prostate cancer.

Kovacs, Balint; Netzer, Nils; Baumgartner, Michael; Schrader, Adrian; Isensee, Fabian; Weißer, Cedric; Wolf, Ivo; Görtz, Magdalena; Jaeger, Paul F; Schütz, Victoria; Floca, Ralf; Gnirs, Regula; Stenzinger, Albrecht; Hohenfellner, Markus; Schlemmer, Heinz-Peter; Bonekamp, David; Maier-Hein, Klaus H.

Sci Rep ; 13(1): 19805, 2023 11 13.

Artigo em Inglês | MEDLINE | ID: mdl-37957250

RESUMO

Prostate cancer (PCa) diagnosis on multi-parametric magnetic resonance images (MRI) requires radiologists with a high level of expertise. Misalignments between the MRI sequences can be caused by patient movement, elastic soft-tissue deformations, and imaging artifacts. They further increase the complexity of the task prompting radiologists to interpret the images. Recently, computer-aided diagnosis (CAD) tools have demonstrated potential for PCa diagnosis typically relying on complex co-registration of the input modalities. However, there is no consensus among research groups on whether CAD systems profit from using registration. Furthermore, alternative strategies to handle multi-modal misalignments have not been explored so far. Our study introduces and compares different strategies to cope with image misalignments and evaluates them regarding to their direct effect on diagnostic accuracy of PCa. In addition to established registration algorithms, we propose 'misalignment augmentation' as a concept to increase CAD robustness. As the results demonstrate, misalignment augmentations can not only compensate for a complete lack of registration, but if used in conjunction with registration, also improve the overall performance on an independent test set.

Assuntos

Próstata , Neoplasias da Próstata , Masculino , Humanos , Próstata/diagnóstico por imagem , Próstata/patologia , Imageamento por Ressonância Magnética/métodos , Diagnóstico por Computador/métodos , Neoplasias da Próstata/diagnóstico por imagem , Neoplasias da Próstata/patologia , Computadores

7.

USE-Evaluator: Performance metrics for medical image segmentation models supervised by uncertain, small or empty reference annotations in neuroimaging.

Ostmeier, Sophie; Axelrod, Brian; Isensee, Fabian; Bertels, Jeroen; Mlynash, Michael; Christensen, Soren; Lansberg, Maarten G; Albers, Gregory W; Sheth, Rajen; Verhaaren, Benjamin F J; Mahammedi, Abdelkader; Li, Li-Jia; Zaharchuk, Greg; Heit, Jeremy J.

Med Image Anal ; 90: 102927, 2023 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-37672900

RESUMO

Performance metrics for medical image segmentation models are used to measure the agreement between the reference annotation and the predicted segmentation. Usually, overlap metrics, such as the Dice, are used as a metric to evaluate the performance of these models in order for results to be comparable. However, there is a mismatch between the distributions of cases and the difficulty level of segmentation tasks in public data sets compared to clinical practice. Common metrics used to assess performance fail to capture the impact of this mismatch, particularly when dealing with datasets in clinical settings that involve challenging segmentation tasks, pathologies with low signal, and reference annotations that are uncertain, small, or empty. Limitations of common metrics may result in ineffective machine learning research in designing and optimizing models. To effectively evaluate the clinical value of such models, it is essential to consider factors such as the uncertainty associated with reference annotations, the ability to accurately measure performance regardless of the size of the reference annotation volume, and the classification of cases where reference annotations are empty. We study how uncertain, small, and empty reference annotations influence the value of metrics on a stroke in-house data set regardless of the model. We examine metrics behavior on the predictions of a standard deep learning framework in order to identify suitable metrics in such a setting. We compare our results to the BRATS 2019 and Spinal Cord public data sets. We show how uncertain, small, or empty reference annotations require a rethinking of the evaluation. The evaluation code was released to encourage further analysis of this topic https://github.com/SophieOstmeier/UncertainSmallEmpty.git.

8.

Automated Brain Tumor Detection and Segmentation for Treatment Response Assessment Using Amino Acid PET.

Gutsche, Robin; Lowis, Carsten; Ziemons, Karl; Kocher, Martin; Ceccon, Garry; Régio Brambilla, Cláudia; Shah, Nadim J; Langen, Karl-Josef; Galldiks, Norbert; Isensee, Fabian; Lohmann, Philipp.

J Nucl Med ; 64(10): 1594-1602, 2023 10.

Artigo em Inglês | MEDLINE | ID: mdl-37562802

RESUMO

Evaluation of metabolic tumor volume (MTV) changes using amino acid PET has become an important tool for response assessment in brain tumor patients. MTV is usually determined by manual or semiautomatic delineation, which is laborious and may be prone to intra- and interobserver variability. The goal of our study was to develop a method for automated MTV segmentation and to evaluate its performance for response assessment in patients with gliomas. Methods: In total, 699 amino acid PET scans using the tracer O-(2-[18F]fluoroethyl)-l-tyrosine (18F-FET) from 555 brain tumor patients at initial diagnosis or during follow-up were retrospectively evaluated (mainly glioma patients, 76%). 18F-FET PET MTVs were segmented semiautomatically by experienced readers. An artificial neural network (no new U-Net) was configured on 476 scans from 399 patients, and the network performance was evaluated on a test dataset including 223 scans from 156 patients. Surface and volumetric Dice similarity coefficients (DSCs) were used to evaluate segmentation quality. Finally, the network was applied to a recently published 18F-FET PET study on response assessment in glioblastoma patients treated with adjuvant temozolomide chemotherapy for a fully automated response assessment in comparison to an experienced physician. Results: In the test dataset, 92% of lesions with increased uptake (n = 189) and 85% of lesions with iso- or hypometabolic uptake (n = 33) were correctly identified (F1 score, 92%). Single lesions with a contiguous uptake had the highest DSC, followed by lesions with heterogeneous, noncontiguous uptake and multifocal lesions (surface DSC: 0.96, 0.93, and 0.81 respectively; volume DSC: 0.83, 0.77, and 0.67, respectively). Change in MTV, as detected by the automated segmentation, was a significant determinant of disease-free and overall survival, in agreement with the physician's assessment. Conclusion: Our deep learning-based 18F-FET PET segmentation allows reliable, robust, and fully automated evaluation of MTV in brain tumor patients and demonstrates clinical value for automated response assessment.

Assuntos

Neoplasias Encefálicas , Glioma , Humanos , Aminoácidos , Estudos Retrospectivos , Neoplasias Encefálicas/diagnóstico por imagem , Neoplasias Encefálicas/terapia , Glioma/patologia , Compostos Radiofarmacêuticos/uso terapêutico , Tirosina , Tomografia por Emissão de Pósitrons/métodos

9.

AI-generated R.E.N.A.L.+ Score Surpasses Human-generated Score in Predicting Renal Oncologic Outcomes.

Abdallah, Nour; Wood, Andrew; Benidir, Tarik; Heller, Nicholas; Isensee, Fabian; Tejpaul, Resha; Corrigan, Dillon; Suk-Ouichai, Chalairat; Struyk, Griffin; Moore, Keenan; Venkatesh, Nitin; Ergun, Onuralp; You, Alex; Campbell, Rebecca; Remer, Erick M; Haywood, Samuel; Krishnamurthi, Venkatesh; Abouassaly, Robert; Campbell, Steven; Papanikolopoulos, Nikolaos; Weight, Christopher J.

Urology ; 180: 160-167, 2023 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-37517681

RESUMO

OBJECTIVE: To determine whether we can surpass the traditional R.E.N.A.L. nephrometry score (H-score) prediction ability of pathologic outcomes by creating artificial intelligence (AI)-generated R.E.N.A.L.+ score (AI+ score) with continuous rather than ordinal components. We also assessed the AI+ score components' relative importance with respect to outcome odds. METHODS: This is a retrospective study of 300 consecutive patients with preoperative computed tomography scans showing suspected renal cancer at a single institution from 2010 to 2018. H-score was tabulated by three trained medical personnel. Deep neural network approach automatically generated kidney segmentation masks of parenchyma and tumor. Geometric algorithms were used to automatically estimate score components as ordinal and continuous variables. Multivariate logistic regression of continuous R.E.N.A.L. components was used to generate AI+ score. Predictive utility was compared between AI+, AI, and H-scores for variables of interest, and AI+ score components' relative importance was assessed. RESULTS: Median age was 60years (interquartile range 51-68), and 40% were female. Median tumor size was 4.2 cm (2.6-6.12), and 92% were malignant, including 27%, 37%, and 23% with high-stage, high-grade, and necrosis, respectively. AI+ score demonstrated superior predictive ability over AI and H-scores for predicting malignancy (area under the curve [AUC] 0.69 vs 0.67 vs 0.64, respectively), high stage (AUC 0.82 vs 0.65 vs 0.71, respectively), high grade (AUC 0.78 vs 0.65 vs 0.65, respectively), pathologic tumor necrosis (AUC 0.81 vs 0.72 vs 0.74, respectively), and partial nephrectomy approach (AUC 0.88 vs 0.74 vs 0.79, respectively). Of AI+ score components, the maximal tumor diameter ("R") was the most important outcomes predictor. CONCLUSION: AI+ score was superior to AI-score and H-score in predicting oncologic outcomes. Time-efficient AI+ score can be used at the point of care, surpassing validated clinical scoring systems.

10.

The Cell Tracking Challenge: 10 years of objective benchmarking.

Maska, Martin; Ulman, Vladimír; Delgado-Rodriguez, Pablo; Gómez-de-Mariscal, Estibaliz; Necasová, Tereza; Guerrero Peña, Fidel A; Ren, Tsang Ing; Meyerowitz, Elliot M; Scherr, Tim; Löffler, Katharina; Mikut, Ralf; Guo, Tianqi; Wang, Yin; Allebach, Jan P; Bao, Rina; Al-Shakarji, Noor M; Rahmon, Gani; Toubal, Imad Eddine; Palaniappan, Kannappan; Lux, Filip; Matula, Petr; Sugawara, Ko; Magnusson, Klas E G; Aho, Layton; Cohen, Andrew R; Arbelle, Assaf; Ben-Haim, Tal; Raviv, Tammy Riklin; Isensee, Fabian; Jäger, Paul F; Maier-Hein, Klaus H; Zhu, Yanming; Ederra, Cristina; Urbiola, Ainhoa; Meijering, Erik; Cunha, Alexandre; Muñoz-Barrutia, Arrate; Kozubek, Michal; Ortiz-de-Solórzano, Carlos.

Nat Methods ; 20(7): 1010-1020, 2023 07.

Artigo em Inglês | MEDLINE | ID: mdl-37202537

RESUMO

The Cell Tracking Challenge is an ongoing benchmarking initiative that has become a reference in cell segmentation and tracking algorithm development. Here, we present a significant number of improvements introduced in the challenge since our 2017 report. These include the creation of a new segmentation-only benchmark, the enrichment of the dataset repository with new datasets that increase its diversity and complexity, and the creation of a silver standard reference corpus based on the most competitive results, which will be of particular interest for data-hungry deep learning-based strategies. Furthermore, we present the up-to-date cell segmentation and tracking leaderboards, an in-depth analysis of the relationship between the performance of the state-of-the-art methods and the properties of the datasets and annotations, and two novel, insightful studies about the generalizability and the reusability of top-performing methods. These studies provide critical practical conclusions for both developers and users of traditional and machine learning-based cell segmentation and tracking algorithms.

Assuntos

Benchmarking , Rastreamento de Células , Rastreamento de Células/métodos , Aprendizado de Máquina , Algoritmos

11.

Beyond rankings: Learning (more) from algorithm validation.

Roß, Tobias; Bruno, Pierangela; Reinke, Annika; Wiesenfarth, Manuel; Koeppel, Lisa; Full, Peter M; Pekdemir, Bünyamin; Godau, Patrick; Trofimova, Darya; Isensee, Fabian; Adler, Tim J; Tran, Thuy N; Moccia, Sara; Calimeri, Francesco; Müller-Stich, Beat P; Kopp-Schneider, Annette; Maier-Hein, Lena.

Med Image Anal ; 86: 102765, 2023 05.

Artigo em Inglês | MEDLINE | ID: mdl-36965252

RESUMO

Challenges have become the state-of-the-art approach to benchmark image analysis algorithms in a comparative manner. While the validation on identical data sets was a great step forward, results analysis is often restricted to pure ranking tables, leaving relevant questions unanswered. Specifically, little effort has been put into the systematic investigation on what characterizes images in which state-of-the-art algorithms fail. To address this gap in the literature, we (1) present a statistical framework for learning from challenges and (2) instantiate it for the specific task of instrument instance segmentation in laparoscopic videos. Our framework relies on the semantic meta data annotation of images, which serves as foundation for a General Linear Mixed Models (GLMM) analysis. Based on 51,542 meta data annotations performed on 2,728 images, we applied our approach to the results of the Robust Medical Instrument Segmentation Challenge (ROBUST-MIS) challenge 2019 and revealed underexposure, motion and occlusion of instruments as well as the presence of smoke or other objects in the background as major sources of algorithm failure. Our subsequent method development, tailored to the specific remaining issues, yielded a deep learning model with state-of-the-art overall performance and specific strengths in the processing of images in which previous methods tended to fail. Due to the objectivity and generic applicability of our approach, it could become a valuable tool for validation in the field of medical image analysis and beyond.

Assuntos

Algoritmos , Laparoscopia , Humanos , Processamento de Imagem Assistida por Computador/métodos

12.

Artificial intelligence (AI)-based decision support improves reproducibility of tumor response assessment in neuro-oncology: An international multi-reader study.

Vollmuth, Philipp; Foltyn, Martha; Huang, Raymond Y; Galldiks, Norbert; Petersen, Jens; Isensee, Fabian; van den Bent, Martin J; Barkhof, Frederik; Park, Ji Eun; Park, Yae Won; Ahn, Sung Soo; Brugnara, Gianluca; Meredig, Hagen; Jain, Rajan; Smits, Marion; Pope, Whitney B; Maier-Hein, Klaus; Weller, Michael; Wen, Patrick Y; Wick, Wolfgang; Bendszus, Martin.

Neuro Oncol ; 25(3): 533-543, 2023 03 14.

Artigo em Inglês | MEDLINE | ID: mdl-35917833

RESUMO

BACKGROUND: To assess whether artificial intelligence (AI)-based decision support allows more reproducible and standardized assessment of treatment response on MRI in neuro-oncology as compared to manual 2-dimensional measurements of tumor burden using the Response Assessment in Neuro-Oncology (RANO) criteria. METHODS: A series of 30 patients (15 lower-grade gliomas, 15 glioblastoma) with availability of consecutive MRI scans was selected. The time to progression (TTP) on MRI was separately evaluated for each patient by 15 investigators over two rounds. In the first round the TTP was evaluated based on the RANO criteria, whereas in the second round the TTP was evaluated by incorporating additional information from AI-enhanced MRI sequences depicting the longitudinal changes in tumor volumes. The agreement of the TTP measurements between investigators was evaluated using concordance correlation coefficients (CCC) with confidence intervals (CI) and P-values obtained using bootstrap resampling. RESULTS: The CCC of TTP-measurements between investigators was 0.77 (95% CI = 0.69,0.88) with RANO alone and increased to 0.91 (95% CI = 0.82,0.95) with AI-based decision support (P = .005). This effect was significantly greater (P = .008) for patients with lower-grade gliomas (CCC = 0.70 [95% CI = 0.56,0.85] without vs. 0.90 [95% CI = 0.76,0.95] with AI-based decision support) as compared to glioblastoma (CCC = 0.83 [95% CI = 0.75,0.92] without vs. 0.86 [95% CI = 0.78,0.93] with AI-based decision support). Investigators with less years of experience judged the AI-based decision as more helpful (P = .02). CONCLUSIONS: AI-based decision support has the potential to yield more reproducible and standardized assessment of treatment response in neuro-oncology as compared to manual 2-dimensional measurements of tumor burden, particularly in patients with lower-grade gliomas. A fully-functional version of this AI-based processing pipeline is provided as open-source (https://github.com/NeuroAI-HD/HD-GLIO-XNAT).

Assuntos

Neoplasias Encefálicas , Glioblastoma , Glioma , Humanos , Glioblastoma/patologia , Neoplasias Encefálicas/diagnóstico por imagem , Neoplasias Encefálicas/terapia , Neoplasias Encefálicas/patologia , Inteligência Artificial , Reprodutibilidade dos Testes , Glioma/diagnóstico por imagem , Glioma/terapia , Glioma/patologia

13.

Automated detection and quantification of brain metastases on clinical MRI data using artificial neural networks.

Pflüger, Irada; Wald, Tassilo; Isensee, Fabian; Schell, Marianne; Meredig, Hagen; Schlamp, Kai; Bernhardt, Denise; Brugnara, Gianluca; Heußel, Claus Peter; Debus, Juergen; Wick, Wolfgang; Bendszus, Martin; Maier-Hein, Klaus H; Vollmuth, Philipp.

Neurooncol Adv ; 4(1): vdac138, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-36105388

RESUMO

Background: Reliable detection and precise volumetric quantification of brain metastases (BM) on MRI are essential for guiding treatment decisions. Here we evaluate the potential of artificial neural networks (ANN) for automated detection and quantification of BM. Methods: A consecutive series of 308 patients with BM was used for developing an ANN (with a 4:1 split for training/testing) for automated volumetric assessment of contrast-enhancing tumors (CE) and non-enhancing FLAIR signal abnormality including edema (NEE). An independent consecutive series of 30 patients was used for external testing. Performance was assessed case-wise for CE and NEE and lesion-wise for CE using the case-wise/lesion-wise DICE-coefficient (C/L-DICE), positive predictive value (L-PPV) and sensitivity (C/L-Sensitivity). Results: The performance of detecting CE lesions on the validation dataset was not significantly affected when evaluating different volumetric thresholds (0.001-0.2 cm3; P = .2028). The median L-DICE and median C-DICE for CE lesions were 0.78 (IQR = 0.6-0.91) and 0.90 (IQR = 0.85-0.94) in the institutional as well as 0.79 (IQR = 0.67-0.82) and 0.84 (IQR = 0.76-0.89) in the external test dataset. The corresponding median L-Sensitivity and median L-PPV were 0.81 (IQR = 0.63-0.92) and 0.79 (IQR = 0.63-0.93) in the institutional test dataset, as compared to 0.85 (IQR = 0.76-0.94) and 0.76 (IQR = 0.68-0.88) in the external test dataset. The median C-DICE for NEE was 0.96 (IQR = 0.92-0.97) in the institutional test dataset as compared to 0.85 (IQR = 0.72-0.91) in the external test dataset. Conclusion: The developed ANN-based algorithm (publicly available at www.github.com/NeuroAI-HD/HD-BM) allows reliable detection and precise volumetric quantification of CE and NEE compartments in patients with BM.

14.

Rapid artificial intelligence solutions in a pandemic-The COVID-19-20 Lung CT Lesion Segmentation Challenge.

Roth, Holger R; Xu, Ziyue; Tor-Díez, Carlos; Sanchez Jacob, Ramon; Zember, Jonathan; Molto, Jose; Li, Wenqi; Xu, Sheng; Turkbey, Baris; Turkbey, Evrim; Yang, Dong; Harouni, Ahmed; Rieke, Nicola; Hu, Shishuai; Isensee, Fabian; Tang, Claire; Yu, Qinji; Sölter, Jan; Zheng, Tong; Liauchuk, Vitali; Zhou, Ziqi; Moltz, Jan Hendrik; Oliveira, Bruno; Xia, Yong; Maier-Hein, Klaus H; Li, Qikai; Husch, Andreas; Zhang, Luyang; Kovalev, Vassili; Kang, Li; Hering, Alessa; Vilaça, João L; Flores, Mona; Xu, Daguang; Wood, Bradford; Linguraru, Marius George.

Med Image Anal ; 82: 102605, 2022 11.

Artigo em Inglês | MEDLINE | ID: mdl-36156419

RESUMO

Artificial intelligence (AI) methods for the automatic detection and quantification of COVID-19 lesions in chest computed tomography (CT) might play an important role in the monitoring and management of the disease. We organized an international challenge and competition for the development and comparison of AI algorithms for this task, which we supported with public data and state-of-the-art benchmark methods. Board Certified Radiologists annotated 295 public images from two sources (A and B) for algorithms training (n=199, source A), validation (n=50, source A) and testing (n=23, source A; n=23, source B). There were 1,096 registered teams of which 225 and 98 completed the validation and testing phases, respectively. The challenge showed that AI models could be rapidly designed by diverse teams with the potential to measure disease or facilitate timely and patient-specific interventions. This paper provides an overview and the major outcomes of the COVID-19 Lung CT Lesion Segmentation Challenge - 2020.

Assuntos

COVID-19 , Pandemias , Humanos , COVID-19/diagnóstico por imagem , Inteligência Artificial , Tomografia Computadorizada por Raios X/métodos , Pulmão/diagnóstico por imagem

15.

The Medical Segmentation Decathlon.

Antonelli, Michela; Reinke, Annika; Bakas, Spyridon; Farahani, Keyvan; Kopp-Schneider, Annette; Landman, Bennett A; Litjens, Geert; Menze, Bjoern; Ronneberger, Olaf; Summers, Ronald M; van Ginneken, Bram; Bilello, Michel; Bilic, Patrick; Christ, Patrick F; Do, Richard K G; Gollub, Marc J; Heckers, Stephan H; Huisman, Henkjan; Jarnagin, William R; McHugo, Maureen K; Napel, Sandy; Pernicka, Jennifer S Golia; Rhode, Kawal; Tobon-Gomez, Catalina; Vorontsov, Eugene; Meakin, James A; Ourselin, Sebastien; Wiesenfarth, Manuel; Arbeláez, Pablo; Bae, Byeonguk; Chen, Sihong; Daza, Laura; Feng, Jianjiang; He, Baochun; Isensee, Fabian; Ji, Yuanfeng; Jia, Fucang; Kim, Ildoo; Maier-Hein, Klaus; Merhof, Dorit; Pai, Akshay; Park, Beomhee; Perslev, Mathias; Rezaiifar, Ramin; Rippel, Oliver; Sarasua, Ignacio; Shen, Wei; Son, Jaemin; Wachinger, Christian; Wang, Liansheng.

Nat Commun ; 13(1): 4128, 2022 07 15.

Artigo em Inglês | MEDLINE | ID: mdl-35840566

RESUMO

International challenges have become the de facto standard for comparative assessment of image analysis algorithms. Although segmentation is the most widely investigated medical image processing task, the various challenges have been organized to focus only on specific clinical tasks. We organized the Medical Segmentation Decathlon (MSD)-a biomedical image analysis challenge, in which algorithms compete in a multitude of both tasks and modalities to investigate the hypothesis that a method capable of performing well on multiple tasks will generalize well to a previously unseen task and potentially outperform a custom-designed solution. MSD results confirmed this hypothesis, moreover, MSD winner continued generalizing well to a wide range of other clinical problems for the next two years. Three main conclusions can be drawn from this study: (1) state-of-the-art image segmentation algorithms generalize well when retrained on unseen tasks; (2) consistent algorithmic performance across multiple tasks is a strong surrogate of algorithmic generalizability; (3) the training of accurate AI segmentation models is now commoditized to scientists that are not versed in AI model training.

Assuntos

Algoritmos , Processamento de Imagem Assistida por Computador , Processamento de Imagem Assistida por Computador/métodos

16.

Classification of diffraction patterns using a convolutional neural network in single-particle-imaging experiments performed at X-ray free-electron lasers.

Assalauova, Dameli; Ignatenko, Alexandr; Isensee, Fabian; Trofimova, Darya; Vartanyants, Ivan A.

J Appl Crystallogr ; 55(Pt 3): 444-454, 2022 Jun 01.

Artigo em Inglês | MEDLINE | ID: mdl-35719305

RESUMO

Single particle imaging (SPI) at X-ray free-electron lasers is particularly well suited to determining the 3D structure of particles at room temperature. For a successful reconstruction, diffraction patterns originating from a single hit must be isolated from a large number of acquired patterns. It is proposed that this task could be formulated as an image-classification problem and solved using convolutional neural network (CNN) architectures. Two CNN configurations are developed: one that maximizes the F1 score and one that emphasizes high recall. The CNNs are also combined with expectation-maximization (EM) selection as well as size filtering. It is observed that the CNN selections have lower contrast in power spectral density functions relative to the EM selection used in previous work. However, the reconstruction of the CNN-based selections gives similar results. Introducing CNNs into SPI experiments allows the reconstruction pipeline to be streamlined, enables researchers to classify patterns on the fly, and, as a consequence, enables them to tightly control the duration of their experiments. Incorporating non-standard artificial-intelligence-based solutions into an existing SPI analysis workflow may be beneficial for the future development of SPI experiments.

17.

MOOD 2020: A Public Benchmark for Out-of-Distribution Detection and Localization on Medical Images.

Zimmerer, David; Full, Peter M; Isensee, Fabian; Jager, Paul; Adler, Tim; Petersen, Jens; Kohler, Gregor; Ross, Tobias; Reinke, Annika; Kascenas, Antanas; Jensen, Bjorn Sand; O'Neil, Alison Q; Tan, Jeremy; Hou, Benjamin; Batten, James; Qiu, Huaqi; Kainz, Bernhard; Shvetsova, Nina; Fedulova, Irina; Dylov, Dmitry V; Yu, Baolun; Zhai, Jianyang; Hu, Jingtao; Si, Runxuan; Zhou, Sihang; Wang, Siqi; Li, Xinyang; Chen, Xuerun; Zhao, Yang; Marimont, Sergio Naval; Tarroni, Giacomo; Saase, Victor; Maier-Hein, Lena; Maier-Hein, Klaus.

IEEE Trans Med Imaging ; 41(10): 2728-2738, 2022 10.

Artigo em Inglês | MEDLINE | ID: mdl-35468060

RESUMO

Detecting Out-of-Distribution (OoD) data is one of the greatest challenges in safe and robust deployment of machine learning algorithms in medicine. When the algorithms encounter cases that deviate from the distribution of the training data, they often produce incorrect and over-confident predictions. OoD detection algorithms aim to catch erroneous predictions in advance by analysing the data distribution and detecting potential instances of failure. Moreover, flagging OoD cases may support human readers in identifying incidental findings. Due to the increased interest in OoD algorithms, benchmarks for different domains have recently been established. In the medical imaging domain, for which reliable predictions are often essential, an open benchmark has been missing. We introduce the Medical-Out-Of-Distribution-Analysis-Challenge (MOOD) as an open, fair, and unbiased benchmark for OoD methods in the medical imaging domain. The analysis of the submitted algorithms shows that performance has a strong positive correlation with the perceived difficulty, and that all algorithms show a high variance for different anomalies, making it yet hard to recommend them for clinical practice. We also see a strong correlation between challenge ranking and performance on a simple toy test set, indicating that this might be a valuable addition as a proxy dataset during anomaly detection algorithm development.

Assuntos

Benchmarking , Aprendizado de Máquina , Algoritmos , Humanos

18.

Semantic segmentation of multispectral photoacoustic images using deep learning.

Schellenberg, Melanie; Dreher, Kris K; Holzwarth, Niklas; Isensee, Fabian; Reinke, Annika; Schreck, Nicholas; Seitel, Alexander; Tizabi, Minu D; Maier-Hein, Lena; Gröhl, Janek.

Photoacoustics ; 26: 100341, 2022 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-35371919

RESUMO

Photoacoustic (PA) imaging has the potential to revolutionize functional medical imaging in healthcare due to the valuable information on tissue physiology contained in multispectral photoacoustic measurements. Clinical translation of the technology requires conversion of the high-dimensional acquired data into clinically relevant and interpretable information. In this work, we present a deep learning-based approach to semantic segmentation of multispectral photoacoustic images to facilitate image interpretability. Manually annotated photoacoustic and ultrasound imaging data are used as reference and enable the training of a deep learning-based segmentation algorithm in a supervised manner. Based on a validation study with experimentally acquired data from 16 healthy human volunteers, we show that automatic tissue segmentation can be used to create powerful analyses and visualizations of multispectral photoacoustic images. Due to the intuitive representation of high-dimensional information, such a preprocessing algorithm could be a valuable means to facilitate the clinical translation of photoacoustic imaging.

19.

Band selection for oxygenation estimation with multispectral/hyperspectral imaging.

Ayala, Leonardo; Isensee, Fabian; Wirkert, Sebastian J; Vemuri, Anant S; Maier-Hein, Klaus H; Fei, Baowei; Maier-Hein, Lena.

Biomed Opt Express ; 13(3): 1224-1242, 2022 Mar 01.

Artigo em Inglês | MEDLINE | ID: mdl-35414995

RESUMO

Multispectral imaging provides valuable information on tissue composition such as hemoglobin oxygen saturation. However, the real-time application of this technique in interventional medicine can be challenging due to the long acquisition times needed for large amounts of hyperspectral data with hundreds of bands. While this challenge can partially be addressed by choosing a discriminative subset of bands, the band selection methods proposed to date are mainly restricted by the availability of often hard to obtain reference measurements. We address this bottleneck with a new approach to band selection that leverages highly accurate Monte Carlo (MC) simulations. We hypothesize that a so chosen small subset of bands can reproduce or even improve upon the results of a quasi continuous spectral measurement. We further investigate whether novel domain adaptation techniques can address the inevitable domain shift stemming from the use of simulations. Initial results based on in silico and in vivo experiments suggest that 10-20 bands are sufficient to closely reproduce results from spectral measurements with 101 bands in the 500-700 nm range. The investigated domain adaptation technique, which only requires unlabeled in vivo measurements, yielded better results than the pure in silico band selection method. Overall, our method could guide development of fast multispectral imaging systems suited for interventional use without relying on complex hardware setups or manually labeled data.

20.

Deep-learning-based synthesis of post-contrast T1-weighted MRI for tumour response assessment in neuro-oncology: a multicentre, retrospective cohort study.

Jayachandran Preetha, Chandrakanth; Meredig, Hagen; Brugnara, Gianluca; Mahmutoglu, Mustafa A; Foltyn, Martha; Isensee, Fabian; Kessler, Tobias; Pflüger, Irada; Schell, Marianne; Neuberger, Ulf; Petersen, Jens; Wick, Antje; Heiland, Sabine; Debus, Jürgen; Platten, Michael; Idbaih, Ahmed; Brandes, Alba A; Winkler, Frank; van den Bent, Martin J; Nabors, Burt; Stupp, Roger; Maier-Hein, Klaus H; Gorlia, Thierry; Tonn, Jörg-Christian; Weller, Michael; Wick, Wolfgang; Bendszus, Martin; Vollmuth, Philipp.

Lancet Digit Health ; 3(12): e784-e794, 2021 12.

Artigo em Inglês | MEDLINE | ID: mdl-34688602

RESUMO

BACKGROUND: Gadolinium-based contrast agents (GBCAs) are widely used to enhance tissue contrast during MRI scans and play a crucial role in the management of patients with cancer. However, studies have shown gadolinium deposition in the brain after repeated GBCA administration with yet unknown clinical significance. We aimed to assess the feasibility and diagnostic value of synthetic post-contrast T1-weighted MRI generated from pre-contrast MRI sequences through deep convolutional neural networks (dCNN) for tumour response assessment in neuro-oncology. METHODS: In this multicentre, retrospective cohort study, we used MRI examinations to train and validate a dCNN for synthesising post-contrast T1-weighted sequences from pre-contrast T1-weighted, T2-weighted, and fluid-attenuated inversion recovery sequences. We used MRI scans with availability of these sequences from 775 patients with glioblastoma treated at Heidelberg University Hospital, Heidelberg, Germany (775 MRI examinations); 260 patients who participated in the phase 2 CORE trial (1083 MRI examinations, 59 institutions); and 505 patients who participated in the phase 3 CENTRIC trial (3147 MRI examinations, 149 institutions). Separate training runs to rank the importance of individual sequences and (for a subset) diffusion-weighted imaging were conducted. Independent testing was performed on MRI data from the phase 2 and phase 3 EORTC-26101 trial (521 patients, 1924 MRI examinations, 32 institutions). The similarity between synthetic and true contrast enhancement on post-contrast T1-weighted MRI was quantified using the structural similarity index measure (SSIM). Automated tumour segmentation and volumetric tumour response assessment based on synthetic versus true post-contrast T1-weighted sequences was performed in the EORTC-26101 trial and agreement was assessed with Kaplan-Meier plots. FINDINGS: The median SSIM score for predicting contrast enhancement on synthetic post-contrast T1-weighted sequences in the EORTC-26101 test set was 0·818 (95% CI 0·817-0·820). Segmentation of the contrast-enhancing tumour from synthetic post-contrast T1-weighted sequences yielded a median tumour volume of 6·31 cm3 (5·60 to 7·14), thereby underestimating the true tumour volume by a median of -0·48 cm3 (-0·37 to -0·76) with the concordance correlation coefficient suggesting a strong linear association between tumour volumes derived from synthetic versus true post-contrast T1-weighted sequences (0·782, 0·751-0·807, p<0·0001). Volumetric tumour response assessment in the EORTC-26101 trial showed a median time to progression of 4·2 months (95% CI 4·1-5·2) with synthetic post-contrast T1-weighted and 4·3 months (4·1-5·5) with true post-contrast T1-weighted sequences (p=0·33). The strength of the association between the time to progression as a surrogate endpoint for predicting the patients' overall survival in the EORTC-26101 cohort was similar when derived from synthetic post-contrast T1-weighted sequences (hazard ratio of 1·749, 95% CI 1·282-2·387, p=0·0004) and model C-index (0·667, 0·622-0·708) versus true post-contrast T1-weighted MRI (1·799, 95% CI 1·314-2·464, p=0·0003) and model C-index (0·673, 95% CI 0·626-0·711). INTERPRETATION: Generating synthetic post-contrast T1-weighted MRI from pre-contrast MRI using dCNN is feasible and quantification of the contrast-enhancing tumour burden from synthetic post-contrast T1-weighted MRI allows assessment of the patient's response to treatment with no significant difference by comparison with true post-contrast T1-weighted sequences with administration of GBCAs. This finding could guide the application of dCNN in radiology to potentially reduce the necessity of GBCA administration. FUNDING: Deutsche Forschungsgemeinschaft.

Assuntos

Neoplasias Encefálicas/diagnóstico , Encéfalo/patologia , Meios de Contraste/administração & dosagem , Aprendizado Profundo , Gadolínio/administração & dosagem , Imageamento por Ressonância Magnética/métodos , Redes Neurais de Computação , Algoritmos , Neoplasias Encefálicas/diagnóstico por imagem , Neoplasias Encefálicas/patologia , Imagem de Difusão por Ressonância Magnética , Progressão da Doença , Estudos de Viabilidade , Alemanha , Glioblastoma/diagnóstico , Glioblastoma/diagnóstico por imagem , Humanos , Pessoa de Meia-Idade , Neoplasias , Prognóstico , Radiologia/métodos , Estudos Retrospectivos , Carga Tumoral

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA