Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 15 de 15
Filter
1.
Nat Methods ; 21(2): 182-194, 2024 Feb.
Article in English | MEDLINE | ID: mdl-38347140

ABSTRACT

Validation metrics are key for tracking scientific progress and bridging the current chasm between artificial intelligence research and its translation into practice. However, increasing evidence shows that, particularly in image analysis, metrics are often chosen inadequately. Although taking into account the individual strengths, weaknesses and limitations of validation metrics is a critical prerequisite to making educated choices, the relevant knowledge is currently scattered and poorly accessible to individual researchers. Based on a multistage Delphi process conducted by a multidisciplinary expert consortium as well as extensive community feedback, the present work provides a reliable and comprehensive common point of access to information on pitfalls related to validation metrics in image analysis. Although focused on biomedical image analysis, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy. The work serves to enhance global comprehension of a key topic in image analysis validation.


Subject(s)
Artificial Intelligence
2.
Nat Methods ; 21(2): 195-212, 2024 Feb.
Article in English | MEDLINE | ID: mdl-38347141

ABSTRACT

Increasing evidence shows that flaws in machine learning (ML) algorithm validation are an underestimated global problem. In biomedical image analysis, chosen performance metrics often do not reflect the domain interest, and thus fail to adequately measure scientific progress and hinder translation of ML techniques into practice. To overcome this, we created Metrics Reloaded, a comprehensive framework guiding researchers in the problem-aware selection of metrics. Developed by a large international consortium in a multistage Delphi process, it is based on the novel concept of a problem fingerprint-a structured representation of the given problem that captures all aspects that are relevant for metric selection, from the domain interest to the properties of the target structure(s), dataset and algorithm output. On the basis of the problem fingerprint, users are guided through the process of choosing and applying appropriate validation metrics while being made aware of potential pitfalls. Metrics Reloaded targets image analysis problems that can be interpreted as classification tasks at image, object or pixel level, namely image-level classification, object detection, semantic segmentation and instance segmentation tasks. To improve the user experience, we implemented the framework in the Metrics Reloaded online tool. Following the convergence of ML methodology across application domains, Metrics Reloaded fosters the convergence of validation methodology. Its applicability is demonstrated for various biomedical use cases.


Subject(s)
Algorithms , Image Processing, Computer-Assisted , Machine Learning , Semantics
3.
ArXiv ; 2024 Feb 23.
Article in English | MEDLINE | ID: mdl-36945687

ABSTRACT

Validation metrics are key for the reliable tracking of scientific progress and for bridging the current chasm between artificial intelligence (AI) research and its translation into practice. However, increasing evidence shows that particularly in image analysis, metrics are often chosen inadequately in relation to the underlying research problem. This could be attributed to a lack of accessibility of metric-related knowledge: While taking into account the individual strengths, weaknesses, and limitations of validation metrics is a critical prerequisite to making educated choices, the relevant knowledge is currently scattered and poorly accessible to individual researchers. Based on a multi-stage Delphi process conducted by a multidisciplinary expert consortium as well as extensive community feedback, the present work provides the first reliable and comprehensive common point of access to information on pitfalls related to validation metrics in image analysis. Focusing on biomedical image analysis but with the potential of transfer to other fields, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy. To facilitate comprehension, illustrations and specific examples accompany each pitfall. As a structured body of information accessible to researchers of all levels of expertise, this work enhances global comprehension of a key topic in image analysis validation.

4.
Transl Lung Cancer Res ; 11(9): 1896-1911, 2022 Sep.
Article in English | MEDLINE | ID: mdl-36248328

ABSTRACT

Background: Lung cancer screening may provide a favorable opportunity for a spirometry examination, to diagnose participants with undiagnosed lung function impairments, or to improve targeting of computed tomography (CT) screening intensity in view of expected net benefit. Methods: Spirometry was performed in the CT screening arm (n=2,029) of the German Lung Cancer Screening Intervention Study (LUSI)-a trial examining the effects of annual CT screening on lung cancer mortality, in 50-69-year-old long-term smokers. Participants were classified as having chronic obstructive pulmonary disease (COPD) [forced expiration in one second (FEV1)/forced vital lung capacity (FVC) <0.7], preserved ratio impaired spirometry (PRISm; FEV1/FVC ≥0.7 and FEV1% predicted <80%), or normal spirometry. Descriptive statistics were used to examine associations of COPD or PRISm with respiratory symptoms, and self-reported medical diagnoses of respiratory and other morbidities. Logistic regression and proportional hazards regression were used to examine associations of COPD and PRISm, as well as their self-reported medical diagnoses, with risks of lung cancer and all-cause mortality. Results: A total of 1,987 screening arm participants (98%) provided interpretable spirometry measurements; of these, 34.3% had spirometric patterns consistent with either COPD (18.6%) or PRISm (15.7%). Two thirds of participants with COPD or PRISm were asymptomatic, and only 23% reported a previous medical diagnosis concordant with COPD. Participants reporting a diagnosis tended to be more often current and heavier smokers, and more often had respiratory symptoms, cardiovascular comorbidities, or more severe lung function impairments. Independently of smoking history, moderate-to-severe (GOLD 2-4) COPD (OR =2.14; 95% CI: 1.54-2.98), and PRISm (OR =2.68; 95% CI: 1.61-4.40), were associated with increased lung cancer risk. Lung cancer patients with PRISm less frequently had adenocarcinomas, and more often squamous cell or small cell tumors, compared to those with normal spirometry (n=45), and both PRISm and COPD were associated with more advanced lung cancer tumor stage for screen-detected cancers. PRISm and COPD, depending on GOLD stage, were also associated with about 2- to 4-fold increases in risk of overall mortality, which to 87 percent had causes other than lung cancer. Conclusions: About one third of smokers eligible for lung cancer screening in Germany have COPD or PRISm. As these conditions were associated with detection of lung cancer, spirometry may help identify populations at high risk for death of lung cancer or other causes, and who might particularly benefit from CT screening.

5.
Int J Cancer ; 151(9): 1491-1501, 2022 11 01.
Article in English | MEDLINE | ID: mdl-35809038

ABSTRACT

We aimed to explore the underlying reasons that estimates of overdiagnosis vary across and within low-dose computed tomography (LDCT) lung cancer screening trials. We conducted a systematic review to identify estimates of overdiagnosis from randomised controlled trials of LDCT screening. We then analysed the association of Ps (the excess incidence of lung cancer as a proportion of screen-detected cases) with postscreening follow-up time using a linear random effects meta-regression model. Separately, we analysed annual Ps estimates from the US National Lung Screening Trial (NLST) and German Lung Cancer Screening Intervention Trial (LUSI) using exponential decay models with asymptotes. We conducted stratified analyses to investigate participant characteristics associated with Ps using the extended follow-up data from NLST. Among 12 overdiagnosis estimates from 8 trials, the postscreening follow-up ranged from 3.8 to 9.3 years, and Ps ranged from -27.0% (ITALUNG, 8.3 years follow-up) to 67.2% (DLCST, 5.0 years follow-up). Across trials, 39.1% of the variation in Ps was explained by postscreening follow-up time. The annual changes in Ps were -3.5% and -3.9% in the NLST and LUSI trials, respectively. Ps was predicted to plateau at 2.2% for NLST and 9.2% for LUSI with hypothetical infinite follow-up. In NLST, Ps increased with age from -14.9% (55-59 years) to 21.7% (70-74 years), and time trends in Ps varied by histological type. The findings suggest that differences in postscreening follow-up time partially explain variation in overdiagnosis estimates across lung cancer screening trials. Estimates of overdiagnosis should be interpreted in the context of postscreening follow-up and population characteristics.


Subject(s)
Early Detection of Cancer , Lung Neoplasms , Early Detection of Cancer/methods , Follow-Up Studies , Humans , Lung Neoplasms/diagnostic imaging , Lung Neoplasms/epidemiology , Mass Screening/methods , Middle Aged , Overdiagnosis
6.
Acta Obstet Gynecol Scand ; 101(1): 46-55, 2022 Jan.
Article in English | MEDLINE | ID: mdl-34817062

ABSTRACT

INTRODUCTION: There is no global agreement on how to best determine pregnancy of unknown location viability and location using biomarkers. Measurements of progesterone and ß human chorionic gonadotropin (ßhCG) are still used in clinical practice to exclude the possibility of a viable intrauterine pregnancy (VIUP). We evaluate the predictive value of progesterone, ßhCG, and ßhCG ratio cut-off levels to exclude a VIUP in women with a pregnancy of unknown location. MATERIAL AND METHODS: This was a secondary analysis of prospective multicenter study data of consecutive women with a pregnancy of unknown location between January 2015 and 2017 collected from dedicated early pregnancy assessment units of eight hospitals. Single progesterone and serial ßhCG measurements were taken. Women were followed up until final pregnancy outcome between 11 and 14 weeks of gestation was confirmed using transvaginal ultrasonography: (1) VIUP, (2) non-viable intrauterine pregnancy or failed pregnancy of unknown location, and (3) ectopic pregnancy or persisting pregnancy of unknown location. The predictive value of cut-off levels for ruling out VIUP were evaluated across a range of values likely to be encountered clinically for progesterone, ßhCG, and ßhCG ratio. RESULTS: Data from 2507 of 3272 (76.6%) women were suitable for analysis. All had data for ßhCG levels, 2248 (89.7%) had progesterone levels, and 1809 (72.2%) had ßhCG ratio. The likelihood of viability falls with the progesterone level. Although the median progesterone level associated with viability was 59 nmol/L, VIUP were identified with levels as low as 5 nmol/L. No single ßhCG cut-off reliably ruled out the presence of viability with certainty, even when the level was more than 3000 IU/L, there were 39/358 (11%) women who had a VIUP. The probability of viability decreases with the ßhCG ratio. Although the median ßhCG ratio associated with viability was 2.26, VIUP were identified with ratios as low as 1.02. A progesterone level below 2 nmol/L and ßhCG ratio below 0.87 were unlikely to be associated with viability but were not definitive when considering multiple imputation. CONCLUSIONS: Cut-off levels for ßhCG, ßhCG ratio, and progesterone are not safe to be used clinically to exclude viability in early pregnancy. Although ßhCG ratio and progesterone have slightly better performance in comparison, single ßhCG used in this manner is highly unreliable.


Subject(s)
Pregnancy, Ectopic/diagnosis , Prenatal Diagnosis , Adult , Chorionic Gonadotropin/metabolism , Chorionic Gonadotropin, beta Subunit, Human/metabolism , Cohort Studies , Female , Humans , London , Predictive Value of Tests , Pregnancy , Pregnancy, Ectopic/blood , Progesterone/metabolism , Prospective Studies , State Medicine
8.
Maedica (Bucur) ; 16(3): 531-533, 2021 Sep.
Article in English | MEDLINE | ID: mdl-34925614

ABSTRACT

The button sequestrum sign is demonstrated in a number of medical conditions and refers to a lesion of devascularised bone which is surrounded by lucency. Although it may be difficult to arrive at a single diagnosis based on this sign, the combination of clinical and paraclinical findings, patient's medical history and imagistic presentation of this sign can lead to a great specificity in chronic osteomyelitis, even if osteomyelitis is accompanied by osteopetrosis, as in the present case.

9.
Maedica (Bucur) ; 16(2): 318-319, 2021 Jun.
Article in English | MEDLINE | ID: mdl-34621359

ABSTRACT

Enostoses, also known as bone islands, are common benign sclerotic bone lesion that usually represent incidental findings. They constitute a small focus of compact bone within cancellous bone. Enostoses can be seen on radiographs, CT, and MRI, and are considered one of the skeletal do not touch lesions.

10.
Maedica (Bucur) ; 16(2): 325-327, 2021 Jun.
Article in English | MEDLINE | ID: mdl-34621361

ABSTRACT

The current paper focuses on a trial to understand the imaging manifestations in combination with the clinical presentation of the sacrococcygeal chordoma in a patient with referred back pain. Also, the steps for the final diagnosis are described and via this procedure, the paper demonstrates the crucial role of magnetic resonance imaging, computed tomography guided biopsy and histopathological examination in order to minimize the differential diagnosis and lead to the correct diagnosis.

11.
Diagn Progn Res ; 5(1): 6, 2021 Mar 22.
Article in English | MEDLINE | ID: mdl-33745449

ABSTRACT

BACKGROUND: We suggest an adaptive sample size calculation method for developing clinical prediction models, in which model performance is monitored sequentially as new data comes in. METHODS: We illustrate the approach using data for the diagnosis of ovarian cancer (n = 5914, 33% event fraction) and obstructive coronary artery disease (CAD; n = 4888, 44% event fraction). We used logistic regression to develop a prediction model consisting only of a priori selected predictors and assumed linear relations for continuous predictors. We mimicked prospective patient recruitment by developing the model on 100 randomly selected patients, and we used bootstrapping to internally validate the model. We sequentially added 50 random new patients until we reached a sample size of 3000 and re-estimated model performance at each step. We examined the required sample size for satisfying the following stopping rule: obtaining a calibration slope ≥ 0.9 and optimism in the c-statistic (or AUC) < = 0.02 at two consecutive sample sizes. This procedure was repeated 500 times. We also investigated the impact of alternative modeling strategies: modeling nonlinear relations for continuous predictors and correcting for bias on the model estimates (Firth's correction). RESULTS: Better discrimination was achieved in the ovarian cancer data (c-statistic 0.9 with 7 predictors) than in the CAD data (c-statistic 0.7 with 11 predictors). Adequate calibration and limited optimism in discrimination was achieved after a median of 450 patients (interquartile range 450-500) for the ovarian cancer data (22 events per parameter (EPP), 20-24) and 850 patients (750-900) for the CAD data (33 EPP, 30-35). A stricter criterion, requiring AUC optimism < = 0.01, was met with a median of 500 (23 EPP) and 1500 (59 EPP) patients, respectively. These sample sizes were much higher than the well-known 10 EPP rule of thumb and slightly higher than a recently published fixed sample size calculation method by Riley et al. Higher sample sizes were required when nonlinear relationships were modeled, and lower sample sizes when Firth's correction was used. CONCLUSIONS: Adaptive sample size determination can be a useful supplement to fixed a priori sample size calculations, because it allows to tailor the sample size to the specific prediction modeling context in a dynamic fashion.

13.
J Clin Epidemiol ; 110: 12-22, 2019 06.
Article in English | MEDLINE | ID: mdl-30763612

ABSTRACT

OBJECTIVES: The objective of this study was to compare performance of logistic regression (LR) with machine learning (ML) for clinical prediction modeling in the literature. STUDY DESIGN AND SETTING: We conducted a Medline literature search (1/2016 to 8/2017) and extracted comparisons between LR and ML models for binary outcomes. RESULTS: We included 71 of 927 studies. The median sample size was 1,250 (range 72-3,994,872), with 19 predictors considered (range 5-563) and eight events per predictor (range 0.3-6,697). The most common ML methods were classification trees, random forests, artificial neural networks, and support vector machines. In 48 (68%) studies, we observed potential bias in the validation procedures. Sixty-four (90%) studies used the area under the receiver operating characteristic curve (AUC) to assess discrimination. Calibration was not addressed in 56 (79%) studies. We identified 282 comparisons between an LR and ML model (AUC range, 0.52-0.99). For 145 comparisons at low risk of bias, the difference in logit(AUC) between LR and ML was 0.00 (95% confidence interval, -0.18 to 0.18). For 137 comparisons at high risk of bias, logit(AUC) was 0.34 (0.20-0.47) higher for ML. CONCLUSION: We found no evidence of superior performance of ML over LR. Improvements in methodology and reporting are needed for studies that compare modeling algorithms.


Subject(s)
Logistic Models , Models, Theoretical , Supervised Machine Learning , Algorithms , Area Under Curve , Humans , Outcome Assessment, Health Care , Predictive Value of Tests , Sensitivity and Specificity
14.
Oncologist ; 24(2): 165-171, 2019 02.
Article in English | MEDLINE | ID: mdl-30171067

ABSTRACT

BACKGROUND: In estrogen receptor-positive (ER+), human epidermal growth factor receptor 2 (HER-2) negative breast cancers, the progesterone receptor (PR) is an independent prognostic marker. Little is known about the prognostic value of PR by tumor grade. We assessed this in two independent datasets. PATIENTS AND METHODS: Women with primary operable, invasive ER+ HER-2 negative breast cancer diagnosed between 2000 and 2012, treated at University Hospitals Leuven, were included. We assessed the association of PR status and subtype (grade 1-2 vs. grade 3) with distant recurrence-free interval (DRFI) and breast cancer-specific survival. The interaction between PR status and subtype was investigated, and associations of PR status by subtype were calculated. The BIG 1-98 data set was used for validation. RESULTS: In total, 4,228 patients from Leuven and 5,419 from BIG 1-98 were analyzed. In the Leuven cohort, the adjusted hazard ratio (HR) of PR-positive versus PR-negative tumors for DRFI was 0.66 (95% confidence interval [CI], 0.50-0.89). For the interaction with subtype (p = .34), the HR of PR status was 0.79 (95% CI, 0.61-1.01) in luminal A-like and 0.59 (95% CI, 0.46-0.76) in luminal B-like tumors. In luminal A-like tumors, observed 5-year cumulative incidences of distant recurrence were 4.1% for PR-negative and 2.8% for PR-positive tumors, and in luminal B-like 18.7% and 9.2%, respectively. In the BIG 1-98 cohort, similar results were observed; for the interaction with subtype (p = .12), the adjusted HR of PR status for DRFI was 0.88 (95% CI, 0.57-1.35) in luminal A-like and 0.58 (95% CI, 0.43-0.77) in luminal B-like tumors. Observed 5-year cumulative incidences were similar. CONCLUSION: PR positivity may be more protective against metastatic relapse in luminal B-like versus luminal A-like breast cancer, but no strong conclusions can be made. In absolute risk, results suggest an absent PR is clinically more important in high compared with low proliferative ER+ HER-2 negative tumors. IMPLICATIONS FOR PRACTICE: An absent progesterone receptor (PR) predicts a worse outcome in women treated for an estrogen receptor-positive, human epidermal growth factor receptor 2 negative breast cancer. As low proliferative tumors lacking PR are now also classified high risk, the prognostic value of PR across risk groups was studied. Despite a negative test for interaction of the prognostic value of PR by tumor grade, the magnitude of an absent PR on breast cancer relapse is much larger in high than in low proliferative breast cancers.


Subject(s)
Breast Neoplasms/genetics , Receptors, Progesterone/metabolism , Breast Neoplasms/mortality , Breast Neoplasms/pathology , Female , Humans , Prognosis , Survival Analysis
15.
Eur Urol ; 74(6): 796-804, 2018 12.
Article in English | MEDLINE | ID: mdl-30241973

ABSTRACT

CONTEXT: Urologists regularly develop clinical risk prediction models to support clinical decisions. In contrast to traditional performance measures, decision curve analysis (DCA) can assess the utility of models for decision making. DCA plots net benefit (NB) at a range of clinically reasonable risk thresholds. OBJECTIVE: To provide recommendations on interpreting and reporting DCA when evaluating prediction models. EVIDENCE ACQUISITION: We informally reviewed the urological literature to determine investigators' understanding of DCA. To illustrate, we use data from 3616 patients to develop risk models for high-grade prostate cancer (n=313, 9%) to decide who should undergo a biopsy. The baseline model includes prostate-specific antigen and digital rectal examination; the extended model adds two predictors based on transrectal ultrasound (TRUS). EVIDENCE SYNTHESIS: We explain risk thresholds, NB, default strategies (treat all, treat no one), and test tradeoff. To use DCA, first determine whether a model is superior to all other strategies across the range of reasonable risk thresholds. If so, that model appears to improve decisions irrespective of threshold. Second, consider if there are important extra costs to using the model. If so, obtain the test tradeoff to check whether the increase in NB versus the best other strategy is worth the additional cost. In our case study, addition of TRUS improved NB by 0.0114, equivalent to 1.1 more detected high-grade prostate cancers per 100 patients. Hence, adding TRUS would be worthwhile if we accept subjecting 88 patients to TRUS to find one additional high-grade prostate cancer or, alternatively, subjecting 10 patients to TRUS to avoid one unnecessary biopsy. CONCLUSIONS: The proposed guidelines can help researchers understand DCA and improve application and reporting. PATIENT SUMMARY: Decision curve analysis can identify risk models that can help us make better clinical decisions. We illustrate appropriate reporting and interpretation of decision curve analysis.


Subject(s)
Decision Support Techniques , Prostatic Neoplasms/pathology , Urologists , Urology/methods , Attitude of Health Personnel , Biopsy , Clinical Decision-Making , Comprehension , Digital Rectal Examination , Health Knowledge, Attitudes, Practice , Humans , Kallikreins/blood , Male , Neoplasm Grading , Patient Selection , Predictive Value of Tests , Prostate-Specific Antigen/blood , Prostatic Neoplasms/blood , Prostatic Neoplasms/diagnostic imaging , Prostatic Neoplasms/therapy , Risk Assessment , Risk Factors , Ultrasonography , Urologists/psychology
SELECTION OF CITATIONS
SEARCH DETAIL
...