Search | VHL Regional Portal

Embryo ranking agreement between embryologists and artificial intelligence algorithms.

Zaninovic, Nikica; Sierra, Jose T; Malmsten, Jonas E; Rosenwaks, Zev.

F S Sci ; 5(1): 50-57, 2024 Feb.

Article in English | MEDLINE | ID: mdl-37820865

ABSTRACT

OBJECTIVE: To evaluate the degree of agreement of embryo ranking between embryologists and eight artificial intelligence (AI) algorithms. DESIGN: Retrospective study. PATIENT(S): A total of 100 cycles with at least eight embryos were selected from the Weill Cornell Medicine database. For each embryo, the full-length time-lapse (TL) videos, as well as a single embryo image at 120 hours, were given to five embryologists and eight AI algorithms for ranking. INTERVENTION(S): None. MAIN OUTCOME MEASURE(S): Kendall rank correlation coefficient (Kendall's τ). RESULT(S): Embryologists had a high degree of agreement in the overall ranking of 100 cycles with an average Kendall's tau (K-τ) of 0.70, slightly lower than the interembryologist agreement when using a single image or video (average K-τ = 0.78). Overall agreement between embryologists and the AI algorithms was significantly lower (average K-τ = 0.53) and similar to the observed low inter-AI algorithm agreement (average K-τ = 0.47). Notably, two of the eight algorithms had a very low agreement with other ranking methodologies (average K-τ = 0.05) and between each other (K-τ = 0.01). The average agreement in selecting the best-quality embryo (1/8 in 100 cycles with an expected agreement by random chance of 12.5%; confidence interval [CI]95: 6%-19%) was 59.5% among embryologists and 40.3% for six AI algorithms. The incidence of the agreement for the two algorithms with the low overall agreement was 11.7%. Agreement on selecting the same top two embryos/cycle (expected agreement by random chance corresponds to 25.0%; CI95: 17%-32%) was 73.5% among embryologists and 56.0% among AI methods excluding two discordant algorithms, which had an average agreement of 24.4%, the expected range of agreement by random chance. Intraembryologist ranking agreement (single image vs. video) was 71.7% and 77.8% for single and top two embryos, respectively. Analysis of average raw scores indicated that cycles with low diversity of embryo quality generally resulted in a lower overall agreement between the methods (embryologists and AI models). CONCLUSION(S): To our knowledge, this is the first study that evaluates the level of agreement in ranking embryo quality between different AI algorithms and embryologists. The different concordance methods were consistent and indicated that the highest agreement was intraembryologist agreement, followed by interembryologist agreement. In contrast, the agreement between some of the AI algorithms and embryologists was similar to the inter-AI algorithm agreement, which also showed a wide range of pairwise concordance. Specifically, two AI models showed intra- and interagreement at the level expected from random selection.

Subject(s)

Artificial Intelligence , Embryo, Mammalian , Retrospective Studies , Time-Lapse Imaging/methods , Algorithms

Automatic Ploidy Prediction and Quality Assessment of Human Blastocyst Using Time-Lapse Imaging.

Rajendran, Suraj; Brendel, Matthew; Barnes, Josue; Zhan, Qiansheng; Malmsten, Jonas E; Zisimopoulos, Pantelis; Sigaras, Alexandros; Ofori-Atta, Kwabena; Meseguer, Marcos; Miller, Kathleen A; Hoffman, David; Rosenwaks, Zev; Elemento, Olivier; Zaninovic, Nikica; Hajirasouliha, Iman.

bioRxiv ; 2023 Sep 02.

Article in English | MEDLINE | ID: mdl-37693566

ABSTRACT

Assessing fertilized human embryos is crucial for in vitro-fertilization (IVF), a task being revolutionized by artificial intelligence and deep learning. Existing models used for embryo quality assessment and chromosomal abnormality (ploidy) detection could be significantly improved by effectively utilizing time-lapse imaging to identify critical developmental time points for maximizing prediction accuracy. Addressing this, we developed and compared various embryo ploidy status prediction models across distinct embryo development stages. We present BELA (Blastocyst Evaluation Learning Algorithm), a state-of-the-art ploidy prediction model surpassing previous image- and video-based models, without necessitating subjective input from embryologists. BELA uses multitask learning to predict quality scores that are used downstream to predict ploidy status. By achieving an AUC of 0.76 for discriminating between euploidy and aneuploidy embryos on the Weill Cornell dataset, BELA matches the performance of models trained on embryologists' manual scores. While not a replacement for preimplantation genetic testing for aneuploidy (PGT-A), BELA exemplifies how such models can streamline the embryo evaluation process, reducing time and effort required by embryologists.

A non-invasive artificial intelligence approach for the prediction of human blastocyst ploidy: a retrospective model development and validation study.

Barnes, Josue; Brendel, Matthew; Gao, Vianne R; Rajendran, Suraj; Kim, Junbum; Li, Qianzi; Malmsten, Jonas E; Sierra, Jose T; Zisimopoulos, Pantelis; Sigaras, Alexandros; Khosravi, Pegah; Meseguer, Marcos; Zhan, Qiansheng; Rosenwaks, Zev; Elemento, Olivier; Zaninovic, Nikica; Hajirasouliha, Iman.

Lancet Digit Health ; 5(1): e28-e40, 2023 01.

Article in English | MEDLINE | ID: mdl-36543475

ABSTRACT

BACKGROUND: One challenge in the field of in-vitro fertilisation is the selection of the most viable embryos for transfer. Morphological quality assessment and morphokinetic analysis both have the disadvantage of intra-observer and inter-observer variability. A third method, preimplantation genetic testing for aneuploidy (PGT-A), has limitations too, including its invasiveness and cost. We hypothesised that differences in aneuploid and euploid embryos that allow for model-based classification are reflected in morphology, morphokinetics, and associated clinical information. METHODS: In this retrospective study, we used machine-learning and deep-learning approaches to develop STORK-A, a non-invasive and automated method of embryo evaluation that uses artificial intelligence to predict embryo ploidy status. Our method used a dataset of 10 378 embryos that consisted of static images captured at 110 h after intracytoplasmic sperm injection, morphokinetic parameters, blastocyst morphological assessments, maternal age, and ploidy status. Independent and external datasets, Weill Cornell Medicine EmbryoScope+ (WCM-ES+; Weill Cornell Medicine Center of Reproductive Medicine, NY, USA) and IVI Valencia (IVI Valencia, Health Research Institute la Fe, Valencia, Spain) were used to test the generalisability of STORK-A and were compared measuring accuracy and area under the receiver operating characteristic curve (AUC). FINDINGS: Analysis and model development included the use of 10 378 embryos, all with PGT-A results, from 1385 patients (maternal age range 21-48 years; mean age 36·98 years [SD 4·62]). STORK-A predicted aneuploid versus euploid embryos with an accuracy of 69·3% (95% CI 66·9-71·5; AUC 0·761; positive predictive value [PPV] 76·1%; negative predictive value [NPV] 62·1%) when using images, maternal age, morphokinetics, and blastocyst score. A second classification task trained to predict complex aneuploidy versus euploidy and single aneuploidy produced an accuracy of 74·0% (95% CI 71·7-76·1; AUC 0·760; PPV 54·9%; NPV 87·6%) using an image, maternal age, morphokinetic parameters, and blastocyst grade. A third classification task trained to predict complex aneuploidy versus euploidy had an accuracy of 77·6% (95% CI 75·0-80·0; AUC 0·847; PPV 76·7%; NPV 78·0%). STORK-A reported accuracies of 63·4% (AUC 0·702) on the WCM-ES+ dataset and 65·7% (AUC 0·715) on the IVI Valencia dataset, when using an image, maternal age, and morphokinetic parameters, similar to the STORK-A test dataset accuracy of 67·8% (AUC 0·737), showing generalisability. INTERPRETATION: As a proof of concept, STORK-A shows an ability to predict embryo ploidy in a non-invasive manner and shows future potential as a standardised supplementation to traditional methods of embryo selection and prioritisation for implantation or recommendation for PGT-A. FUNDING: US National Institutes of Health.

Subject(s)

Artificial Intelligence , Preimplantation Diagnosis , United States , Pregnancy , Female , Humans , Male , Young Adult , Adult , Middle Aged , Retrospective Studies , Preimplantation Diagnosis/methods , Semen , Ploidies , Blastocyst , Aneuploidy

Pregnancy outcomes after oral and injectable ovulation induction in women with infertility with a low antimüllerian hormone level compared with those with a normal antimüllerian hormone level.

Romanski, Phillip A; Bortoletto, Pietro; Malmsten, Jonas E; Tan, Kay See; Spandorfer, Steven D.

Fertil Steril ; 118(6): 1048-1056, 2022 12.

Article in English | MEDLINE | ID: mdl-36379757

ABSTRACT

OBJECTIVE: To determine the ongoing pregnancy rate among patients with infertility with a low antimüllerian (AMH) level compared with those with a normal AMH level after oral and injectable ovulation induction (OI)/intrauterine insemination (IUI). DESIGN: Retrospective cohort. SETTING: Academic center. PATIENT(S): Patients completing ≥1 medicated OI/IUI cycle at our center between 2015 and 2019 were included. The AMH levels were measured within 12 months of treatment initiation. The cohort was stratified into low AMH (AMH level, <1.0 ng/mL) and normal AMH (AMH level, ≥1.0 ng/mL) groups. All subsequent medicated OI/IUI cycles occurring within 1 year of initial cycle start date were included up to the third completed cycle or until an ongoing pregnancy was recorded. Patients were stratified by age (<35, 35-40, and >40 years), and the relationship between the low and normal AMH groups and each binary endpoint were quantified as risk ratios using the age-adjusted Poisson models. INTERVENTION(S): None. MAIN OUTCOME MEASURE(S): Ongoing pregnancy. RESULT(S): A total of 3,122 patients completed 5,539 oral antiestrogen cycles, and 1,060 completed 1,630 injectable gonadotropin cycles. For oral antiestrogen treatment, pregnancy outcomes, including ongoing pregnancy rate per cycle, for patients with a low AMH level were comparable with those for patients with a normal AMH level (<35 years, 15.4% vs. 14.9%; 35-40 years, 10.0% vs. 11.0%; and >40 years, 2.8% vs. 3.3%). For injectable gonadotropin treatment, the ongoing pregnancy rate was lower in the low AMH group than in the normal AMH group for the ages of <35 (12.1% vs. 23.5%; relative risk [RR], 0.52 [95% confidence interval {CI}, 0.28-0.97]) and 35-40 (12.5% vs. 18.5%; RR, 0.70 [95% CI, 0.49-0.99]) years but comparable with that for patients aged >40 years (3.0% vs. 4.0%; RR, 0.86 [95% CI, 0.31-2.35]). The proportion of multifetal gestations was similar between the low and normal AMH groups treated with oral antiestrogens (13.1% vs. 10.8%); however, for injectable gonadotropin treatment, patients with a normal AMH level had a higher proportion of multifetal gestations (18.6% vs. 31.1%). CONCLUSION(S): Compared with normal ovarian reserve, treatment with oral antiestrogens for OI/IUI for patients with low ovarian reserve results in comparable follicular development and ongoing pregnancy rates for all age groups. When patients with low ovarian reserve are treated with gonadotropins for OI/IUI, multifollicular recruitment is less likely resulting in a significantly decreased ongoing pregnancy rate for patients aged <35 and 35-40 years but also a decrease in multifetal gestations. Overall, the ongoing pregnancy rates of 8.7% per oral antiestrogen cycle and 8.1% per injectable gonadotropin cycle in patients with low ovarian reserve are comparable with the expected rates in the general infertility population.

Subject(s)

Anti-Mullerian Hormone , Gonadotropins , Infertility, Female , Ovulation Induction , Female , Humans , Pregnancy , Anti-Mullerian Hormone/blood , Gonadotropins/administration & dosage , Infertility, Female/diagnosis , Infertility, Female/drug therapy , Ovulation Induction/methods , Pregnancy Outcome , Pregnancy Rate , Retrospective Studies , Adult , Injections

Characterization of an artificial intelligence model for ranking static images of blastocyst stage embryos.

Loewke, Kevin; Cho, Justina Hyunjii; Brumar, Camelia D; Maeder-York, Paxton; Barash, Oleksii; Malmsten, Jonas E; Zaninovic, Nikica; Sakkas, Denny; Miller, Kathleen A; Levy, Michael; VerMilyea, Matthew David.

Fertil Steril ; 117(3): 528-535, 2022 03.

Article in English | MEDLINE | ID: mdl-34998577

ABSTRACT

OBJECTIVE: To perform a series of analyses characterizing an artificial intelligence (AI) model for ranking blastocyst-stage embryos. The primary objective was to evaluate the benefit of the model for predicting clinical pregnancy, whereas the secondary objective was to identify limitations that may impact clinical use. DESIGN: Retrospective study. SETTING: Consortium of 11 assisted reproductive technology centers in the United States. PATIENT(S): Static images of 5,923 transferred blastocysts and 2,614 nontransferred aneuploid blastocysts. INTERVENTION(S): None. MAIN OUTCOME MEASURE(S): Prediction of clinical pregnancy (fetal heartbeat). RESULT(S): The area under the curve of the AI model ranged from 0.6 to 0.7 and outperformed manual morphology grading overall and on a per-site basis. A bootstrapped study predicted improved pregnancy rates between +5% and +12% per site using AI compared with manual grading using an inverted microscope. One site that used a low-magnification stereo zoom microscope did not show predicted improvement with the AI. Visualization techniques and attribution algorithms revealed that the features learned by the AI model largely overlap with the features of manual grading systems. Two sources of bias relating to the type of microscope and presence of embryo holding micropipettes were identified and mitigated. The analysis of AI scores in relation to pregnancy rates showed that score differences of ≥0.1 (10%) correspond with improved pregnancy rates, whereas score differences of <0.1 may not be clinically meaningful. CONCLUSION(S): This study demonstrates the potential of AI for ranking blastocyst stage embryos and highlights potential limitations related to image quality, bias, and granularity of scores.

Subject(s)

Artificial Intelligence/standards , Blastocyst/cytology , Embryo Transfer/standards , Image Processing, Computer-Assisted/standards , Blastocyst/physiology , Cohort Studies , Databases, Factual/standards , Embryo Transfer/methods , Female , Humans , Image Processing, Computer-Assisted/methods , Microscopy/methods , Microscopy/standards , Pregnancy , Pregnancy Rate/trends , Retrospective Studies

Deep learning enables robust assessment and selection of human blastocysts after in vitro fertilization.

Khosravi, Pegah; Kazemi, Ehsan; Zhan, Qiansheng; Malmsten, Jonas E; Toschi, Marco; Zisimopoulos, Pantelis; Sigaras, Alexandros; Lavery, Stuart; Cooper, Lee A D; Hickman, Cristina; Meseguer, Marcos; Rosenwaks, Zev; Elemento, Olivier; Zaninovic, Nikica; Hajirasouliha, Iman.

NPJ Digit Med ; 2: 21, 2019.

Article in English | MEDLINE | ID: mdl-31304368

ABSTRACT

Visual morphology assessment is routinely used for evaluating of embryo quality and selecting human blastocysts for transfer after in vitro fertilization (IVF). However, the assessment produces different results between embryologists and as a result, the success rate of IVF remains low. To overcome uncertainties in embryo quality, multiple embryos are often implanted resulting in undesired multiple pregnancies and complications. Unlike in other imaging fields, human embryology and IVF have not yet leveraged artificial intelligence (AI) for unbiased, automated embryo assessment. We postulated that an AI approach trained on thousands of embryos can reliably predict embryo quality without human intervention. We implemented an AI approach based on deep neural networks (DNNs) to select highest quality embryos using a large collection of human embryo time-lapse images (about 50,000 images) from a high-volume fertility center in the United States. We developed a framework (STORK) based on Google's Inception model. STORK predicts blastocyst quality with an AUC of >0.98 and generalizes well to images from other clinics outside the US and outperforms individual embryologists. Using clinical data for 2182 embryos, we created a decision tree to integrate embryo quality and patient age to identify scenarios associated with pregnancy likelihood. Our analysis shows that the chance of pregnancy based on individual embryos varies from 13.8% (age ≥41 and poor-quality) to 66.3% (age <37 and good-quality) depending on automated blastocyst quality assessment and patient age. In conclusion, our AI-driven approach provides a reproducible way to assess embryo quality and uncovers new, potentially personalized strategies to select embryos.

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL