Pesquisa | Portal Regional da BVS (teste)

Seeking arrangements: cell contact as a cleavage-stage biomarker.

He, Chloe; Karpaviciute, Neringa; Hariharan, Rishabh; Lees, Lilly; Jacques, Céline; Ferrand, Timothy; Chambost, Jérôme; Wouters, Koen; Malmsten, Jonas; Miller, Ryan; Zaninovic, Nikica; Vasconcelos, Francisco; Hickman, Cristina.

Reprod Biomed Online ; 48(3): 103654, 2024 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-38246064

RESUMO

RESEARCH QUESTION: What can three-dimensional cell contact networks tell us about the developmental potential of cleavage-stage human embryos? DESIGN: This pilot study was a retrospective analysis of two Embryoscope imaging datasets from two clinics. An artificial intelligence system was used to reconstruct the three-dimensional structure of embryos from 11-plane focal stacks. Networks of cell contacts were extracted from the resulting embryo three-dimensional models and each embryo's mean contacts per cell was computed. Unpaired t-tests and receiver operating characteristic curve analysis were used to statistically analyse mean cell contact outcomes. Cell contact networks from different embryos were compared with identical embryos with similar cell arrangements. RESULTS: At t4, a higher mean number of contacts per cell was associated with greater rates of blastulation and blastocyst quality. No associations were found with biochemical pregnancy, live birth, miscarriage or ploidy. At t8, a higher mean number of contacts was associated with increased blastocyst quality, biochemical pregnancy and live birth. No associations were found with miscarriage or aneuploidy. Mean contacts at t4 weakly correlated with those at t8. Four-cell embryos fell into nine distinct cell arrangements; the five most common accounted for 97% of embryos. Eight-cell embryos, however, displayed a greater degree of variation with 59 distinct cell arrangements. CONCLUSIONS: Evidence is provided for the clinical relevance of cleavage-stage cell arrangement in the human preimplantation embryo beyond the four-cell stage, which may improve selection techniques for day-3 transfers. This pilot study provides a strong case for further investigation into spatial biomarkers and three-dimensional morphokinetics.

Assuntos

Aborto Espontâneo , Gravidez , Feminino , Humanos , Estudos Retrospectivos , Transferência Embrionária/métodos , Inteligência Artificial , Projetos Piloto , Fase de Clivagem do Zigoto , Blastocisto , Aneuploidia , Biomarcadores , Taxa de Gravidez

Localizing unmapped sequences with families to validate the Telomere-to-Telomere assembly and identify new hotspots for genetic diversity.

Chrisman, Brianna; He, Chloe; Jung, Jae-Yoon; Stockham, Nate; Paskov, Kelley; Washington, Peter; Petereit, Juli; Wall, Dennis P.

Genome Res ; 33(10): 1734-1746, 2023 10.

Artigo em Inglês | MEDLINE | ID: mdl-37879860

RESUMO

Although it is ubiquitous in genomics, the current human reference genome (GRCh38) is incomplete: It is missing large sections of heterochromatic sequence, and as a singular, linear reference genome, it does not represent the full spectrum of human genetic diversity. To characterize gaps in GRCh38 and human genetic diversity, we developed an algorithm for sequence location approximation using nuclear families (ASLAN) to identify the region of origin of reads that do not align to GRCh38. Using unmapped reads and variant calls from whole-genome sequences (WGSs), ASLAN uses a maximum likelihood model to identify the most likely region of the genome that a subsequence belongs to given the distribution of the subsequence in the unmapped reads and phasings of families. Validating ASLAN on synthetic data and on reads from the alternative haplotypes in the decoy genome, ASLAN localizes >90% of 100-bp sequences with >92% accuracy and â¼1 Mb of resolution. We then ran ASLAN on 100-mers from unmapped reads from WGS from more than 700 families, and compared ASLAN localizations to alignment of the 100-mers to the recently released T2T-CHM13 assembly. We found that many unmapped reads in GRCh38 originate from telomeres and centromeres that are gaps in GRCh38. ASLAN localizations are in high concordance with T2T-CHM13 alignments, except in the centromeres of the acrocentric chromosomes. Comparing ASLAN localizations and T2T-CHM13 alignments, we identified sequences missing from T2T-CHM13 or sequences with high divergence from their aligned region in T2T-CHM13, highlighting new hotspots for genetic diversity.

Assuntos

Genoma Humano , Genômica , Humanos , Algoritmos , Telômero/genética , Variação Genética , Análise de Sequência de DNA

Predicting the number of oocytes retrieved from controlled ovarian hyperstimulation with machine learning.

Ferrand, Timothy; Boulant, Justine; He, Chloe; Chambost, Jérôme; Jacques, Céline; Pena, Chris-Alexandre; Hickman, Cristina; Reignier, Arnaud; Fréour, Thomas.

Hum Reprod ; 38(10): 1918-1926, 2023 10 03.

Artigo em Inglês | MEDLINE | ID: mdl-37581894

RESUMO

STUDY QUESTION: Can machine learning predict the number of oocytes retrieved from controlled ovarian hyperstimulation (COH)? SUMMARY ANSWER: Three machine-learning models were successfully trained to predict the number of oocytes retrieved from COH. WHAT IS KNOWN ALREADY: A number of previous studies have identified and built predictive models on factors that influence the number of oocytes retrieved during COH. Many of these studies are, however, limited in the fact that they only consider a small number of variables in isolation. STUDY DESIGN, SIZE, DURATION: This study was a retrospective analysis of a dataset of 11,286 cycles performed at a single centre in France between 2009 and 2020 with the aim of building a predictive model for the number of oocytes retrieved from ovarian stimulation. The analysis was carried out by a data analysis team external to the centre using the Substra framework. The Substra framework enabled the data analysis team to send computer code to run securely on the centre's on-premises server. In this way, a high level of data security was achieved as the data analysis team did not have direct access to the data, nor did the data leave the centre at any point during the study. PARTICIPANTS/MATERIALS, SETTING, METHODS: The Light Gradient Boosting Machine algorithm was used to produce three predictive models: one that directly predicted the number of oocytes retrieved and two that predicted which of a set of bins provided by two clinicians the number of oocytes retrieved fell into. The resulting models were evaluated on a held-out test set and compared to linear and logistic regression baselines. In addition, the models themselves were analysed to identify the parameters that had the biggest impact on their predictions. MAIN RESULTS AND THE ROLE OF CHANCE: On average, the model that directly predicted the number of oocytes retrieved deviated from the ground truth by 4.21 oocytes. The model that predicted the first clinician's bins deviated by 0.73 bins whereas the model for the second clinician deviated by 0.62 bins. For all models, performance was best within the first and third quartiles of the target variable, with the model underpredicting extreme values of the target variable (no oocytes and large numbers of oocytes retrieved). Nevertheless, the erroneous predictions made for these extreme cases were still within the vicinity of the true value. Overall, all three models agreed on the importance of each feature which was estimated using Shapley Additive Explanation (SHAP) values. The feature with the highest mean absolute SHAP value (and thus the highest importance) was the antral follicle count, followed by basal AMH and FSH. Of the other hormonal features, basal TSH, LH, and testosterone levels were similarly important and baseline LH was the least important. The treatment characteristic with the highest SHAP value was the initial dose of gonadotropins. LIMITATIONS, REASONS FOR CAUTION: The models produced in this study were trained on a cohort from a single centre. They should thus not be used in clinical practice until trained and evaluated on a larger cohort more representative of the general population. WIDER IMPLICATIONS OF FINDINGS: These predictive models for the number of oocytes retrieved from COH may be useful in clinical practice, assisting clinicians in optimizing COH protocols for individual patients. Our work also demonstrates the promise of using the Substra framework for allowing external researchers to provide clinically relevant insights on sensitive fertility data in a fully secure, trustworthy manner and opens a number of exciting avenues for accelerating future research. STUDY FUNDING/COMPETING INTEREST(S): This study was funded by the French Public Bank of Investment as part of the Healthchain Consortium. T.Fe., C.He., J.C., C.J., C.-A.P., and C.Hi. are employed by Apricity. C.Hi. has received consulting fees and honoraria from Vitrolife, Merck Serono, Ferring, Cooper Surgical, Dibimed, Apricity, and Fairtility and travel support from Fairtility and Vitrolife, participates on an advisory board for Merck Serono, was the founder and organizer of the AI Fertility conference, has stock in Aria Fertility, TMRW, Fairtility, Apricity, and IVF Professionals, and received free equipment from Planar in exchange for first user feedback. C.J. has received a grant from BPI. J.C. has also received a grant from BPI, is a member of the Merck AI advisory board, and is a board member of Labelia Labs. C.He has a contract for medical writing of this manuscript by CHU Nantes and has received travel support from Apricity. A.R. haÈ received honoraria from Ferring and Organon. T.Fe. has received a grant from BPI. TRIAL REGISTRATION NUMBER: N/A.

Assuntos

Coeficiente de Natalidade , Síndrome de Hiperestimulação Ovariana , Masculino , Feminino , Humanos , Estudos Retrospectivos , Resultado do Tratamento , Indução da Ovulação/métodos , Oócitos , Fertilização in vitro/métodos

Transmission dynamics of human herpesvirus 6A, 6B and 7 from whole genome sequences of families.

Chrisman, Brianna S; He, Chloe; Jung, Jae-Yoon; Stockham, Nate; Paskov, Kelley; Wall, Dennis P.

Virol J ; 19(1): 225, 2022 12 24.

Artigo em Inglês | MEDLINE | ID: mdl-36566197

RESUMO

While hundreds of thousands of human whole genome sequences (WGS) have been collected in the effort to better understand genetic determinants of disease, these whole genome sequences have less frequently been used to study another major determinant of human health: the human virome. Using the unmapped reads from WGS of over 1000 families, we present insights into the human blood DNA virome, focusing particularly on human herpesvirus (HHV) 6A, 6B, and 7. In addition to extensively cataloguing the viruses detected in WGS of human whole blood and lymphoblastoid cell lines, we use the family structure of our dataset to show that household drives transmission of several viruses, and identify the Mendelian inheritance patterns characteristic of inherited chromsomally integrated human herpesvirus 6 (iciHHV-6). Consistent with prior studies, we find that 0.6% of our dataset's population has iciHHV, and we locate candidate integration sequences for these cases. We document genetic diversity within exogenous and integrated HHV species and within integration sites of HHV-6. Finally, in the first observation of its kind, we present evidence that suggests widespread de novo HHV-6B integration and HHV-7 integration and reactivation in lymphoblastoid cell lines. These findings show that the unmapped read space of WGS is a promising source of data for virology research.

Assuntos

Herpesvirus Humano 6 , Infecções por Roseolovirus , Humanos , Herpesvirus Humano 6/genética , Integração Viral , Análise de Sequência , Linhagem Celular

The human "contaminome": bacterial, viral, and computational contamination in whole genome sequences from 1000 families.

Chrisman, Brianna; He, Chloe; Jung, Jae-Yoon; Stockham, Nate; Paskov, Kelley; Washington, Peter; Wall, Dennis P.

Sci Rep ; 12(1): 9863, 2022 06 14.

Artigo em Inglês | MEDLINE | ID: mdl-35701436

RESUMO

The unmapped readspace of whole genome sequencing data tends to be large but is often ignored. We posit that it contains valuable signals of both human infection and contamination. Using unmapped and poorly aligned reads from whole genome sequences (WGS) of over 1000 families and nearly 5000 individuals, we present insights into common viral, bacterial, and computational contamination that plague whole genome sequencing studies. We present several notable results: (1) In addition to known contaminants such as Epstein-Barr virus and phiX, sequences from whole blood and lymphocyte cell lines contain many other contaminants, likely originating from storage, prep, and sequencing pipelines. (2) Sequencing plate and biological sample source of a sample strongly influence contamination profile. And, (3) Y-chromosome fragments not on the human reference genome commonly mismap to bacterial reference genomes. Both experiment-derived and computational contamination is prominent in next-generation sequencing data. Such contamination can compromise results from WGS as well as metagenomics studies, and standard protocols for identifying and removing contamination should be developed to ensure the fidelity of sequencing-based studies.

Assuntos

Bacteriófagos , Infecções por Vírus Epstein-Barr , Biologia Computacional , Genoma Bacteriano , Genoma Humano , Genoma Viral , Herpesvirus Humano 4/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Sequenciamento Completo do Genoma

Classifying Autism From Crowdsourced Semistructured Speech Recordings: Machine Learning Model Comparison Study.

Chi, Nathan A; Washington, Peter; Kline, Aaron; Husic, Arman; Hou, Cathy; He, Chloe; Dunlap, Kaitlyn; Wall, Dennis P.

JMIR Pediatr Parent ; 5(2): e35406, 2022 Apr 14.

Artigo em Inglês | MEDLINE | ID: mdl-35436234

RESUMO

BACKGROUND: Autism spectrum disorder (ASD) is a neurodevelopmental disorder that results in altered behavior, social development, and communication patterns. In recent years, autism prevalence has tripled, with 1 in 44 children now affected. Given that traditional diagnosis is a lengthy, labor-intensive process that requires the work of trained physicians, significant attention has been given to developing systems that automatically detect autism. We work toward this goal by analyzing audio data, as prosody abnormalities are a signal of autism, with affected children displaying speech idiosyncrasies such as echolalia, monotonous intonation, atypical pitch, and irregular linguistic stress patterns. OBJECTIVE: We aimed to test the ability for machine learning approaches to aid in detection of autism in self-recorded speech audio captured from children with ASD and neurotypical (NT) children in their home environments. METHODS: We considered three methods to detect autism in child speech: (1) random forests trained on extracted audio features (including Mel-frequency cepstral coefficients); (2) convolutional neural networks trained on spectrograms; and (3) fine-tuned wav2vec 2.0-a state-of-the-art transformer-based speech recognition model. We trained our classifiers on our novel data set of cellphone-recorded child speech audio curated from the Guess What? mobile game, an app designed to crowdsource videos of children with ASD and NT children in a natural home environment. RESULTS: The random forest classifier achieved 70% accuracy, the fine-tuned wav2vec 2.0 model achieved 77% accuracy, and the convolutional neural network achieved 79% accuracy when classifying children's audio as either ASD or NT. We used 5-fold cross-validation to evaluate model performance. CONCLUSIONS: Our models were able to predict autism status when trained on a varied selection of home audio clips with inconsistent recording qualities, which may be more representative of real-world conditions. The results demonstrate that machine learning methods offer promise in detecting autism automatically from speech without specialized equipment.

A Method for Localizing Non-Reference Sequences to the Human Genome.

Chrisman, Brianna Sierra; Paskov, Kelley M; He, Chloe; Jung, Jae-Yoon; Stockham, Nate; Washington, Peter Yigitcan; Wall, Dennis Paul.

Pac Symp Biocomput ; 27: 313-324, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-34890159

RESUMO

As the last decade of human genomics research begins to bear the fruit of advancements in precision medicine, it is important to ensure that genomics' improvements in human health are distributed globally and equitably. An important step to ensuring health equity is to improve the human reference genome to capture global diversity by including a wide variety of alternative haplotypes, sequences that are not currently captured on the reference genome.We present a method that localizes 100 basepair (bp) long sequences extracted from short-read sequencing that can ultimately be used to identify what regions of the human genome non-reference sequences belong to.We extract reads that don't align to the reference genome, and compute the population's distribution of 100-mers found within the unmapped reads. We use genetic data from families to identify shared genetic material between siblings and match the distribution of unmapped k-mers to these inheritance patterns to determine the the most likely genomic region of a k-mer. We perform this localization with two highly interpretable methods of artificial intelligence: a computationally tractable Hidden Markov Model coupled to a Maximum Likelihood Estimator. Using a set of alternative haplotypes with known locations on the genome, we show that our algorithm is able to localize 96% of k-mers with over 90% accuracy and less than 1Mb median resolution. As the collection of sequenced human genomes grows larger and more diverse, we hope that this method can be used to improve the human reference genome, a critical step in addressing precision medicine's diversity crisis.

Assuntos

Inteligência Artificial , Genoma Humano , Biologia Computacional , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Análise de Sequência de DNA

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA