Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 122
Filter
1.
JMIR Res Protoc ; 13: e52205, 2024 Feb 08.
Article in English | MEDLINE | ID: mdl-38329783

ABSTRACT

BACKGROUND: A considerable number of minors in the United States are diagnosed with developmental or psychiatric conditions, potentially influenced by underdiagnosis factors such as cost, distance, and clinician availability. Despite the potential of digital phenotyping tools with machine learning (ML) approaches to expedite diagnoses and enhance diagnostic services for pediatric psychiatric conditions, existing methods face limitations because they use a limited set of social features for prediction tasks and focus on a single binary prediction, resulting in uncertain accuracies. OBJECTIVE: This study aims to propose the development of a gamified web system for data collection, followed by a fusion of novel crowdsourcing algorithms with ML behavioral feature extraction approaches to simultaneously predict diagnoses of autism spectrum disorder and attention-deficit/hyperactivity disorder in a precise and specific manner. METHODS: The proposed pipeline will consist of (1) gamified web applications to curate videos of social interactions adaptively based on the needs of the diagnostic system, (2) behavioral feature extraction techniques consisting of automated ML methods and novel crowdsourcing algorithms, and (3) the development of ML models that classify several conditions simultaneously and that adaptively request additional information based on uncertainties about the data. RESULTS: A preliminary version of the web interface has been implemented, and a prior feature selection method has highlighted a core set of behavioral features that can be targeted through the proposed gamified approach. CONCLUSIONS: The prospect for high reward stems from the possibility of creating the first artificial intelligence-powered tool that can identify complex social behaviors well enough to distinguish conditions with nuanced differentiators such as autism spectrum disorder and attention-deficit/hyperactivity disorder. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): PRR1-10.2196/52205.

2.
Genome Res ; 33(10): 1734-1746, 2023 10.
Article in English | MEDLINE | ID: mdl-37879860

ABSTRACT

Although it is ubiquitous in genomics, the current human reference genome (GRCh38) is incomplete: It is missing large sections of heterochromatic sequence, and as a singular, linear reference genome, it does not represent the full spectrum of human genetic diversity. To characterize gaps in GRCh38 and human genetic diversity, we developed an algorithm for sequence location approximation using nuclear families (ASLAN) to identify the region of origin of reads that do not align to GRCh38. Using unmapped reads and variant calls from whole-genome sequences (WGSs), ASLAN uses a maximum likelihood model to identify the most likely region of the genome that a subsequence belongs to given the distribution of the subsequence in the unmapped reads and phasings of families. Validating ASLAN on synthetic data and on reads from the alternative haplotypes in the decoy genome, ASLAN localizes >90% of 100-bp sequences with >92% accuracy and ∼1 Mb of resolution. We then ran ASLAN on 100-mers from unmapped reads from WGS from more than 700 families, and compared ASLAN localizations to alignment of the 100-mers to the recently released T2T-CHM13 assembly. We found that many unmapped reads in GRCh38 originate from telomeres and centromeres that are gaps in GRCh38. ASLAN localizations are in high concordance with T2T-CHM13 alignments, except in the centromeres of the acrocentric chromosomes. Comparing ASLAN localizations and T2T-CHM13 alignments, we identified sequences missing from T2T-CHM13 or sequences with high divergence from their aligned region in T2T-CHM13, highlighting new hotspots for genetic diversity.


Subject(s)
Genome, Human , Genomics , Humans , Algorithms , Telomere/genetics , Genetic Variation , Sequence Analysis, DNA
3.
Genome Res ; 33(10): 1747-1756, 2023 10.
Article in English | MEDLINE | ID: mdl-37879861

ABSTRACT

Large, whole-genome sequencing (WGS) data sets containing families provide an important opportunity to identify crossovers and shared genetic material in siblings. However, the high variant calling error rates of WGS in some areas of the genome can result in spurious crossover calls, and the special inheritance status of the X Chromosome presents challenges. We have developed a hidden Markov model that addresses these issues by modeling the inheritance of variants in families in the presence of error-prone regions and inherited deletions. We call our method PhasingFamilies. We validate PhasingFamilies using the platinum genome family NA1281 (precision: 0.81; recall: 0.97), as well as simulated genomes with known crossover positions (precision: 0.93; recall: 0.92). Using 1925 quads from the Simons Simplex Collection, we found that PhasingFamilies resolves crossovers to a median resolution of 3527.5 bp. These crossovers recapitulate existing recombination rate maps, including for the X Chromosome; produce sibling pair IBD that matches expected distributions; and are validated by the haplotype estimation tool SHAPEIT. We provide an efficient, open-source implementation of PhasingFamilies that can be used to identify crossovers from family sequencing data.


Subject(s)
Genome , Inheritance Patterns , Humans , Whole Genome Sequencing , Haplotypes
4.
Sci Rep ; 13(1): 11353, 2023 07 13.
Article in English | MEDLINE | ID: mdl-37443184

ABSTRACT

While healthy gut microbiomes are critical to human health, pertinent microbial processes remain largely undefined, partially due to differential bias among profiling techniques. By simultaneously integrating multiple profiling methods, multi-omic analysis can define generalizable microbial processes, and is especially useful in understanding complex conditions such as Autism. Challenges with integrating heterogeneous data produced by multiple profiling methods can be overcome using Latent Dirichlet Allocation (LDA), a promising natural language processing technique that identifies topics in heterogeneous documents. In this study, we apply LDA to multi-omic microbial data (16S rRNA amplicon, shotgun metagenomic, shotgun metatranscriptomic, and untargeted metabolomic profiling) from the stool of 81 children with and without Autism. We identify topics, or microbial processes, that summarize complex phenomena occurring within gut microbial communities. We then subset stool samples by topic distribution, and identify metabolites, specifically neurotransmitter precursors and fatty acid derivatives, that differ significantly between children with and without Autism. We identify clusters of topics, deemed "cross-omic topics", which we hypothesize are representative of generalizable microbial processes observable regardless of profiling method. Interpreting topics, we find each represents a particular diet, and we heuristically label each cross-omic topic as: healthy/general function, age-associated function, transcriptional regulation, and opportunistic pathogenesis.


Subject(s)
Autistic Disorder , Gastrointestinal Microbiome , Microbiota , Child , Humans , Gastrointestinal Microbiome/genetics , Multiomics , RNA, Ribosomal, 16S/genetics , Microbiota/genetics
5.
Proc Natl Acad Sci U S A ; 120(31): e2215632120, 2023 08.
Article in English | MEDLINE | ID: mdl-37506195

ABSTRACT

Autism spectrum disorder (ASD) has a complex genetic architecture involving contributions from both de novo and inherited variation. Few studies have been designed to address the role of rare inherited variation or its interaction with common polygenic risk in ASD. Here, we performed whole-genome sequencing of the largest cohort of multiplex families to date, consisting of 4,551 individuals in 1,004 families having two or more autistic children. Using this study design, we identify seven previously unrecognized ASD risk genes supported by a majority of rare inherited variants, finding support for a total of 74 genes in our cohort and a total of 152 genes after combined analysis with other studies. Autistic children from multiplex families demonstrate an increased burden of rare inherited protein-truncating variants in known ASD risk genes. We also find that ASD polygenic score (PGS) is overtransmitted from nonautistic parents to autistic children who also harbor rare inherited variants, consistent with combinatorial effects in the offspring, which may explain the reduced penetrance of these rare variants in parents. We also observe that in addition to social dysfunction, language delay is associated with ASD PGS overtransmission. These results are consistent with an additive complex genetic risk architecture of ASD involving rare and common variation and further suggest that language delay is a core biological feature of ASD.


Subject(s)
Autism Spectrum Disorder , Language Development Disorders , Child , Humans , Autism Spectrum Disorder/genetics , Multifactorial Inheritance/genetics , Parents , Whole Genome Sequencing , Genetic Predisposition to Disease
6.
Nat Neurosci ; 26(7): 1208-1217, 2023 07.
Article in English | MEDLINE | ID: mdl-37365313

ABSTRACT

Autism spectrum disorder (ASD) is a neurodevelopmental disorder characterized by heterogeneous cognitive, behavioral and communication impairments. Disruption of the gut-brain axis (GBA) has been implicated in ASD although with limited reproducibility across studies. In this study, we developed a Bayesian differential ranking algorithm to identify ASD-associated molecular and taxa profiles across 10 cross-sectional microbiome datasets and 15 other datasets, including dietary patterns, metabolomics, cytokine profiles and human brain gene expression profiles. We found a functional architecture along the GBA that correlates with heterogeneity of ASD phenotypes, and it is characterized by ASD-associated amino acid, carbohydrate and lipid profiles predominantly encoded by microbial species in the genera Prevotella, Bifidobacterium, Desulfovibrio and Bacteroides and correlates with brain gene expression changes, restrictive dietary patterns and pro-inflammatory cytokine profiles. The functional architecture revealed in age-matched and sex-matched cohorts is not present in sibling-matched cohorts. We also show a strong association between temporal changes in microbiome composition and ASD phenotypes. In summary, we propose a framework to leverage multi-omic datasets from well-defined cohorts and investigate how the GBA influences ASD.


Subject(s)
Autism Spectrum Disorder , Gastrointestinal Microbiome , Humans , Gastrointestinal Microbiome/genetics , Brain-Gut Axis , Autism Spectrum Disorder/genetics , Autism Spectrum Disorder/metabolism , Cross-Sectional Studies , Bayes Theorem , Reproducibility of Results , Cytokines
7.
Annu Rev Biomed Data Sci ; 6: 211-228, 2023 08 10.
Article in English | MEDLINE | ID: mdl-37137169

ABSTRACT

Autism spectrum disorder (autism) is a neurodevelopmental delay that affects at least 1 in 44 children. Like many neurological disorder phenotypes, the diagnostic features are observable, can be tracked over time, and can be managed or even eliminated through proper therapy and treatments. However, there are major bottlenecks in the diagnostic, therapeutic, and longitudinal tracking pipelines for autism and related neurodevelopmental delays, creating an opportunity for novel data science solutions to augment and transform existing workflows and provide increased access to services for affected families. Several efforts previously conducted by a multitude of research labs have spawned great progress toward improved digital diagnostics and digital therapies for children with autism. We review the literature on digital health methods for autism behavior quantification and beneficial therapies using data science. We describe both case-control studies and classification systems for digital phenotyping. We then discuss digital diagnostics and therapeutics that integrate machine learning models of autism-related behaviors, including the factors that must be addressed for translational use. Finally, we describe ongoing challenges and potential opportunities for the field of autism data science. Given the heterogeneous nature of autism and the complexities of the relevant behaviors, this review contains insights that are relevant to neurological behavior analysis and digital psychiatry more broadly.


Subject(s)
Autism Spectrum Disorder , Autistic Disorder , Humans , Autistic Disorder/diagnosis , Autism Spectrum Disorder/diagnosis , Data Science , Machine Learning , Phenotype
8.
J Dev Behav Pediatr ; 44(2): e126-e134, 2023.
Article in English | MEDLINE | ID: mdl-36730317

ABSTRACT

ABSTRACT: Technological breakthroughs, together with the rapid growth of medical information and improved data connectivity, are creating dramatic shifts in the health care landscape, including the field of developmental and behavioral pediatrics. While medical information took an estimated 50 years to double in 1950, by 2020, it was projected to double every 73 days. Artificial intelligence (AI)-powered health technologies, once considered theoretical or research-exclusive concepts, are increasingly being granted regulatory approval and integrated into clinical care. In the United States, the Food and Drug Administration has cleared or approved over 160 health-related AI-based devices to date. These trends are only likely to accelerate as economic investment in AI health care outstrips investment in other sectors. The exponential increase in peer-reviewed AI-focused health care publications year over year highlights the speed of growth in this sector. As health care moves toward an era of intelligent technology powered by rich medical information, pediatricians will increasingly be asked to engage with tools and systems underpinned by AI. However, medical students and practicing clinicians receive insufficient training and lack preparedness for transitioning into a more AI-informed future. This article provides a brief primer on AI in health care. Underlying AI principles and key performance metrics are described, and the clinical potential of AI-driven technology together with potential pitfalls is explored within the developmental and behavioral pediatric health context.


Subject(s)
Artificial Intelligence , Pediatrics , Humans , Child , Delivery of Health Care , Pediatricians
9.
JAMA Netw Open ; 6(1): e2251182, 2023 01 03.
Article in English | MEDLINE | ID: mdl-36689227

ABSTRACT

Importance: While research has identified racial and ethnic disparities in access to autism services, the size, extent, and specific locations of these access gaps have not yet been characterized on a national scale. Mapping comprehensive national listings of autism health care services together with the prevalence of autistic children of various races and ethnicities and evaluating geographic regions defined by localized commuting patterns may help to identify areas within the US where families who belong to minoritized racial and ethnic groups have disproportionally lower access to services. Objective: To evaluate differences in access to autism health care services among autistic children of various races and ethnicities within precisely defined geographic regions encompassing all serviceable areas within the US. Design, Setting, and Participants: This population-based cross-sectional study was conducted from October 5, 2021, to June 3, 2022, and involved 530 965 autistic children in kindergarten through grade 12. Core-based statistical areas (CBSAs; defined as areas containing a city and its surrounding commuter region), the Civil Rights Data Collection (CRDC) data set, and 51 071 autism resources (collected from October 1, 2015, to December 18, 2022) geographically distributed into 912 CBSAs were combined and analyzed to understand variation in access to autism health care services among autistic children of different races and ethnicities. Six racial and ethnic categories (American Indian or Alaska Native, Asian, Black or African American, Hispanic or Latino, Native Hawaiian or other Pacific Islander, and White) assigned by the US Department of Education were included in the analysis. Main Outcomes and Measures: A regularized least-squares regression analysis was used to measure differences in nationwide resource allocation between racial and ethnic groups. The number of autism resources allocated per autistic child was estimated based on the child's racial and ethnic group. To evaluate how the CBSA population size may have altered the results, the least-squares regression analysis was run on CBSAs divided into metropolitan (>50 000 inhabitants) and micropolitan (10 000-50 000 inhabitants) groups. A Mann-Whitney U test was used to compare the model estimated ratio of autism resources to autistic children among specific racial and ethnic groups comprising the proportions of autistic children in each CBSA. Results: Among 530 965 autistic children aged 5 to 18 years, 83.9% were male and 16.1% were female; 0.7% of children were American Indian or Alaska Native, 5.9% were Asian, 14.3% were Black or African American, 22.9% were Hispanic or Latino, 0.2% were Native Hawaiian or other Pacific Islander, 51.7% were White, and 4.2% were of 2 or more races and/or ethnicities. At a national scale, American Indian or Alaska Native autistic children (ß = 0; 95% CI, 0-0; P = .01) and Hispanic autistic children (ß = 0.02; 95% CI, 0-0.06; P = .02) had significant disparities in access to autism resources in comparison with White autistic children. When evaluating the proportion of autistic children in each racial and ethnic group, areas in which Black autistic children (>50% of the population: ß = 0.05; <50% of the population: ß = 0.07; P = .002) or Hispanic autistic children (>50% of the population: ß = 0.04; <50% of the population: ß = 0.07; P < .001) comprised greater than 50% of the total population of autistic children had significantly fewer resources than areas in which Black or Hispanic autistic children comprised less than 50% of the total population. Comparing metropolitan vs micropolitan CBSAs revealed that in micropolitan CBSAs, Black autistic children (ß = 0; 95% CI, 0-0; P < .001) and Hispanic autistic children (ß = 0; 95% CI, 0-0.02; P < .001) had the greatest disparities in access to autism resources compared with White autistic children. In metropolitan CBSAs, American Indian or Alaska Native autistic children (ß = 0; 95% CI, 0-0; P = .005) and Hispanic autistic children (ß = 0.01; 95% CI, 0-0.06; P = .02) had the greatest disparities compared with White autistic children. Conclusions and Relevance: In this study, autistic children from several minoritized racial and ethnic groups, including Black and Hispanic autistic children, had access to significantly fewer autism resources than White autistic children in the US. This study pinpointed the specific geographic regions with the greatest disparities, where increases in the number and types of treatment options are warranted. These findings suggest that a prioritized response strategy to address these racial and ethnic disparities is needed.


Subject(s)
Autistic Disorder , Child , Humans , Male , Female , Cross-Sectional Studies , Health Services Accessibility , Healthcare Disparities , Racial Groups
10.
Pac Symp Biocomput ; 28: 461-471, 2023.
Article in English | MEDLINE | ID: mdl-36541000

ABSTRACT

Innovations in human-centered biomedical informatics are often developed with the eventual goal of real-world translation. While biomedical research questions are usually answered in terms of how a method performs in a particular context, we argue that it is equally important to consider and formally evaluate the ethical implications of informatics solutions. Several new research paradigms have arisen as a result of the consideration of ethical issues, including but not limited for privacy-preserving computation and fair machine learning. In the spirit of the Pacific Symposium on Biocomputing, we discuss broad and fundamental principles of ethical biomedical informatics in terms of Olelo Noeau, or Hawaiian proverbs and poetical sayings that capture Hawaiian values. While we emphasize issues related to privacy and fairness in particular, there are a multitude of facets to ethical biomedical informatics that can benefit from a critical analysis grounded in ethics.


Subject(s)
Computational Biology , Informatics , Humans , Hawaii , Privacy
11.
JMIR Form Res ; 7: e39917, 2023 Mar 21.
Article in English | MEDLINE | ID: mdl-35962462

ABSTRACT

BACKGROUND: Implementing automated facial expression recognition on mobile devices could provide an accessible diagnostic and therapeutic tool for those who struggle to recognize facial expressions, including children with developmental behavioral conditions such as autism. Despite recent advances in facial expression classifiers for children, existing models are too computationally expensive for smartphone use. OBJECTIVE: We explored several state-of-the-art facial expression classifiers designed for mobile devices, used posttraining optimization techniques for both classification performance and efficiency on a Motorola Moto G6 phone, evaluated the importance of training our classifiers on children versus adults, and evaluated the models' performance against different ethnic groups. METHODS: We collected images from 12 public data sets and used video frames crowdsourced from the GuessWhat app to train our classifiers. All images were annotated for 7 expressions: neutral, fear, happiness, sadness, surprise, anger, and disgust. We tested 3 copies for each of 5 different convolutional neural network architectures: MobileNetV3-Small 1.0x, MobileNetV2 1.0x, EfficientNetB0, MobileNetV3-Large 1.0x, and NASNetMobile. We trained the first copy on images of children, second copy on images of adults, and third copy on all data sets. We evaluated each model against the entire Child Affective Facial Expression (CAFE) set and by ethnicity. We performed weight pruning, weight clustering, and quantize-aware training when possible and profiled each model's performance on the Moto G6. RESULTS: Our best model, a MobileNetV3-Large network pretrained on ImageNet, achieved 65.78% accuracy and 65.31% F1-score on the CAFE and a 90-millisecond inference latency on a Moto G6 phone when trained on all data. This accuracy is only 1.12% lower than the current state of the art for CAFE, a model with 13.91x more parameters that was unable to run on the Moto G6 due to its size, even when fully optimized. When trained solely on children, this model achieved 60.57% accuracy and 60.29% F1-score. When trained only on adults, the model received 53.36% accuracy and 53.10% F1-score. Although the MobileNetV3-Large trained on all data sets achieved nearly a 60% F1-score across all ethnicities, the data sets for South Asian and African American children achieved lower accuracy (as much as 11.56%) and F1-score (as much as 11.25%) than other groups. CONCLUSIONS: With specialized design and optimization techniques, facial expression classifiers can become lightweight enough to run on mobile devices and achieve state-of-the-art performance. There is potentially a "data shift" phenomenon between facial expressions of children compared with adults; our classifiers performed much better when trained on children. Certain underrepresented ethnic groups (e.g., South Asian and African American) also perform significantly worse than groups such as European Caucasian despite similar data quality. Our models can be integrated into mobile health therapies to help diagnose autism spectrum disorder and provide targeted therapeutic treatment to children.

12.
Virol J ; 19(1): 225, 2022 12 24.
Article in English | MEDLINE | ID: mdl-36566197

ABSTRACT

While hundreds of thousands of human whole genome sequences (WGS) have been collected in the effort to better understand genetic determinants of disease, these whole genome sequences have less frequently been used to study another major determinant of human health: the human virome. Using the unmapped reads from WGS of over 1000 families, we present insights into the human blood DNA virome, focusing particularly on human herpesvirus (HHV) 6A, 6B, and 7. In addition to extensively cataloguing the viruses detected in WGS of human whole blood and lymphoblastoid cell lines, we use the family structure of our dataset to show that household drives transmission of several viruses, and identify the Mendelian inheritance patterns characteristic of inherited chromsomally integrated human herpesvirus 6 (iciHHV-6). Consistent with prior studies, we find that 0.6% of our dataset's population has iciHHV, and we locate candidate integration sequences for these cases. We document genetic diversity within exogenous and integrated HHV species and within integration sites of HHV-6. Finally, in the first observation of its kind, we present evidence that suggests widespread de novo HHV-6B integration and HHV-7 integration and reactivation in lymphoblastoid cell lines. These findings show that the unmapped read space of WGS is a promising source of data for virology research.


Subject(s)
Herpesvirus 6, Human , Roseolovirus Infections , Humans , Herpesvirus 6, Human/genetics , Virus Integration , Sequence Analysis , Cell Line
13.
Sci Rep ; 12(1): 17034, 2022 10 11.
Article in English | MEDLINE | ID: mdl-36220843

ABSTRACT

Observational studies have shown that the composition of the human gut microbiome in children diagnosed with Autism Spectrum Disorder (ASD) differs significantly from that of their neurotypical (NT) counterparts. Thus far, reported ASD-specific microbiome signatures have been inconsistent. To uncover reproducible signatures, we compiled 10 publicly available raw amplicon and metagenomic sequencing datasets alongside new data generated from an internal cohort (the largest ASD cohort to date), unified them with standardized pre-processing methods, and conducted a comprehensive meta-analysis of all taxa and variables detected across multiple studies. By screening metadata to test associations between the microbiome and 52 variables in multiple patient subsets and across multiple datasets, we determined that differentially abundant taxa in ASD versus NT children were dependent upon age, sex, and bowel function, thus marking these variables as potential confounders in case-control ASD studies. Several taxa, including the strains Bacteroides stercoris t__190463 and Clostridium M bolteae t__180407, and the species Granulicatella elegans and Massilioclostridium coli, exhibited differential abundance in ASD compared to NT children only after subjects with bowel dysfunction were removed. Adjusting for age, sex and bowel function resulted in adding or removing significantly differentially abundant taxa in ASD-diagnosed individuals, emphasizing the importance of collecting and controlling for these metadata. We have performed the largest (n = 690) and most comprehensive systematic analysis of ASD gut microbiome data to date. Our study demonstrated the importance of accounting for confounding variables when designing statistical comparative analyses of ASD- and NT-associated gut bacterial profiles. Mitigating these confounders identified robust microbial signatures across cohorts, signifying the importance of accounting for these factors in comparative analyses of ASD and NT-associated gut profiles. Such studies will advance the understanding of different patient groups to deliver appropriate therapeutics by identifying microbiome traits germane to the specific ASD phenotype.


Subject(s)
Autism Spectrum Disorder , Gastrointestinal Microbiome , Microbiota , Autism Spectrum Disorder/genetics , Bacteria/genetics , Child , Gastrointestinal Microbiome/genetics , Humans , Metagenome
14.
Intell Based Med ; 6: 100057, 2022.
Article in English | MEDLINE | ID: mdl-36035501

ABSTRACT

Digitally-delivered healthcare is well suited to address current inequities in the delivery of care due to barriers of access to healthcare facilities. As the COVID-19 pandemic phases out, we have a unique opportunity to capitalize on the current familiarity with telemedicine approaches and continue to advocate for mainstream adoption of remote care delivery. In this paper, we specifically focus on the ability of GuessWhat? a smartphone-based charades-style gamified therapeutic intervention for autism spectrum disorder (ASD) to generate a signal that distinguishes children with ASD from neurotypical (NT) children. We demonstrate the feasibility of using "in-the-wild", naturalistic gameplay data to distinguish between ASD and NT by children by training a random forest classifier to discern the two classes (AU-ROC = 0.745, recall = 0.769). This performance demonstrates the potential for GuessWhat? to facilitate screening for ASD in historically difficult-to-reach communities. To further examine this potential, future work should expand the size of the training sample and interrogate differences in predictive ability by demographic.

15.
AMIA Jt Summits Transl Sci Proc ; 2022: 456-465, 2022.
Article in English | MEDLINE | ID: mdl-35854759

ABSTRACT

Autism is among the most common neurodevelopmental conditions. Timely diagnosis and access to therapeutic resources are essential for positive prognoses, yet long queues and unevenly dispersed resources leave many untreated. Without granular estimates of autism prevalence by geographic area, it is difficult to identify unmet needs and mechanisms to address them. Mining a dataset of 53M children using meaningful geographic regions, we computed autism prevalence across the country. We then performed comparative analysis against 50,000 resources to identify the type and extent of gaps in access to autism services. We find a steady increase in autism diagnoses from K-5, supporting delayed diagnosis of autism, and consistent under-diagnosis of females. We find a significant inverse relationship between prevalence and availability of resources (p < 0.001). While more work is needed to characterize additional trends including racial and ethnicity-based disparities, the identification of resource gaps can direct and prioritize new innovations.

16.
Sci Rep ; 12(1): 9863, 2022 06 14.
Article in English | MEDLINE | ID: mdl-35701436

ABSTRACT

The unmapped readspace of whole genome sequencing data tends to be large but is often ignored. We posit that it contains valuable signals of both human infection and contamination. Using unmapped and poorly aligned reads from whole genome sequences (WGS) of over 1000 families and nearly 5000 individuals, we present insights into common viral, bacterial, and computational contamination that plague whole genome sequencing studies. We present several notable results: (1) In addition to known contaminants such as Epstein-Barr virus and phiX, sequences from whole blood and lymphocyte cell lines contain many other contaminants, likely originating from storage, prep, and sequencing pipelines. (2) Sequencing plate and biological sample source of a sample strongly influence contamination profile. And, (3) Y-chromosome fragments not on the human reference genome commonly mismap to bacterial reference genomes. Both experiment-derived and computational contamination is prominent in next-generation sequencing data. Such contamination can compromise results from WGS as well as metagenomics studies, and standard protocols for identifying and removing contamination should be developed to ensure the fidelity of sequencing-based studies.


Subject(s)
Bacteriophages , Epstein-Barr Virus Infections , Computational Biology , Genome, Bacterial , Genome, Human , Genome, Viral , Herpesvirus 4, Human/genetics , High-Throughput Nucleotide Sequencing/methods , Humans , Whole Genome Sequencing
17.
Article in English | MEDLINE | ID: mdl-35634270

ABSTRACT

Artificial Intelligence (A.I.) solutions are increasingly considered for telemedicine. For these methods to serve children and their families in home settings, it is crucial to ensure the privacy of the child and parent or caregiver. To address this challenge, we explore the potential for global image transformations to provide privacy while preserving the quality of behavioral annotations. Crowd workers have previously been shown to reliably annotate behavioral features in unstructured home videos, allowing machine learning classifiers to detect autism using the annotations as input. We evaluate this method with videos altered via pixelation, dense optical flow, and Gaussian blurring. On a balanced test set of 30 videos of children with autism and 30 neurotypical controls, we find that the visual privacy alterations do not drastically alter any individual behavioral annotation at the item level. The AUROC on the evaluation set was 90.0% ±7.5% for unaltered videos, 85.0% ±9.0% for pixelation, 85.0% ±9.0% for optical flow, and 83.3% ±9.3% for blurring, demonstrating that an aggregation of small changes across behavioral questions can collectively result in increased misdiagnosis rates. We also compare crowd answers against clinicians who provided the same annotations for the same videos as crowd workers, and we find that clinicians have higher sensitivity in their recognition of autism-related symptoms. We also find that there is a linear correlation (r = 0.75, p < 0.0001) between the mean Clinical Global Impression (CGI) score provided by professional clinicians and the corresponding score emitted by a previously validated autism classifier with crowd inputs, indicating that the classifier's output probability is a reliable estimate of the clinical impression of autism. A significant correlation is maintained with privacy alterations, indicating that crowd annotations can approximate clinician-provided autism impression from home videos in a privacy-preserved manner.

18.
JMIR Public Health Surveill ; 8(7): e31306, 2022 07 21.
Article in English | MEDLINE | ID: mdl-35605128

ABSTRACT

BACKGROUND: Selection bias and unmeasured confounding are fundamental problems in epidemiology that threaten study internal and external validity. These phenomena are particularly dangerous in internet-based public health surveillance, where traditional mitigation and adjustment methods are inapplicable, unavailable, or out of date. Recent theoretical advances in causal modeling can mitigate these threats, but these innovations have not been widely deployed in the epidemiological community. OBJECTIVE: The purpose of our paper is to demonstrate the practical utility of causal modeling to both detect unmeasured confounding and selection bias and guide model selection to minimize bias. We implemented this approach in an applied epidemiological study of the COVID-19 cumulative infection rate in the New York City (NYC) spring 2020 epidemic. METHODS: We collected primary data from Qualtrics surveys of Amazon Mechanical Turk (MTurk) crowd workers residing in New Jersey and New York State across 2 sampling periods: April 11-14 and May 8-11, 2020. The surveys queried the subjects on household health status and demographic characteristics. We constructed a set of possible causal models of household infection and survey selection mechanisms and ranked them by compatibility with the collected survey data. The most compatible causal model was then used to estimate the cumulative infection rate in each survey period. RESULTS: There were 527 and 513 responses collected for the 2 periods, respectively. Response demographics were highly skewed toward a younger age in both survey periods. Despite the extremely strong relationship between age and COVID-19 symptoms, we recovered minimally biased estimates of the cumulative infection rate using only primary data and the most compatible causal model, with a relative bias of +3.8% and -1.9% from the reported cumulative infection rate for the first and second survey periods, respectively. CONCLUSIONS: We successfully recovered accurate estimates of the cumulative infection rate from an internet-based crowdsourced sample despite considerable selection bias and unmeasured confounding in the primary data. This implementation demonstrates how simple applications of structural causal modeling can be effectively used to determine falsifiable model conditions, detect selection bias and confounding factors, and minimize estimate bias through model selection in a novel epidemiological context. As the disease and social dynamics of COVID-19 continue to evolve, public health surveillance protocols must continue to adapt; the emergence of Omicron variants and shift to at-home testing as recent challenges. Rigorous and transparent methods to develop, deploy, and diagnosis adapted surveillance protocols will be critical to their success.


Subject(s)
COVID-19 , COVID-19/epidemiology , Confounding Factors, Epidemiologic , Humans , Internet , New York City/epidemiology , SARS-CoV-2 , Selection Bias
19.
NPJ Digit Med ; 5(1): 57, 2022 May 05.
Article in English | MEDLINE | ID: mdl-35513550

ABSTRACT

Autism spectrum disorder (ASD) can be reliably diagnosed at 18 months, yet significant diagnostic delays persist in the United States. This double-blinded, multi-site, prospective, active comparator cohort study tested the accuracy of an artificial intelligence-based Software as a Medical Device designed to aid primary care healthcare providers (HCPs) in diagnosing ASD. The Device combines behavioral features from three distinct inputs (a caregiver questionnaire, analysis of two short home videos, and an HCP questionnaire) in a gradient boosted decision tree machine learning algorithm to produce either an ASD positive, ASD negative, or indeterminate output. This study compared Device outputs to diagnostic agreement by two or more independent specialists in a cohort of 18-72-month-olds with developmental delay concerns (425 study completers, 36% female, 29% ASD prevalence). Device output PPV for all study completers was 80.8% (95% confidence intervals (CI), 70.3%-88.8%) and NPV was 98.3% (90.6%-100%). For the 31.8% of participants who received a determinate output (ASD positive or negative) Device sensitivity was 98.4% (91.6%-100%) and specificity was 78.9% (67.6%-87.7%). The Device's indeterminate output acts as a risk control measure when inputs are insufficiently granular to make a determinate recommendation with confidence. If this risk control measure were removed, the sensitivity for all study completers would fall to 51.6% (63/122) (95% CI 42.4%, 60.8%), and specificity would fall to 18.5% (56/303) (95% CI 14.3%, 23.3%). Among participants for whom the Device abstained from providing a result, specialists identified that 91% had one or more complex neurodevelopmental disorders. No significant differences in Device performance were found across participants' sex, race/ethnicity, income, or education level. For nearly a third of this primary care sample, the Device enabled timely diagnostic evaluation with a high degree of accuracy. The Device shows promise to significantly increase the number of children able to be diagnosed with ASD in a primary care setting, potentially facilitating earlier intervention and more efficient use of specialist resources.

20.
JMIR Pediatr Parent ; 5(2): e35406, 2022 Apr 14.
Article in English | MEDLINE | ID: mdl-35436234

ABSTRACT

BACKGROUND: Autism spectrum disorder (ASD) is a neurodevelopmental disorder that results in altered behavior, social development, and communication patterns. In recent years, autism prevalence has tripled, with 1 in 44 children now affected. Given that traditional diagnosis is a lengthy, labor-intensive process that requires the work of trained physicians, significant attention has been given to developing systems that automatically detect autism. We work toward this goal by analyzing audio data, as prosody abnormalities are a signal of autism, with affected children displaying speech idiosyncrasies such as echolalia, monotonous intonation, atypical pitch, and irregular linguistic stress patterns. OBJECTIVE: We aimed to test the ability for machine learning approaches to aid in detection of autism in self-recorded speech audio captured from children with ASD and neurotypical (NT) children in their home environments. METHODS: We considered three methods to detect autism in child speech: (1) random forests trained on extracted audio features (including Mel-frequency cepstral coefficients); (2) convolutional neural networks trained on spectrograms; and (3) fine-tuned wav2vec 2.0-a state-of-the-art transformer-based speech recognition model. We trained our classifiers on our novel data set of cellphone-recorded child speech audio curated from the Guess What? mobile game, an app designed to crowdsource videos of children with ASD and NT children in a natural home environment. RESULTS: The random forest classifier achieved 70% accuracy, the fine-tuned wav2vec 2.0 model achieved 77% accuracy, and the convolutional neural network achieved 79% accuracy when classifying children's audio as either ASD or NT. We used 5-fold cross-validation to evaluate model performance. CONCLUSIONS: Our models were able to predict autism status when trained on a varied selection of home audio clips with inconsistent recording qualities, which may be more representative of real-world conditions. The results demonstrate that machine learning methods offer promise in detecting autism automatically from speech without specialized equipment.

SELECTION OF CITATIONS
SEARCH DETAIL
...