Search | VHL Regional Portal

The Stanford Medicine data science ecosystem for clinical and translational research.

Callahan, Alison; Ashley, Euan; Datta, Somalee; Desai, Priyamvada; Ferris, Todd A; Fries, Jason A; Halaas, Michael; Langlotz, Curtis P; Mackey, Sean; Posada, José D; Pfeffer, Michael A; Shah, Nigam H.

JAMIA Open ; 6(3): ooad054, 2023 Oct.

Article in English | MEDLINE | ID: mdl-37545984

ABSTRACT

Objective: To describe the infrastructure, tools, and services developed at Stanford Medicine to maintain its data science ecosystem and research patient data repository for clinical and translational research. Materials and Methods: The data science ecosystem, dubbed the Stanford Data Science Resources (SDSR), includes infrastructure and tools to create, search, retrieve, and analyze patient data, as well as services for data deidentification, linkage, and processing to extract high-value information from healthcare IT systems. Data are made available via self-service and concierge access, on HIPAA compliant secure computing infrastructure supported by in-depth user training. Results: The Stanford Medicine Research Data Repository (STARR) functions as the SDSR data integration point, and includes electronic medical records, clinical images, text, bedside monitoring data and HL7 messages. SDSR tools include tools for electronic phenotyping, cohort building, and a search engine for patient timelines. The SDSR supports patient data collection, reproducible research, and teaching using healthcare data, and facilitates industry collaborations and large-scale observational studies. Discussion: Research patient data repositories and their underlying data science infrastructure are essential to realizing a learning health system and advancing the mission of academic medical centers. Challenges to maintaining the SDSR include ensuring sufficient financial support while providing researchers and clinicians with maximal access to data and digital infrastructure, balancing tool development with user training, and supporting the diverse needs of users. Conclusion: Our experience maintaining the SDSR offers a case study for academic medical centers developing data science and research informatics infrastructure.

A scalable, secure, and interoperable platform for deep data-driven health management.

Bahmani, Amir; Alavi, Arash; Buergel, Thore; Upadhyayula, Sushil; Wang, Qiwen; Ananthakrishnan, Srinath Krishna; Alavi, Amir; Celis, Diego; Gillespie, Dan; Young, Gregory; Xing, Ziye; Nguyen, Minh Hoang Huynh; Haque, Audrey; Mathur, Ankit; Payne, Josh; Mazaheri, Ghazal; Li, Jason Kenichi; Kotipalli, Pramod; Liao, Lisa; Bhasin, Rajat; Cha, Kexin; Rolnik, Benjamin; Celli, Alessandra; Dagan-Rosenfeld, Orit; Higgs, Emily; Zhou, Wenyu; Berry, Camille Lauren; Van Winkle, Katherine Grace; Contrepois, Kévin; Ray, Utsab; Bettinger, Keith; Datta, Somalee; Li, Xiao; Snyder, Michael P.

Nat Commun ; 12(1): 5757, 2021 10 01.

Article in English | MEDLINE | ID: mdl-34599181

ABSTRACT

The large amount of biomedical data derived from wearable sensors, electronic health records, and molecular profiling (e.g., genomics data) is rapidly transforming our healthcare systems. The increasing scale and scope of biomedical data not only is generating enormous opportunities for improving health outcomes but also raises new challenges ranging from data acquisition and storage to data analysis and utilization. To meet these challenges, we developed the Personal Health Dashboard (PHD), which utilizes state-of-the-art security and scalability technologies to provide an end-to-end solution for big biomedical data analytics. The PHD platform is an open-source software framework that can be easily configured and deployed to any big data health project to store, organize, and process complex biomedical data sets, support real-time data analysis at both the individual level and the cohort level, and ensure participant privacy at every step. In addition to presenting the system, we illustrate the use of the PHD framework for large-scale applications in emerging multi-omics disease studies, such as collecting and visualization of diverse data types (wearable, clinical, omics) at a personal level, investigation of insulin resistance, and an infrastructure for the detection of presymptomatic COVID-19.

Subject(s)

Data Science/methods , Medical Records Systems, Computerized , Big Data , Computer Security , Data Analysis , Health Information Interoperability , Humans , Information Storage and Retrieval , Software

Benchmarking workflows to assess performance and suitability of germline variant calling pipelines in clinical diagnostic assays.

Krishnan, Vandhana; Utiramerur, Sowmithri; Ng, Zena; Datta, Somalee; Snyder, Michael P; Ashley, Euan A.

BMC Bioinformatics ; 22(1): 85, 2021 Feb 24.

Article in English | MEDLINE | ID: mdl-33627090

ABSTRACT

BACKGROUND: Benchmarking the performance of complex analytical pipelines is an essential part of developing Lab Developed Tests (LDT). Reference samples and benchmark calls published by Genome in a Bottle (GIAB) consortium have enabled the evaluation of analytical methods. The performance of such methods is not uniform across the different genomic regions of interest and variant types. Several benchmarking methods such as hap.py, vcfeval, and vcflib are available to assess the analytical performance characteristics of variant calling algorithms. However, assessing the performance characteristics of an overall LDT assay still requires stringing together several such methods and experienced bioinformaticians to interpret the results. In addition, these methods are dependent on the hardware, operating system and other software libraries, making it impossible to reliably repeat the analytical assessment, when any of the underlying dependencies change in the assay. Here we present a scalable and reproducible, cloud-based benchmarking workflow that is independent of the laboratory and the technician executing the workflow, or the underlying compute hardware used to rapidly and continually assess the performance of LDT assays, across their regions of interest and reportable range, using a broad set of benchmarking samples. RESULTS: The benchmarking workflow was used to evaluate the performance characteristics for secondary analysis pipelines commonly used by Clinical Genomics laboratories in their LDT assays such as the GATK HaplotypeCaller v3.7 and the SpeedSeq workflow based on FreeBayes v0.9.10. Five reference sample truth sets generated by Genome in a Bottle (GIAB) consortium, six samples from the Personal Genome Project (PGP) and several samples with validated clinically relevant variants from the Centers for Disease Control were used in this work. The performance characteristics were evaluated and compared for multiple reportable ranges, such as whole exome and the clinical exome. CONCLUSIONS: We have implemented a benchmarking workflow for clinical diagnostic laboratories that generates metrics such as specificity, precision and sensitivity for germline SNPs and InDels within a reportable range using whole exome or genome sequencing data. Combining these benchmarking results with validation using known variants of clinical significance in publicly available cell lines, we were able to establish the performance of variant calling pipelines in a clinical setting.

Subject(s)

Benchmarking , High-Throughput Nucleotide Sequencing , Exome , Germ Cells , Polymorphism, Single Nucleotide , Software , Workflow

Cloud-based interactive analytics for terabytes of genomic variants data.

Pan, Cuiping; McInnes, Gregory; Deflaux, Nicole; Snyder, Michael; Bingham, Jonathan; Datta, Somalee; Tsao, Philip S.

Bioinformatics ; 33(23): 3709-3715, 2017 Dec 01.

Article in English | MEDLINE | ID: mdl-28961771

ABSTRACT

MOTIVATION: Large scale genomic sequencing is now widely used to decipher questions in diverse realms such as biological function, human diseases, evolution, ecosystems, and agriculture. With the quantity and diversity these data harbor, a robust and scalable data handling and analysis solution is desired. RESULTS: We present interactive analytics using a cloud-based columnar database built on Dremel to perform information compression, comprehensive quality controls, and biological information retrieval in large volumes of genomic data. We demonstrate such Big Data computing paradigms can provide orders of magnitude faster turnaround for common genomic analyses, transforming long-running batch jobs submitted via a Linux shell into questions that can be asked from a web browser in seconds. Using this method, we assessed a study population of 475 deeply sequenced human genomes for genomic call rate, genotype and allele frequency distribution, variant density across the genome, and pharmacogenomic information. AVAILABILITY AND IMPLEMENTATION: Our analysis framework is implemented in Google Cloud Platform and BigQuery. Codes are available at https://github.com/StanfordBioinformatics/mvp_aaa_codelabs. CONTACT: cuiping@stanford.edu or ptsao@stanford.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Genetic Variation , Genomics/methods , Data Compression , Databases, Nucleic Acid , Gene Frequency , Genome, Human , Genotype , Humans , Software , Web Browser

Digital Health: Tracking Physiomes and Activity Using Wearable Biosensors Reveals Useful Health-Related Information.

Li, Xiao; Dunn, Jessilyn; Salins, Denis; Zhou, Gao; Zhou, Wenyu; Schüssler-Fiorenza Rose, Sophia Miryam; Perelman, Dalia; Colbert, Elizabeth; Runge, Ryan; Rego, Shannon; Sonecha, Ria; Datta, Somalee; McLaughlin, Tracey; Snyder, Michael P.

PLoS Biol ; 15(1): e2001402, 2017 01.

Article in English | MEDLINE | ID: mdl-28081144

ABSTRACT

A new wave of portable biosensors allows frequent measurement of health-related physiology. We investigated the use of these devices to monitor human physiological changes during various activities and their role in managing health and diagnosing and analyzing disease. By recording over 250,000 daily measurements for up to 43 individuals, we found personalized circadian differences in physiological parameters, replicating previous physiological findings. Interestingly, we found striking changes in particular environments, such as airline flights (decreased peripheral capillary oxygen saturation [SpO2] and increased radiation exposure). These events are associated with physiological macro-phenotypes such as fatigue, providing a strong association between reduced pressure/oxygen and fatigue on high-altitude flights. Importantly, we combined biosensor information with frequent medical measurements and made two important observations: First, wearable devices were useful in identification of early signs of Lyme disease and inflammatory responses; we used this information to develop a personalized, activity-based normalization framework to identify abnormal physiological signals from longitudinal data for facile disease detection. Second, wearables distinguish physiological differences between insulin-sensitive and -resistant individuals. Overall, these results indicate that portable biosensors provide useful information for monitoring personal activities and physiology and are likely to play an important role in managing health and enabling affordable health care access to groups traditionally limited by socioeconomic class or remote geography.

Subject(s)

Biosensing Techniques , Electronics, Medical , Health , Patient-Specific Modeling , Circadian Rhythm/physiology , Electronics, Medical/instrumentation , Humans , Inflammation/diagnosis , Insulin/metabolism , Insulin Resistance , Oxygen/metabolism , Partial Pressure , Precision Medicine , Radiation , Reproducibility of Results

Corrigendum: Secure cloud computing for genomic data.

Datta, Somalee; Bettinger, Keith; Snyder, Michael.

Nat Biotechnol ; 34(10): 1072, 2016 10 11.

Article in English | MEDLINE | ID: mdl-27727225

Secure cloud computing for genomic data.

Datta, Somalee; Bettinger, Keith; Snyder, Michael.

Nat Biotechnol ; 34(6): 588-91, 2016 06 09.

Article in English | MEDLINE | ID: mdl-27281411

Subject(s)

Cloud Computing , Computer Security , Confidentiality , Databases, Genetic , Information Dissemination/methods , Information Storage and Retrieval/methods

Sequence to Medical Phenotypes: A Framework for Interpretation of Human Whole Genome DNA Sequence Data.

Dewey, Frederick E; Grove, Megan E; Priest, James R; Waggott, Daryl; Batra, Prag; Miller, Clint L; Wheeler, Matthew; Zia, Amin; Pan, Cuiping; Karzcewski, Konrad J; Miyake, Christina; Whirl-Carrillo, Michelle; Klein, Teri E; Datta, Somalee; Altman, Russ B; Snyder, Michael; Quertermous, Thomas; Ashley, Euan A.

PLoS Genet ; 11(10): e1005496, 2015 Oct.

Article in English | MEDLINE | ID: mdl-26448358

ABSTRACT

High throughput sequencing has facilitated a precipitous drop in the cost of genomic sequencing, prompting predictions of a revolution in medicine via genetic personalization of diagnostic and therapeutic strategies. There are significant barriers to realizing this goal that are related to the difficult task of interpreting personal genetic variation. A comprehensive, widely accessible application for interpretation of whole genome sequence data is needed. Here, we present a series of methods for identification of genetic variants and genotypes with clinical associations, phasing genetic data and using Mendelian inheritance for quality control, and providing predictive genetic information about risk for rare disease phenotypes and response to pharmacological therapy in single individuals and father-mother-child trios. We demonstrate application of these methods for disease and drug response prognostication in whole genome sequence data from twelve unrelated adults, and for disease gene discovery in one father-mother-child trio with apparently simplex congenital ventricular arrhythmia. In doing so we identify clinically actionable inherited disease risk and drug response genotypes in pre-symptomatic individuals. We also nominate a new candidate gene in congenital arrhythmia, ATP2B4, and provide experimental evidence of a regulatory role for variants discovered using this framework.

Subject(s)

Arrhythmias, Cardiac/genetics , Genetic Predisposition to Disease , Plasma Membrane Calcium-Transporting ATPases/genetics , Sequence Analysis, DNA , Arrhythmias, Cardiac/pathology , Base Sequence , Chromosome Mapping , Genetic Variation , Genome, Human , Genotype , Humans , Phenotype

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL