Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 37
Filter
Add more filters










Publication year range
1.
bioRxiv ; 2024 Mar 13.
Article in English | MEDLINE | ID: mdl-38559260

ABSTRACT

Accurate identification of germline de novo variants (DNVs) remains a challenging problem despite rapid advances in sequencing technologies as well as methods for the analysis of the data they generate, with putative solutions often involving ad hoc filters and visual inspection of identified variants. Here, we present a purely informatic method for the identification of DNVs by analyzing short-read genome sequencing data from proband-parent trios. Our method evaluates variant calls generated by three genome sequence analysis pipelines utilizing different algorithms-GATK HaplotypeCaller, DeepTrio and Velsera GRAF-exploring the assumption that a requirement of consensus can serve as an effective filter for high-quality DNVs. We assessed the efficacy of our method by testing DNVs identified using a previously established, highly accurate classification procedure that partially relied on manual inspection and used Sanger sequencing to validate a DNV subset comprising less confident calls. The results show that our method is highly precise and that applying a force-calling procedure to putative variants further removes false-positive calls, increasing precision of the workflow to 99.6%. Our method also identified novel DNVs, 87% of which were validated, indicating it offers a higher recall rate without compromising accuracy. We have implemented this method as an automated bioinformatics workflow suitable for large-scale analyses without need for manual intervention.

2.
Cancer Res ; 84(9): 1396-1403, 2024 May 02.
Article in English | MEDLINE | ID: mdl-38488504

ABSTRACT

The NCI's Cloud Resources (CR) are the analytical components of the Cancer Research Data Commons (CRDC) ecosystem. This review describes how the three CRs (Broad Institute FireCloud, Institute for Systems Biology Cancer Gateway in the Cloud, and Seven Bridges Cancer Genomics Cloud) provide access and availability to large, cloud-hosted, multimodal cancer datasets, as well as offer tools and workspaces for performing data analysis where the data resides, without download or storage. In addition, users can upload their own data and tools into their workspaces, allowing researchers to create custom analysis workflows and integrate CRDC-hosted data with their own. See related articles by Brady et al., p. 1384, Wang et al., p. 1388, and Kim et al., p. 1404.


Subject(s)
Cloud Computing , National Cancer Institute (U.S.) , Neoplasms , Humans , Neoplasms/genetics , United States , Biomedical Research , Genomics/methods , Computational Biology/methods
3.
Cancer Res ; 84(9): 1404-1409, 2024 May 02.
Article in English | MEDLINE | ID: mdl-38488510

ABSTRACT

More than ever, scientific progress in cancer research hinges on our ability to combine datasets and extract meaningful interpretations to better understand diseases and ultimately inform the development of better treatments and diagnostic tools. To enable the successful sharing and use of big data, the NCI developed the Cancer Research Data Commons (CRDC), providing access to a large, comprehensive, and expanding collection of cancer data. The CRDC is a cloud-based data science infrastructure that eliminates the need for researchers to download and store large-scale datasets by allowing them to perform analysis where data reside. Over the past 10 years, the CRDC has made significant progress in providing access to data and tools along with training and outreach to support the cancer research community. In this review, we provide an overview of the history and the impact of the CRDC to date, lessons learned, and future plans to further promote data sharing, accessibility, interoperability, and reuse. See related articles by Brady et al., p. 1384, Wang et al., p. 1388, and Pot et al., p. 1396.


Subject(s)
Information Dissemination , National Cancer Institute (U.S.) , Neoplasms , Humans , United States , Neoplasms/therapy , Information Dissemination/methods , Biomedical Research/trends , Databases, Factual , Big Data
5.
J Am Med Inform Assoc ; 30(7): 1293-1300, 2023 06 20.
Article in English | MEDLINE | ID: mdl-37192819

ABSTRACT

Research increasingly relies on interrogating large-scale data resources. The NIH National Heart, Lung, and Blood Institute developed the NHLBI BioData CatalystⓇ (BDC), a community-driven ecosystem where researchers, including bench and clinical scientists, statisticians, and algorithm developers, find, access, share, store, and compute on large-scale datasets. This ecosystem provides secure, cloud-based workspaces, user authentication and authorization, search, tools and workflows, applications, and new innovative features to address community needs, including exploratory data analysis, genomic and imaging tools, tools for reproducibility, and improved interoperability with other NIH data science platforms. BDC offers straightforward access to large-scale datasets and computational resources that support precision medicine for heart, lung, blood, and sleep conditions, leveraging separately developed and managed platforms to maximize flexibility based on researcher needs, expertise, and backgrounds. Through the NHLBI BioData Catalyst Fellows Program, BDC facilitates scientific discoveries and technological advances. BDC also facilitated accelerated research on the coronavirus disease-2019 (COVID-19) pandemic.


Subject(s)
COVID-19 , Cloud Computing , Humans , Ecosystem , Reproducibility of Results , Lung , Software
6.
Nat Commun ; 13(1): 4384, 2022 08 04.
Article in English | MEDLINE | ID: mdl-35927245

ABSTRACT

Graph-based genome reference representations have seen significant development, motivated by the inadequacy of the current human genome reference to represent the diverse genetic information from different human populations and its inability to maintain the same level of accuracy for non-European ancestries. While there have been many efforts to develop computationally efficient graph-based toolkits for NGS read alignment and variant calling, methods to curate genomic variants and subsequently construct genome graphs remain an understudied problem that inevitably determines the effectiveness of the overall bioinformatics pipeline. In this study, we discuss obstacles encountered during graph construction and propose methods for sample selection based on population diversity, graph augmentation with structural variants and resolution of graph reference ambiguity caused by information overload. Moreover, we present the case for iteratively augmenting tailored genome graphs for targeted populations and demonstrate this approach on the whole-genome samples of African ancestry. Our results show that population-specific graphs, as more representative alternatives to linear or generic graph references, can achieve significantly lower read mapping errors and enhanced variant calling sensitivity, in addition to providing the improvements of joint variant calling without the need of computationally intensive post-processing steps.


Subject(s)
Data Analysis , High-Throughput Nucleotide Sequencing , Genome, Human/genetics , Genomics/methods , Humans , Sequence Analysis, DNA/methods , Software
7.
NAR Cancer ; 4(2): zcac014, 2022 Jun.
Article in English | MEDLINE | ID: mdl-35475145

ABSTRACT

We created the PDX Network (PDXNet) portal (https://portal.pdxnetwork.org/) to centralize access to the National Cancer Institute-funded PDXNet consortium resources, to facilitate collaboration among researchers and to make these data easily available for research. The portal includes sections for resources, analysis results, metrics for PDXNet activities, data processing protocols and training materials for processing PDX data. Currently, the portal contains PDXNet model information and data resources from 334 new models across 33 cancer types. Tissue samples of these models were deposited in the NCI's Patient-Derived Model Repository (PDMR) for public access. These models have 2134 associated sequencing files from 873 samples across 308 patients, which are hosted on the Cancer Genomics Cloud powered by Seven Bridges and the NCI Cancer Data Service for long-term storage and access with dbGaP permissions. The portal includes results from freely available, robust, validated and standardized analysis workflows on PDXNet sequencing files and PDMR data (3857 samples from 629 patients across 85 disease types). The PDXNet portal is continuously updated with new data and is of significant utility to the cancer research community as it provides a centralized location for PDXNet resources, which support multi-agent treatment studies, determination of sensitivity and resistance mechanisms, and preclinical trials.

9.
Nat Commun ; 12(1): 5086, 2021 08 24.
Article in English | MEDLINE | ID: mdl-34429404

ABSTRACT

Development of candidate cancer treatments is a resource-intensive process, with the research community continuing to investigate options beyond static genomic characterization. Toward this goal, we have established the genomic landscapes of 536 patient-derived xenograft (PDX) models across 25 cancer types, together with mutation, copy number, fusion, transcriptomic profiles, and NCI-MATCH arms. Compared with human tumors, PDXs typically have higher purity and fit to investigate dynamic driver events and molecular properties via multiple time points from same case PDXs. Here, we report on dynamic genomic landscapes and pharmacogenomic associations, including associations between activating oncogenic events and drugs, correlations between whole-genome duplications and subclone events, and the potential PDX models for NCI-MATCH trials. Lastly, we provide a web portal having comprehensive pan-cancer PDX genomic profiles and source code to facilitate identification of more druggable events and further insights into PDXs' recapitulation of human tumors.


Subject(s)
Heterografts , Neoplasms/genetics , Neoplasms/metabolism , Xenograft Model Antitumor Assays , Animals , Disease Models, Animal , Female , Gene Expression Regulation, Neoplastic , Genome , Genomics , Humans , Male , Mice , Models, Biological , Mutation , Transcriptome
11.
Nat Genet ; 53(1): 86-99, 2021 01.
Article in English | MEDLINE | ID: mdl-33414553

ABSTRACT

Patient-derived xenografts (PDXs) are resected human tumors engrafted into mice for preclinical studies and therapeutic testing. It has been proposed that the mouse host affects tumor evolution during PDX engraftment and propagation, affecting the accuracy of PDX modeling of human cancer. Here, we exhaustively analyze copy number alterations (CNAs) in 1,451 PDX and matched patient tumor (PT) samples from 509 PDX models. CNA inferences based on DNA sequencing and microarray data displayed substantially higher resolution and dynamic range than gene expression-based inferences, and they also showed strong CNA conservation from PTs through late-passage PDXs. CNA recurrence analysis of 130 colorectal and breast PT/PDX-early/PDX-late trios confirmed high-resolution CNA retention. We observed no significant enrichment of cancer-related genes in PDX-specific CNAs across models. Moreover, CNA differences between patient and PDX tumors were comparable to variations in multiregion samples within patients. Our study demonstrates the lack of systematic copy number evolution driven by the PDX mouse host.


Subject(s)
DNA Copy Number Variations/genetics , Xenograft Model Antitumor Assays , Animals , Databases, Genetic , Gene Expression Regulation, Neoplastic , Humans , Mice , Neoplasm Metastasis , Polymorphism, Single Nucleotide/genetics , Exome Sequencing
12.
Nat Neurosci ; 22(2): 167-179, 2019 02.
Article in English | MEDLINE | ID: mdl-30643292

ABSTRACT

The findings that amyotrophic lateral sclerosis (ALS) patients almost universally display pathological mislocalization of the RNA-binding protein TDP-43 and that mutations in its gene cause familial ALS have nominated altered RNA metabolism as a disease mechanism. However, the RNAs regulated by TDP-43 in motor neurons and their connection to neuropathy remain to be identified. Here we report transcripts whose abundances in human motor neurons are sensitive to TDP-43 depletion. Notably, expression of STMN2, which encodes a microtubule regulator, declined after TDP-43 knockdown and TDP-43 mislocalization as well as in patient-specific motor neurons and postmortem patient spinal cord. STMN2 loss upon reduced TDP-43 function was due to altered splicing, which is functionally important, as we show STMN2 is necessary for normal axonal outgrowth and regeneration. Notably, post-translational stabilization of STMN2 rescued neurite outgrowth and axon regeneration deficits induced by TDP-43 depletion. We propose that restoring STMN2 expression warrants examination as a therapeutic strategy for ALS.


Subject(s)
Amyotrophic Lateral Sclerosis/metabolism , DNA-Binding Proteins/metabolism , Membrane Proteins/metabolism , Motor Neurons/metabolism , Axons/metabolism , Cell Line , Down-Regulation , Female , Humans , Induced Pluripotent Stem Cells , Male , Spinal Cord/metabolism , Stathmin
13.
Methods Mol Biol ; 1878: 39-64, 2019.
Article in English | MEDLINE | ID: mdl-30378068

ABSTRACT

The Seven Bridges Cancer Genomics Cloud (CGC) is part of the National Cancer Institute Cloud Resource project, which was created to explore the paradigm of co-locating massive datasets with the computational resources to analyze them. The CGC was designed to allow researchers to easily find the data they need and analyze it with robust applications in a scalable and reproducible fashion. To enable this, individual tools are packaged within Docker containers and described by the Common Workflow Language (CWL), an emerging standard for enabling reproducible data analysis. On the CGC, researchers can deploy individual tools and customize massive workflows by chaining together tools. Here, we discuss a case study in which RNA sequencing data is analyzed with different methods and compared on the Seven Bridges CGC. We highlight best practices for designing command line tools, Docker containers, and CWL descriptions to enable massively parallelized and reproducible biomedical computation with cloud resources.


Subject(s)
Neoplasms/genetics , RNA/genetics , Cell Line, Tumor , Computational Biology/methods , Genomics/methods , Humans , Sequence Analysis, RNA/methods , Software , Workflow
14.
Cancer Inform ; 17: 1176935118774787, 2018.
Article in English | MEDLINE | ID: mdl-30283230

ABSTRACT

Increased efforts in cancer genomics research and bioinformatics are producing tremendous amounts of data. These data are diverse in origin, format, and content. As the amount of available sequencing data increase, technologies that make them discoverable and usable are critically needed. In response, we have developed a Semantic Web-based Data Browser, a tool allowing users to visually build and execute ontology-driven queries. This approach simplifies access to available data and improves the process of using them in analyses on the Seven Bridges Cancer Genomics Cloud (CGC; www.cancergenomicscloud.org). The Data Browser makes large data sets easily explorable and simplifies the retrieval of specific data of interest. Although initially implemented on top of The Cancer Genome Atlas (TCGA) data set, the Data Browser's architecture allows for seamless integration of other data sets. By deploying it on the CGC, we have enabled remote researchers to access data and perform collaborative investigations.

15.
Development ; 145(22)2018 11 21.
Article in English | MEDLINE | ID: mdl-30337375

ABSTRACT

Advances in stem cell science allow the production of different cell types in vitro either through the recapitulation of developmental processes, often termed 'directed differentiation', or the forced expression of lineage-specific transcription factors. Although cells produced by both approaches are increasingly used in translational applications, their quantitative similarity to their primary counterparts remains largely unresolved. To investigate the similarity between in vitro-derived and primary cell types, we harvested and purified mouse spinal motor neurons and compared them with motor neurons produced by transcription factor-mediated lineage conversion of fibroblasts or directed differentiation of pluripotent stem cells. To enable unbiased analysis of these motor neuron types and their cells of origin, we then subjected them to whole transcriptome and DNA methylome analysis by RNA sequencing (RNA-seq) and reduced representation bisulfite sequencing (RRBS). Despite major differences in methodology, lineage conversion and directed differentiation both produce cells that closely approximate the primary motor neuron state. However, we identify differences in Fas signaling, the Hox code and synaptic gene expression between lineage-converted and directed differentiation motor neurons that affect their utility in translational studies.


Subject(s)
Cell Lineage/genetics , Embryo, Mammalian/cytology , Genomics , Motor Neurons/cytology , Pluripotent Stem Cells/cytology , Animals , Epigenesis, Genetic , Mice, Inbred C57BL , Motor Neurons/metabolism , Pluripotent Stem Cells/metabolism , Transcription, Genetic
16.
Curr Protoc Bioinformatics ; 60: 11.16.1-11.16.32, 2017 12 08.
Article in English | MEDLINE | ID: mdl-29220078

ABSTRACT

Next-generation sequencing has produced petabytes of data, but accessing and analyzing these data remain challenging. Traditionally, researchers investigating public datasets like The Cancer Genome Atlas (TCGA) would download the data to a high-performance cluster, which could take several weeks even with a highly optimized network connection. The National Cancer Institute (NCI) initiated the Cancer Genomics Cloud Pilots program to provide researchers with the resources to process data with cloud computational resources. We present protocols using one of these Cloud Pilots, the Seven Bridges Cancer Genomics Cloud (CGC), to find and query public datasets, bring your own data to the CGC, analyze data using standard or custom workflows, and benchmark tools for accuracy with interactive analysis features. These protocols demonstrate that the CGC is a data-analysis ecosystem that fully empowers researchers with a variety of areas of expertise and interests to collaborate in the analysis of petabytes of data. © 2017 by John Wiley & Sons, Inc.


Subject(s)
Databases, Genetic/statistics & numerical data , Neoplasms/genetics , Cloud Computing , Computational Biology , Data Interpretation, Statistical , Genomics , High-Throughput Nucleotide Sequencing , Humans , Metadata , Pilot Projects
17.
Cancer Res ; 77(21): e3-e6, 2017 11 01.
Article in English | MEDLINE | ID: mdl-29092927

ABSTRACT

The Seven Bridges Cancer Genomics Cloud (CGC; www.cancergenomicscloud.org) enables researchers to rapidly access and collaborate on massive public cancer genomic datasets, including The Cancer Genome Atlas. It provides secure on-demand access to data, analysis tools, and computing resources. Researchers from diverse backgrounds can easily visualize, query, and explore cancer genomic datasets visually or programmatically. Data of interest can be immediately analyzed in the cloud using more than 200 preinstalled, curated bioinformatics tools and workflows. Researchers can also extend the functionality of the platform by adding their own data and tools via an intuitive software development kit. By colocalizing these resources in the cloud, the CGC enables scalable, reproducible analyses. Researchers worldwide can use the CGC to investigate key questions in cancer genomics. Cancer Res; 77(21); e3-6. ©2017 AACR.


Subject(s)
Computational Biology , Genomics , Neoplasms/genetics , Genome, Human , Humans , Internet , Research , Software
18.
Pac Symp Biocomput ; 22: 154-165, 2017.
Article in English | MEDLINE | ID: mdl-27896971

ABSTRACT

As biomedical data has become increasingly easy to generate in large quantities, the methods used to analyze it have proliferated rapidly. Reproducible and reusable methods are required to learn from large volumes of data reliably. To address this issue, numerous groups have developed workflow specifications or execution engines, which provide a framework with which to perform a sequence of analyses. One such specification is the Common Workflow Language, an emerging standard which provides a robust and flexible framework for describing data analysis tools and workflows. In addition, reproducibility can be furthered by executors or workflow engines which interpret the specification and enable additional features, such as error logging, file organization, optim1izations to computation and job scheduling, and allow for easy computing on large volumes of data. To this end, we have developed the Rabix Executor, an open-source workflow engine for the purposes of improving reproducibility through reusability and interoperability of workflow descriptions.


Subject(s)
Software , Workflow , Computational Biology , Humans , Models, Statistical , Reproducibility of Results
19.
Methods Mol Biol ; 1381: 223-37, 2016.
Article in English | MEDLINE | ID: mdl-26667464

ABSTRACT

Chromosomal rearrangements resulting in the creation of novel gene products, termed fusion genes, have been identified as driving events in the development of multiple types of cancer. As these gene products typically do not exist in normal cells, they represent valuable prognostic and therapeutic targets. Advances in next-generation sequencing and computational approaches have greatly improved our ability to detect and identify fusion genes. Nevertheless, these approaches require significant computational resources. Here we describe an approach which leverages cloud computing technologies to perform fusion gene detection from RNA sequencing data at any scale. We additionally highlight methods to enhance reproducibility of bioinformatics analyses which may be applied to any next-generation sequencing experiment.


Subject(s)
Cloud Computing , Gene Fusion , Genomics/methods , High-Throughput Nucleotide Sequencing/methods , Sequence Analysis, RNA/methods , Humans , Neoplasms/genetics , RNA/genetics , Reproducibility of Results
20.
Sci Transl Med ; 6(248): 248ra104, 2014 Aug 06.
Article in English | MEDLINE | ID: mdl-25100738

ABSTRACT

Neurons produced from stem cells have emerged as a tool to identify new therapeutic targets for neurological diseases such as amyotrophic lateral sclerosis (ALS). However, it remains unclear to what extent these new mechanistic insights will translate to animal models, an important step in the validation of new targets. Previously, we found that glia from mice carrying the SOD1G93A mutation, a model of ALS, were toxic to stem cell-derived human motor neurons. We use pharmacological and genetic approaches to demonstrate that the prostanoid receptor DP1 mediates this glial toxicity. Furthermore, we validate the importance of this mechanism for neural degeneration in vivo. Genetic ablation of DP1 in SOD1G93A mice extended life span, decreased microglial activation, and reduced motor neuron loss. Our findings suggest that blocking DP1 may be a therapeutic strategy in ALS and demonstrate that discoveries from stem cell models of disease can be corroborated in vivo.


Subject(s)
Amyotrophic Lateral Sclerosis/genetics , Amyotrophic Lateral Sclerosis/therapy , Disease Models, Animal , Molecular Targeted Therapy , Animals , Coculture Techniques , Cytoprotection , Disease Progression , Humans , Longevity , Mice , Mice, Transgenic , Motor Neurons/metabolism , Motor Neurons/pathology , Mutation/genetics , Neuroglia/pathology , Receptors, Prostaglandin/metabolism , Reproducibility of Results
SELECTION OF CITATIONS
SEARCH DETAIL
...