Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 48
Filter
1.
BMC Genomics ; 24(1): 349, 2023 Jun 26.
Article in English | MEDLINE | ID: mdl-37365517

ABSTRACT

T cell receptor repertoires can be profiled using next generation sequencing (NGS) to measure and monitor adaptive dynamical changes in response to disease and other perturbations. Genomic DNA-based bulk sequencing is cost-effective but necessitates multiplex target amplification using multiple primer pairs with highly variable amplification efficiencies. Here, we utilize an equimolar primer mixture and propose a single statistical normalization step that efficiently corrects for amplification bias post sequencing. Using samples analyzed by both our open protocol and a commercial solution, we show high concordance between bulk clonality metrics. This approach is an inexpensive and open-source alternative to commercial solutions.


Subject(s)
High-Throughput Nucleotide Sequencing , T-Lymphocytes , Base Sequence , Chromosome Mapping , High-Throughput Nucleotide Sequencing/methods , Receptors, Antigen, T-Cell, alpha-beta/genetics
2.
Res Sq ; 2023 Feb 16.
Article in English | MEDLINE | ID: mdl-36824803

ABSTRACT

T cell receptor repertoires can be profiled using next generation sequencing (NGS) to measure and monitor adaptive dynamical changes in response to disease and other perturbations. Genomic DNA-based bulk sequencing is cost-effective but necessitates multiplex target amplification using multiple primer pairs with highly variable amplification efficiencies. Here, we utilize an equimolar primer mixture and propose a single statistical normalization step that efficiently corrects for amplification bias post sequencing. Using samples analyzed by both our open protocol and a commercial solution, we show high concordance between bulk clonality metrics. This approach is an inexpensive and open-source alternative to commercial solutions.

3.
Bioinformatics ; 37(13): 1912-1914, 2021 07 27.
Article in English | MEDLINE | ID: mdl-33051644

ABSTRACT

MOTIVATION: Despite widespread prevalence of somatic structural variations (SVs) across most tumor types, understanding of their molecular implications often remains poor. SVs are extremely heterogeneous in size and complexity, hindering the interpretation of their pathogenic role. Tools integrating large SV datasets across platforms are required to fully characterize the cancer's somatic landscape. RESULTS: svpluscnv R package is a swiss army knife for the integration and interpretation of orthogonal datasets including copy number variant segmentation profiles and sequencing-based structural variant calls. The package implements analysis and visualization tools to evaluate chromosomal instability and ploidy, identify genes harboring recurrent SVs and detects complex rearrangements such as chromothripsis and chromoplexia. Further, it allows systematic identification of hot-spot shattered genomic regions, showing reproducibility across alternative detection methods and datasets. AVAILABILITY AND IMPLEMENTATION: https://github.com/ccbiolab/svpluscnv. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genome , Genomics , DNA Copy Number Variations , Genomic Structural Variation , Humans , Reproducibility of Results , Sequence Analysis , Software
4.
Nat Genet ; 52(4): 448-457, 2020 04.
Article in English | MEDLINE | ID: mdl-32246132

ABSTRACT

Precision oncology relies on accurate discovery and interpretation of genomic variants, enabling individualized diagnosis, prognosis and therapy selection. We found that six prominent somatic cancer variant knowledgebases were highly disparate in content, structure and supporting primary literature, impeding consensus when evaluating variants and their relevance in a clinical setting. We developed a framework for harmonizing variant interpretations to produce a meta-knowledgebase of 12,856 aggregate interpretations. We demonstrated large gains in overlap between resources across variants, diseases and drugs as a result of this harmonization. We subsequently demonstrated improved matching between a patient cohort and harmonized interpretations of potential clinical significance, observing an increase from an average of 33% per individual knowledgebase to 57% in aggregate. Our analyses illuminate the need for open, interoperable sharing of variant interpretation data. We also provide a freely available web interface (search.cancervariants.org) for exploring the harmonized interpretations from these six knowledgebases.


Subject(s)
Genetic Variation/genetics , Neoplasms/genetics , Databases, Genetic , Diploidy , Genomics/methods , Humans , Knowledge Bases , Precision Medicine/methods
6.
Genome Biol ; 19(1): 188, 2018 11 06.
Article in English | MEDLINE | ID: mdl-30400818

ABSTRACT

BACKGROUND: The phenotypes of cancer cells are driven in part by somatic structural variants. Structural variants can initiate tumors, enhance their aggressiveness, and provide unique therapeutic opportunities. Whole-genome sequencing of tumors can allow exhaustive identification of the specific structural variants present in an individual cancer, facilitating both clinical diagnostics and the discovery of novel mutagenic mechanisms. A plethora of somatic structural variant detection algorithms have been created to enable these discoveries; however, there are no systematic benchmarks of them. Rigorous performance evaluation of somatic structural variant detection methods has been challenged by the lack of gold standards, extensive resource requirements, and difficulties arising from the need to share personal genomic information. RESULTS: To facilitate structural variant detection algorithm evaluations, we create a robust simulation framework for somatic structural variants by extending the BAMSurgeon algorithm. We then organize and enable a crowdsourced benchmarking within the ICGC-TCGA DREAM Somatic Mutation Calling Challenge (SMC-DNA). We report here the results of structural variant benchmarking on three different tumors, comprising 204 submissions from 15 teams. In addition to ranking methods, we identify characteristic error profiles of individual algorithms and general trends across them. Surprisingly, we find that ensembles of analysis pipelines do not always outperform the best individual method, indicating a need for new ways to aggregate somatic structural variant detection approaches. CONCLUSIONS: The synthetic tumors and somatic structural variant detection leaderboards remain available as a community benchmarking resource, and BAMSurgeon is available at https://github.com/adamewing/bamsurgeon .


Subject(s)
Benchmarking , Computer Simulation , Crowdsourcing , Genetic Variation , Genome, Human , Genomics/methods , Neoplasms/genetics , Algorithms , Databases, Genetic , High-Throughput Nucleotide Sequencing , Humans , Software
7.
Article in English | MEDLINE | ID: mdl-30283195

ABSTRACT

Multiplexed imaging such as multicolor immunofluorescence staining, multiplexed immunohistochemistry (mIHC) or cyclic immunofluorescence (cycIF) enables deep assessment of cellular complexity in situ and, in conjunction with standard histology stains like hematoxylin and eosin (H&E), can help to unravel the complex molecular relationships and spatial interdependencies that undergird disease states. However, these multiplexed imaging methods are costly and can degrade both tissue quality and antigenicity with each successive cycle of staining. In addition, computationally intensive image processing such as image registration across multiple channels is required. We have developed a novel method, speedy histopathological-to-immunofluorescent translation (SHIFT) of whole slide images (WSIs) using conditional generative adversarial networks (cGANs). This approach is rooted in the assumption that specific patterns captured in IF images by stains like DAPI, pan-cytokeratin (panCK), or α-smooth muscle actin ( α-SMA) are encoded in H&E images, such that a SHIFT model can learn useful feature representations or architectural patterns in the H&E stain that help generate relevant IF stain patterns. We demonstrate that the proposed method is capable of generating realistic tumor marker IF WSIs conditioned on corresponding H&E-stained WSIs with up to 94.5% accuracy in a matter of seconds. Thus, this method has the potential to not only improve our understanding of the mapping of histological and morphological profiles into protein expression profiles, but also greatly increase the e ciency of diagnostic and prognostic decision-making.

8.
Cancer Cell ; 34(4): 561-578.e6, 2018 10 08.
Article in English | MEDLINE | ID: mdl-30300579

ABSTRACT

Complement is a critical component of humoral immunity implicated in cancer development; however, its biological contributions to tumorigenesis remain poorly understood. Using the K14-HPV16 transgenic mouse model of squamous carcinogenesis, we report that urokinase (uPA)+ macrophages regulate C3-independent release of C5a during premalignant progression, which in turn regulates protumorigenic properties of C5aR1+ mast cells and macrophages, including suppression of CD8+ T cell cytotoxicity. Therapeutic inhibition of C5aR1 via the peptide antagonist PMX-53 improved efficacy of paclitaxel chemotherapy associated with increased presence and cytotoxic properties of CXCR3+ effector memory CD8+ T cells in carcinomas, dependent on both macrophage transcriptional programming and IFNγ. Together, these data identify C5aR1-dependent signaling as an important immunomodulatory program in neoplastic tissue tractable for combinatorial cancer immunotherapy.


Subject(s)
Carcinogenesis/drug effects , Complement C5a/drug effects , Drug Therapy , Receptor, Anaphylatoxin C5a/drug effects , Animals , CD8-Positive T-Lymphocytes/drug effects , Carcinoma, Squamous Cell/drug therapy , Disease Models, Animal , Drug Therapy/methods , Humans , Macrophages/drug effects , Macrophages/physiology , Mice , Signal Transduction/drug effects
9.
BMC Bioinformatics ; 19(1): 339, 2018 Sep 25.
Article in English | MEDLINE | ID: mdl-30253747

ABSTRACT

BACKGROUND: Platform-specific error profiles necessitate confirmatory studies where predictions made on data generated using one technology are additionally verified by processing the same samples on an orthogonal technology. However, verifying all predictions can be costly and redundant, and testing a subset of findings is often used to estimate the true error profile. RESULTS: To determine how to create subsets of predictions for validation that maximize accuracy of global error profile inference, we developed Valection, a software program that implements multiple strategies for the selection of verification candidates. We evaluated these selection strategies on one simulated and two experimental datasets. CONCLUSIONS: Valection is implemented in multiple programming languages, available at: http://labs.oicr.on.ca/boutros-lab/software/valection.


Subject(s)
Sequence Analysis, DNA/methods , Software Validation
10.
Proc Natl Acad Sci U S A ; 115(21): 5462-5467, 2018 05 22.
Article in English | MEDLINE | ID: mdl-29735700

ABSTRACT

The Fbw7 (F-box/WD repeat-containing protein 7) ubiquitin ligase targets multiple oncoproteins for degradation and is commonly mutated in cancers. Like other pleiotropic tumor suppressors, Fbw7's complex biology has impeded our understanding of how Fbw7 mutations promote tumorigenesis and hindered the development of targeted therapies. To address these needs, we employed a transfer learning approach to derive gene-expression signatures from The Cancer Gene Atlas datasets that predict Fbw7 mutational status across tumor types and identified the pathways enriched within these signatures. Genes involved in mitochondrial function were highly enriched in pan-cancer signatures that predict Fbw7 mutations. Studies in isogenic colorectal cancer cell lines that differed in Fbw7 mutational status confirmed that Fbw7 mutations increase mitochondrial gene expression. Surprisingly, Fbw7 mutations shifted cellular metabolism toward oxidative phosphorylation and caused context-specific metabolic vulnerabilities. Our approach revealed unexpected metabolic reprogramming and possible therapeutic targets in Fbw7-mutant cancers and provides a framework to study other complex, oncogenic mutations.


Subject(s)
Colorectal Neoplasms/metabolism , Colorectal Neoplasms/pathology , F-Box-WD Repeat-Containing Protein 7/genetics , F-Box-WD Repeat-Containing Protein 7/metabolism , Metabolome , Mitochondria/metabolism , Mutation , Cell Respiration , Colorectal Neoplasms/genetics , Gene Expression Profiling , Humans , Mitochondria/pathology , Oxidative Phosphorylation , Oxidative Stress , Phosphorylation , Ubiquitin , Ubiquitination
11.
Clin Cancer Res ; 24(12): 2828-2843, 2018 06 15.
Article in English | MEDLINE | ID: mdl-29599409

ABSTRACT

Purpose: Head and neck squamous cell carcinoma (HNSCC) is the sixth most common cancer worldwide, with high mortality and a lack of targeted therapies. To identify and prioritize druggable targets, we performed genome analysis together with genome-scale siRNA and oncology drug profiling using low-passage tumor cells derived from a patient with treatment-resistant HPV-negative HNSCC.Experimental Design: A tumor cell culture was established and subjected to whole-exome sequencing, RNA sequencing, comparative genome hybridization, and high-throughput phenotyping with a siRNA library covering the druggable genome and an oncology drug library. Secondary screens of candidate target genes were performed on the primary tumor cells and two nontumorigenic keratinocyte cell cultures for validation and to assess cancer specificity. siRNA screens of the kinome on two isogenic pairs of p53-mutated HNSCC cell lines were used to determine generalizability. Clinical utility was addressed by performing drug screens on two additional HNSCC cell cultures derived from patients enrolled in a clinical trial.Results: Many of the identified copy number aberrations and somatic mutations in the primary tumor were typical of HPV(-) HNSCC, but none pointed to obvious therapeutic choices. In contrast, siRNA profiling identified 391 candidate target genes, 35 of which were preferentially lethal to cancer cells, most of which were not genomically altered. Chemotherapies and targeted agents with strong tumor-specific activities corroborated the siRNA profiling results and included drugs that targeted the mitotic spindle, the proteasome, and G2-M kinases WEE1 and CHK1 We also show the feasibility of ex vivo drug profiling for patients enrolled in a clinical trial.Conclusions: High-throughput phenotyping with siRNA and drug libraries using patient-derived tumor cells prioritizes mutated driver genes and identifies novel drug targets not revealed by genomic profiling. Functional profiling is a promising adjunct to DNA sequencing for precision oncology. Clin Cancer Res; 24(12); 2828-43. ©2018 AACR.


Subject(s)
Biomarkers, Tumor , Head and Neck Neoplasms/drug therapy , Molecular Targeted Therapy , Precision Medicine , Antineoplastic Agents/pharmacology , Antineoplastic Agents/therapeutic use , Biomarkers, Tumor/antagonists & inhibitors , Biomarkers, Tumor/genetics , Comparative Genomic Hybridization , Computational Biology/methods , Gene Expression Profiling , Genomics/methods , Head and Neck Neoplasms/diagnosis , Head and Neck Neoplasms/genetics , Humans , Male , Middle Aged , Molecular Targeted Therapy/methods , Mutation , Positron-Emission Tomography , Precision Medicine/methods , RNA, Small Interfering/genetics , Tomography, X-Ray Computed , Transcriptome , Exome Sequencing
12.
BMC Bioinformatics ; 19(1): 28, 2018 01 31.
Article in English | MEDLINE | ID: mdl-29385983

ABSTRACT

BACKGROUND: The clinical sequencing of cancer genomes to personalize therapy is becoming routine across the world. However, concerns over patient re-identification from these data lead to questions about how tightly access should be controlled. It is not thought to be possible to re-identify patients from somatic variant data. However, somatic variant detection pipelines can mistakenly identify germline variants as somatic ones, a process called "germline leakage". The rate of germline leakage across different somatic variant detection pipelines is not well-understood, and it is uncertain whether or not somatic variant calls should be considered re-identifiable. To fill this gap, we quantified germline leakage across 259 sets of whole-genome somatic single nucleotide variant (SNVs) predictions made by 21 teams as part of the ICGC-TCGA DREAM Somatic Mutation Calling Challenge. RESULTS: The median somatic SNV prediction set contained 4325 somatic SNVs and leaked one germline polymorphism. The level of germline leakage was inversely correlated with somatic SNV prediction accuracy and positively correlated with the amount of infiltrating normal cells. The specific germline variants leaked differed by tumour and algorithm. To aid in quantitation and correction of leakage, we created a tool, called GermlineFilter, for use in public-facing somatic SNV databases. CONCLUSIONS: The potential for patient re-identification from leaked germline variants in somatic SNV predictions has led to divergent open data access policies, based on different assessments of the risks. Indeed, a single, well-publicized re-identification event could reshape public perceptions of the values of genomic data sharing. We find that modern somatic SNV prediction pipelines have low germline-leakage rates, which can be further reduced, especially for cloud-sharing, using pre-filtering software.


Subject(s)
Genome, Human , Germ Cells/metabolism , Polymorphism, Single Nucleotide , Algorithms , Humans , Internet , Neoplasms/genetics , Neoplasms/pathology , User-Computer Interface , Whole Genome Sequencing
13.
Cell Syst ; 5(5): 485-497.e3, 2017 11 22.
Article in English | MEDLINE | ID: mdl-28988802

ABSTRACT

We report the results of a DREAM challenge designed to predict relative genetic essentialities based on a novel dataset testing 98,000 shRNAs against 149 molecularly characterized cancer cell lines. We analyzed the results of over 3,000 submissions over a period of 4 months. We found that algorithms combining essentiality data across multiple genes demonstrated increased accuracy; gene expression was the most informative molecular data type; the identity of the gene being predicted was far more important than the modeling strategy; well-predicted genes and selected molecular features showed enrichment in functional categories; and frequently selected expression features correlated with survival in primary tumors. This study establishes benchmarks for gene essentiality prediction, presents a community resource for future comparison with this benchmark, and provides insights into factors influencing the ability to predict gene essentiality from functional genetic screens. This study also demonstrates the value of releasing pre-publication data publicly to engage the community in an open research collaboration.


Subject(s)
Gene Expression/genetics , Genes, Essential/genetics , Algorithms , Cell Line, Tumor , Genomics/methods , Humans , RNA, Small Interfering/genetics
14.
Cell Rep ; 19(1): 203-217, 2017 04 04.
Article in English | MEDLINE | ID: mdl-28380359

ABSTRACT

Here, we describe a multiplexed immunohistochemical platform with computational image processing workflows, including image cytometry, enabling simultaneous evaluation of 12 biomarkers in one formalin-fixed paraffin-embedded tissue section. To validate this platform, we used tissue microarrays containing 38 archival head and neck squamous cell carcinomas and revealed differential immune profiles based on lymphoid and myeloid cell densities, correlating with human papilloma virus status and prognosis. Based on these results, we investigated 24 pancreatic ductal adenocarcinomas from patients who received neoadjuvant GVAX vaccination and revealed that response to therapy correlated with degree of mono-myelocytic cell density and percentages of CD8+ T cells expressing T cell exhaustion markers. These data highlight the utility of in situ immune monitoring for patient stratification and provide digital image processing pipelines to the community for examining immune complexity in precious tissue sections, where phenotype and tissue architecture are preserved to improve biomarker discovery and assessment.


Subject(s)
Biomarkers, Tumor/analysis , Carcinoma, Squamous Cell/immunology , Head and Neck Neoplasms/immunology , Image Cytometry/methods , Image Processing, Computer-Assisted , Monitoring, Immunologic/methods , Aged , Aged, 80 and over , Biomarkers, Tumor/metabolism , Cohort Studies , Female , Humans , Immunohistochemistry , Male , Middle Aged , Prognosis , Statistics, Nonparametric , Tissue Array Analysis
16.
Bioinformatics ; 33(9): 1362-1369, 2017 05 01.
Article in English | MEDLINE | ID: mdl-28082455

ABSTRACT

Motivation: In recent years, vast advances in biomedical technologies and comprehensive sequencing have revealed the genomic landscape of common forms of human cancer in unprecedented detail. The broad heterogeneity of the disease calls for rapid development of personalized therapies. Translating the readily available genomic data into useful knowledge that can be applied in the clinic remains a challenge. Computational methods are needed to aid these efforts by robustly analyzing genome-scale data from distinct experimental platforms for prioritization of targets and treatments. Results: We propose a novel, biologically motivated, Bayesian multitask approach, which explicitly models gene-centric dependencies across multiple and distinct genomic platforms. We introduce a gene-wise prior and present a fully Bayesian formulation of a group factor analysis model. In supervised prediction applications, our multitask approach leverages similarities in response profiles of groups of drugs that are more likely to be related to true biological signal, which leads to more robust performance and improved generalization ability. We evaluate the performance of our method on molecularly characterized collections of cell lines profiled against two compound panels, namely the Cancer Cell Line Encyclopedia and the Cancer Therapeutics Response Portal. We demonstrate that accounting for the gene-centric dependencies enables leveraging information from multi-omic input data and improves prediction and feature selection performance. We further demonstrate the applicability of our method in an unsupervised dimensionality reduction application by inferring genes essential to tumorigenesis in the pancreatic ductal adenocarcinoma and lung adenocarcinoma patient cohorts from The Cancer Genome Atlas. Availability and Implementation: : The code for this work is available at https://github.com/olganikolova/gbgfa. Contact: : nikolova@ohsu.edu or margolin@ohsu.edu. Supplementary information: Supplementary data are available at Bioinformatics online.


Subject(s)
Biomarkers, Pharmacological , Genes, Neoplasm , Genomics/methods , Models, Genetic , Neoplasms/metabolism , Precision Medicine/methods , Adenocarcinoma/drug therapy , Adenocarcinoma/genetics , Adenocarcinoma/metabolism , Antineoplastic Agents/therapeutic use , Bayes Theorem , Cell Line , Cell Transformation, Neoplastic , Humans , Lung Neoplasms/drug therapy , Lung Neoplasms/genetics , Lung Neoplasms/metabolism , Neoplasms/drug therapy , Neoplasms/genetics , Pancreatic Neoplasms/drug therapy , Pancreatic Neoplasms/genetics , Pancreatic Neoplasms/metabolism , Unsupervised Machine Learning
17.
Nat Cell Biol ; 17(12): 1523-35, 2015 Dec.
Article in English | MEDLINE | ID: mdl-26571212

ABSTRACT

For nearly a century developmental biologists have recognized that cells from embryos can differ in their potential to differentiate into distinct cell types. Recently, it has been recognized that embryonic stem cells derived from both mice and humans exhibit two stable yet epigenetically distinct states of pluripotency: naive and primed. We now show that nicotinamide N-methyltransferase (NNMT) and the metabolic state regulate pluripotency in human embryonic stem cells (hESCs).  Specifically, in naive hESCs, NNMT and its enzymatic product 1-methylnicotinamide are highly upregulated, and NNMT is required for low S-adenosyl methionine (SAM) levels and the H3K27me3 repressive state. NNMT consumes SAM in naive cells, making it unavailable for histone methylation that represses Wnt and activates the HIF pathway in primed hESCs. These data support the hypothesis that the metabolome regulates the epigenetic landscape of the earliest steps in human development.


Subject(s)
Cell Differentiation , Epigenesis, Genetic/genetics , Human Embryonic Stem Cells/metabolism , Metabolome , Animals , Blotting, Western , Cells, Cultured , Embryonic Stem Cells/metabolism , Gas Chromatography-Mass Spectrometry , Gene Expression Profiling/methods , Gene Knockdown Techniques , Histones/metabolism , Humans , Lysine/metabolism , Mass Spectrometry , Metabolomics/methods , Methylation , Mice , Niacinamide/analogs & derivatives , Niacinamide/metabolism , Nicotinamide N-Methyltransferase/genetics , Nicotinamide N-Methyltransferase/metabolism , Proteomics/methods , Reverse Transcriptase Polymerase Chain Reaction , S-Adenosylmethionine/metabolism , Signal Transduction
18.
J Am Med Inform Assoc ; 22(6): 1143-7, 2015 Nov.
Article in English | MEDLINE | ID: mdl-26174866

ABSTRACT

The world's genomics data will never be stored in a single repository - rather, it will be distributed among many sites in many countries. No one site will have enough data to explain genotype to phenotype relationships in rare diseases; therefore, sites must share data. To accomplish this, the genetics community must forge common standards and protocols to make sharing and computing data among many sites a seamless activity. Through the Global Alliance for Genomics and Health, we are pioneering the development of shared application programming interfaces (APIs) to connect the world's genome repositories. In parallel, we are developing an open source software stack (ADAM) that uses these APIs. This combination will create a cohesive genome informatics ecosystem. Using containers, we are facilitating the deployment of this software in a diverse array of environments. Through benchmarking efforts and big data driver projects, we are ensuring ADAM's performance and utility.


Subject(s)
Datasets as Topic , Genomics , Translational Research, Biomedical , Computational Biology , Humans , Knowledge Bases , National Institutes of Health (U.S.) , United States
19.
Nat Methods ; 12(7): 623-30, 2015 Jul.
Article in English | MEDLINE | ID: mdl-25984700

ABSTRACT

The detection of somatic mutations from cancer genome sequences is key to understanding the genetic basis of disease progression, patient survival and response to therapy. Benchmarking is needed for tool assessment and improvement but is complicated by a lack of gold standards, by extensive resource requirements and by difficulties in sharing personal genomic information. To resolve these issues, we launched the ICGC-TCGA DREAM Somatic Mutation Calling Challenge, a crowdsourced benchmark of somatic mutation detection algorithms. Here we report the BAMSurgeon tool for simulating cancer genomes and the results of 248 analyses of three in silico tumors created with it. Different algorithms exhibit characteristic error profiles, and, intriguingly, false positives show a trinucleotide profile very similar to one found in human tumors. Although the three simulated tumors differ in sequence contamination (deviation from normal cell sequence) and in subclonality, an ensemble of pipelines outperforms the best individual pipeline in all cases. BAMSurgeon is available at https://github.com/adamewing/bamsurgeon/.


Subject(s)
Benchmarking , Crowdsourcing , Genome , Neoplasms/genetics , Polymorphism, Single Nucleotide , Algorithms , Humans
20.
Pac Symp Biocomput ; : 32-43, 2015.
Article in English | MEDLINE | ID: mdl-25592566

ABSTRACT

Complex mechanisms involving genomic aberrations in numerous proteins and pathways are believed to be a key cause of many diseases such as cancer. With recent advances in genomics, elucidating the molecular basis of cancer at a patient level is now feasible, and has led to personalized treatment strategies whereby a patient is treated according to his or her genomic profile. However, there is growing recognition that existing treatment modalities are overly simplistic, and do not fully account for the deep genomic complexity associated with sensitivity or resistance to cancer therapies. To overcome these limitations, large-scale pharmacogenomic screens of cancer cell lines--in conjunction with modern statistical learning approaches--have been used to explore the genetic underpinnings of drug response. While these analyses have demonstrated the ability to infer genetic predictors of compound sensitivity, to date most modeling approaches have been data-driven, i.e. they do not explicitly incorporate domain-specific knowledge (priors) in the process of learning a model. While a purely data-driven approach offers an unbiased perspective of the data--and may yield unexpected or novel insights--this strategy introduces challenges for both model interpretability and accuracy. In this study, we propose a novel prior-incorporated sparse regression model in which the choice of informative predictor sets is carried out by knowledge-driven priors (gene sets) in a stepwise fashion. Under regularization in a linear regression model, our algorithm is able to incorporate prior biological knowledge across the predictive variables thereby improving the interpretability of the final model with no loss--and often an improvement--in predictive performance. We evaluate the performance of our algorithm compared to well-known regularization methods such as LASSO, Ridge and Elastic net regression in the Cancer Cell Line Encyclopedia (CCLE) and Genomics of Drug Sensitivity in Cancer (Sanger) pharmacogenomics datasets, demonstrating that incorporation of the biological priors selected by our model confers improved predictability and interpretability, despite much fewer predictors, over existing state-of-the-art methods.


Subject(s)
Linear Models , Pharmacogenetics/statistics & numerical data , Algorithms , Cell Line, Tumor , Computational Biology , Databases, Genetic , Drug Screening Assays, Antitumor/statistics & numerical data , Humans , Models, Genetic , Neoplasms/drug therapy , Neoplasms/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...