Search | VHL Regional Portal

Improved detection suggests all Merkel cell carcinomas harbor Merkel polyomavirus.

Rodig, Scott J; Cheng, Jingwei; Wardzala, Jacek; DoRosario, Andrew; Scanlon, Jessica J; Laga, Alvaro C; Martinez-Fernandez, Alejandro; Barletta, Justine A; Bellizzi, Andrew M; Sadasivam, Subhashini; Holloway, Dustin T; Cooper, Dylan J; Kupper, Thomas S; Wang, Linda C; DeCaprio, James A.

J Clin Invest ; 122(12): 4645-53, 2012 Dec.

Article in English | MEDLINE | ID: mdl-23114601

ABSTRACT

A human polyomavirus was recently discovered in Merkel cell carcinoma (MCC) specimens. The Merkel cell polyomavirus (MCPyV) genome undergoes clonal integration into the host cell chromosomes of MCC tumors and expresses small T antigen and truncated large T antigen. Previous studies have consistently reported that MCPyV can be detected in approximately 80% of all MCC tumors. We sought to increase the sensitivity of detection of MCPyV in MCC by developing antibodies capable of detecting large T antigen by immunohistochemistry. In addition, we expanded the repertoire of quantitative PCR primers specific for MCPyV to improve the detection of viral DNA in MCC. Here we report that a novel monoclonal antibody detected MCPyV large T antigen expression in 56 of 58 (97%) unique MCC tumors. PCR analysis specifically detected viral DNA in all 60 unique MCC tumors tested. We also detected inactivating point substitution mutations of TP53 in the two MCC specimens that lacked large T antigen expression and in only 1 of 56 tumors positive for large T antigen. These results indicate that MCPyV is present in MCC tumors more frequently than previously reported and that mutations in TP53 tend to occur in MCC tumors that fail to express MCPyV large T antigen.

Subject(s)

Carcinoma, Merkel Cell/virology , Merkel cell polyomavirus/metabolism , Polyomavirus Infections/virology , Tumor Virus Infections/virology , Aged , Aged, 80 and over , Antibodies, Viral/immunology , Antigens, Polyomavirus Transforming/immunology , Antigens, Polyomavirus Transforming/metabolism , Carcinoma, Merkel Cell/diagnosis , Carcinoma, Merkel Cell/metabolism , Cell Line, Tumor , DNA Mutational Analysis , Female , Gene Dosage , Genes, Tumor Suppressor , Genotype , Humans , Immunohistochemistry , Male , Merkel cell polyomavirus/genetics , Merkel cell polyomavirus/immunology , Middle Aged , Molecular Diagnostic Techniques , Oncogenes , Polyomavirus Infections/diagnosis , Polyomavirus Infections/metabolism , Real-Time Polymerase Chain Reaction , Tumor Virus Infections/diagnosis , Tumor Virus Infections/metabolism

A comprehensively molecular haplotype-resolved genome of a European individual.

Suk, Eun-Kyung; McEwen, Gayle K; Duitama, Jorge; Nowick, Katja; Schulz, Sabrina; Palczewski, Stefanie; Schreiber, Stefan; Holloway, Dustin T; McLaughlin, Stephen; Peckham, Heather; Lee, Clarence; Huebsch, Thomas; Hoehe, Margret R.

Genome Res ; 21(10): 1672-85, 2011 Oct.

Article in English | MEDLINE | ID: mdl-21813624

ABSTRACT

Independent determination of both haplotype sequences of an individual genome is essential to relate genetic variation to genome function, phenotype, and disease. To address the importance of phase, we have generated the most complete haplotype-resolved genome to date, "Max Planck One" (MP1), by fosmid pool-based next generation sequencing. Virtually all SNPs (>99%) and 80,000 indels were phased into haploid sequences of up to 6.3 Mb (N50 ~1 Mb). The completeness of phasing allowed determination of the concrete molecular haplotype pairs for the vast majority of genes (81%) including potential regulatory sequences, of which >90% were found to be constituted by two different molecular forms. A subset of 159 genes with potentially severe mutations in either cis or trans configurations exemplified in particular the role of phase for gene function, disease, and clinical interpretation of personal genomes (e.g., BRCA1). Extended genomic regions harboring manifold combinations of physically and/or functionally related genes and regulatory elements were resolved into their underlying "haploid landscapes," which may define the functional genome. Moreover, the majority of genes and functional sequences were found to contain individual or rare SNPs, which cannot be phased from population data alone, emphasizing the importance of molecular phasing for characterizing a genome in its molecular individuality. Our work provides the foundation to understand that the distinction of molecular haplotypes is essential to resolve the (inherently individual) biology of genes, genomes, and disease, establishing a reference point for "phase-sensitive" personal genomics. MP1's annotated haploid genomes are available as a public resource.

Subject(s)

Genome, Human , Haplotypes , Female , Genomics , High-Throughput Nucleotide Sequencing , Humans , INDEL Mutation , Male , Middle Aged , Polymorphism, Single Nucleotide , Sequence Analysis, DNA

In silico regulatory analysis for exploring human disease progression.

Holloway, Dustin T; Kon, Mark; DeLisi, Charles.

Biol Direct ; 3: 24, 2008 Jun 18.

Article in English | MEDLINE | ID: mdl-18564415

ABSTRACT

BACKGROUND: An important goal in bioinformatics is to unravel the network of transcription factors (TFs) and their targets. This is important in the human genome, where many TFs are involved in disease progression. Here, classification methods are applied to identify new targets for 152 transcriptional regulators using publicly-available targets as training examples. Three types of sequence information are used: composition, conservation, and overrepresentation. RESULTS: Starting with 8817 TF-target interactions we predict an additional 9333 targets for 152 TFs. Randomized classifiers make few predictions (approximately 2/18660) indicating that our predictions for many TFs are significantly enriched for true targets. An enrichment score is calculated and used to filter new predictions.Two case-studies for the TFs OCT4 and WT1 illustrate the usefulness of our predictions: Many predicted OCT4 targets fall into the Wnt-pathway. This is consistent with known biology as OCT4 is developmentally related and Wnt pathway plays a role in early development. Beginning with 15 known targets, 354 predictions are made for WT1. WT1 has a role in formation of Wilms' tumor. Chromosomal regions previously implicated in Wilms' tumor by cytological evidence are statistically enriched in predicted WT1 targets. These findings may shed light on Wilms' tumor progression, suggesting that the tumor progresses either by loss of WT1 or by loss of regions harbouring its targets. Targets of WT1 are statistically enriched for cancer related functions including metastasis and apoptosis. Among new targets are BAX and PDE4B, which may help mediate the established anti-apoptotic effects of WT1. Of the thirteen TFs found which co-regulate genes with WT1 (p < or = 0.02), 8 have been previously implicated in cancer. The regulatory-network for WT1 targets in genomic regions relevant to Wilms' tumor is provided. CONCLUSION: We have assembled a set of features for the targets of human TFs and used them to develop classifiers for the determination of new regulatory targets. Many predicted targets are consistent with the known biology of their regulators, and new targets for the Wilms' tumor regulator, WT1, are proposed. We speculate that Wilms' tumor development is mediated by chromosomal rearrangements in the location of WT1 targets.

Subject(s)

Computer Simulation , Gene Expression Regulation, Neoplastic/physiology , Models, Genetic , Octamer Transcription Factor-3/physiology , Transcription Factors/physiology , WT1 Proteins/physiology , Wilms Tumor/genetics , Wilms Tumor/metabolism , Child , Child, Preschool , Disease Progression , Gene Targeting , Genes, Wilms Tumor/physiology , Humans , Octamer Transcription Factor-3/biosynthesis , Octamer Transcription Factor-3/genetics , Predictive Value of Tests , Protein Binding/genetics , Signal Transduction/genetics , Transcription Factors/classification , Transcription Factors/metabolism , WT1 Proteins/genetics , WT1 Proteins/metabolism , Wilms Tumor/pathology

Classifying transcription factor targets and discovering relevant biological features.

Holloway, Dustin T; Kon, Mark; DeLisi, Charles.

Biol Direct ; 3: 22, 2008 May 30.

Article in English | MEDLINE | ID: mdl-18513408

ABSTRACT

BACKGROUND: An important goal in post-genomic research is discovering the network of interactions between transcription factors (TFs) and the genes they regulate. We have previously reported the development of a supervised-learning approach to TF target identification, and used it to predict targets of 104 transcription factors in yeast. We now include a new sequence conservation measure, expand our predictions to include 59 new TFs, introduce a web-server, and implement an improved ranking method to reveal the biological features contributing to regulation. The classifiers combine 8 genomic datasets covering a broad range of measurements including sequence conservation, sequence overrepresentation, gene expression, and DNA structural properties. PRINCIPAL FINDINGS: (1) Application of the method yields an amplification of information about yeast regulators. The ratio of total targets to previously known targets is greater than 2 for 11 TFs, with several having larger gains: Ash1(4), Ino2(2.6), Yaf1(2.4), and Yap6(2.4). (2) Many predicted targets for TFs match well with the known biology of their regulators. As a case study we discuss the regulator Swi6, presenting evidence that it may be important in the DNA damage response, and that the previously uncharacterized gene YMR279C plays a role in DNA damage response and perhaps in cell-cycle progression. (3) A procedure based on recursive-feature-elimination is able to uncover from the large initial data sets those features that best distinguish targets for any TF, providing clues relevant to its biology. An analysis of Swi6 suggests a possible role in lipid metabolism, and more specifically in metabolism of ceramide, a bioactive lipid currently being investigated for anti-cancer properties. (4) An analysis of global network properties highlights the transcriptional network hubs; the factors which control the most genes and the genes which are bound by the largest set of regulators. Cell-cycle and growth related regulators dominate the former; genes involved in carbon metabolism and energy generation dominate the latter. CONCLUSION: Postprocessing of regulatory-classifier results can provide high quality predictions, and feature ranking strategies can deliver insight into the regulatory functions of TFs. Predictions are available at an online web-server, including the full transcriptional network, which can be analyzed using VisAnt network analysis suite.

Subject(s)

Transcription Factors/classification , Transcription Factors/metabolism , Transcription, Genetic/physiology , Base Sequence , Bayes Theorem , Conserved Sequence , Gene Expression Profiling , Gene Regulatory Networks/physiology , Oligonucleotide Array Sequence Analysis , Saccharomyces cerevisiae Proteins/genetics , Saccharomyces cerevisiae Proteins/metabolism , Saccharomyces cerevisiae Proteins/physiology , Sequence Deletion , Transcription Factors/genetics , Transcription Factors/physiology

High-precision high-coverage functional inference from integrated data sources.

Linghu, Bolan; Snitkin, Evan S; Holloway, Dustin T; Gustafson, Adam M; Xia, Yu; DeLisi, Charles.

BMC Bioinformatics ; 9: 119, 2008 Feb 25.

Article in English | MEDLINE | ID: mdl-18298847

ABSTRACT

BACKGROUND: Information obtained from diverse data sources can be combined in a principled manner using various machine learning methods to increase the reliability and range of knowledge about protein function. The result is a weighted functional linkage network (FLN) in which linked neighbors share at least one function with high probability. Precision is, however, low. Aiming to provide precise functional annotation for as many proteins as possible, we explore and propose a two-step framework for functional annotation (1) construction of a high-coverage and reliable FLN via machine learning techniques (2) development of a decision rule for the constructed FLN to optimize functional annotation. RESULTS: We first apply this framework to Saccharomyces cerevisiae. In the first step, we demonstrate that four commonly used machine learning methods, Linear SVM, Linear Discriminant Analysis, Naïve Bayes, and Neural Network, all combine heterogeneous data to produce reliable and high-coverage FLNs, in which the linkage weight more accurately estimates functional coupling of linked proteins than use individual data sources alone. In the second step, empirical tuning of an adjustable decision rule on the constructed FLN reveals that basing annotation on maximum edge weight results in the most precise annotation at high coverages. In particular at low coverage all rules evaluated perform comparably. At coverage above approximately 50%, however, they diverge rapidly. At full coverage, the maximum weight decision rule still has a precision of approximately 70%, whereas for other methods, precision ranges from a high of slightly more than 30%, down to 3%. In addition, a scoring scheme to estimate the precisions of individual predictions is also provided. Finally, tests of the robustness of the framework indicate that our framework can be successfully applied to less studied organisms. CONCLUSION: We provide a general two-step function-annotation framework, and show that high coverage, high precision annotations can be achieved by constructing a high-coverage and reliable FLN via data integration followed by applying a maximum weight decision rule.

Subject(s)

Algorithms , Artificial Intelligence , Database Management Systems , Databases, Protein , Pattern Recognition, Automated/methods , Saccharomyces cerevisiae Proteins/metabolism , Saccharomyces cerevisiae/metabolism , Reproducibility of Results , Sensitivity and Specificity , Systems Integration

Machine learning for regulatory analysis and transcription factor target prediction in yeast.

Holloway, Dustin T; Kon, Mark; Delisi, Charles.

Syst Synth Biol ; 1(1): 25-46, 2007 Mar.

Article in English | MEDLINE | ID: mdl-19003435

ABSTRACT

High throughput technologies, including array-based chromatin immunoprecipitation, have rapidly increased our knowledge of transcriptional maps-the identity and location of regulatory binding sites within genomes. Still, the full identification of sites, even in lower eukaryotes, remains largely incomplete. In this paper we develop a supervised learning approach to site identification using support vector machines (SVMs) to combine 26 different data types. A comparison with the standard approach to site identification using position specific scoring matrices (PSSMs) for a set of 104 Saccharomyces cerevisiae regulators indicates that our SVM-based target classification is more sensitive (73 vs. 20%) when specificity and positive predictive value are the same. We have applied our SVM classifier for each transcriptional regulator to all promoters in the yeast genome to obtain thousands of new targets, which are currently being analyzed and refined to limit the risk of classifier over-fitting. For the purpose of illustration we discuss several results, including biochemical pathway predictions for Gcn4 and Rap1. For both transcription factors SVM predictions match well with the known biology of control mechanisms, and possible new roles for these factors are suggested, such as a function for Rap1 in regulating fermentative growth. We also examine the promoter melting temperature curves for the targets of YJR060W, and show that targets of this TF have potentially unique physical properties which distinguish them from other genes. The SVM output automatically provides the means to rank dataset features to identify important biological elements. We use this property to rank classifying k-mers, thereby reconstructing known binding sites for several TFs, and to rank expression experiments, determining the conditions under which Fhl1, the factor responsible for expression of ribosomal protein genes, is active. We can see that targets of Fhl1 are differentially expressed in the chosen conditions as compared to the expression of average and negative set genes. SVM-based classifiers provide a robust framework for analysis of regulatory networks. Processing of classifier outputs can provide high quality predictions and biological insight into functions of particular transcription factors. Future work on this method will focus on increasing the accuracy and quality of predictions using feature reduction and clustering strategies. Since predictions have been made on only 104 TFs in yeast, new classifiers will be built for the remaining 100 factors which have available binding data.

Identification and characterization of renal cell carcinoma gene markers.

Dalgin, Gul S; Holloway, Dustin T; Liou, Louis S; DeLisi, Charles.

Cancer Inform ; 3: 65-92, 2007 Feb 09.

Article in English | MEDLINE | ID: mdl-19455236

ABSTRACT

Microarray gene expression profiling has been used to distinguish histological subtypes of renal cell carcinoma (RCC), and consequently to identify specific tumor markers. The analytical procedures currently in use find sets of genes whose average differential expression across the two categories differ significantly. In general each of the markers thus identified does not distinguish tumor from normal with 100% accuracy, although the group as a whole might be able to do so. For the purpose of developing a widely used economically viable diagnostic signature, however, large groups of genes are not likely to be useful. Here we use two different methods, one a support vector machine variant, and the other an exhaustive search, to reanalyze data previously generated in our Lab (Lenburg et al. 2003). We identify 158 genes, each having an expression level that is higher (lower) in every tumor sample than in any normal sample, and each having a minimum differential expression across the two categories at a significance of 0.01. The set is highly enriched in cancer related genes (p = 1.6 x 10â»¹²), containing 43 genes previously associated with either RCC or other types of cancer. Many of the biomarkers appear to be associated with the central alterations known to be required for cancer transformation. These include the oncogenes JAZF1, AXL, ABL2; tumor suppressors RASD1, PTPRO, TFAP2A, CDKN1C; and genes involved in proteolysis or cell-adhesion such as WASF2, and PAPPA.

Integrating genomic data to predict transcription factor binding.

Holloway, Dustin T; Kon, Mark; DeLisi, Charles.

Genome Inform ; 16(1): 83-94, 2005.

Article in English | MEDLINE | ID: mdl-16362910

ABSTRACT

Transcription factor binding sites (TFBS) in gene promoter regions are often predicted by using position specific scoring matrices (PSSMs), which summarize sequence patterns of experimentally determined TF binding sites. Although PSSMs are more reliable than simple consensus string matching in predicting a true binding site, they generally result in high numbers of false positive hits. This study attempts to reduce the number of false positive matches and generate new predictions by integrating various types of genomic data by two methods: a Bayesian allocation procedure, and support vector machine classification. Several methods will be explored to strengthen the prediction of a true TFBS in the Saccharomyces cerevisiae genome: binding site degeneracy, binding site conservation, phylogenetic profiling, TF binding site clustering, gene expression profiles, GO functional annotation, and k-mer counts in promoter regions. Binding site degeneracy (or redundancy) refers to the number of times a particular transcription factor's binding motif is discovered in the upstream region of a gene. Phylogenetic conservation takes into account the number of orthologous upstream regions in other genomes that contain a particular binding site. Phylogenetic profiling refers to the presence or absence of a gene across a large set of genomes. Binding site clusters are statistically significant clusters of TF binding sites detected by the algorithm ClusterBuster. Gene expression takes into account the idea that when the gene expression profiles of a transcription factor and a potential target gene are correlated, then it is more likely that the gene is a genuine target. Also, genes with highly correlated expression profiles are often regulated by the same TF(s). The GO annotation data takes advantage of the idea that common transcription targets often have related function. Finally, the distribution of the counts of all k-mers of length 4, 5, and 6 in gene's promoter region were examined as means to predict TF binding. In each case the data are compared to known true positives taken from ChIP-chip data, Transfac, and the Saccharomyces Genome Database. First, degeneracy, conservation, expression, and binding site clusters were examined independently and in combination via Bayesian allocation. Then, binding sites were predicted with a support vector machine (SVM) using all methods alone and in combination. The SVM works best when all genomic data are combined, but can also identify which methods contribute the most to accurate classification. On average, a support vector machine can classify binding sites with high sensitivity and an accuracy of almost 80%.

Subject(s)

Genome, Fungal , Saccharomyces cerevisiae/genetics , Transcription Factors/metabolism , Algorithms , Base Sequence , Bayes Theorem , Binding Sites , Chromatin Immunoprecipitation , Cluster Analysis , Computational Biology , Evolution, Molecular , Gene Expression Profiling , Gene Expression Regulation, Fungal , Genes, Fungal , Phylogeny , Promoter Regions, Genetic , Protein Binding , Transcription Factors/genetics

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL