Search | VHL Regional Portal

1.

Network motif analysis of a multi-mode genetic-interaction network.

Taylor, R James; Siegel, Andrew F; Galitski, Timothy.

Genome Biol ; 8(8): R160, 2007.

Article in English | MEDLINE | ID: mdl-17683534

ABSTRACT

Different modes of genetic interaction indicate different functional relationships between genes. The extraction of biological information from dense multi-mode genetic-interaction networks demands appropriate statistical and computational methods. We developed such methods and implemented them in open-source software. Motifs extracted from multi-mode genetic-interaction networks form functional subnetworks, highlight genes dominating these subnetworks, and reveal genetic reflections of the underlying biochemical system.

Subject(s)

Computational Biology/methods , Gene Regulatory Networks , Software , Monte Carlo Method

2.

Differential expression of CD10 in prostate cancer and its clinical implication.

Dall'Era, Marc A; True, Lawrence D; Siegel, Andrew F; Porter, Michael P; Sherertz, Tracy M; Liu, Alvin Y.

BMC Urol ; 7: 3, 2007 Mar 02.

Article in English | MEDLINE | ID: mdl-17335564

ABSTRACT

BACKGROUND: CD10 is a transmembrane metallo-endopeptidase that cleaves and inactivates a variety of peptide growth factors. Loss of CD10 expression is a common, early event in human prostate cancer; however, CD10 positive cancer cells frequently appear in lymph node metastasis. We hypothesize that prostate tumors expressing high levels of CD10 have a more aggressive biology with an early propensity towards lymph node metastasis. METHODS: Eighty-seven patients, 53 with and 34 without pathologically organ confined prostate cancer at the time of radical prostatectomy (RP), were used for the study. Fourteen patients with lymph node metastasis found at the time of surgery were identified and included in this study. Serial sections from available frozen tumor specimens in OCT were processed for CD10 immunohistochemistry. Cancer glands were graded for the presence and intensity of CD10 staining, and overall percentage of glands staining positive was estimated. Clinical characteristics including pre- and post-operative PSA and Gleason score were obtained. A similar study as a control for the statistical analysis was performed with CD13 staining. For statistical analysis, strong staining was defined as > 20% positivity based on the observed maximum separation of the cumulative distributions. RESULTS: CD10 expression significantly correlated with Gleason grade, tumor stage, and with pre-operative serum PSA. Seventy percent of RP specimens from patients with node metastasis showed strong staining for CD10, compared to 30% in the entire cohort (OR = 3.4, 95% CI: 1.08-10.75, P = 0.019). Increased staining for CD10 was associated with PSA recurrence after RP. CD13 staining did not correlate significantly with any of these same clinical parameters. CONCLUSION: These results suggest that the expression of CD10 by prostate cancer corresponds to a more aggressive phenotype with a higher malignant potential, described histologically by the Gleason score. CD10 offers potential clinical utility for stratifying prostate cancer to predict biological behavior of the tumor.

Subject(s)

Neprilysin/biosynthesis , Prostatic Neoplasms/genetics , Prostatic Neoplasms/metabolism , Biomarkers, Tumor/biosynthesis , Biomarkers, Tumor/genetics , Cell Transformation, Neoplastic/genetics , Cell Transformation, Neoplastic/metabolism , Cell Transformation, Neoplastic/pathology , Gene Expression Regulation, Neoplastic/physiology , Humans , Male , Neoplasm Staging , Neprilysin/genetics , Phenotype , Prostate-Specific Antigen/biosynthesis , Prostate-Specific Antigen/genetics , Prostatic Neoplasms/pathology

3.

Genetic mapping at 3-kilobase resolution reveals inositol 1,4,5-triphosphate receptor 3 as a risk factor for type 1 diabetes in Sweden.

Roach, Jared C; Deutsch, Kerry; Li, Sarah; Siegel, Andrew F; Bekris, Lynn M; Einhaus, Derek C; Sheridan, Colleen M; Glusman, Gustavo; Hood, Leroy; Lernmark, Ake; Janer, Marta.

Am J Hum Genet ; 79(4): 614-27, 2006 Oct.

Article in English | MEDLINE | ID: mdl-16960798

ABSTRACT

We mapped the genetic influences for type 1 diabetes (T1D), using 2,360 single-nucleotide polymorphism (SNP) markers in the 4.4-Mb human major histocompatibility complex (MHC) locus and the adjacent 493 kb centromeric to the MHC, initially in a survey of 363 Swedish T1D cases and controls. We confirmed prior studies showing association with T1D in the MHC, most significantly near HLA-DR/DQ. In the region centromeric to the MHC, we identified a peak of association within the inositol 1,4,5-triphosphate receptor 3 gene (ITPR3; formerly IP3R3). The most significant single SNP in this region was at the center of the ITPR3 peak of association (P=1.7 x 10(-4) for the survey study). For validation, we typed an additional 761 Swedish individuals. The P value for association computed from all 1,124 individuals was 1.30 x 10(-6) (recessive odds ratio 2.5; 95% confidence interval [CI] 1.7-3.9). The estimated population-attributable risk of 21.6% (95% CI 10.0%-31.0%) suggests that variation within ITPR3 reflects an important contribution to T1D in Sweden. Two-locus regression analysis supports an influence of ITPR3 variation on T1D that is distinct from that of any MHC class II gene.

Subject(s)

Calcium Channels/genetics , Chromosome Mapping/methods , Diabetes Mellitus, Type 1/genetics , Genetic Predisposition to Disease , Receptors, Cytoplasmic and Nuclear/genetics , Adolescent , Adult , Centromere , Child , Child, Preschool , Chromosomes, Human, Pair 6 , Female , Genome, Human , Haplotypes , Humans , Infant , Inositol 1,4,5-Trisphosphate Receptors , Major Histocompatibility Complex/genetics , Male , Polymorphism, Single Nucleotide , Sweden

4.

A third approach to gene prediction suggests thousands of additional human transcribed regions.

Glusman, Gustavo; Qin, Shizhen; El-Gewely, M Raafat; Siegel, Andrew F; Roach, Jared C; Hood, Leroy; Smit, Arian F A.

PLoS Comput Biol ; 2(3): e18, 2006 Mar.

Article in English | MEDLINE | ID: mdl-16543943

ABSTRACT

The identification and characterization of the complete ensemble of genes is a main goal of deciphering the digital information stored in the human genome. Many algorithms for computational gene prediction have been described, ultimately derived from two basic concepts: (1) modeling gene structure and (2) recognizing sequence similarity. Successful hybrid methods combining these two concepts have also been developed. We present a third orthogonal approach to gene prediction, based on detecting the genomic signatures of transcription, accumulated over evolutionary time. We discuss four algorithms based on this third concept: Greens and CHOWDER, which quantify mutational strand biases caused by transcription-coupled DNA repair, and ROAST and PASTA, which are based on strand-specific selection against polyadenylation signals. We combined these algorithms into an integrated method called FEAST, which we used to predict the location and orientation of thousands of putative transcription units not overlapping known genes. Many of the newly predicted transcriptional units do not appear to code for proteins. The new algorithms are particularly apt at detecting genes with long introns and lacking sequence conservation. They therefore complement existing gene prediction methods and will help identify functional transcripts within many apparent "genomic deserts."

Subject(s)

Computational Biology/methods , Gene Expression Profiling/methods , Sequence Analysis, DNA/methods , Algorithms , DNA Repair , Evolution, Molecular , Humans , Models, Statistical , Mutation , Software , Time Factors

5.

The mobile nucleoporin Nup2p and chromatin-bound Prp20p function in endogenous NPC-mediated transcriptional control.

Dilworth, David J; Tackett, Alan J; Rogers, Richard S; Yi, Eugene C; Christmas, Rowan H; Smith, Jennifer J; Siegel, Andrew F; Chait, Brian T; Wozniak, Richard W; Aitchison, John D.

J Cell Biol ; 171(6): 955-65, 2005 Dec 19.

Article in English | MEDLINE | ID: mdl-16365162

ABSTRACT

Nuclear pore complexes (NPCs) govern macromolecular transport between the nucleus and cytoplasm and serve as key positional markers within the nucleus. Several protein components of yeast NPCs have been implicated in the epigenetic control of gene expression. Among these, Nup2p is unique as it transiently associates with NPCs and, when artificially tethered to DNA, can prevent the spread of transcriptional activation or repression between flanking genes, a function termed boundary activity. To understand this function of Nup2p, we investigated the interactions of Nup2p with other proteins and with DNA using immunopurifications coupled with mass spectrometry and microarray analyses. These data combined with functional assays of boundary activity and epigenetic variegation suggest that Nup2p and the Ran guanylyl-nucleotide exchange factor, Prp20p, interact at specific chromatin regions and enable the NPC to play an active role in chromatin organization by facilitating the transition of chromatin between activity states.

Subject(s)

Chromatin/metabolism , DNA-Binding Proteins/metabolism , Nuclear Pore Complex Proteins/metabolism , Nuclear Pore/metabolism , Nuclear Proteins/metabolism , Saccharomyces cerevisiae Proteins/metabolism , Transcription, Genetic/physiology , Active Transport, Cell Nucleus/physiology , Chromatin/genetics , DNA-Binding Proteins/genetics , Gene Silencing/physiology , Guanine Nucleotide Exchange Factors , Histones/genetics , Histones/metabolism , Microarray Analysis , Models, Biological , Nuclear Pore/genetics , Nuclear Pore Complex Proteins/genetics , Nuclear Proteins/genetics , Nucleosomes/metabolism , Open Reading Frames/genetics , Saccharomyces cerevisiae/metabolism , Saccharomyces cerevisiae Proteins/genetics , Telomere/genetics , Telomere/metabolism

6.

A data integration methodology for systems biology: experimental verification.

Hwang, Daehee; Smith, Jennifer J; Leslie, Deena M; Weston, Andrea D; Rust, Alistair G; Ramsey, Stephen; de Atauri, Pedro; Siegel, Andrew F; Bolouri, Hamid; Aitchison, John D; Hood, Leroy.

Proc Natl Acad Sci U S A ; 102(48): 17302-7, 2005 Nov 29.

Article in English | MEDLINE | ID: mdl-16301536

ABSTRACT

The integration of data from multiple global assays is essential to understanding dynamic spatiotemporal interactions within cells. In a companion paper, we reported a data integration methodology, designated Pointillist, that can handle multiple data types from technologies with different noise characteristics. Here we demonstrate its application to the integration of 18 data sets relating to galactose utilization in yeast. These data include global changes in mRNA and protein abundance, genome-wide protein-DNA interaction data, database information, and computational predictions of protein-DNA and protein-protein interactions. We divided the integration task to determine three network components: key system elements (genes and proteins), protein-protein interactions, and protein-DNA interactions. Results indicate that the reconstructed network efficiently focuses on and recapitulates the known biology of galactose utilization. It also provided new insights, some of which were verified experimentally. The methodology described here, addresses a critical need across all domains of molecular and cell biology, to effectively integrate large and disparate data sets.

Subject(s)

Galactose/genetics , Galactose/metabolism , Informatics/methods , Information Systems , Software , Systems Biology/methods , Chromatin Immunoprecipitation , Microarray Analysis , Monosaccharide Transport Proteins/metabolism , Saccharomyces cerevisiae Proteins/metabolism , Yeasts

7.

A data integration methodology for systems biology.

Hwang, Daehee; Rust, Alistair G; Ramsey, Stephen; Smith, Jennifer J; Leslie, Deena M; Weston, Andrea D; de Atauri, Pedro; Aitchison, John D; Hood, Leroy; Siegel, Andrew F; Bolouri, Hamid.

Proc Natl Acad Sci U S A ; 102(48): 17296-301, 2005 Nov 29.

Article in English | MEDLINE | ID: mdl-16301537

ABSTRACT

Different experimental technologies measure different aspects of a system and to differing depth and breadth. High-throughput assays have inherently high false-positive and false-negative rates. Moreover, each technology includes systematic biases of a different nature. These differences make network reconstruction from multiple data sets difficult and error-prone. Additionally, because of the rapid rate of progress in biotechnology, there is usually no curated exemplar data set from which one might estimate data integration parameters. To address these concerns, we have developed data integration methods that can handle multiple data sets differing in statistical power, type, size, and network coverage without requiring a curated training data set. Our methodology is general in purpose and may be applied to integrate data from any existing and future technologies. Here we outline our methods and then demonstrate their performance by applying them to simulated data sets. The results show that these methods select true-positive data elements much more accurately than classical approaches. In an accompanying companion paper, we demonstrate the applicability of our approach to biological data. We have integrated our methodology into a free open source software package named POINTILLIST.

Subject(s)

Informatics/methods , Information Systems , Models, Theoretical , Software , Systems Biology/methods

8.

Reverse engineering galactose regulation in yeast through model selection.

Thorsson, Vesteinn; Hörnquist, Michael; Siegel, Andrew F; Hood, Leroy.

Stat Appl Genet Mol Biol ; 4: Article28, 2005.

Article in English | MEDLINE | ID: mdl-16646846

ABSTRACT

We examine the application of statistical model selection methods to reverse-engineering the control of galactose utilization in yeast from DNA microarray experiment data. In these experiments, relationships among gene expression values are revealed through modifications of galactose sugar level and genetic perturbations through knockouts. For each gene variable, we select predictors using a variety of methods, taking into account the variance in each measurement. These methods include maximization of log-likelihood with Cp, AIC, and BIC penalties, bootstrap and cross-validation error estimation, and coefficient shrinkage via the Lasso.

9.

Control of yeast filamentous-form growth by modules in an integrated molecular network.

Prinz, Susanne; Avila-Campillo, Iliana; Aldridge, Christine; Srinivasan, Ajitha; Dimitrov, Krassen; Siegel, Andrew F; Galitski, Timothy.

Genome Res ; 14(3): 380-90, 2004 Mar.

Article in English | MEDLINE | ID: mdl-14993204

ABSTRACT

On solid growth media with limiting nitrogen source, diploid budding-yeast cells differentiate from the yeast form to a filamentous, adhesive, and invasive form. Genomic profiles of mRNA levels in Saccharomyces cerevisiae yeast-form and filamentous-form cells were compared. Disparate data types, including genes implicated by expression change, filamentation genes known previously through a phenotype, protein-protein interaction data, and protein-metabolite interaction data were integrated as the nodes and edges of a filamentation-network graph. Application of a network-clustering method revealed 47 clusters in the data. The correspondence of the clusters to modules is supported by significant coordinated expression change among cluster co-member genes, and the quantitative identification of collective functions controlling cell properties. The modular abstraction of the filamentation network enables the association of filamentous-form cell properties with the activation or repression of specific biological processes, and suggests hypotheses. A module-derived hypothesis was tested. It was found that the 26S proteasome regulates filamentous-form growth.

Subject(s)

Gene Expression Regulation, Fungal/genetics , Saccharomyces cerevisiae/growth & development , Cell Cycle/genetics , Cyclins/biosynthesis , Cyclins/genetics , Cyclins/metabolism , Cyclins/physiology , Cytoskeletal Proteins/biosynthesis , Cytoskeletal Proteins/genetics , Cytoskeletal Proteins/metabolism , Cytoskeletal Proteins/physiology , DNA-Binding Proteins/biosynthesis , DNA-Binding Proteins/genetics , DNA-Binding Proteins/metabolism , DNA-Binding Proteins/physiology , Gene Deletion , Genes, Fungal/genetics , Genes, Fungal/physiology , Proteasome Endopeptidase Complex , Protein Interaction Mapping , RNA, Fungal/genetics , RNA, Messenger/genetics , RNA, Messenger/physiology , Saccharomyces cerevisiae/cytology , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae Proteins/biosynthesis , Saccharomyces cerevisiae Proteins/genetics , Saccharomyces cerevisiae Proteins/metabolism , Saccharomyces cerevisiae Proteins/physiology , Transcription Factors/biosynthesis , Transcription Factors/genetics , Transcription Factors/metabolism , Transcription Factors/physiology

10.

Initial proteome analysis of model microorganism Haemophilus influenzae strain Rd KW20.

Kolker, Eugene; Purvine, Samuel; Galperin, Michael Y; Stolyar, Serg; Goodlett, David R; Nesvizhskii, Alexey I; Keller, Andrew; Xie, Tao; Eng, Jimmy K; Yi, Eugene; Hood, Leroy; Picone, Alex F; Cherny, Tim; Tjaden, Brian C; Siegel, Andrew F; Reilly, Thomas J; Makarova, Kira S; Palsson, Bernhard O; Smith, Arnold L.

J Bacteriol ; 185(15): 4593-602, 2003 Aug.

Article in English | MEDLINE | ID: mdl-12867470

ABSTRACT

The proteome of Haemophilus influenzae strain Rd KW20 was analyzed by liquid chromatography (LC) coupled with ion trap tandem mass spectrometry (MS/MS). This approach does not require a gel electrophoresis step and provides a rapidly developed snapshot of the proteome. In order to gain insight into the central metabolism of H. influenzae, cells were grown microaerobically and anaerobically in a rich medium and soluble and membrane proteins of strain Rd KW20 were proteolyzed with trypsin and directly examined by LC-MS/MS. Several different experimental and computational approaches were utilized to optimize the proteome coverage and to ensure statistically valid protein identification. Approximately 25% of all predicted proteins (open reading frames) of H. influenzae strain Rd KW20 were identified with high confidence, as their component peptides were unambiguously assigned to tandem mass spectra. Approximately 80% of the predicted ribosomal proteins were identified with high confidence, compared to the 33% of the predicted ribosomal proteins detected by previous two-dimensional gel electrophoresis studies. The results obtained in this study are generally consistent with those obtained from computational genome analysis, two-dimensional gel electrophoresis, and whole-genome transposon mutagenesis studies. At least 15 genes originally annotated as conserved hypothetical were found to encode expressed proteins. Two more proteins, previously annotated as predicted coding regions, were detected with high confidence; these proteins also have close homologs in related bacteria. The direct proteomics approach to studying protein expression in vivo reported here is a powerful method that is applicable to proteome analysis of any (micro)organism.

Subject(s)

Bacterial Proteins/analysis , Haemophilus influenzae/chemistry , Proteome , Aerobiosis , Anaerobiosis , Animals , Bacterial Proteins/genetics , Bacterial Proteins/metabolism , Cattle , Chromatography, Liquid , Gene Expression Regulation, Bacterial , Haemophilus influenzae/genetics , Haemophilus influenzae/growth & development , Humans , Rabbits , Sensitivity and Specificity , Spectrometry, Mass, Electrospray Ionization , Trypsin

11.

Discovering regulatory and signalling circuits in molecular interaction networks.

Ideker, Trey; Ozier, Owen; Schwikowski, Benno; Siegel, Andrew F.

Bioinformatics ; 18 Suppl 1: S233-40, 2002.

Article in English | MEDLINE | ID: mdl-12169552

ABSTRACT

MOTIVATION: In model organisms such as yeast, large databases of protein-protein and protein-DNA interactions have become an extremely important resource for the study of protein function, evolution, and gene regulatory dynamics. In this paper we demonstrate that by integrating these interactions with widely-available mRNA expression data, it is possible to generate concrete hypotheses for the underlying mechanisms governing the observed changes in gene expression. To perform this integration systematically and at large scale, we introduce an approach for screening a molecular interaction network to identify active subnetworks, i.e., connected regions of the network that show significant changes in expression over particular subsets of conditions. The method we present here combines a rigorous statistical measure for scoring subnetworks with a search algorithm for identifying subnetworks with high score. RESULTS: We evaluated our procedure on a small network of 332 genes and 362 interactions and a large network of 4160 genes containing all 7462 protein-protein and protein-DNA interactions in the yeast public databases. In the case of the small network, we identified five significant subnetworks that covered 41 out of 77 (53%) of all significant changes in expression. Both network analyses returned several top-scoring subnetworks with good correspondence to known regulatory mechanisms in the literature. These results demonstrate how large-scale genomic approaches may be used to uncover signalling and regulatory pathways in a systematic, integrative fashion.

Subject(s)

Algorithms , Gene Expression Profiling/methods , Gene Expression Regulation/physiology , Models, Biological , Proteome/genetics , Proteome/metabolism , Signal Transduction/physiology , Computer Simulation , Yeasts/physiology

12.

Spectral analysis of distributions: finding periodic components in eukaryotic enzyme length data.

Kolker, Eugene; Tjaden, Brian C; Hubley, Robert; Trifonov, Edward N; Siegel, Andrew F.

OMICS ; 6(1): 123-30, 2002.

Article in English | MEDLINE | ID: mdl-11881830

ABSTRACT

We introduce the spectral analysis of distributions (SAD), a method for detecting and evaluating possible periodicity in experimental data distributions (histograms) of arbitrary shape. SAD determines whether a given empirical distribution contains a periodic component. We also propose a system of probabilistic mixture distributions to model a histogram consisting of a smooth background together with peaks at periodic intervals, with each peak corresponding to a fixed number of subunits added together. This mixture distribution model allows us to estimate the parameters of the data and to test the statistical significance of the estimated peaks. The analysis is applied to the length distribution of eukaryotic enzymes.

Subject(s)

Enzymes/chemistry , Algorithms , Fourier Analysis

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL