Search | VHL Regional Portal

1.

Stochastics of Cellular Differentiation Explained by Epigenetics: The Case of T-Cell Differentiation and Functional Plasticity.

Bhat, J; Helmuth, J; Chitadze, G; Kouakanou, L; Peters, C; Vingron, M; Ammerpohl, O; Kabelitz, D.

Scand J Immunol ; 86(4): 184-195, 2017 Oct.

Article in English | MEDLINE | ID: mdl-28799233

ABSTRACT

Epigenetic marks including histone modifications and DNA methylation are associated with the regulation of gene expression and activity. In addition, an increasing number of non-coding RNAs with regulatory activity on gene expression have been identified. Alongside, technological advancements allow for the analysis of these mechanisms with high resolution up to the single-cell level. For instance, the assay for transposase-accessible chromatin using sequencing (ATAC-seq) simultaneously probes for chromatin accessibility and nucleosome positioning. Thus, it provides information on two levels of epigenetic regulation. Development and differentiation of T cells into functional subset cells including memory T cells are dynamic processes driven by environmental signals. Here, we briefly review the current knowledge of how epigenetic regulation contributes to subset specification, differentiation and memory development in T cells. Specifically, we focus on epigenetic mechanisms differentially active in the two distinct T cell populations expressing αß or Î³Î´ T cell receptors. We also discuss examples of epigenetic alterations of T cells in autoimmune diseases. DNA methylation and histone acetylation are subject to modification by several classes of 'epigenetic modifiers', some of which are in clinical use or in preclinical development. Therefore, we address the impact of some epigenetic modifiers on T-cell activation and differentiation, and discuss possible synergies with T cell-based immunotherapeutic strategies.

Subject(s)

Cell Differentiation , Cell Plasticity , Epigenesis, Genetic , Epigenomics , T-Lymphocytes/physiology , Animals , DNA Methylation , Humans , Lymphocyte Activation , Protein Processing, Post-Translational

2.

Developments in CORG: a gene-centric comparative genomics resource.

Dieterich, C; Franz, M W; Vingron, M.

Nucleic Acids Res ; 35(Database issue): D32-5, 2007 Jan.

Article in English | MEDLINE | ID: mdl-17135197

ABSTRACT

The CORG resource (Comparative Regulatory Genomics, http://corg.eb.tuebingen.mpg.de) provides extensive cross-species comparisons of promoter regions in particular and whole gene loci in general. Pairwise as well as multiple alignments of 10 vertebrate species form the key component of CORG. We implemented a rapid alignment approach based on weight matrix motif anchors to ensure efficient computation and biologically informative alignments. All CORG workbench components have been enhanced towards more flexibility and interactivity. Reference sequence based data presentation and analysis was put into the well-known and modular Generic Genome Browser framework. Herein, various plugins facilitate online data analysis and integration with static conservation data. Main emphasis was put on the design of a new JAVA WebStart application for comparative data display. Flexible data import and export options for standard formats complete the provided services.

Subject(s)

Databases, Genetic , Genes , Genomics , Promoter Regions, Genetic , Animals , Cattle , Computer Graphics , DNA, Intergenic/chemistry , Humans , Internet , Mice , Rats , Sequence Alignment , User-Computer Interface

3.

SVC: structured visualization of evolutionary sequence conservation.

Roepcke, S; Fiziev, P; Seeburg, P H; Vingron, M.

Nucleic Acids Res ; 33(Web Server issue): W271-3, 2005 Jul 01.

Article in English | MEDLINE | ID: mdl-15991338

ABSTRACT

We have developed a web application for the detailed analysis and visualization of evolutionary sequence conservation in complex vertebrate genes. Given a pair of orthologous genes, the protein-coding sequences are aligned. When these sequences are mapped back onto their encoding exons in the genomes, a scaffold of the conserved gene structure naturally emerges. Sequence similarity between exons and introns is analysed and embedded into the gene structure scaffold. The visualization on the SVC server provides detailed information about evolutionarily conserved features of these genes. It further allows concise representation of complex splice patterns in the context of evolutionary conservation. A particular application of our tool arises from the fact that around mRNA editing sites both exonic and intronic sequences are highly conserved. This aids in delineation of these sites. SVC is available at http://svc.molgen.mpg.de.

Subject(s)

Evolution, Molecular , Genomics/methods , Sequence Alignment/methods , Software , Animals , Base Sequence , Computer Graphics , Conserved Sequence , Exons , Humans , Internet , Introns , Mice , RNA Editing , Receptor, Metabotropic Glutamate 5 , Receptors, Metabotropic Glutamate/genetics

4.

Genome wide identification and classification of alternative splicing based on EST data.

Gupta, S; Zink, D; Korn, B; Vingron, M; Haas, S A.

Bioinformatics ; 20(16): 2579-85, 2004 Nov 01.

Article in English | MEDLINE | ID: mdl-15117759

ABSTRACT

MOTIVATION: Alternative splicing is currently seen to explain the vast disparity between the number of predicted genes in the human genome and the highly diverse proteome. The mapping of expressed sequences tag (EST) consensus sequences derived from the GeneNest database onto the genome provides an efficient way of predicting exon-intron boundaries, gene structure and alternative splicing events. However, the alternative splicing events are obscured by a large number of putatively artificial exon boundaries arising due to genomic contamination or alignment errors. The current work describes a methodology to associate quality values to the predicted exon-intron boundaries. High quality exon-intron boundaries are used to predict constitutive and alternative splicing ranked by confidence values, aiming to facilitate large-scale analysis of alternative splicing and splicing in general. RESULTS: Applying the current methodology, constitutive splicing is observed in 33,270 EST clusters, out of which 45% are alternatively spliced. The classification derived from the computed confidence values for 17 of these splice events frequently correlate (15/17) with RT-PCR experiments performed for 40 different tissue samples. As an application of the confidence measure, an evaluation of distribution of alternative splicing revealed that majority of variants correspond to the coding regions of the genes. However, still a significant fraction maps to non-coding regions, thereby indicating a functional relevance of alternative splicing in untranslated regions. AVAILABILITY: The predicted alternative splice variants are visualized in the SpliceNest database at http://splicenest.molgen.mpg.de

Subject(s)

Algorithms , Alternative Splicing/genetics , Chromosome Mapping/methods , Expressed Sequence Tags , Sequence Alignment/methods , Sequence Analysis, DNA/methods , Artificial Intelligence , Chromosomes, Human/genetics , Consensus Sequence/genetics , Humans

5.

The Helmholtz Network for Bioinformatics: an integrative web portal for bioinformatics resources.

Crass, T; Antes, I; Basekow, R; Bork, P; Buning, C; Christensen, M; Claussen, H; Ebeling, C; Ernst, P; Gailus-Durner, V; Glatting, K-H; Gohla, R; Gössling, F; Grote, K; Heidtke, K; Herrmann, A; O'Keeffe, S; Kiesslich, O; Kolibal, S; Korbel, J O; Lengauer, T; Liebich, I; van der Linden, M; Luz, H; Meissner, K; von Mering, C; Mevissen, H-T; Mewes, H-W; Michael, H; Mokrejs, M; Müller, T; Pospisil, H; Rarey, M; Reich, J G; Schneider, R; Schomburg, D; Schulze-Kremer, S; Schwarzer, K; Sommer, I; Springstubbe, S; Suhai, S; Thoppae, G; Vingron, M; Warfsmann, J; Werner, T; Wetzler, D; Wingender, E; Zimmer, R.

Bioinformatics ; 20(2): 268-70, 2004 Jan 22.

Article in English | MEDLINE | ID: mdl-14734319

ABSTRACT

SUMMARY: The Helmholtz Network for Bioinformatics (HNB) is a joint venture of eleven German bioinformatics research groups that offers convenient access to numerous bioinformatics resources through a single web portal. The 'Guided Solution Finder' which is available through the HNB portal helps users to locate the appropriate resources to answer their queries by employing a detailed, tree-like questionnaire. Furthermore, automated complex tool cascades ('tasks'), involving resources located on different servers, have been implemented, allowing users to perform comprehensive data analyses without the requirement of further manual intervention for data transfer and re-formatting. Currently, automated cascades for the analysis of regulatory DNA segments as well as for the prediction of protein functional properties are provided. AVAILABILITY: The HNB portal is available at http://www.hnbioinfo.de

Subject(s)

Algorithms , Computational Biology/methods , Database Management Systems , Information Storage and Retrieval/methods , Internet , Sequence Analysis, DNA/methods , Sequence Analysis, Protein/methods , User-Computer Interface , Computational Biology/organization & administration , Germany , Interinstitutional Relations , Software

6.

Exploring potential target genes of signaling pathways by predicting conserved transcription factor binding sites.

Dieterich, C; Herwig, R; Vingron, M.

Bioinformatics ; 19 Suppl 2: ii50-6, 2003 Oct.

Article in English | MEDLINE | ID: mdl-14534171

ABSTRACT

Many cellular signaling pathways induce gene expression by activating specific transcription factor complexes. Conventional approaches to the prediction of transcription factor binding sites lead to a notoriously high number of false discoveries. To alleviate this problem, we consider only binding sites that are conserved in man-mouse genomic sequence comparisons. We employ two alternative methods for predicting binding sites: exact matches to validated binding site sequences and weight matrix scans. We then ask the question whether there is a characteristic association between a transcription factor or set thereof to a particular group of genes. Our approach is tested on genes, which are induced in dendritic cells in response to the cells' exposure to LPS. We chose this example because the underlying signaling pathways are well understood. We demonstrate the benefit of conserved predicted binding sites in interpreting the LPS experiment. Additionally, we find that both methods for the prediction of conserved binding sites complement one another. Finally, our results suggest a distinct role for SRF in the context of LPS-induced gene expression.

Subject(s)

Conserved Sequence/genetics , Gene Targeting/methods , Regulatory Elements, Transcriptional/genetics , Sequence Analysis, DNA/methods , Signal Transduction/genetics , Transcription Factors/genetics , Transcription, Genetic/genetics , Algorithms , Base Sequence , Binding Sites , Chromosome Mapping/methods , Molecular Sequence Data , Protein Binding , Software

7.

CORG: a database for COmparative Regulatory Genomics.

Dieterich, C; Wang, H; Rateitschak, K; Luz, H; Vingron, M.

Nucleic Acids Res ; 31(1): 55-7, 2003 Jan 01.

Article in English | MEDLINE | ID: mdl-12519946

ABSTRACT

Sequence conservation in non-coding, upstream regions of orthologous genes from man and mouse is likely to reflect common regulatory DNA sites. Motivated by this assumption we have delineated a catalogue of conserved non-coding sequence blocks and provide the CORG-'COmparative Regulatory Genomics'-database. The data were computed based on statistically significant local suboptimal alignments of 15 kb regions upstream of the translation start sites of, currently, 10 793 pairs of orthologous genes. The resulting conserved non-coding blocks were annotated with EST matches for easier detection of non-coding mRNA and with hits to known transcription factor binding sites. CORG data are accessible from the ENSEMBL web site via a DAS service as well as a specially developed web service (http://corg.molgen.mpg.de) for query and interactive visualization of the conserved blocks and their annotation.

Subject(s)

Databases, Nucleic Acid , Genomics , Regulatory Sequences, Nucleic Acid , Animals , Conserved Sequence , Gene Expression Regulation , Genome, Human , Humans , Internet , Mice , Sequence Alignment/methods , Sequence Analysis, DNA/methods , Transcription, Genetic

8.

Mathematical tree models for cytogenetic development in solid tumors.

von Heydebreck, A; Gunawan, B; Huber, W; Vingron, M; Füzesi, L.

Verh Dtsch Ges Pathol ; 87: 188-92, 2003.

Article in English | MEDLINE | ID: mdl-16888912

ABSTRACT

We present a new approach for modeling the occurrence of genetic changes in human tumors over time. In solid tumors, data on genetic alterations are usually only available at a single point in time, allowing no direct insight into the sequential order of genetic events. In our approach, genetic tumor development and progression is assumed to follow a probabilistic tree model. We use maximum likelihood estimation to reconstruct a tree model for the genetic evolution of a given tumor type. The use of the proposed method is illustrated by an application to cytogenetic data from 173 cases of clear cell renal cell carcinoma, which results in a model for the karyotypic evolution of this tumor.

Subject(s)

Chromosome Aberrations , Models, Genetic , Neoplasms/genetics , Neoplasms/pathology , Carcinoma, Renal Cell/genetics , Carcinoma, Renal Cell/pathology , Cytogenetic Analysis , Decision Trees , Humans , Kidney Neoplasms/genetics , Kidney Neoplasms/pathology , Likelihood Functions

9.

An integrated gene annotation and transcriptional profiling approach towards the full gene content of the Drosophila genome.

Hild, M; Beckmann, B; Haas, S A; Koch, B; Solovyev, V; Busold, C; Fellenberg, K; Boutros, M; Vingron, M; Sauer, F; Hoheisel, J D; Paro, R.

Genome Biol ; 5(1): R3, 2003.

Article in English | MEDLINE | ID: mdl-14709175

ABSTRACT

BACKGROUND: While the genome sequences for a variety of organisms are now available, the precise number of the genes encoded is still a matter of debate. For the human genome several stringent annotation approaches have resulted in the same number of potential genes, but a careful comparison revealed only limited overlap. This indicates that only the combination of different computational prediction methods and experimental evaluation of such in silico data will provide more complete genome annotations. In order to get a more complete gene content of the Drosophila melanogaster genome, we based our new D. melanogaster whole-transcriptome microarray, the Heidelberg FlyArray, on the combination of the Berkeley Drosophila Genome Project (BDGP) annotation and a novel ab initio gene prediction of lower stringency using the Fgenesh software. RESULTS: Here we provide evidence for the transcription of approximately 2,600 additional genes predicted by Fgenesh. Validation of the developmental profiling data by RT-PCR and in situ hybridization indicates a lower limit of 2,000 novel annotations, thus substantially raising the number of genes that make a fly. CONCLUSIONS: The successful design and application of this novel Drosophila microarray on the basis of our integrated in silico/wet biology approach confirms our expectation that in silico approaches alone will always tend to be incomplete. The identification of at least 2,000 novel genes highlights the importance of gathering experimental evidence to discover all genes within a genome. Moreover, as such an approach is independent of homology criteria, it will allow the discovery of novel genes unrelated to known protein families or those that have not been strictly conserved between species.

Subject(s)

Drosophila melanogaster/genetics , Gene Expression Profiling/methods , Genes, Insect/physiology , Genome , Oligonucleotide Array Sequence Analysis/methods , Animals , Cluster Analysis , Computational Biology/methods , Computational Biology/statistics & numerical data , Gene Expression Profiling/statistics & numerical data , In Situ Hybridization/methods , Models, Genetic , Molecular Sequence Data , Oligonucleotide Array Sequence Analysis/statistics & numerical data , Predictive Value of Tests , Pseudogenes/genetics , RNA Interference/physiology , Reverse Transcriptase Polymerase Chain Reaction/methods

10.

Proteome analysis based on motif statistics.

Nicodème, P; Doerks, T; Vingron, M.

Bioinformatics ; 18 Suppl 2: S161-71, 2002.

Article in English | MEDLINE | ID: mdl-12385999

ABSTRACT

MOTIVATION: Even for the amino acid motifs collected in the Prosite database there may be chance occurences as opposed to those occurences where the motif is involved in fold or function of a protein. With recent mathematical advances in assessing the significance of observing such a motif a particular number of times, we can now study the over- or under-representation of particular motifs in a complete genome and attempt to make functional deductions. RESULTS: We demonstrate that statistical over- or under-representation of motifs in complete proteomes may be an indicator of whether, in that organism, we are looking at chance occurrences of the motif or whether the occurrences are sufficiently numerous to suggest a systematic, and thus functionally important occurrence. This has important implications on databank annotations. AVAILABILITY: The complete dataset comprising the plotted statistics of 266 Prosite motifs on 42 proteomes is available at http://algo.inria.fr/nicodeme/proteomes/proteocomp.html. The software used to compute this data has been described by Nicodème (2000, 2001). They are available either by web access as mentioned in these articles or by direct request from Pierre Nicodème.

Subject(s)

Chromosome Mapping/methods , Databases, Protein , Models, Chemical , Proteome/analysis , Proteome/chemistry , Sequence Analysis, Protein/methods , Amino Acid Motifs , Amino Acid Sequence , Computer Simulation , Data Interpretation, Statistical , Models, Genetic , Models, Statistical , Molecular Sequence Data , Proteome/genetics , Sequence Homology, Amino Acid

11.

Transcription profiling of renal cell carcinoma.

Huber, W; Boer, J M; von Heydebreck, A; Gunawan, B; Vingron, M; Füzesi, L; Poustka, A; Sültmann, H.

Verh Dtsch Ges Pathol ; 86: 153-64, 2002.

Article in English | MEDLINE | ID: mdl-12647365

ABSTRACT

AIMS: Our aim was to prepare a comprehensive catalogue of the changes in gene expression accompanying the development and progression of renal cell carcinoma, and to correlate these with histo-pathological, cytogenetic and clinical findings. METHODS: mRNA samples from paired neoplastic and non-cancerous human kidney tissue were labeled and hybridized in duplicate against high-density cDNA arrays. Two array technologies were used: 31,500-element transcriptome-wide nylon arrays for hybridization with 37 radioactively labelled sample pairs, and 4200-element kidney- and cancer-specific glass microarrays for hybridization with 19 fluorescently labelled sample pairs. RESULTS: We identified more than 1700 cDNA clones that show differential transcription levels in kidney tumor tissue compared to normal kidney tissue. The functional classification of 389 annotated genes provided views of the changes in the activities of specific biological processes in renal cancer. Among the biological processes with a large proportion of up-regulated genes we found cell adhesion, signal transduction, and nucleotide metabolism. Down-regulated processes included small molecule transport, ion homeostasis, and oxygen and radical metabolism. Furthermore, we explored the feasibility of molecular diagnosis for renal cell tumors using cDNA microarrays on glass slides, investigating the association of transcription levels with tumor type, progression, and a putative prognostic variable. The experimental data is available from the GEO gene expression database (http://www.ncbi.nlm.nih.gov/geo; accession no. GSE3), and a comprehensive presentation of the results is available in the web supplement (http://www.dkfz-heidelberg.de/abt0840/whuber/rcc). CONCLUSION: Transcription profiling using high-density cDNA arrays is a powerful method with the potential to improve cancer diagnosis and prognosis. The identification and classification of differentially transcribed genes, as described in our study, is the beginning of a more complete understanding of kidney cancer.

Subject(s)

Carcinoma, Renal Cell/genetics , Gene Expression Profiling/methods , Kidney Neoplasms/genetics , Transcription, Genetic , Gene Expression Regulation, Neoplastic , Humans

12.

Minimum information about a microarray experiment (MIAME)-toward standards for microarray data.

Brazma, A; Hingamp, P; Quackenbush, J; Sherlock, G; Spellman, P; Stoeckert, C; Aach, J; Ansorge, W; Ball, C A; Causton, H C; Gaasterland, T; Glenisson, P; Holstege, F C; Kim, I F; Markowitz, V; Matese, J C; Parkinson, H; Robinson, A; Sarkans, U; Schulze-Kremer, S; Stewart, J; Taylor, R; Vilo, J; Vingron, M.

Nat Genet ; 29(4): 365-71, 2001 Dec.

Article in English | MEDLINE | ID: mdl-11726920

ABSTRACT

Microarray analysis has become a widely used tool for the generation of gene expression data on a genomic scale. Although many significant results have been derived from microarray studies, one limitation has been the lack of standards for presenting and exchanging such data. Here we present a proposal, the Minimum Information About a Microarray Experiment (MIAME), that describes the minimum information required to ensure that microarray data can be easily interpreted and that results derived from its analysis can be independently verified. The ultimate goal of this work is to establish a standard for recording and reporting microarray-based gene expression data, which will in turn facilitate the establishment of databases and public repositories and enable the development of data analysis tools. With respect to MIAME, we concentrate on defining the content and structure of the necessary information rather than the technical format for capturing it.

Subject(s)

Computational Biology , Oligonucleotide Array Sequence Analysis/standards , Gene Expression Profiling/methods

13.

Phylogenetic information improves homology detection.

Rehmsmeier, M; Vingron, M.

Proteins ; 45(4): 360-71, 2001 Dec 01.

Article in English | MEDLINE | ID: mdl-11746684

ABSTRACT

We present a database search method that is based on phylogenetic trees (treesearch). The method is used to search a protein sequence database for homologs to a protein family. In preparation for the search, a phylogenetic tree is constructed from a given multiple alignment of the family. During the search, each database sequence is temporarily inserted into the tree, thus adding a new edge to the tree. Homology between family and sequence is then judged from the length of this edge. In a comparison of our method to profiles (ISREC pfsearch), two implementations of hidden Markov models (HMMER hmmsearch and SAM hmmscore), and to the family pairwise search (FPS) method on 43 families from the SCOP database based on minimum false-positive counts (min-FPCs), we found a considerable gain in sensitivity. In 69% of the test cases, treesearch showed a min-FPC of at most 50, whereas the two second best methods (hmmsearch and FPS) showed this performance only in 53% cases. A similar impression holds for a large range of min-FPC thresholds. The results demonstrate that phylogenetic information can significantly improve the detection of distant homologies and justify our method as a useful alternative to existing methods.

Subject(s)

Phylogeny , Sequence Homology, Amino Acid , Algorithms , Animals , Databases, Protein , Humans , Methods , Proteins/chemistry , Proteins/genetics , Sequence Alignment

14.

Identification and classification of differentially expressed genes in renal cell carcinoma by expression profiling on a global human 31,500-element cDNA array.

Boer, J M; Huber, W K; Sültmann, H; Wilmer, F; von Heydebreck, A; Haas, S; Korn, B; Gunawan, B; Vente, A; Füzesi, L; Vingron, M; Poustka, A.

Genome Res ; 11(11): 1861-70, 2001 Nov.

Article in English | MEDLINE | ID: mdl-11691851

ABSTRACT

We investigated the changes in gene expression accompanying the development and progression of kidney cancer by use of 31,500-element complementary DNA arrays. We measured expression profiles for paired neoplastic and noncancerous renal epithelium samples from 37 individuals. Using an experimental design optimized for factoring out technological and biological noise, and an adapted statistical test, we found 1738 differentially expressed cDNAs with an expected number of six false positives. Functional annotation of these genes provided views of the changes in the activities of specific biological pathways in renal cancer. Cell adhesion, signal transduction, and nucleotide metabolism were among the biological processes with a large proportion of genes overexpressed in renal cell carcinoma. Down-regulated pathways in the kidney tumor cells included small molecule transport, ion homeostasis, and oxygen and radical metabolism. Our expression profiling data uncovered gene expression changes shared with other epithelial tumors, as well as a unique signature for renal cell carcinoma. [Expression data for the differentially expressed cDNAs are available as a Web supplement at http://www.dkfz-heidelberg.de/abt0840/whuber/rcc.]

Subject(s)

Carcinoma, Renal Cell/classification , Carcinoma, Renal Cell/genetics , Gene Expression Profiling/methods , Gene Expression Regulation, Neoplastic/genetics , Kidney Neoplasms/classification , Kidney Neoplasms/genetics , Oligonucleotide Array Sequence Analysis/methods , Clone Cells , Down-Regulation/genetics , Genes, Neoplasm/genetics , Humans , Organ Specificity/genetics , Signal Transduction/genetics

15.

Correspondence analysis applied to microarray data.

Fellenberg, K; Hauser, N C; Brors, B; Neutzner, A; Hoheisel, J D; Vingron, M.

Proc Natl Acad Sci U S A ; 98(19): 10781-6, 2001 Sep 11.

Article in English | MEDLINE | ID: mdl-11535808

ABSTRACT

Correspondence analysis is an explorative computational method for the study of associations between variables. Much like principal component analysis, it displays a low-dimensional projection of the data, e.g., into a plane. It does this, though, for two variables simultaneously, thus revealing associations between them. Here, we demonstrate the applicability of correspondence analysis to and high value for the analysis of microarray data, displaying associations between genes and experiments. To introduce the method, we show its application to the well-known Saccharomyces cerevisiae cell-cycle synchronization data by Spellman et al. [Spellman, P. T., Sherlock, G., Zhang, M. Q., Iyer, V. R., Anders, K., Eisen, M. B., Brown, P. O., Botstein, D. & Futcher, B. (1998) Mol. Biol. Cell 9, 3273-3297], allowing for comparison with their visualization of this data set. Furthermore, we apply correspondence analysis to a non-time-series data set of our own, thus supporting its general applicability to microarray data of different complexity, underlying structure, and experimental strategy (both two-channel fluorescence-tag and radioactive labeling).

Subject(s)

Data Interpretation, Statistical , Gene Expression , Oligonucleotide Array Sequence Analysis/methods , Protein Tyrosine Phosphatases , Saccharomyces cerevisiae Proteins , Transcription, Genetic , Cell Cycle , Cell Cycle Proteins/genetics , Saccharomyces cerevisiae/genetics

16.

Identifying splits with clear separation: a new class discovery method for gene expression data.

von Heydebreck, A; Huber, W; Poustka, A; Vingron, M.

Bioinformatics ; 17 Suppl 1: S107-14, 2001.

Article in English | MEDLINE | ID: mdl-11472999

ABSTRACT

We present a new class discovery method for microarray gene expression data. Based on a collection of gene expression profiles from different tissue samples, the method searches for binary class distinctions in the set of samples that show clear separation in the expression levels of specific subsets of genes. Several mutually independent class distinctions may be found, which is difficult to obtain from most commonly used clustering algorithms. Each class distinction can be biologically interpreted in terms of its supporting genes. The mathematical characterization of the favored class distinctions is based on statistical concepts. By analyzing three data sets from cancer gene expression studies, we demonstrate that our method is able to detect biologically relevant structures, for example cancer subtypes, in an unsupervised fashion.

Subject(s)

Algorithms , Gene Expression Profiling/statistics & numerical data , Computational Biology , Databases, Factual , Gene Expression , Humans , Leukemia/genetics , Melanoma/genetics , Oligonucleotide Array Sequence Analysis/statistics & numerical data , Oncogenes

17.

Bioinformatics needs to adopt statistical thinking.

Vingron, M.

Bioinformatics ; 17(5): 389-90, 2001 May.

Article in English | MEDLINE | ID: mdl-11331232

Subject(s)

Computational Biology/statistics & numerical data , Biometry , Computational Biology/trends , Data Interpretation, Statistical , Databases, Factual , Humans , Sequence Alignment/statistics & numerical data

18.

Limits of homology detection by pairwise sequence comparison.

Spang, R; Vingron, M.

Bioinformatics ; 17(4): 338-42, 2001 Apr.

Article in English | MEDLINE | ID: mdl-11301302

ABSTRACT

MOTIVATION: Noise in database searches resulting from random sequence similarities increases as the databases expand rapidly. The noise problems are not a technical shortcoming of the database search programs, but a logical consequence of the idea of homology searches. The effect can be observed in simulation experiments. RESULTS: We have investigated noise levels in pairwise alignment based database searches. The noise levels of 38 releases of the SwissProt database, display perfect logarithmic growth with the total length of the databases. Clustering of real biological sequences reduces noise levels, but the effect is marginal.

Subject(s)

Databases, Factual , Proteins/analysis , Sequence Alignment , Sequence Homology, Nucleic Acid , Computer Simulation , Mathematical Computing , Models, Statistical

19.

Contig selection in physical mapping.

Heber, S; Stoye, J; Frohme, M; Hoheisel, J; Vingron, M.

J Comput Biol ; 7(3-4): 395-408, 2000.

Article in English | MEDLINE | ID: mdl-11108470

ABSTRACT

In physical mapping, one orders a set of genetic landmarks or a library of cloned fragments of DNA according to their position in the genome. Our approach to physical mapping divides the problem into smaller and easier subproblems by partitioning the probe set into independent parts (probe contigs). For this purpose we introduce a new distance function between probes, the averaged rank distance (ARD) derived from bootstrap resampling of the raw data. The ARD measures the pairwise distances of probes within a contig and smoothes the distances of probes across different contigs. It shows distinct jumps at contig borders. This makes it appropriate for contig selection by clustering. We have designed a physical mapping algorithm that makes use of these observations and seems to be particularly well suited to the delineation of reliable contigs. We evaluated our method on data sets from two physical mapping projects. On data from the recently sequenced bacterium Xylella fastidiosa, the probe contig set produced by the new method was evaluated using the probe order derived from the sequence information. Our approach yielded a basically correct contig set. On this data we also compared our method to an approach which uses the number of supporting clones to determine contigs. Our map is much more accurate. In comparison to a physical map of Pasteurella haemolytica that was computed using simulated annealing, the newly computed map is considerably cleaner. The results of our method have already proven helpful for the design of experiments aimed at further improving the quality of a map.

Subject(s)

Algorithms , Contig Mapping/statistics & numerical data , Cluster Analysis , Computational Biology , DNA, Bacterial/genetics , Databases, Factual , Gammaproteobacteria/genetics , Mannheimia haemolytica/genetics

20.

Application of bootstrap techniques to physical mapping.

Heber, S; Hoheisel, J; Vingron, M.

Genomics ; 69(2): 235-41, 2000 Oct 15.

Article in English | MEDLINE | ID: mdl-11031106

ABSTRACT

Ordering genetic markers or clones from a genomic library into a physical map is a central problem in genetics. In the presence of errors, there is no efficient algorithm known that solves this problem. Based on a standard heuristic algorithm for it, we present a method to construct a confidence neighborhood for a computed solution. We compute a confidence value for putative local solutions derived from bootstrap replicates of the original solution. In the reliable parts, the confidence neighborhood and the computed solution tend to coincide. In regions that are ill-defined by the data, the neighborhood contains additional reasonable alternatives. This offers the possibility of designing further experiments for the badly defined regions to improve the quality of the physical map. We analyze our approach by a simulation study and by application to a dataset of the genome of the bacterium Xylella fastidiosa.

Subject(s)

Algorithms , Physical Chromosome Mapping/methods , Confidence Intervals , Genome, Bacterial , Gram-Negative Bacteria/genetics

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL