Search | VHL Regional Portal

1.

Ultra-sensitive molecular residual disease detection through whole genome sequencing with single-read error correction.

Li, Xinxing; Liu, Tao; Bacchiocchi, Antonella; Li, Mengxing; Cheng, Wen; Wittkop, Tobias; Mendez, Fernando; Wang, Yingyu; Tang, Paul; Yao, Qianqian; Bosenberg, Marcus W; Sznol, Mario; Yan, Qin; Faham, Malek; Weng, Li; Halaban, Ruth; Jin, Hai; Hu, Zhiqian.

medRxiv ; 2024 Jan 22.

Article in English | MEDLINE | ID: mdl-38260271

ABSTRACT

While whole genome sequencing (WGS) of cell-free DNA (cfDNA) holds enormous promise for molecular residual disease (MRD) detection, its performance is limited by WGS error rate. Here we introduce AccuScan, an efficient cfDNA WGS technology that enables genome-wide error correction at single read level, achieving an error rate of 4.2×10 -7 , which is about two orders of magnitude lower than a read-centric de-noising method. When applied to MRD detection, AccuScan demonstrated analytical sensitivity down to 10 -6 circulating tumor allele fraction at 99% sample level specificity. In colorectal cancer, AccuScan showed 90% landmark sensitivity for predicting relapse. It also showed robust MRD performance with esophageal cancer using samples collected as early as 1 week after surgery, and predictive value for immunotherapy monitoring with melanoma patients. Overall, AccuScan provides a highly accurate WGS solution for MRD, empowering circulating tumor DNA detection at parts per million range without high sample input nor personalized reagents. One Sentence Summary: AccuScan showed remarkable ultra-low limit of detection with a short turnaround time, low sample requirement and a simple workflow for MRD detection.

2.

Multiplex Identification of Antigen-Specific T Cell Receptors Using a Combination of Immune Assays and Immune Receptor Sequencing.

Klinger, Mark; Pepin, Francois; Wilkins, Jen; Asbury, Thomas; Wittkop, Tobias; Zheng, Jianbiao; Moorhead, Martin; Faham, Malek.

PLoS One ; 10(10): e0141561, 2015.

Article in English | MEDLINE | ID: mdl-26509579

ABSTRACT

Monitoring antigen-specific T cells is critical for the study of immune responses and development of biomarkers and immunotherapeutics. We developed a novel multiplex assay that combines conventional immune monitoring techniques and immune receptor repertoire sequencing to enable identification of T cells specific to large numbers of antigens simultaneously. We multiplexed 30 different antigens and identified 427 antigen-specific clonotypes from 5 individuals with frequencies as low as 1 per million T cells. The clonotypes identified were validated several ways including repeatability, concordance with published clonotypes, and high correlation with ELISPOT. Applying this technology we have shown that the vast majority of shared antigen-specific clonotypes identified in different individuals display the same specificity. We also showed that shared antigen-specific clonotypes are simpler sequences and are present at higher frequencies compared to non-shared clonotypes specific to the same antigen. In conclusion this technology enables sensitive and quantitative monitoring of T cells specific for hundreds or thousands of antigens simultaneously allowing the study of T cell responses with an unprecedented resolution and scale.

Subject(s)

Enzyme-Linked Immunospot Assay , Epitopes, T-Lymphocyte/immunology , High-Throughput Nucleotide Sequencing , Receptors, Antigen, T-Cell/genetics , Receptors, Immunologic/genetics , T-Cell Antigen Receptor Specificity/genetics , T-Cell Antigen Receptor Specificity/immunology , Clonal Evolution/genetics , Clonal Evolution/immunology , Enzyme-Linked Immunospot Assay/methods , Enzyme-Linked Immunospot Assay/standards , Humans , Reproducibility of Results

3.

Genome and proteome annotation using automatically recognized concepts and functional networks.

Bivol, Adrian; Wittkop, Tobias; Davis, Darcy; Mooney, Sean D.

AMIA Jt Summits Transl Sci Proc ; 2013: 26, 2013.

Article in English | MEDLINE | ID: mdl-24303290

ABSTRACT

Many tools have been developed for prediction of the function or disease association of genes and proteins, and this continues to be a highly active area of bioinformatics research. Typically, these methods predict which concepts should be annotated to genes or proteins, using terms from ontologies such as Gene Ontology (GO), largely overlooking other ontologies that are available. Here, we set out to broadly evaluate novel, automatically retrieved, gene-term annotations and identify those concepts of publicly available ontologies that can be predicted using a generalized tool for prediction of annotations. We identified terms that perform better than expected by chance using randomly generated gene sets and show that both manually curated terms in GO and automatically recognized terms can be used to develop reasonable predictive models. In all, we characterize terms in over 250 ontologies and identify more than 127,000 statistically significant terms that can be predicted on human genes.

4.

STOP using just GO: a multi-ontology hypothesis generation tool for high throughput experimentation.

Wittkop, Tobias; TerAvest, Emily; Evani, Uday S; Fleisch, K Mathew; Berman, Ari E; Powell, Corey; Shah, Nigam H; Mooney, Sean D.

BMC Bioinformatics ; 14: 53, 2013 Feb 14.

Article in English | MEDLINE | ID: mdl-23409969

ABSTRACT

BACKGROUND: Gene Ontology (GO) enrichment analysis remains one of the most common methods for hypothesis generation from high throughput datasets. However, we believe that researchers strive to test other hypotheses that fall outside of GO. Here, we developed and evaluated a tool for hypothesis generation from gene or protein lists using ontological concepts present in manually curated text that describes those genes and proteins. RESULTS: As a consequence we have developed the method Statistical Tracking of Ontological Phrases (STOP) that expands the realm of testable hypotheses in gene set enrichment analyses by integrating automated annotations of genes to terms from over 200 biomedical ontologies. While not as precise as manually curated terms, we find that the additional enriched concepts have value when coupled with traditional enrichment analyses using curated terms. CONCLUSION: Multiple ontologies have been developed for gene and protein annotation, by using a dataset of both manually curated GO terms and automatically recognized concepts from curated text we can expand the realm of hypotheses that can be discovered. The web application STOP is available at http://mooneygroup.org/stop/.

Subject(s)

Genes , Molecular Sequence Annotation , Proteins , Software , Vocabulary, Controlled , Humans , Huntington Disease/genetics , Huntington Disease/metabolism , Internet , Parkinson Disease/genetics , Parkinson Disease/metabolism , Protein Interaction Mapping

5.

A large-scale evaluation of computational protein function prediction.

Radivojac, Predrag; Clark, Wyatt T; Oron, Tal Ronnen; Schnoes, Alexandra M; Wittkop, Tobias; Sokolov, Artem; Graim, Kiley; Funk, Christopher; Verspoor, Karin; Ben-Hur, Asa; Pandey, Gaurav; Yunes, Jeffrey M; Talwalkar, Ameet S; Repo, Susanna; Souza, Michael L; Piovesan, Damiano; Casadio, Rita; Wang, Zheng; Cheng, Jianlin; Fang, Hai; Gough, Julian; Koskinen, Patrik; Törönen, Petri; Nokso-Koivisto, Jussi; Holm, Liisa; Cozzetto, Domenico; Buchan, Daniel W A; Bryson, Kevin; Jones, David T; Limaye, Bhakti; Inamdar, Harshal; Datta, Avik; Manjari, Sunitha K; Joshi, Rajendra; Chitale, Meghana; Kihara, Daisuke; Lisewski, Andreas M; Erdin, Serkan; Venner, Eric; Lichtarge, Olivier; Rentzsch, Robert; Yang, Haixuan; Romero, Alfonso E; Bhat, Prajwal; Paccanaro, Alberto; Hamp, Tobias; Kaßner, Rebecca; Seemayer, Stefan; Vicedo, Esmeralda; Schaefer, Christian.

Nat Methods ; 10(3): 221-7, 2013 Mar.

Article in English | MEDLINE | ID: mdl-23353650

ABSTRACT

Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be high. Here we report the results from the first large-scale community-based critical assessment of protein function annotation (CAFA) experiment. Fifty-four methods representing the state of the art for protein function prediction were evaluated on a target set of 866 proteins from 11 organisms. Two findings stand out: (i) today's best protein function prediction algorithms substantially outperform widely used first-generation methods, with large gains on all types of targets; and (ii) although the top methods perform well enough to guide experiments, there is considerable need for improvement of currently available tools.

Subject(s)

Computational Biology/methods , Molecular Biology/methods , Molecular Sequence Annotation , Proteins/physiology , Algorithms , Animals , Databases, Protein , Exoribonucleases/classification , Exoribonucleases/genetics , Exoribonucleases/physiology , Forecasting , Humans , Proteins/chemistry , Proteins/classification , Proteins/genetics , Species Specificity

6.

Density parameter estimation for finding clusters of homologous proteins--tracing actinobacterial pathogenicity lifestyles.

Röttger, Richard; Kalaghatgi, Prabhav; Sun, Peng; Soares, Siomar de Castro; Azevedo, Vasco; Wittkop, Tobias; Baumbach, Jan.

Bioinformatics ; 29(2): 215-22, 2013 Jan 15.

Article in English | MEDLINE | ID: mdl-23142964

ABSTRACT

MOTIVATION: Homology detection is a long-standing challenge in computational biology. To tackle this problem, typically all-versus-all BLAST results are coupled with data partitioning approaches resulting in clusters of putative homologous proteins. One of the main problems, however, has been widely neglected: all clustering tools need a density parameter that adjusts the number and size of the clusters. This parameter is crucial but hard to estimate without gold standard data at hand. Developing a gold standard, however, is a difficult and time consuming task. Having a reliable method for detecting clusters of homologous proteins between a huge set of species would open opportunities for better understanding the genetic repertoire of bacteria with different lifestyles. RESULTS: Our main contribution is a method for identifying a suitable and robust density parameter for protein homology detection without a given gold standard. Therefore, we study the core genome of 89 actinobacteria. This allows us to incorporate background knowledge, i.e. the assumption that a set of evolutionarily closely related species should share a comparably high number of evolutionarily conserved proteins (emerging from phylum-specific housekeeping genes). We apply our strategy to find genes/proteins that are specific for certain actinobacterial lifestyles, i.e. different types of pathogenicity. The whole study was performed with transitivity clustering, as it only requires a single intuitive density parameter and has been shown to be well applicable for the task of protein sequence clustering. Note, however, that the presented strategy generally does not depend on our clustering method but can easily be adapted to other clustering approaches. AVAILABILITY: All results are publicly available at http://transclust.mmci.uni-saarland.de/actino_core/ or as Supplementary Material of this article. CONTACT: roettger@mpi-inf.mpg.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Actinobacteria/classification , Bacterial Proteins/chemistry , Sequence Homology, Amino Acid , Actinobacteria/genetics , Actinobacteria/pathogenicity , Algorithms , Bacterial Proteins/genetics , Cluster Analysis , Genome, Bacterial , Models, Genetic , Phylogeny , Sequence Alignment

7.

Genetic correction of Huntington's disease phenotypes in induced pluripotent stem cells.

An, Mahru C; Zhang, Ningzhe; Scott, Gary; Montoro, Daniel; Wittkop, Tobias; Mooney, Sean; Melov, Simon; Ellerby, Lisa M.

Cell Stem Cell ; 11(2): 253-63, 2012 Aug 03.

Article in English | MEDLINE | ID: mdl-22748967

ABSTRACT

Huntington's disease (HD) is caused by a CAG expansion in the huntingtin gene. Expansion of the polyglutamine tract in the huntingtin protein results in massive cell death in the striatum of HD patients. We report that human induced pluripotent stem cells (iPSCs) derived from HD patient fibroblasts can be corrected by the replacement of the expanded CAG repeat with a normal repeat using homologous recombination, and that the correction persists in iPSC differentiation into DARPP-32-positive neurons in vitro and in vivo. Further, correction of the HD-iPSCs normalized pathogenic HD signaling pathways (cadherin, TGF-ß, BDNF, and caspase activation) and reversed disease phenotypes such as susceptibility to cell death and altered mitochondrial bioenergetics in neural stem cells. The ability to make patient-specific, genetically corrected iPSCs from HD patients will provide relevant disease models in identical genetic backgrounds and is a critical step for the eventual use of these cells in cell replacement therapy.

Subject(s)

Huntington Disease/genetics , Huntington Disease/pathology , Induced Pluripotent Stem Cells/cytology , Induced Pluripotent Stem Cells/metabolism , Cell Differentiation , Cells, Cultured , Humans , Induced Pluripotent Stem Cells/pathology , Phenotype

8.

DEFOG: discrete enrichment of functionally organized genes.

Wittkop, Tobias; Berman, Ari E; Fleisch, K Mathew; Mooney, Sean D.

Integr Biol (Camb) ; 4(7): 795-804, 2012 Jul.

Article in English | MEDLINE | ID: mdl-22706384

ABSTRACT

High-throughput biological experiments commonly result in a list of genes or proteins of interest. In order to understand the observed changes of the genes and to generate new hypotheses, one needs to understand the functions and roles of the genes and how those functions relate to the experimental conditions. Typically, statistical tests are performed in order to detect enriched Gene Ontology categories or pathways, i.e. the categories are observed in the genes of interest more often than is expected by chance. Depending on the number of genes and the complexity and quantity of functions in which they are involved, such an analysis can easily result in hundreds of enriched terms. To this end we developed DEFOG, a web-based application that facilitates the functional analysis of gene sets by hierarchically organizing the genes into functionally related modules. Our computational pipeline utilizes three powerful tools to achieve this goal: (1) GeneMANIA creates a functional consensus network of the genes of interest based on gene-list-specific data fusion of hundreds of genomic networks from publicly available sources; (2) Transitivity Clustering organizes those genes into a clear hierarchy of functionally related groups, and (3) Ontologizer performs a Gene Ontology enrichment analysis on the resulting gene clusters. DEFOG integrates this computational pipeline within an easy-to-use web interface, thus allowing for a novel visual analysis of gene sets that aids in the discovery of potentially important biological mechanisms and facilitates the creation of new hypotheses. DEFOG is available at http://www.mooneygroup.org/defog.

Subject(s)

Cluster Analysis , Computational Biology/methods , Databases, Genetic , Genomics/methods , Aging/genetics , Algorithms , Animals , Computer Graphics , Gene Expression Profiling/methods , Gene Regulatory Networks , Humans , Internet , Multigene Family , Oligonucleotide Array Sequence Analysis , Software

9.

clusterMaker: a multi-algorithm clustering plugin for Cytoscape.

Morris, John H; Apeltsin, Leonard; Newman, Aaron M; Baumbach, Jan; Wittkop, Tobias; Su, Gang; Bader, Gary D; Ferrin, Thomas E.

BMC Bioinformatics ; 12: 436, 2011 Nov 09.

Article in English | MEDLINE | ID: mdl-22070249

ABSTRACT

BACKGROUND: In the post-genomic era, the rapid increase in high-throughput data calls for computational tools capable of integrating data of diverse types and facilitating recognition of biologically meaningful patterns within them. For example, protein-protein interaction data sets have been clustered to identify stable complexes, but scientists lack easily accessible tools to facilitate combined analyses of multiple data sets from different types of experiments. Here we present clusterMaker, a Cytoscape plugin that implements several clustering algorithms and provides network, dendrogram, and heat map views of the results. The Cytoscape network is linked to all of the other views, so that a selection in one is immediately reflected in the others. clusterMaker is the first Cytoscape plugin to implement such a wide variety of clustering algorithms and visualizations, including the only implementations of hierarchical clustering, dendrogram plus heat map visualization (tree view), k-means, k-medoid, SCPS, AutoSOME, and native (Java) MCL. RESULTS: Results are presented in the form of three scenarios of use: analysis of protein expression data using a recently published mouse interactome and a mouse microarray data set of nearly one hundred diverse cell/tissue types; the identification of protein complexes in the yeast Saccharomyces cerevisiae; and the cluster analysis of the vicinal oxygen chelate (VOC) enzyme superfamily. For scenario one, we explore functionally enriched mouse interactomes specific to particular cellular phenotypes and apply fuzzy clustering. For scenario two, we explore the prefoldin complex in detail using both physical and genetic interaction clusters. For scenario three, we explore the possible annotation of a protein as a methylmalonyl-CoA epimerase within the VOC superfamily. Cytoscape session files for all three scenarios are provided in the Additional Files section. CONCLUSIONS: The Cytoscape plugin clusterMaker provides a number of clustering algorithms and visualizations that can be used independently or in combination for analysis and visualization of biological data sets, and for confirming or generating hypotheses about biological function. Several of these visualizations and algorithms are only available to Cytoscape users through the clusterMaker plugin. clusterMaker is available via the Cytoscape plugin manager.

Subject(s)

Algorithms , Saccharomyces cerevisiae/genetics , Software , Animals , Cluster Analysis , Genomics , Mice , Protein Interaction Maps , Racemases and Epimerases/genetics , Saccharomyces cerevisiae/enzymology

10.

Comprehensive cluster analysis with Transitivity Clustering.

Wittkop, Tobias; Emig, Dorothea; Truss, Anke; Albrecht, Mario; Böcker, Sebastian; Baumbach, Jan.

Nat Protoc ; 6(3): 285-95, 2011 Mar.

Article in English | MEDLINE | ID: mdl-21372810

ABSTRACT

Transitivity Clustering is a method for the partitioning of biological data into groups of similar objects, such as genes, for instance. It provides integrated access to various functions addressing each step of a typical cluster analysis. To facilitate this, Transitivity Clustering is accessible online and offers three user-friendly interfaces: a powerful stand-alone version, a web interface, and a collection of Cytoscape plug-ins. In this paper, we describe three major workflows: (i) protein (super)family detection with Cytoscape, (ii) protein homology detection with incomplete gold standards and (iii) clustering of gene expression data. This protocol guides the user through the most important features of Transitivity Clustering and takes â¼1 h to complete.

Subject(s)

Cluster Analysis , Computational Biology/methods , Pattern Recognition, Automated/methods , Sequence Alignment/methods , Software , Databases, Nucleic Acid , Databases, Protein , Gene Expression Profiling , Internet , Molecular Sequence Data , Sequence Analysis/methods , Sequence Homology , User-Computer Interface

11.

Partitioning biological data with transitivity clustering.

Wittkop, Tobias; Emig, Dorothea; Lange, Sita; Rahmann, Sven; Albrecht, Mario; Morris, John H; Böcker, Sebastian; Stoye, Jens; Baumbach, Jan.

Nat Methods ; 7(6): 419-20, 2010 Jun.

Article in English | MEDLINE | ID: mdl-20508635

Subject(s)

Cluster Analysis , Data Interpretation, Statistical , Animals , Humans

12.

Efficient online transcription factor binding site adjustment by integrating transitive graph projection with MoRAine 2.0.

Wittkop, Tobias; Rahmann, Sven; Baumbach, Jan.

J Integr Bioinform ; 7(3)2010 Mar 25.

Article in English | MEDLINE | ID: mdl-20375458

ABSTRACT

UNLABELLED: We investigated the problem of imprecisely determined prokaryotic transcription factor (TF) binding sites (TFBSs). We found that the identification and reinvestigation of questionable binding motifs may result in improved models of these motifs. Subsequent modelbased predictions of gene regulatory interactions may be performed with increased accuracy when the TFBSs annotation underlying these models has been re-adjusted. We present MoRAine 2.0, a significantly improved version of MoRAine. It can automatically identify cases of unfavorable TFBS strand annotations and imprecisely determined TFBS positions. With release 2.0, we close the gap between reasonable running time and high accuracy. Furthermore, it requires only minimal input from the user: (1) the input TFBS sequences and (2) the length of the flanking sequences. CONCLUSIONS: MoRAine 2.0 is an easy-to-use, integrated, and publicly available web tool for the re-annotation of questionable TFBSs. It can be used online or downloaded as a stand-alone version from http://moraine.cebitec.uni-bielefeld.de.

Subject(s)

Computational Biology/methods , Internet , Software , Transcription Factors/metabolism , Base Sequence , Binding Sites , Molecular Sequence Data , Position-Specific Scoring Matrices

13.

Integrated analysis and reconstruction of microbial transcriptional gene regulatory networks using CoryneRegNet.

Baumbach, Jan; Wittkop, Tobias; Kleindt, Christiane Katja; Tauch, Andreas.

Nat Protoc ; 4(6): 992-1005, 2009.

Article in English | MEDLINE | ID: mdl-19498379

ABSTRACT

CoryneRegNet is the reference database and analysis platform for corynebacterial gene regulatory networks. It provides web-based access to integrated data on gene regulatory interactions of corynebacteria relevant to human medicine and biotechnology, Escherichia coli and Mycobacterium tuberculosis. To facilitate the analysis and reconstruction of the corresponding networks, CoryneRegNet provides user-friendly interfaces for bioinformatics analysis and network visualization tools. This protocol describes four major workflows: (1) querying the regulatory network of a gene of interest, (2) prediction and interspecies transfer of gene regulatory interactions, (3) visualization and comparison of predicted or known networks and (4) integration of gene expression data analysis and visualization. This protocol guides the user through the most important features of CoryneRegNet and takes 45-60 min to complete.

Subject(s)

Computational Biology/methods , Corynebacterium/genetics , Databases, Genetic , Gene Expression Regulation, Bacterial , Gene Regulatory Networks , Internet , Transcription Factors/genetics , Transcription, Genetic , User-Computer Interface

14.

MoRAine--a web server for fast computational transcription factor binding motif re-annotation.

Baumbach, Jan; Wittkop, Tobias; Weile, Jochen; Kohl, Thomas; Rahmann, Sven.

J Integr Bioinform ; 5(2)2008 Aug 25.

Article in English | MEDLINE | ID: mdl-20134062

ABSTRACT

BACKGROUND: A precise experimental identification of transcription factor binding motifs (TFBMs), accurate to a single base pair, is time-consuming and diffcult. For several databases, TFBM annotations are extracted from the literature and stored 5' --> 3' relative to the target gene. Mixing the two possible orientations of a motif results in poor information content of subsequently computed position frequency matrices (PFMs) and sequence logos. Since these PFMs are used to predict further TFBMs, we address the question if the TFBMs underlying a PFM can be re-annotated automatically to improve both the information content of the PFM and subsequent classification performance. RESULTS: We present MoRAine, an algorithm that re-annotates transcription factor binding motifs. Each motif with experimental evidence underlying a PFM is compared against each other such motif. The goal is to re-annotate TFBMs by possibly switching their strands and shifting them a few positions in order to maximize the information content of the resulting adjusted PFM. We present two heuristic strategies to perform this optimization and subsequently show that MoRAine significantly improves the corresponding sequence logos. Furthermore, we justify the method by evaluating specificity, sensitivity, true positive, and false positive rates of PFM-based TFBM predictions for E. coli using the original database motifs and the MoRAine-adjusted motifs. The classification performance is considerably increased if MoRAine is used as a preprocessing step. CONCLUSIONS: MoRAine is integrated into a publicly available web server and can be used online or downloaded as a stand-alone version from http://moraine.cebitec. uni-bielefeld.de.

Subject(s)

Internet , Software , Transcription Factors/chemistry , Transcription Factors/metabolism , Algorithms , Base Sequence , Binding Sites , Molecular Sequence Data

15.

Large scale clustering of protein sequences with FORCE -A layout based heuristic for weighted cluster editing.

Wittkop, Tobias; Baumbach, Jan; Lobo, Francisco P; Rahmann, Sven.

BMC Bioinformatics ; 8: 396, 2007 Oct 17.

Article in English | MEDLINE | ID: mdl-17941985

ABSTRACT

BACKGROUND: Detecting groups of functionally related proteins from their amino acid sequence alone has been a long-standing challenge in computational genome research. Several clustering approaches, following different strategies, have been published to attack this problem. Today, new sequencing technologies provide huge amounts of sequence data that has to be efficiently clustered with constant or increased accuracy, at increased speed. RESULTS: We advocate that the model of weighted cluster editing, also known as transitive graph projection is well-suited to protein clustering. We present the FORCE heuristic that is based on transitive graph projection and clusters arbitrary sets of objects, given pairwise similarity measures. In particular, we apply FORCE to the problem of protein clustering and show that it outperforms the most popular existing clustering tools (Spectral clustering, TribeMCL, GeneRAGE, Hierarchical clustering, and Affinity Propagation). Furthermore, we show that FORCE is able to handle huge datasets by calculating clusters for all 192 187 prokaryotic protein sequences (66 organisms) obtained from the COG database. Finally, FORCE is integrated into the corynebacterial reference database CoryneRegNet. CONCLUSION: FORCE is an applicable alternative to existing clustering algorithms. Its theoretical foundation, weighted cluster editing, can outperform other clustering paradigms on protein homology clustering. FORCE is open source and implemented in Java. The software, including the source code, the clustering results for COG and CoryneRegNet, and all evaluation datasets are available at http://gi.cebitec.uni-bielefeld.de/comet/force/.

Subject(s)

Algorithms , Cluster Analysis , Pattern Recognition, Automated/methods , Proteins/chemistry , Sequence Alignment/methods , Sequence Analysis, Protein/methods , Software , Amino Acid Sequence , Artificial Intelligence , Molecular Sequence Data

16.

Exact and heuristic algorithms for weighted cluster editing.

Rahmann, Sven; Wittkop, Tobias; Baumbach, Jan; Martin, Marcel; Truss, Anke; Böcker, Sebastian.

Comput Syst Bioinformatics Conf ; 6: 391-401, 2007.

Article in English | MEDLINE | ID: mdl-17951842

ABSTRACT

Clustering objects according to given similarity or distance values is a ubiquitous problem in computational biology with diverse applications, e.g., in defining families of orthologous genes, or in the analysis of microarray experiments. While there exists a plenitude of methods, many of them produce clusterings that can be further improved. "Cleaning up" initial clusterings can be formalized as projecting a graph on the space of transitive graphs; it is also known as the cluster editing or cluster partitioning problem in the literature. In contrast to previous work on cluster editing, we allow arbitrary weights on the similarity graph. To solve the so-defined weighted transitive graph projection problem, we present (1) the first exact fixed-parameter algorithm, (2) a polynomial-time greedy algorithm that returns the optimal result on a well-defined subset of "close-to-transitive" graphs and works heuristically on other graphs, and (3) a fast heuristic that uses ideas similar to those from the Fruchterman-Reingold graph layout algorithm. We compare quality and running times of these algorithms on both artificial graphs and protein similarity graphs derived from the 66 organisms of the COG dataset.

Subject(s)

Algorithms , Cluster Analysis , Documentation/methods , Gene Expression Profiling/methods , Oligonucleotide Array Sequence Analysis/methods , Pattern Recognition, Automated/methods

17.

CoryneRegNet 3.0--an interactive systems biology platform for the analysis of gene regulatory networks in corynebacteria and Escherichia coli.

Baumbach, Jan; Wittkop, Tobias; Rademacher, Katrin; Rahmann, Sven; Brinkrolf, Karina; Tauch, Andreas.

J Biotechnol ; 129(2): 279-89, 2007 Apr 30.

Article in English | MEDLINE | ID: mdl-17229482

ABSTRACT

CoryneRegNet is an ontology-based data warehouse for the reconstruction and visualization of transcriptional regulatory interactions in prokaryotes. To extend the biological content of CoryneRegNet, we added comprehensive data on transcriptional regulations in the model organism Escherichia coli K-12, originally deposited in the international reference database RegulonDB. The enhanced web interface of CoryneRegNet offers several types of search options. The results of a search are displayed in a table-based style and include a visualization of the genetic organization of the respective gene region. Information on DNA binding sites of transcriptional regulators is depicted by sequence logos. The results can also be displayed by several layouters implemented in the graphical user interface GraphVis, allowing, for instance, the visualization of genome-wide network reconstructions and the homology-based inter-species comparison of reconstructed gene regulatory networks. In an application example, we compare the composition of the gene regulatory networks involved in the SOS response of E. coli and Corynebacterium glutamicum. CoryneRegNet is available at the following URL: http://www.cebitec.uni-bielefeld.de/groups/gi/software/coryneregnet/.

Subject(s)

Corynebacterium glutamicum/genetics , Databases, Genetic , Escherichia coli/genetics , Gene Regulatory Networks/genetics , Systems Biology , Gene Expression Regulation

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL