Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 15 de 15
Filter
1.
Bioinformatics ; 30(1): 50-60, 2014 Jan 01.
Article in English | MEDLINE | ID: mdl-24177718

ABSTRACT

MOTIVATION: Several cancer types consist of multiple genetically and phenotypically distinct subpopulations. The underlying mechanism for this intra-tumoral heterogeneity can be explained by the clonal evolution model, whereby growth advantageous mutations cause the expansion of cancer cell subclones. The recurrent phenotype of many cancers may be a consequence of these coexisting subpopulations responding unequally to therapies. Methods to computationally infer tumor evolution and subpopulation diversity are emerging and they hold the promise to improve the understanding of genetic and molecular determinants of recurrence. RESULTS: To address cellular subpopulation dynamics within human tumors, we developed a bioinformatic method, EXPANDS. It estimates the proportion of cells harboring specific mutations in a tumor. By modeling cellular frequencies as probability distributions, EXPANDS predicts mutations that accumulate in a cell before its clonal expansion. We assessed the performance of EXPANDS on one whole genome sequenced breast cancer and performed SP analyses on 118 glioblastoma multiforme samples obtained from TCGA. Our results inform about the extent of subclonal diversity in primary glioblastoma, subpopulation dynamics during recurrence and provide a set of candidate genes mutated in the most well-adapted subpopulations. In summary, EXPANDS predicts tumor purity and subclonal composition from sequencing data. AVAILABILITY AND IMPLEMENTATION: EXPANDS is available for download at http://code.google.com/p/expands (matlab version--used in this manuscript) and http://cran.r-project.org/web/packages/expands (R version).


Subject(s)
Gene Frequency , Glioblastoma/genetics , Ploidies , Glioblastoma/pathology , Humans , Mutation , Neoplasms/genetics , Probability , Recurrence
2.
Proteomics ; 9(10): 2740-9, 2009 May.
Article in English | MEDLINE | ID: mdl-19405022

ABSTRACT

Recent advances in experimental technologies allow for the detection of a complete cell proteome. Proteins that are expressed at a particular cell state or in a particular compartment as well as proteins with differential expression between various cells states are commonly delivered by many proteomics studies. Once a list of proteins is derived, a major challenge is to interpret the identified set of proteins in the biological context. Protein-protein interaction (PPI) data represents abundant information that can be employed for this purpose. However, these data have not yet been fully exploited due to the absence of a methodological framework that can integrate this type of information. Here, we propose to infer a network model from an experimentally identified protein list based on the available information about the topology of the global PPI network. We propose to use a Monte Carlo simulation procedure to compute the statistical significance of the inferred models. The method has been implemented as a freely available web-based tool, PPI spider (http://mips.helmholtz-muenchen.de/proj/ppispider). To support the practical significance of PPI spider, we collected several hundreds of recently published experimental proteomics studies that reported lists of proteins in various biological contexts. We reanalyzed them using PPI spider and demonstrated that in most cases PPI spider could provide statistically significant hypotheses that are helpful for understanding of the protein list.


Subject(s)
Protein Interaction Mapping/methods , Proteome/analysis , Proteomics/methods , Software , Antineoplastic Agents, Phytogenic/pharmacology , Cell Line, Tumor , Databases, Protein , Drug Resistance, Multiple , Drug Resistance, Neoplasm , Female , Genes, MDR , Humans , Influenza A Virus, H9N2 Subtype , Influenza, Human/metabolism , Internet , Models, Statistical , Monte Carlo Method , Uterine Cervical Neoplasms/chemistry , Vincristine/pharmacology
3.
Nucleic Acids Res ; 37(Web Server issue): W323-8, 2009 Jul.
Article in English | MEDLINE | ID: mdl-19420064

ABSTRACT

GeneSet2miRNA is the first web-based tool which is able to identify whether or not a gene list has a signature of miRNA-regulatory activity. As input, GeneSet2miRNA accepts a list of genes. As output, a list of miRNA-regulatory models is provided. A miRNA-regulatory model is a group of miRNAs (single, pair, triplet or quadruplet) that is predicted to regulate a significant subset of genes from the submitted list. GeneSet2miRNA provides a user friendly dialog-driven web page submission available for several model organisms. GeneSet2miRNA is freely available at http://mips.helmholtz-muenchen.de/proj/gene2mir/.


Subject(s)
Gene Expression Regulation , MicroRNAs/metabolism , Software , Animals , Genes , Humans , Internet , Mice , Rats
4.
FEBS J ; 276(7): 2084-94, 2009 Apr.
Article in English | MEDLINE | ID: mdl-19292876

ABSTRACT

High-throughput metabolomics is a dynamically developing technology that enables the mass separation of complex mixtures at very high resolution. Metabolic profiling has begun to be widely used in clinical research to study the molecular mechanisms of complex cell disorders. Similar to transcriptomics, which is capable of detecting genes at differential states, metabolomics is able to deliver a list of compounds differentially present between explored cell physiological conditions. The bioinformatics challenge lies in a statistically valid interpretation of the functional context for identified sets of metabolites. Here, we present TICL, a web tool for the automatic interpretation of lists of compounds. The major advance of TICL is that it not only provides a model of possible compound transformations related to the input list, but also implements a robust statistical framework to estimate the significance of the inferred model. The TICL web tool is freely accessible at http://mips.helmholtz-muenchen.de/proj/cmp.


Subject(s)
Metabolomics/methods , Software , Computational Biology/methods , Databases, Genetic , Gene Expression Profiling/methods , Internet
5.
J Proteome Res ; 8(3): 1193-7, 2009 Mar.
Article in English | MEDLINE | ID: mdl-19216535

ABSTRACT

The spectrum of problems covered by proteomics studies range from the discovery of compartment specific cell proteomes to clinical applications, including the identification of diagnostic markers and monitoring the effects of drug treatments. In most cases, the ultimate results of a proteomics study are lists of proteins found to be present (or differentially present) at cell physiological conditions under study. Normally, the results are published directly in the article in one or several tables. In many cases, this type of information remains disseminated in hundreds of proteomics publications. We have developed a Web mining tool which allows the collection of this information by searching through full text papers and automatically selecting tables, which report a list of protein identifiers. By searching through major proteomics journals, we have collected approximately 800 independent studies published recently, which reported about 1000 different protein lists. On the basis of this data, we developed a computational tool PLIPS (Protein Lists Identified in Proteomics Studies). PLIPS accepts as input a list of protein/gene identifiers. With the use of statistical analyses, PLIPS infers recently published proteomics studies, which report protein lists that significantly intersect with a query list. PLIPS is a freely available Web-based tool ( http://mips.helmholtz-muenchen.de/proj/plips ).


Subject(s)
Computational Biology , Databases, Protein , Software , Proteomics/methods
6.
Genome Biol ; 9(12): R179, 2008.
Article in English | MEDLINE | ID: mdl-19094223

ABSTRACT

KEGG spider is a web-based tool for interpretation of experimentally derived gene lists in order to gain understanding of metabolism variations at a genomic level. KEGG spider implements a 'pathway-free' framework that overcomes a major bottleneck of enrichment analyses: it provides global models uniting genes from different metabolic pathways. Analyzing a number of experimentally derived gene lists, we demonstrate that KEGG spider provides deeper insights into metabolism variations in comparison to existing methods.


Subject(s)
Genomics , Metabolic Networks and Pathways , Software , Animals , Gallstones/genetics , Gene Expression Regulation, Neoplastic , Humans , Liver/physiopathology , Stomach Neoplasms/genetics
7.
Comput Biol Chem ; 32(6): 412-6, 2008 Dec.
Article in English | MEDLINE | ID: mdl-18753010

ABSTRACT

We have developed a computational technique refereed to as complex phylogenetic profiling. Our approach combines logic analyses of gene phylogenetic profiles and phenotype data. Logic analysis of phylogenetic profiles identifies sets of proteins whose presence or absence follows certain logic relationships. Our approach identifies phenotype specific logic, i.e. it identifies sets of proteins simultaneously present or absent only in genomes with a given phenotype. For example, for most genomes expressing phenotype A, the presence of protein C presumes the presence of protein B, while for other genomes (not expressing phenotype A) the presence of protein C presumes the absence of protein B. Application of complex phylogenetic profiling to bacterial data and several well studied phenotypes reveals genotype-phenotype associations on the level of fundamental biochemical pathways.


Subject(s)
Gene Expression Profiling , Phylogeny , Bacillus subtilis/genetics , Genes, Bacterial , Genotype , Oxygen/metabolism , Phenotype
8.
Nucleic Acids Res ; 36(Web Server issue): W347-51, 2008 Jul 01.
Article in English | MEDLINE | ID: mdl-18460543

ABSTRACT

ProfCom is a web-based tool for the functional interpretation of a gene list that was identified to be related by experiments. A trait which makes ProfCom a unique tool is an ability to profile enrichments of not only available Gene Ontology (GO) terms but also of 'complex functions'. A 'Complex function' is constructed as Boolean combination of available GO terms. The complex functions inferred by ProfCom are more specific in comparison to single terms and describe more accurately the functional role of genes. ProfCom provides a user friendly dialog-driven web page submission available for several model organisms and supports most available gene identifiers. In addition, the web service interface allows the submission of any kind of annotation data. ProfCom is freely available at http://webclu.bio.wzw.tum.de/profcom/.


Subject(s)
Gene Expression Profiling , Genes/physiology , Software , Animals , Humans , Internet , Mice , Rabbits
9.
J Mol Biol ; 363(1): 289-96, 2006 Oct 13.
Article in English | MEDLINE | ID: mdl-16959266

ABSTRACT

Relating experimental data to biological knowledge is necessary to cope with the avalanches of new data emerging from recent developments in high-throughput technologies. Automatic functional profiling becomes the de facto standard approach for the secondary analysis of high-throughput data. A number of tools employing available gene functional annotations have been developed for this purpose. However, current annotations are derived mostly from traditional analysis of the individual gene function. The complex biological phenomena carried out by the concerted activity of many genes often requires the definition of new complex functionality (related to a group of genes), which is, in many cases, not available in current annotation vocabularies. Functional profiling with annotation terms related to the description of individual biological functions of a gene may fail to provide reasonable interpretation of biological relationships in a set of genes involved in complex biological phenomena. We introduce a novel procedure to profile a complex functionality of a gene set. Complex functionality is constructed as a combination of available annotation terms. By profiling ChIP-chip data from Saccharomyces cerevisiae we demonstrate that this technique produces deeper insights into the results of high-throughput experiments that are beyond the known facts described in the functional classifications.


Subject(s)
Gene Expression Profiling , Gene Expression Regulation/physiology , Saccharomyces cerevisiae/genetics , Animals , Gene Expression Profiling/statistics & numerical data , Humans , Saccharomyces cerevisiae/physiology , Saccharomyces cerevisiae Proteins/biosynthesis , Saccharomyces cerevisiae Proteins/genetics
10.
Nucleic Acids Res ; 34(1): e6, 2006 Jan 10.
Article in English | MEDLINE | ID: mdl-16407322

ABSTRACT

The development of high-throughput technologies has generated the need for bioinformatics approaches to assess the biological relevance of gene networks. Although several tools have been proposed for analysing the enrichment of functional categories in a set of genes, none of them is suitable for evaluating the biological relevance of the gene network. We propose a procedure and develop a web-based resource (BIOREL) to estimate the functional bias (biological relevance) of any given genetic network by integrating different sources of biological information. The weights of the edges in the network may be either binary or continuous. These essential features make our web tool unique among many similar services. BIOREL provides standardized estimations of the network biases extracted from independent data. By the analyses of real data we demonstrate that the potential application of BIOREL ranges from various benchmarking purposes to systematic analysis of the network biology.


Subject(s)
Computational Biology/methods , Genes , Genomics/methods , Models, Genetic , Software , Gene Expression Profiling , Internet , Models, Statistical , Oligonucleotide Array Sequence Analysis , Saccharomyces cerevisiae/genetics , Two-Hybrid System Techniques
11.
FEBS Lett ; 580(3): 844-8, 2006 Feb 06.
Article in English | MEDLINE | ID: mdl-16414044

ABSTRACT

The progress of high-throughput methodologies in functional genomics has lead to the development of statistical procedures to infer gene networks from various types of high-throughput data. However, due to the lack of common standards, the biological significance of the results of the different studies is hard to compare. To overcome this problem we propose a benchmark procedure and have developed a web resource (BIOREL), which is useful for estimating the biological relevance of any genetic network by integrating different sources of biological information. The associations of each gene from the network are classified as biologically relevant or not. The proportion of genes in the network classified as "relevant" is used as the overall network relevance score. Employing synthetic data we demonstrated that such a score ranks the networks fairly in respect to the relevance level. Using BIOREL as the benchmark resource we compared the quality of experimental and theoretically predicted protein interaction data.


Subject(s)
Genomics , Internet , Proteins/genetics , Software , Computational Biology , Humans , Protein Binding/genetics
12.
Comput Biol Chem ; 29(1): 37-46, 2005 Feb.
Article in English | MEDLINE | ID: mdl-15680584

ABSTRACT

A DNA microarray can track the expression levels of thousands of genes simultaneously. Previous research has demonstrated that this technology can be useful in the classification of cancers. Cancer microarray data normally contains a small number of samples which have a large number of gene expression levels as features. To select relevant genes involved in different types of cancer remains a challenge. In order to extract useful gene information from cancer microarray data and reduce dimensionality, feature selection algorithms were systematically investigated in this study. Using a correlation-based feature selector combined with machine learning algorithms such as decision trees, naïve Bayes and support vector machines, we show that classification performance at least as good as published results can be obtained on acute leukemia and diffuse large B-cell lymphoma microarray data sets. We also demonstrate that a combined use of different classification and feature selection approaches makes it possible to select relevant genes with high confidence. This is also the first paper which discusses both computational and biological evidence for the involvement of zyxin in leukaemogenesis.


Subject(s)
Artificial Intelligence , Leukemia, Myeloid, Acute/classification , Oligonucleotide Array Sequence Analysis/methods , Precursor Cell Lymphoblastic Leukemia-Lymphoma/classification , Algorithms , Cytoskeletal Proteins , Gene Expression Profiling , Glycoproteins/genetics , Humans , Leukemia, Myeloid, Acute/genetics , Precursor Cell Lymphoblastic Leukemia-Lymphoma/genetics , Zyxin
13.
Bioinformatics ; 21(8): 1383-8, 2005 Apr 15.
Article in English | MEDLINE | ID: mdl-15585526

ABSTRACT

MOTIVATION: Discovery of host and pathogen genes expressed at the plant-pathogen interface often requires the construction of mixed libraries that contain sequences from both genomes. Sequence identification requires high-throughput and reliable classification of genome origin. When using single-pass cDNA sequences difficulties arise from the short sequence length, the lack of sufficient taxonomically relevant sequence data in public databases and ambiguous sequence homology between plant and pathogen genes. RESULTS: A novel method is described, which is independent of the availability of homologous genes and relies on subtle differences in codon usage between plant and fungal genes. We used support vector machines (SVMs) to identify the probable origin of sequences. SVMs were compared to several other machine learning techniques and to a probabilistic algorithm (PF-IND) for expressed sequence tag (EST) classification also based on codon bias differences. Our software (Eclat) has achieved a classification accuracy of 93.1% on a test set of 3217 EST sequences from Hordeum vulgare and Blumeria graminis, which is a significant improvement compared to PF-IND (prediction accuracy of 81.2% on the same test set). EST sequences with at least 50 nt of coding sequence can be classified using Eclat with high confidence. Eclat allows training of classifiers for any host-pathogen combination for which there are sufficient classified training sequences. AVAILABILITY: Eclat is freely available on the Internet (http://mips.gsf.de/proj/est) or on request as a standalone version. CONTACT: friedel@informatik.uni-muenchen.de.


Subject(s)
Artificial Intelligence , Ascomycota/genetics , Chromosome Mapping/methods , Codon/genetics , Hordeum/genetics , Hordeum/microbiology , Host-Parasite Interactions/genetics , Sequence Analysis, DNA/methods , Algorithms , Ascomycota/pathogenicity , DNA, Plant/genetics , Pattern Recognition, Automated/methods , Plants/genetics , Plants/parasitology , Sequence Alignment/methods
14.
Bioinformatics ; 20(17): 3284-5, 2004 Nov 22.
Article in English | MEDLINE | ID: mdl-15217811

ABSTRACT

The Maximal Margin (MAMA) linear programming classification algorithm has recently been proposed and tested for cancer classification based on expression data. It demonstrated sound performance on publicly available expression datasets. We developed a web interface to allow potential users easy access to the MAMA classification tool. Basic and advanced options provide flexibility in exploitation. The input data format is the same as that used in most publicly available datasets. This makes the web resource particularly convenient for non-expert machine learning users working in the field of expression data analysis.


Subject(s)
Algorithms , Gene Expression Profiling/methods , Internet , Oligonucleotide Array Sequence Analysis/methods , Programming, Linear , Software , Artificial Intelligence , User-Computer Interface
15.
Bioinformatics ; 20(5): 644-52, 2004 Mar 22.
Article in English | MEDLINE | ID: mdl-15033871

ABSTRACT

MOTIVATION: Microarray data appear particularly useful to investigate mechanisms in cancer biology and represent one of the most powerful tools to uncover the genetic mechanisms causing loss of cell cycle control. Recently, several different methods to employ microarray data as a diagnostic tool in cancer classification have been proposed. These procedures take changes in the expression of particular genes into account but do not consider disruptions in certain gene interactions caused by the tumor. It is probable that some genes participating in tumor development do not change their expression level dramatically. Thus, they cannot be detected by simple classification approaches used previously. For these reasons, a classification procedure exploiting information related to changes in gene interactions is needed. RESULTS: We propose a MAximal MArgin Linear Programming (MAMA) method for the classification of tumor samples based on microarray data. This procedure detects groups of genes and constructs models (features) that strongly correlate with particular tumor types. The detected features include genes whose functional relations are changed for particular cancer types. The proposed method was tested on two publicly available datasets and demonstrated a prediction ability superior to previously employed classification schemes. AVAILABILITY: The MAMA system was developed using the linear programming system LINDO http://www.lindo.com. A Perl script that specifies the optimization problem for this software is available upon request from the authors.


Subject(s)
Algorithms , Gene Expression Profiling/methods , Models, Genetic , Neoplasms/classification , Neoplasms/genetics , Oligonucleotide Array Sequence Analysis/methods , Protein Interaction Mapping/methods , Animals , Gene Expression Regulation, Neoplastic/genetics , Humans , Neoplasms/diagnosis , Numerical Analysis, Computer-Assisted , Pattern Recognition, Automated , Programming, Linear
SELECTION OF CITATIONS
SEARCH DETAIL
...