Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 14 de 14
Filter
Add more filters










Publication year range
1.
Nucleic Acids Res ; 35(Database issue): D747-50, 2007 Jan.
Article in English | MEDLINE | ID: mdl-17132828

ABSTRACT

UNLABELLED: ArrayExpress is a public database for high throughput functional genomics data. ArrayExpress consists of two parts--the ArrayExpress Repository, which is a MIAME supportive public archive of microarray data, and the ArrayExpress Data Warehouse, which is a database of gene expression profiles selected from the repository and consistently re-annotated. Archived experiments can be queried by experiment attributes, such as keywords, species, array platform, authors, journals or accession numbers. Gene expression profiles can be queried by gene names and properties, such as Gene Ontology terms and gene expression profiles can be visualized. ArrayExpress is a rapidly growing database, currently it contains data from >50,000 hybridizations and >1,500,000 individual expression profiles. ArrayExpress supports community standards, including MIAME, MAGE-ML and more recently the proposal for a spreadsheet based data exchange format: MAGE-TAB. AVAILABILITY: www.ebi.ac.uk/arrayexpress.


Subject(s)
Databases, Genetic , Gene Expression Profiling , Oligonucleotide Array Sequence Analysis , Animals , Arabidopsis Proteins/genetics , Arabidopsis Proteins/metabolism , Fungal Proteins/genetics , Fungal Proteins/metabolism , Humans , Internet , Mice , Rats , User-Computer Interface
2.
Nucleic Acids Res ; 33(Database issue): D553-5, 2005 Jan 01.
Article in English | MEDLINE | ID: mdl-15608260

ABSTRACT

ArrayExpress is a public repository for microarray data that supports the MIAME (Minimum Information About a Microarray Experiment) requirements and stores well-annotated raw and normalized data. As of November 2004, ArrayExpress contains data from approximately 12,000 hybridizations covering 35 species. Data can be submitted online or directly from local databases or LIMS in a standard format, and password-protected access to prepublication data is provided for reviewers and authors. The data can be retrieved by accession number or queried by various parameters such as species, author and array platform. A facility to query experiments by gene and sample properties is provided for a growing subset of curated data that is loaded in to the ArrayExpress data warehouse. Data can be visualized and analysed using Expression Profiler, the integrated data analysis tool. ArrayExpress is available at http://www.ebi.ac.uk/arrayexpress.


Subject(s)
Databases, Genetic , Gene Expression Profiling , Oligonucleotide Array Sequence Analysis , Animals , Computational Biology , Europe , Humans , Mice , User-Computer Interface
4.
Bioinformatics ; 18 Suppl 2: S202-10, 2002.
Article in English | MEDLINE | ID: mdl-12386004

ABSTRACT

MOTIVATION: Microarray experiments comparing expression levels of all genes in yeast for hundreds of mutants allow us to examine properties of gene regulatory networks on a genomic scale. We can investigate questions such as network modularity, connectivity, and look for genes with particular roles in the network structure. RESULTS: We have built genome-wide disruption networks for yeast, using a representation of gene expression data as directed labelled graphs. Nodes represent genes and arcs connect nodes if the disruption of the source gene significantly alters the expression of the target gene. We are interested in features of the resulting disruption networks that are robust over a range of significance cutoffs. The networks show a significant overlap with analogous networks constructed from scientific literature. In disruption networks the number of arcs adjacent to different nodes are distributed roughly according to a power-law, like in many complex systems where the robustness against perturbations is important. The networks are dominated by a single large component and do not have an obvious modular structure. Genes with the highest outdegrees often encode proteins with regulatory functions, whereas genes with the highest indegrees are predominantly involved in metabolism. The local structure of the networks is meaningful, genes involved in the same cellular processes are close together in the network. AVAILABILITY: http://www.ebi.ac.uk/microarray/networks


Subject(s)
Gene Expression Profiling/methods , Gene Expression Regulation, Fungal/physiology , Models, Biological , Proteome/metabolism , Saccharomyces cerevisiae Proteins/metabolism , Saccharomyces cerevisiae/physiology , Signal Transduction/physiology , Chromosome Mapping/methods , Gene Silencing , Genome, Fungal , Oligonucleotide Array Sequence Analysis/methods , Protein Interaction Mapping/methods , Proteome/genetics , Saccharomyces cerevisiae Proteins/genetics
5.
Nat Genet ; 29(4): 365-71, 2001 Dec.
Article in English | MEDLINE | ID: mdl-11726920

ABSTRACT

Microarray analysis has become a widely used tool for the generation of gene expression data on a genomic scale. Although many significant results have been derived from microarray studies, one limitation has been the lack of standards for presenting and exchanging such data. Here we present a proposal, the Minimum Information About a Microarray Experiment (MIAME), that describes the minimum information required to ensure that microarray data can be easily interpreted and that results derived from its analysis can be independently verified. The ultimate goal of this work is to establish a standard for recording and reporting microarray-based gene expression data, which will in turn facilitate the establishment of databases and public repositories and enable the development of data analysis tools. With respect to MIAME, we concentrate on defining the content and structure of the necessary information rather than the technical format for capturing it.


Subject(s)
Computational Biology , Oligonucleotide Array Sequence Analysis/standards , Gene Expression Profiling/methods
6.
Microbes Infect ; 3(10): 823-9, 2001 Aug.
Article in English | MEDLINE | ID: mdl-11580977

ABSTRACT

Microarrays are one of the latest breakthroughs in experimental molecular biology, which allow monitoring of gene expression for tens of thousands of genes in parallel and are already producing huge amounts of valuable data. Analysis and handling of such data is becoming one of the major bottlenecks in the utilization of the technology. The raw microarray data are images, which have to be transformed into gene expression matrices, tables where rows represent genes, columns represent various samples such as tissues or experimental conditions, and numbers in each cell characterize the expression level of the particular gene in the particular sample. These matrices have to be analyzed further if any knowledge about the underlying biological processes is to be extracted. In this paper we concentrate on discussing bioinformatics methods used for such analysis. We briefly discuss supervised and unsupervised data analysis and its applications, such as predicting gene function classes and cancer classification as well as some possible future directions.


Subject(s)
Gene Expression Regulation , Statistics as Topic/methods , Animals , Computational Biology , Humans , Oligonucleotide Array Sequence Analysis
8.
Article in English | MEDLINE | ID: mdl-10977099

ABSTRACT

We have developed a set of methods and tools for automatic discovery of putative regulatory signals in genome sequences. The analysis pipeline consists of gene expression data clustering, sequence pattern discovery from upstream sequences of genes, a control experiment for pattern significance threshold limit detection, selection of interesting patterns, grouping of these patterns, representing the pattern groups in a concise form and evaluating the discovered putative signals against existing databases of regulatory signals. The pattern discovery is computationally the most expensive and crucial step. Our tool performs a rapid exhaustive search for a priori unknown statistically significant sequence patterns of unrestricted length. The statistical significance is determined for a set of sequences in each cluster with respect to a set of background sequences allowing the detection of subtle regulatory signals specific for each cluster. The potentially large number of significant patterns is reduced to a small number of groups by clustering them by mutual similarity. Automatically derived consensus patterns of these groups represent the results in a comprehensive way for a human investigator. We have performed a systematic analysis for the yeast Saccharomyces cerevisiae. We created a large number of independent clusterings of expression data simultaneously assessing the "goodness" of each cluster. For each of the over 52,000 clusters acquired in this way we discovered significant patterns in the upstream sequences of respective genes. We selected nearly 1,500 significant patterns by formal criteria and matched them against the experimentally mapped transcription factor binding sites in the SCPD database. We clustered the 1,500 patterns to 62 groups for which we derived automatically alignments and consensus patterns. Of these 62 groups 48 had patterns that have matching sites in SCPD database.


Subject(s)
Gene Expression Profiling , Gene Expression Regulation , Genome, Fungal , Sequence Analysis, DNA/methods , Humans , Multigene Family , Oligonucleotide Array Sequence Analysis
9.
FEBS Lett ; 480(1): 17-24, 2000 Aug 25.
Article in English | MEDLINE | ID: mdl-10967323

ABSTRACT

Microarrays are one of the latest breakthroughs in experimental molecular biology, which allow monitoring of gene expression for tens of thousands of genes in parallel and are already producing huge amounts of valuable data. Analysis and handling of such data is becoming one of the major bottlenecks in the utilization of the technology. The raw microarray data are images, which have to be transformed into gene expression matrices--tables where rows represent genes, columns represent various samples such as tissues or experimental conditions, and numbers in each cell characterize the expression level of the particular gene in the particular sample. These matrices have to be analyzed further, if any knowledge about the underlying biological processes is to be extracted. In this paper we concentrate on discussing bioinformatics methods used for such analysis. We briefly discuss supervised and unsupervised data analysis and its applications, such as predicting gene function classes and cancer classification. Then we discuss how the gene expression matrix can be used to predict putative regulatory signals in the genome sequences. In conclusion we discuss some possible future directions.


Subject(s)
Computational Biology/methods , Gene Expression Profiling/methods , Oligonucleotide Array Sequence Analysis/methods , Animals , Genes/genetics , Genes/physiology , Humans , Neoplasms/classification , Neoplasms/genetics , Phylogeny , Regulatory Sequences, Nucleic Acid/genetics , Statistics as Topic/methods
11.
Genome Res ; 8(11): 1202-15, 1998 Nov.
Article in English | MEDLINE | ID: mdl-9847082

ABSTRACT

We performed a systematic analysis of gene upstream regions in the yeast genome for occurrences of regular expression-type patterns with the goal of identifying potential regulatory elements. To achieve this goal, we have developed a new sequence pattern discovery algorithm that searches exhaustively for a priori unknown regular expression-type patterns that are over-represented in a given set of sequences. We applied the algorithm in two cases, (1) discovery of patterns in the complete set of >6000 sequences taken upstream of the putative yeast genes and (2) discovery of patterns in the regions upstream of the genes with similar expression profiles. In the first case, we looked for patterns that occur more frequently in the gene upstream regions than in the genome overall. In the second case, first we clustered the upstream regions of all the genes by similarity of their expression profiles on the basis of publicly available gene expression data and then looked for sequence patterns that are over-represented in each cluster. In both cases we considered each pattern that occurred at least in some minimum number of sequences, and rated them on the basis of their over-representation. Among the highest rating patterns, most have matches to substrings in known yeast transcription factor-binding sites. Moreover, several of them are known to be relevant to the expression of the genes from the respective clusters. Experiments on simulated data show that the majority of the discovered patterns are not expected to occur by chance.


Subject(s)
Algorithms , Genes, Fungal/genetics , Genome, Fungal , Regulatory Sequences, Nucleic Acid , Gene Expression , Saccharomyces cerevisiae/genetics
12.
J Comput Biol ; 5(2): 279-305, 1998.
Article in English | MEDLINE | ID: mdl-9672833

ABSTRACT

This paper surveys approaches to the discovery of patterns in biosequences and places these approaches within a formal framework that systematises the types of patterns and the discovery algorithms. Patterns with expressive power in the class of regular languages are considered, and a classification of pattern languages in this class is developed, covering the patterns that are the most frequently used in molecular bioinformatics. A formulation is given of the problem of the automatic discovery of such patterns from a set of sequences, and an analysis is presented of the ways in which an assessment can be made of the significance of the discovered patterns. It is shown that the problem is related to problems studied in the field of machine learning. The major part of this paper comprises a review of a number of existing methods developed to solve the problem and how these relate to each other, focusing on the algorithms underlying the approaches. A comparison is given of the algorithms, and examples are given of patterns that have been discovered using the different methods.


Subject(s)
Algorithms , Databases, Factual , Models, Theoretical , Proteins , Sequence Alignment/methods , Base Composition , Mathematical Computing , Software
13.
Article in English | MEDLINE | ID: mdl-9322017

ABSTRACT

We have examined methods and developed a general software tool for finding and analyzing combinations of transcription factor binding sites that occur relatively often in gene upstream regions (putative promoter regions) in the yeast genome. Such frequently occurring combinations may be essential parts of possible promoter classes. The regions upstream to all genes were first isolated from the yeast genome database MIPS using the information in the annotation files of the database. The ones that do not overlap with coding regions were chosen for further studies. Next, all occurrences of the yeast transcription factor binding sites, as given in the IMD database, were located in the genome and in the selected regions in particular. Finally, by using a general purpose data mining software in combination with our own software, which parametrizes the search, we can find the combinations of binding sites that occur in the upstream regions more frequently than would be expected on the basis of the frequency of individual sites. The procedure also finds so-called association rules present in such combinations. The developed tool is available for use through the WWW.


Subject(s)
Genes, Regulator , Genome, Fungal , Saccharomyces cerevisiae/genetics , Software , Binding Sites/genetics , Chromosomes, Fungal/genetics , Databases, Factual , Open Reading Frames , Promoter Regions, Genetic , Saccharomyces cerevisiae/metabolism , Transcription Factors/metabolism
14.
Article in English | MEDLINE | ID: mdl-8877502

ABSTRACT

We consider the problem of automatic discovery of patterns and the corresponding subfamilies in a set of biosequences. The sequences are unaligned and may contain noise of unknown level. The patterns are of the type used in PROSITE database. In our approach we discover patterns and the respective subfamilies simultaneously. We develop a theoretically substantiated significance measure for a set of such patterns and an algorithm approximating the best pattern set and the subfamilies. The approach is based on the minimum description length (MDL) principle. We report a computing experiment correctly finding subfamilies in the family of chromo domains and revealing new strong patterns.


Subject(s)
Models, Molecular , Protein Conformation , Algorithms , Animals , Phylogeny , Software
SELECTION OF CITATIONS
SEARCH DETAIL
...