Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 48
Filter
1.
Nucleic Acids Res ; 41(Database issue): D530-5, 2013 Jan.
Article in English | MEDLINE | ID: mdl-23161678

ABSTRACT

The Gene Ontology (GO) Consortium (GOC, http://www.geneontology.org) is a community-based bioinformatics resource that classifies gene product function through the use of structured, controlled vocabularies. Over the past year, the GOC has implemented several processes to increase the quantity, quality and specificity of GO annotations. First, the number of manual, literature-based annotations has grown at an increasing rate. Second, as a result of a new 'phylogenetic annotation' process, manually reviewed, homology-based annotations are becoming available for a broad range of species. Third, the quality of GO annotations has been improved through a streamlined process for, and automated quality checks of, GO annotations deposited by different annotation groups. Fourth, the consistency and correctness of the ontology itself has increased by using automated reasoning tools. Finally, the GO has been expanded not only to cover new areas of biology through focused interaction with experts, but also to capture greater specificity in all areas of the ontology using tools for adding new combinatorial terms. The GOC works closely with other ontology developers to support integrated use of terminologies. The GOC supports its user community through the use of e-mail lists, social media and web-based resources.


Subject(s)
Databases, Genetic , Genes , Molecular Sequence Annotation , Vocabulary, Controlled , Internet , Phylogeny
2.
Nucleic Acids Res ; 35(Database issue): D561-5, 2007 Jan.
Article in English | MEDLINE | ID: mdl-17145710

ABSTRACT

IntAct is an open source database and software suite for modeling, storing and analyzing molecular interaction data. The data available in the database originates entirely from published literature and is manually annotated by expert biologists to a high level of detail, including experimental methods, conditions and interacting domains. The database features over 126,000 binary interactions extracted from over 2100 scientific publications and makes extensive use of controlled vocabularies. The web site provides tools allowing users to search, visualize and download data from the repository. IntAct supports and encourages local installations as well as direct data submission and curation collaborations. IntAct source code and data are freely available from http://www.ebi.ac.uk/intact.


Subject(s)
DNA/chemistry , Databases, Genetic , Proteins/chemistry , RNA/chemistry , Databases, Genetic/standards , Internet , Protein Interaction Mapping , Protein Structure, Tertiary , Quality Control , Software , User-Computer Interface , Vocabulary, Controlled
3.
Science ; 309(5740): 1559-63, 2005 Sep 02.
Article in English | MEDLINE | ID: mdl-16141072

ABSTRACT

This study describes comprehensive polling of transcription start and termination sites and analysis of previously unidentified full-length complementary DNAs derived from the mouse genome. We identify the 5' and 3' boundaries of 181,047 transcripts with extensive variation in transcripts arising from alternative promoter usage, splicing, and polyadenylation. There are 16,247 new mouse protein-coding transcripts, including 5154 encoding previously unidentified proteins. Genomic mapping of the transcriptome reveals transcriptional forests, with overlapping transcription on both strands, separated by deserts in which few transcripts are observed. The data provide a comprehensive platform for the comparative analysis of mammalian transcriptional regulation in differentiation and development.


Subject(s)
Genome , Mice/genetics , Terminator Regions, Genetic , Transcription Initiation Site , Transcription, Genetic , 3' Untranslated Regions , Animals , Base Sequence , Conserved Sequence , DNA, Complementary/chemistry , Genome, Human , Genomics , Humans , Promoter Regions, Genetic , Proteins/genetics , RNA/chemistry , RNA/classification , RNA Splicing , RNA, Untranslated/chemistry , Regulatory Sequences, Ribonucleic Acid
4.
Nucleic Acids Res ; 33(Web Server issue): W116-20, 2005 Jul 01.
Article in English | MEDLINE | ID: mdl-15980438

ABSTRACT

InterProScan [E. M. Zdobnov and R. Apweiler (2001) Bioinformatics, 17, 847-848] is a tool that combines different protein signature recognition methods from the InterPro [N. J. Mulder, R. Apweiler, T. K. Attwood, A. Bairoch, A. Bateman, D. Binns, P. Bradley, P. Bork, P. Bucher, L. Cerutti et al. (2005) Nucleic Acids Res., 33, D201-D205] consortium member databases into one resource. At the time of writing there are 10 distinct publicly available databases in the application. Protein as well as DNA sequences can be analysed. A web-based version is accessible for academic and commercial organizations from the EBI (http://www.ebi.ac.uk/InterProScan/). In addition, a standalone Perl version and a SOAP Web Service [J. Snell, D. Tidwell and P. Kulchenko (2001) Programming Web Services with SOAP, 1st edn. O'Reilly Publishers, Sebastopol, CA, http://www.w3.org/TR/soap/] are also available to the users. Various output formats are supported and include text tables, XML documents, as well as various graphs to help interpret the results.


Subject(s)
Protein Structure, Tertiary , Sequence Analysis, Protein , Software , Databases, Protein , Internet , Sequence Analysis, DNA , User-Computer Interface
5.
Nucleic Acids Res ; 33(Database issue): D262-5, 2005 Jan 01.
Article in English | MEDLINE | ID: mdl-15608192

ABSTRACT

The Macromolecular Structure Database (MSD) group (http://www.ebi.ac.uk/msd/) continues to enhance the quality and consistency of macromolecular structure data in the worldwide Protein Data Bank (wwPDB) and to work towards the integration of various bioinformatics data resources. One of the major obstacles to the improved integration of structural databases such as MSD and sequence databases like UniProt is the absence of up to date and well-maintained mapping between corresponding entries. We have worked closely with the UniProt group at the EBI to clean up the taxonomy and sequence cross-reference information in the MSD and UniProt databases. This information is vital for the reliable integration of the sequence family databases such as Pfam and Interpro with the structure-oriented databases of SCOP and CATH. This information has been made available to the eFamily group (http://www.efamily.org.uk/) and now forms the basis of the regular interchange of information between the member databases (MSD, UniProt, Pfam, Interpro, SCOP and CATH). This exchange of annotation information has enriched the structural information in the MSD database with annotation from wider sequence-oriented resources. This work was carried out under the 'Structure Integration with Function, Taxonomy and Sequences (SIFTS)' initiative (http://www.ebi.ac.uk/msd-srv/docs/sifts) in the MSD group.


Subject(s)
Computational Biology , Databases, Protein , Proteins/chemistry , Amino Acid Sequence , Proteins/classification , Systems Integration
6.
Bioinformatics ; 20 Suppl 1: i342-7, 2004 Aug 04.
Article in English | MEDLINE | ID: mdl-15262818

ABSTRACT

MOTIVATION: Automatically generated annotation on protein data of UniProt (Universal Protein Resource) is planned to be publicly available on the UniProt web pages in April 2004. It is expected that the data content of over 500,000 protein entries in the TrEMBL section will be enhanced by the output of an automated annotation pipeline. However, a part of the automatically added data will be erroneous, as are parts of the information coming from other sources. We present a post-processing system called Xanthippe that is based on a simple exclusion mechanism and a decision tree approach using the C4.5 data-mining algorithm. RESULTS: It is shown that Xanthippe detects and flags a large part of the annotation errors and considerably increases the reliability of both automatically generated data and annotation from other sources. As a cross-validation to Swiss-Prot shows, errors in protein descriptions, comments and keywords are successfully filtered out. Xanthippe is a contradictive application that can be combined seamlessly with predictive systems. It can be used either to improve the precision of automated annotation at a constant level of recall or increase the recall at a constant level of precision. AVAILABILITY: The application of the Xanthippe rules can be browsed at http://www.ebi.uniprot.org/


Subject(s)
Algorithms , Databases, Protein , Documentation/methods , Information Storage and Retrieval/methods , Proteins/chemistry , Proteins/classification , Sequence Analysis, Protein/methods , Amino Acid Sequence , Molecular Sequence Data , Software
7.
J Biomed Inform ; 37(1): 30-42, 2004 Feb.
Article in English | MEDLINE | ID: mdl-15016384

ABSTRACT

In this paper, we review the results of BIOINFOMED, a study funded by the European Commission (EC) with the purpose to analyse the different issues and challenges in the area where Medical Informatics and Bioinformatics meet. Traditionally, Medical Informatics has been focused on the intersection between computer science and clinical medicine, whereas Bioinformatics have been predominantly centered on the intersection between computer science and biological research. Although researchers from both areas have occasionally collaborated, their training, objectives and interests have been quite different. The results of the Human Genome and related projects have attracted the interest of many professionals, and introduced new challenges that will transform biomedical research and health care. A characteristic of the 'post genomic' era will be to correlate essential genotypic information with expressed phenotypic information. In this context, Biomedical Informatics (BMI) has emerged to describe the technology that brings both disciplines (BI and MI) together to support genomic medicine. In recognition of the dynamic nature of BMI, institutions such as the EC have launched several initiatives in support of a research agenda, including the BIOINFOMED study.


Subject(s)
Computational Biology/methods , Delivery of Health Care/methods , Genetic Testing/methods , Genetic Therapy/methods , Genomics/methods , Medical Informatics/methods , Research Design , Biotechnology/methods , Biotechnology/trends , Computational Biology/trends , Delivery of Health Care/trends , European Union , Forecasting , Gene Expression Profiling/methods , Gene Expression Profiling/trends , Genetic Testing/trends , Genetic Therapy/trends , Genomics/instrumentation , Government Programs , Medical Informatics/trends , Research/trends , Technology Assessment, Biomedical
8.
Nucleic Acids Res ; 32(Database issue): D258-61, 2004 Jan 01.
Article in English | MEDLINE | ID: mdl-14681407

ABSTRACT

The Gene Ontology (GO) project (http://www. geneontology.org/) provides structured, controlled vocabularies and classifications that cover several domains of molecular and cellular biology and are freely available for community use in the annotation of genes, gene products and sequences. Many model organism databases and genome annotation groups use the GO and contribute their annotation sets to the GO resource. The GO database integrates the vocabularies and contributed annotations and provides full access to this information in several formats. Members of the GO Consortium continually work collectively, involving outside experts as needed, to expand and update the GO vocabularies. The GO Web resource also provides access to extensive documentation about the GO project and links to applications that use GO data for functional analyses.


Subject(s)
Databases, Genetic , Genes , Terminology as Topic , Animals , Bibliographies as Topic , Electronic Mail , Genomics , Humans , Information Storage and Retrieval , Internet , Molecular Biology , Proteins/classification , Proteins/genetics , Software
9.
Methods Inf Med ; 42(2): 154-60, 2003.
Article in English | MEDLINE | ID: mdl-12743652

ABSTRACT

OBJECTIVES: The increasing production of molecular biology data in the post-genomic era, and the proliferation of databases that store it, require the development of an integrative layer in database services to facilitate the synthesis of related information. The solution of this problem is made more difficult by the absence of universal identifiers for biological entities, and the breadth and variety of available data. METHODS: Integr8 was modelled using UML (Universal Modelling Language). Integr8 is being implemented as an n-tier system using a modern object-oriented programming language (Java). An object-relational mapping tool, OJB, is being used to specify the interface between the upper layers and an underlying relational database. RESULTS: The European Bioinformatics Institute is launching the Integr8 project. Integr8 will be an automatically populated database in which we will maintain stable identifiers for biological entities, describe their relationships with each other (in accordance with the central dogma of biology), and store equivalences between identified entities in the source databases. Only core data will be stored in Integr8, with web links to the source databases providing further information. CONCLUSIONS: Integr8 will provide the integrative layer of the next generation of bioinformatics services from the EBI. Web-based interfaces will be developed to offer gene-centric views of the integrated data, presenting (where known) the links between genome, proteome and phenotype.


Subject(s)
Computational Biology , Databases, Genetic , Medical Informatics , Molecular Biology , Systems Integration , Europe , Genomics , Humans , Proteome , User-Computer Interface
10.
Nucleic Acids Res ; 31(1): 388-9, 2003 Jan 01.
Article in English | MEDLINE | ID: mdl-12520029

ABSTRACT

The CluSTr database (http://www.ebi.ac.uk/clustr/) offers an automatic classification of SWISS-PROT+TrEMBL proteins into groups of related proteins. The clustering is based on analysis of all pair-wise sequence comparisons between proteins using the Smith-Waterman algorithm. The analysis, carried out on different levels of protein similarity, yields a hierarchical organization of clusters. Information about domain content of the clustered proteins is provided via the InterPro resource. The introduced InterPro 'condensed graphical view' simplifies the visual analysis of represented domain architectures. Integrated applications allow users to visualize and edit multiple alignments and build sequence divergence trees. Links to the relevant structural data in Protein Data Bank (PDB) and Homology derived Secondary Structure of Proteins (HSSP) are also provided.


Subject(s)
Databases, Protein , Proteins/chemistry , Proteins/classification , Animals , Cluster Analysis , Computer Graphics , Internet , Protein Structure, Tertiary , Sequence Alignment , User-Computer Interface
11.
Bioinformatics ; 18(2): 374-5, 2002 Feb.
Article in English | MEDLINE | ID: mdl-11847096

ABSTRACT

MOTIVATION: The SWISS-PROT group at the EBI has developed the Proteome Analysis Database utilizing existing resources and providing comprehensive and integrated comparative analysis of the predicted protein coding sequences of the complete genomes of bacteria, archaea and eukaryotes. The Proteome Analysis Database is accompanied by a program that has been designed to carry out interactive InterPro proteome comparisons for any one proteome against any other one or more of the proteomes in the database.


Subject(s)
Databases, Protein , Proteome , Software , Computational Biology , Genome
12.
Bioinformatics ; 17(9): 847-8, 2001 Sep.
Article in English | MEDLINE | ID: mdl-11590104

ABSTRACT

UNLABELLED: InterProScan is a tool that scans given protein sequences against the protein signatures of the InterPro member databases, currently--PROSITE, PRINTS, Pfam, ProDom and SMART. The number of signature databases and their associated scanning tools as well as the further refinement procedures make the problem complex. InterProScan is designed to be a scalable and extensible system with a robust internal architecture. AVAILABILITY: The Perl-based InterProScan implementation is available from the EBI ftp server (ftp://ftp.ebi.ac.uk/pub/software/unix/iprscan/) and the SRS-basedInterProScan is available upon request. We provide the public web interface (http://www.ebi.ac.uk/interpro/scan.html) as well as email submission server (interproscan@ebi.ac.uk).


Subject(s)
Database Management Systems , Databases, Protein
13.
Bioinformatics ; 17(10): 920-6, 2001 Oct.
Article in English | MEDLINE | ID: mdl-11673236

ABSTRACT

MOTIVATION: The gap between the amount of newly submitted protein data and reliable functional annotation in public databases is growing. Traditional manual annotation by literature curation and sequence analysis tools without the use of automated annotation systems is not able to keep up with the ever increasing quantity of data that is submitted. Automated supplements to manually curated databases such as TrEMBL or GenPept cover raw data but provide only limited annotation. To improve this situation automatic tools are needed that support manual annotation, automatically increase the amount of reliable information and help to detect inconsistencies in manually generated annotations. RESULTS: A standard data mining algorithm was successfully applied to gain knowledge about the Keyword annotation in SWISS-PROT. 11 306 rules were generated, which are provided in a database and can be applied to yet unannotated protein sequences and viewed using a web browser. They rely on the taxonomy of the organism, in which the protein was found and on signature matches of its sequence. The statistical evaluation of the generated rules by cross-validation suggests that by applying them on arbitrary proteins 33% of their keyword annotation can be generated with an error rate of 1.5%. The coverage rate of the keyword annotation can be increased to 60% by tolerating a higher error rate of 5%. AVAILABILITY: The results of the automatic data mining process can be browsed on http://golgi.ebi.ac.uk:8080/Spearmint/ Source code is available upon request. CONTACT: kretsch@ebi.ac.uk.


Subject(s)
Algorithms , Databases, Protein/statistics & numerical data , Proteins/genetics , Computational Biology , Reproducibility of Results , Software
14.
Bioinformatics ; 17(7): 646-53, 2001 Jul.
Article in English | MEDLINE | ID: mdl-11448883

ABSTRACT

MOTIVATION: A variety of tools are available to predict the topology of transmembrane proteins. To date no independent evaluation of the performance of these tools has been published. A better understanding of the strengths and weaknesses of the different tools would guide both the biologist and the bioinformatician to make better predictions of membrane protein topology. RESULTS: Here we present an evaluation of the performance of the currently best known and most widely used methods for the prediction of transmembrane regions in proteins. Our results show that TMHMM is currently the best performing transmembrane prediction program.


Subject(s)
Membrane Proteins/chemistry , Software , Computational Biology , Databases as Topic , Membrane Proteins/genetics , Models, Molecular , Protein Conformation , Protein Sorting Signals/genetics , Solubility
15.
Brief Bioinform ; 2(1): 9-18, 2001 Mar.
Article in English | MEDLINE | ID: mdl-11465066

ABSTRACT

With the rapid growth of sequence databases, there is an increasing need for reliable functional characterisation and annotation of newly predicted proteins. To cope with such large data volumes, faster and more effective means of protein sequence characterisation and annotation are required. One promising approach is automatic large-scale functional characterisation and annotation, which is generated with limited human interaction. However, such an approach is heavily dependent on reliable data sources. The SWISS-PROT protein sequence database plays an essential role here owing to its high level of functional information.


Subject(s)
Databases, Factual , Proteins/genetics , Animals , Computational Biology , Humans , Proteins/physiology , Sequence Analysis, Protein
16.
J Biosci ; 26(2): 277-84, 2001 Jun.
Article in English | MEDLINE | ID: mdl-11426064

ABSTRACT

InterPro (http://www.ebi.ac.uk/interpro/) is an integrated documentation resource for protein families, domains and sites, developed initially as a means of rationalizing the complementary efforts of the PROSITE, PRINTS, Pfam and ProDom database projects. It is a useful resource that aids the functional classification of proteins. Almost 90% of the actinopterygii protein sequences from SWISS-PROT and TrEMBL can be classified using InterPro. Over 30% of the actinopterygii protein sequences currently in SWISS-PROT and TrEMBL are of mitochondrial origin, the majority of which belong to the cytochrome b/b6 family. InterPro also gives insights into the domain composition of the classified proteins and has applications in the functional classification of newly determined sequences lacking biochemical characterization, and in comparative genome analysis. A comparison of the actinopterygii protein sequences against the sequences of other eukaryotes confirms the high representation of eukaryotic protein kinase in the organisms studied. The comparisons also show that, based on InterPro families, the trans-species evolution of MHC class I and II molecules in mammals and teleost fish can be recognized.


Subject(s)
Databases, Factual , Fishes , Information Management , Proteins/classification , Animals , Fishes/genetics , Information Management/methods , Internet , Sequence Analysis, Protein
17.
Curr Opin Struct Biol ; 11(3): 334-9, 2001 Jun.
Article in English | MEDLINE | ID: mdl-11406384

ABSTRACT

Various sequence-motif and sequence-cluster databases have been integrated into a new resource known as InterPro. Because the contributing databases have different clustering principles and scoring sensitivities, the combined assignments complement each other for grouping protein families and delineating domains. InterPro and new developments in the analysis of both the phylogenetic profiles of protein families and domain fusion events improve the prediction of specific functions for numerous proteins.


Subject(s)
Amino Acid Motifs , Databases, Factual , Phylogeny , Proteins , Sequence Alignment/methods
18.
Trends Biotechnol ; 19(5): 178-81, 2001 May.
Article in English | MEDLINE | ID: mdl-11301130

ABSTRACT

The availability of the human genome sequence has enabled the exploration and exploitation of the human genome and proteome to begin. Research has now focussed on the annotation of the genome and in particular of the proteome. With expert annotation extracted from the literature by biologists as the foundation, it has been possible to expand into the areas of data mining and automatic annotation. With further development and integration of pattern recognition methods and the application of alignments clustering, proteome analysis can now be provided in a meaningful way. These various approaches have been integrated to attach, extract and combine as much relevant information as possible to the proteome. This resource should be valuable to users from both research and industry.


Subject(s)
Databases, Factual , Genome , Proteins/chemistry , Algorithms , Humans , Internet , Models, Statistical
19.
Nucleic Acids Res ; 29(1): 33-6, 2001 Jan 01.
Article in English | MEDLINE | ID: mdl-11125042

ABSTRACT

The CluSTr (Clusters of SWISS-PROT and TrEMBL proteins) database offers an automatic classification of SWISS-PROT and TrEMBL proteins into groups of related proteins. The clustering is based on analysis of all pairwise comparisons between protein sequences. Analysis has been carried out for different levels of protein similarity, yielding a hierarchical organisation of clusters. The database provides links to InterPro, which integrates information on protein families, domains and functional sites from PROSITE, PRINTS, Pfam and ProDom. Links to the InterPro graphical interface allow users to see at a glance whether proteins from the cluster share particular functional sites. CluSTr also provides cross-references to HSSP and PDB. The database is available for querying and browsing at http://www.ebi.ac.uk/clustr.


Subject(s)
Databases, Factual , Proteins , Animals , Carrier Proteins/genetics , Humans , Information Services , Internet , Proteins/genetics , Sequence Alignment , Sodium/metabolism
20.
Nucleic Acids Res ; 29(1): 37-40, 2001 Jan 01.
Article in English | MEDLINE | ID: mdl-11125043

ABSTRACT

Signature databases are vital tools for identifying distant relationships in novel sequences and hence for inferring protein function. InterPro is an integrated documentation resource for protein families, domains and functional sites, which amalgamates the efforts of the PROSITE, PRINTS, Pfam and ProDom database projects. Each InterPro entry includes a functional description, annotation, literature references and links back to the relevant member database(s). Release 2.0 of InterPro (October 2000) contains over 3000 entries, representing families, domains, repeats and sites of post-translational modification encoded by a total of 6804 different regular expressions, profiles, fingerprints and Hidden Markov Models. Each InterPro entry lists all the matches against SWISS-PROT and TrEMBL (more than 1,000,000 hits from 462,500 proteins in SWISS-PROT and TrEMBL). The database is accessible for text- and sequence-based searches at http://www.ebi.ac.uk/interpro/. Questions can be emailed to interhelp@ebi.ac.uk.


Subject(s)
Databases, Factual , Proteins , Information Services , Internet , Protein Structure, Tertiary , Proteins/chemistry , Proteins/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...