Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 16 de 16
Filter
Add more filters










Publication year range
1.
Bioinformatics ; 38(17): 4194-4199, 2022 09 02.
Article in English | MEDLINE | ID: mdl-35801937

ABSTRACT

MOTIVATION: Understanding life cannot be accomplished without making full use of biological data, which are scattered across databases of diverse categories in life sciences. To connect such data seamlessly, identifier (ID) conversion plays a key role. However, existing ID conversion services have disadvantages, such as covering only a limited range of biological categories of databases, not keeping up with the updates of the original databases and outputs being hard to interpret in the context of biological relations, especially when converting IDs in multiple steps. RESULTS: TogoID is an ID conversion service implementing unique features with an intuitive web interface and an application programming interface (API) for programmatic access. TogoID currently supports 65 datasets covering various biological categories. TogoID users can perform exploratory multistep conversions to find a path among IDs. To guide the interpretation of biological meanings in the conversions, we crafted an ontology that defines the semantics of the dataset relations. AVAILABILITY AND IMPLEMENTATION: The TogoID service is freely available on the TogoID website (https://togoid.dbcls.jp/) and the API is also provided to allow programmatic access. To encourage developers to add new dataset pairs, the system stores the configurations of pairs at the GitHub repository (https://github.com/togoid/togoid-config) and accepts the request of additional pairs. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Data Management , Software , Databases, Factual
2.
F1000Res ; 9: 136, 2020.
Article in English | MEDLINE | ID: mdl-32308977

ABSTRACT

We report on the activities of the 2015 edition of the BioHackathon, an annual event that brings together researchers and developers from around the world to develop tools and technologies that promote the reusability of biological data. We discuss issues surrounding the representation, publication, integration, mining and reuse of biological data and metadata across a wide range of biomedical data types of relevance for the life sciences, including chemistry, genotypes and phenotypes, orthology and phylogeny, proteomics, genomics, glycomics, and metabolomics. We describe our progress to address ongoing challenges to the reusability and reproducibility of research results, and identify outstanding issues that continue to impede the progress of bioinformatics research. We share our perspective on the state of the art, continued challenges, and goals for future research and development for the life sciences Semantic Web.


Subject(s)
Biological Science Disciplines , Computational Biology , Semantic Web , Data Mining , Metadata , Reproducibility of Results
3.
Database (Oxford) ; 20192019 01 01.
Article in English | MEDLINE | ID: mdl-30624651

ABSTRACT

TogoGenome is a genome database that is purely based on the Semantic Web technology, which enables the integration of heterogeneous data and flexible semantic searches. All the information is stored as Resource Description Framework (RDF) data, and the reporting web pages are generated on the fly using SPARQL Protocol and RDF Query Language (SPARQL) queries. TogoGenome provides a semantic-faceted search system by gene functional annotation, taxonomy, phenotypes and environment based on the relevant ontologies. TogoGenome also serves as an interface to conduct semantic comparative genomics by which a user can observe pan-organism or organism-specific genes based on the functional aspect of gene annotations and the combinations of organisms from different taxa. The TogoGenome database exhibits a modularized structure, and each module in the report pages is separately served as TogoStanza, which is a generic framework for rendering an information block as IFRAME/Web Components, which can, unlike several other monolithic databases, also be reused to construct other databases. TogoGenome and TogoStanza have been under development since 2012 and are freely available along with their source codes on the GitHub repositories at https://github.com/togogenome/ and https://github.com/togostanza/, respectively, under the MIT license.


Subject(s)
Databases, Genetic , Genomics/methods , Semantic Web , Software , Humans
4.
Nucleic Acids Res ; 47(D1): D382-D389, 2019 01 08.
Article in English | MEDLINE | ID: mdl-30462302

ABSTRACT

The Microbial Genome Database for Comparative Analysis (MBGD) is a database for comparative genomics based on comprehensive orthology analysis of bacteria, archaea and unicellular eukaryotes. MBGD now contains 6318 genomes. To utilize the database for both closely related and distantly related genomes, MBGD previously provided two types of ortholog tables: the standard ortholog table containing one representative genome from each genus covering the entire taxonomic range and the taxon specific ortholog tables for each taxon. However, this approach has a drawback in that the standard ortholog table contains only genes that are conserved in the representative genomes. To address this problem, we developed a stepwise procedure to construct ortholog tables hierarchically in a bottom-up manner. By using this approach, the new standard ortholog table now covers the entire gene repertoire stored in MBGD. In addition, we have enhanced several functionalities, including rapid and flexible keyword searching, profile-based sequence searching for orthology assignment to a user query sequence, and displaying a phylogenetic tree of each taxon based on the concatenated core gene sequences. For integrative database searching, the core data in MBGD are represented in Resource Description Framework (RDF) and a SPARQL interface is provided to search them. MBGD is available at http://mbgd.genome.ad.jp/.


Subject(s)
Genome, Archaeal , Genome, Bacterial , Genome, Fungal , Genome, Protozoan , Genomics , Sequence Homology, Nucleic Acid , Cluster Analysis , Databases, Genetic , Software , User-Computer Interface
5.
BMC Bioinformatics ; 18(1): 93, 2017 Feb 08.
Article in English | MEDLINE | ID: mdl-28178937

ABSTRACT

BACKGROUND: Toward improved interoperability of distributed biological databases, an increasing number of datasets have been published in the standardized Resource Description Framework (RDF). Although the powerful SPARQL Protocol and RDF Query Language (SPARQL) provides a basis for exploiting RDF databases, writing SPARQL code is burdensome for users including bioinformaticians. Thus, an easy-to-use interface is necessary. RESULTS: We developed SPANG, a SPARQL client that has unique features for querying RDF datasets. SPANG dynamically generates typical SPARQL queries according to specified arguments. It can also call SPARQL template libraries constructed in a local system or published on the Web. Further, it enables combinatorial execution of multiple queries, each with a distinct target database. These features facilitate easy and effective access to RDF datasets and integrative analysis of distributed data. CONCLUSIONS: SPANG helps users to exploit RDF datasets by generation and reuse of SPARQL queries through a simple interface. This client will enhance integrative exploitation of biological RDF datasets distributed across the Web. This software package is freely available at http://purl.org/net/spang .


Subject(s)
Computer Communication Networks , Databases, Factual , Internet
6.
J Biomed Semantics ; 7(1): 34, 2016 06 04.
Article in English | MEDLINE | ID: mdl-27259657

ABSTRACT

BACKGROUND: Computational comparative analysis of multiple genomes provides valuable opportunities to biomedical research. In particular, orthology analysis can play a central role in comparative genomics; it guides establishing evolutionary relations among genes of organisms and allows functional inference of gene products. However, the wide variations in current orthology databases necessitate the research toward the shareability of the content that is generated by different tools and stored in different structures. Exchanging the content with other research communities requires making the meaning of the content explicit. DESCRIPTION: The need for a common ontology has led to the creation of the Orthology Ontology (ORTH) following the best practices in ontology construction. Here, we describe our model and major entities of the ontology that is implemented in the Web Ontology Language (OWL), followed by the assessment of the quality of the ontology and the application of the ORTH to existing orthology datasets. This shareable ontology enables the possibility to develop Linked Orthology Datasets and a meta-predictor of orthology through standardization for the representation of orthology databases. The ORTH is freely available in OWL format to all users at http://purl.org/net/orth . CONCLUSIONS: The Orthology Ontology can serve as a framework for the semantic standardization of orthology content and it will contribute to a better exploitation of orthology resources in biomedical research. The results demonstrate the feasibility of developing shareable datasets using this ontology. Further applications will maximize the usefulness of this ontology.


Subject(s)
Gene Ontology , Genomics/methods , Genomics/standards , Reference Standards
7.
PLoS One ; 10(4): e0122802, 2015.
Article in English | MEDLINE | ID: mdl-25875762

ABSTRACT

Recently, various types of biological data, including genomic sequences, have been rapidly accumulating. To discover biological knowledge from such growing heterogeneous data, a flexible framework for data integration is necessary. Ortholog information is a central resource for interlinking corresponding genes among different organisms, and the Semantic Web provides a key technology for the flexible integration of heterogeneous data. We have constructed an ortholog database using the Semantic Web technology, aiming at the integration of numerous genomic data and various types of biological information. To formalize the structure of the ortholog information in the Semantic Web, we have constructed the Ortholog Ontology (OrthO). While the OrthO is a compact ontology for general use, it is designed to be extended to the description of database-specific concepts. On the basis of OrthO, we described the ortholog information from our Microbial Genome Database for Comparative Analysis (MBGD) in the form of Resource Description Framework (RDF) and made it available through the SPARQL endpoint, which accepts arbitrary queries specified by users. In this framework based on the OrthO, the biological data of different organisms can be integrated using the ortholog information as a hub. Besides, the ortholog information from different data sources can be compared with each other using the OrthO as a shared ontology. Here we show some examples demonstrating that the ortholog information described in RDF can be used to link various biological data such as taxonomy information and Gene Ontology. Thus, the ortholog database using the Semantic Web technology can contribute to biological knowledge discovery through integrative data analysis.


Subject(s)
Computational Biology/methods , Databases, Genetic , Gene Ontology , Genome , Animals , Bacteria/genetics , Computational Biology/statistics & numerical data , Datasets as Topic , Fungi/genetics , Humans , Internet , Plants/genetics , Semantics
8.
Nucleic Acids Res ; 43(Database issue): D270-6, 2015 Jan.
Article in English | MEDLINE | ID: mdl-25398900

ABSTRACT

The microbial genome database for comparative analysis (MBGD) (available at http://mbgd.genome.ad.jp/) is a comprehensive ortholog database for flexible comparative analysis of microbial genomes, where the users are allowed to create an ortholog table among any specified set of organisms. Because of the rapid increase in microbial genome data owing to the next-generation sequencing technology, it becomes increasingly challenging to maintain high-quality orthology relationships while allowing the users to incorporate the latest genomic data available into an analysis. Because many of the recently accumulating genomic data are draft genome sequences for which some complete genome sequences of the same or closely related species are available, MBGD now stores draft genome data and allows the users to incorporate them into a user-specific ortholog database using the MyMBGD functionality. In this function, draft genome data are incorporated into an existing ortholog table created only from the complete genome data in an incremental manner to prevent low-quality draft data from affecting clustering results. In addition, to provide high-quality orthology relationships, the standard ortholog table containing all the representative genomes, which is first created by the rapid classification program DomClust, is now refined using DomRefine, a recently developed program for improving domain-level clustering using multiple sequence alignment information.


Subject(s)
Databases, Genetic , Genome, Microbial , Animals , Genomics , Humans , Protein Structure, Tertiary , Sequence Alignment
9.
BMC Bioinformatics ; 15: 148, 2014 May 18.
Article in English | MEDLINE | ID: mdl-24885064

ABSTRACT

BACKGROUND: Identification of ortholog groups is a crucial step in comparative analysis of multiple genomes. Although several computational methods have been developed to create ortholog groups, most of those methods do not evaluate orthology at the sub-gene level. In our method for domain-level ortholog clustering, DomClust, proteins are split into domains on the basis of alignment boundaries identified by all-against-all pairwise comparison, but it often fails to determine appropriate boundaries. RESULTS: We developed a method to improve domain-level ortholog classification using multiple alignment information. This method is based on a scoring scheme, the domain-specific sum-of-pairs (DSP) score, which evaluates ortholog clustering results at the domain level as the sum total of domain-level alignment scores. We developed a refinement pipeline to improve domain-level clustering, DomRefine, by optimizing the DSP score. We applied DomRefine to domain-level ortholog groups created by DomClust using a dataset obtained from the Microbial Genome Database for Comparative Analysis (MBGD), and evaluated the results using COG clusters and TIGRFAMs models as the reference data. Thus, we observed that the agreement between the resulting classification and the classifications in the reference databases is improved at almost every step in the refinement pipeline. Moreover, the refined classification showed better agreement than the classifications in the eggNOG databases when TIGRFAMs was used as the reference database. CONCLUSIONS: DomRefine is a useful tool for improving the quality of domain-level ortholog classification among microbial genomes. Combining with a rapid domain-level ortholog clustering method, such as DomClust, it can be used to create a high-quality ortholog database that can serve as a solid basis for various comparative genome analyses.


Subject(s)
Protein Structure, Tertiary , Sequence Alignment/methods , Sequence Analysis, Protein , Cluster Analysis , Genomics , Phylogeny , Software
10.
J Biomed Semantics ; 5(1): 5, 2014 Feb 05.
Article in English | MEDLINE | ID: mdl-24495517

ABSTRACT

The application of semantic technologies to the integration of biological data and the interoperability of bioinformatics analysis and visualization tools has been the common theme of a series of annual BioHackathons hosted in Japan for the past five years. Here we provide a review of the activities and outcomes from the BioHackathons held in 2011 in Kyoto and 2012 in Toyama. In order to efficiently implement semantic technologies in the life sciences, participants formed various sub-groups and worked on the following topics: Resource Description Framework (RDF) models for specific domains, text mining of the literature, ontology development, essential metadata for biological databases, platforms to enable efficient Semantic Web technology development and interoperability, and the development of applications for Semantic Web data. In this review, we briefly introduce the themes covered by these sub-groups. The observations made, conclusions drawn, and software development projects that emerged from these activities are discussed.

11.
Nucleic Acids Res ; 41(Database issue): D631-5, 2013 Jan.
Article in English | MEDLINE | ID: mdl-23118485

ABSTRACT

The microbial genome database for comparative analysis (MBGD, available at http://mbgd.genome.ad.jp/) is a platform for microbial genome comparison based on orthology analysis. As its unique feature, MBGD allows users to conduct orthology analysis among any specified set of organisms; this flexibility allows MBGD to adapt to a variety of microbial genomic study. Reflecting the huge diversity of microbial world, the number of microbial genome projects now becomes several thousands. To efficiently explore the diversity of the entire microbial genomic data, MBGD now provides summary pages for pre-calculated ortholog tables among various taxonomic groups. For some closely related taxa, MBGD also provides the conserved synteny information (core genome alignment) pre-calculated using the CoreAligner program. In addition, efficient incremental updating procedure can create extended ortholog table by adding additional genomes to the default ortholog table generated from the representative set of genomes. Combining with the functionalities of the dynamic orthology calculation of any specified set of organisms, MBGD is an efficient and flexible tool for exploring the microbial genome diversity.


Subject(s)
Databases, Nucleic Acid , Genetic Variation , Genome , Genome, Bacterial , Internet
12.
Database (Oxford) ; 2011: bar046, 2011.
Article in English | MEDLINE | ID: mdl-22039163

ABSTRACT

CELLPEDIA is a repository database for current knowledge about human cells. It contains various types of information, such as cell morphologies, gene expression and literature references. The major role of CELLPEDIA is to provide a digital dictionary of human cells for the biomedical field, including support for the characterization of artificially generated cells in regenerative medicine. CELLPEDIA features (i) its own cell classification scheme, in which whole human cells are classified by their physical locations in addition to conventional taxonomy; and (ii) cell differentiation pathways compiled from biomedical textbooks and journal papers. Currently, human differentiated cells and stem cells are classified into 2260 and 66 cell taxonomy keys, respectively, from which 934 parent-child relationships reported in cell differentiation or transdifferentiation pathways are retrievable. As far as we know, this is the first attempt to develop a digital cell bank to function as a public resource for the accumulation of current knowledge about human cells. The CELLPEDIA homepage is freely accessible except for the data submission pages that require authentication (please send a password request to cell-info@cbrc.jp). Database URL: http://cellpedia.cbrc.jp/


Subject(s)
Cell Physiological Phenomena , Cells/classification , Database Management Systems , Databases, Factual , Cell Differentiation , Humans , User-Computer Interface
13.
PLoS One ; 5(8): e11881, 2010 Aug 27.
Article in English | MEDLINE | ID: mdl-20806061

ABSTRACT

How to identify true transcription factor binding sites on the basis of sequence motif information (e.g., motif pattern, location, combination, etc.) is an important question in bioinformatics. We present "PeakRegressor," a system that identifies binding motifs by combining DNA-sequence data and ChIP-Seq data. PeakRegressor uses L1-norm log linear regression in order to predict peak values from binding motif candidates. Our approach successfully predicts the peak values of STAT1 and RNA Polymerase II with correlation coefficients as high as 0.65 and 0.66, respectively. Using PeakRegressor, we could identify composite motifs for STAT1, as well as potential regulatory SNPs (rSNPs) involved in the regulation of transcription levels of neighboring genes. In addition, we show that among five regression methods, L1-norm log linear regression achieves the best performance with respect to binding motif identification, biological interpretability and computational efficiency.


Subject(s)
Computational Biology , Polymorphism, Single Nucleotide/genetics , Regulatory Sequences, Nucleic Acid/genetics , Repetitive Sequences, Nucleic Acid/genetics , STAT1 Transcription Factor/metabolism , Base Sequence , Binding Sites , Linear Models , Principal Component Analysis , RNA Polymerase II/metabolism
14.
BMC Genomics ; 9: 152, 2008 Apr 02.
Article in English | MEDLINE | ID: mdl-18384671

ABSTRACT

BACKGROUND: Interspecies sequence comparison is a powerful tool to extract functional or evolutionary information from the genomes of organisms. A number of studies have compared protein sequences or promoter sequences between mammals, which provided many insights into genomics. However, the correlation between protein conservation and promoter conservation remains controversial. RESULTS: We examined promoter conservation as well as protein conservation for 6,901 human and mouse orthologous genes, and observed a very weak correlation between them. We further investigated their relationship by decomposing it based on functional categories, and identified categories with significant tendencies. Remarkably, the 'ribosome' category showed significantly low promoter conservation, despite its high protein conservation, and the 'extracellular matrix' category showed significantly high promoter conservation, in spite of its low protein conservation. CONCLUSION: Our results show the relation of gene function to protein conservation and promoter conservation, and revealed that there seem to be nonparallel components between protein and promoter sequence evolution.


Subject(s)
Conserved Sequence/genetics , Evolution, Molecular , Genes/genetics , Promoter Regions, Genetic/genetics , Animals , Computational Biology , Humans , Mice , Sequence Alignment , Sequence Homology , Species Specificity
15.
Dalton Trans ; (9): 1213-7, 2006 Mar 07.
Article in English | MEDLINE | ID: mdl-16482359

ABSTRACT

Two novel salts of lacunary tungstosilicates with guanidinium and alkali metal cations, (CH6N3)7Na[SiW11O39].(CH3)2CO.8H2O (1) and (CH6N3)K6Na[SiW11O39].11.5H2O (2), have been synthesized and their crystal structures have been determined by synchrotron X-ray diffraction. In both crystals, the Na+ cations link the lacunary Keggin-type tungstosilicate anions into linear structures. The neighboring [SiW11O39]8- anions are related by two-fold screw and translational operations in compounds 1 and 2, respectively. Second harmonic generation was observed for compound 1.

16.
Dalton Trans ; (16): 2726-30, 2005 Aug 21.
Article in English | MEDLINE | ID: mdl-16075112

ABSTRACT

New mixed metal clusters with M19 metal frameworks have been synthesized by NaBH4 reduction of Au(NO3)(PMe2Ph) together with AgNO3 in ethanol. Single crystal X-ray diffraction has revealed Au12Ag7 and Au17Ag2 metal skeletons for these clusters, which are best described in terms of bicapped pentagonal antiprismatic cages with a staggered-staggered M(5) ring configuration. These clusters connect the missing link between M13 icosahedral and M25 biicosahedral clusters providing a view of the cluster growth process. A TEM image of this cluster has been observed, which has clearly demonstrated single-sized nano-particles of less than 1.0 nm.


Subject(s)
Gold/chemistry , Organometallic Compounds/chemistry , Silver/chemistry , Crystallography, X-Ray , Microscopy, Electron, Transmission/methods , Models, Molecular , Organometallic Compounds/chemical synthesis , Sensitivity and Specificity , Spectrophotometry/methods , X-Rays
SELECTION OF CITATIONS
SEARCH DETAIL
...