Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Bioinformatics ; 38(17): 4194-4199, 2022 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-35801937

RESUMO

MOTIVATION: Understanding life cannot be accomplished without making full use of biological data, which are scattered across databases of diverse categories in life sciences. To connect such data seamlessly, identifier (ID) conversion plays a key role. However, existing ID conversion services have disadvantages, such as covering only a limited range of biological categories of databases, not keeping up with the updates of the original databases and outputs being hard to interpret in the context of biological relations, especially when converting IDs in multiple steps. RESULTS: TogoID is an ID conversion service implementing unique features with an intuitive web interface and an application programming interface (API) for programmatic access. TogoID currently supports 65 datasets covering various biological categories. TogoID users can perform exploratory multistep conversions to find a path among IDs. To guide the interpretation of biological meanings in the conversions, we crafted an ontology that defines the semantics of the dataset relations. AVAILABILITY AND IMPLEMENTATION: The TogoID service is freely available on the TogoID website (https://togoid.dbcls.jp/) and the API is also provided to allow programmatic access. To encourage developers to add new dataset pairs, the system stores the configurations of pairs at the GitHub repository (https://github.com/togoid/togoid-config) and accepts the request of additional pairs. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Gerenciamento de Dados , Software , Bases de Dados Factuais
2.
F1000Res ; 9: 136, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32308977

RESUMO

We report on the activities of the 2015 edition of the BioHackathon, an annual event that brings together researchers and developers from around the world to develop tools and technologies that promote the reusability of biological data. We discuss issues surrounding the representation, publication, integration, mining and reuse of biological data and metadata across a wide range of biomedical data types of relevance for the life sciences, including chemistry, genotypes and phenotypes, orthology and phylogeny, proteomics, genomics, glycomics, and metabolomics. We describe our progress to address ongoing challenges to the reusability and reproducibility of research results, and identify outstanding issues that continue to impede the progress of bioinformatics research. We share our perspective on the state of the art, continued challenges, and goals for future research and development for the life sciences Semantic Web.


Assuntos
Disciplinas das Ciências Biológicas , Biologia Computacional , Web Semântica , Mineração de Dados , Metadados , Reprodutibilidade dos Testes
3.
Database (Oxford) ; 20192019 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-30624651

RESUMO

TogoGenome is a genome database that is purely based on the Semantic Web technology, which enables the integration of heterogeneous data and flexible semantic searches. All the information is stored as Resource Description Framework (RDF) data, and the reporting web pages are generated on the fly using SPARQL Protocol and RDF Query Language (SPARQL) queries. TogoGenome provides a semantic-faceted search system by gene functional annotation, taxonomy, phenotypes and environment based on the relevant ontologies. TogoGenome also serves as an interface to conduct semantic comparative genomics by which a user can observe pan-organism or organism-specific genes based on the functional aspect of gene annotations and the combinations of organisms from different taxa. The TogoGenome database exhibits a modularized structure, and each module in the report pages is separately served as TogoStanza, which is a generic framework for rendering an information block as IFRAME/Web Components, which can, unlike several other monolithic databases, also be reused to construct other databases. TogoGenome and TogoStanza have been under development since 2012 and are freely available along with their source codes on the GitHub repositories at https://github.com/togogenome/ and https://github.com/togostanza/, respectively, under the MIT license.


Assuntos
Bases de Dados Genéticas , Genômica/métodos , Web Semântica , Software , Humanos
4.
Nucleic Acids Res ; 47(D1): D382-D389, 2019 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-30462302

RESUMO

The Microbial Genome Database for Comparative Analysis (MBGD) is a database for comparative genomics based on comprehensive orthology analysis of bacteria, archaea and unicellular eukaryotes. MBGD now contains 6318 genomes. To utilize the database for both closely related and distantly related genomes, MBGD previously provided two types of ortholog tables: the standard ortholog table containing one representative genome from each genus covering the entire taxonomic range and the taxon specific ortholog tables for each taxon. However, this approach has a drawback in that the standard ortholog table contains only genes that are conserved in the representative genomes. To address this problem, we developed a stepwise procedure to construct ortholog tables hierarchically in a bottom-up manner. By using this approach, the new standard ortholog table now covers the entire gene repertoire stored in MBGD. In addition, we have enhanced several functionalities, including rapid and flexible keyword searching, profile-based sequence searching for orthology assignment to a user query sequence, and displaying a phylogenetic tree of each taxon based on the concatenated core gene sequences. For integrative database searching, the core data in MBGD are represented in Resource Description Framework (RDF) and a SPARQL interface is provided to search them. MBGD is available at http://mbgd.genome.ad.jp/.


Assuntos
Genoma Arqueal , Genoma Bacteriano , Genoma Fúngico , Genoma de Protozoário , Genômica , Homologia de Sequência do Ácido Nucleico , Análise por Conglomerados , Bases de Dados Genéticas , Software , Interface Usuário-Computador
5.
BMC Bioinformatics ; 18(1): 93, 2017 Feb 08.
Artigo em Inglês | MEDLINE | ID: mdl-28178937

RESUMO

BACKGROUND: Toward improved interoperability of distributed biological databases, an increasing number of datasets have been published in the standardized Resource Description Framework (RDF). Although the powerful SPARQL Protocol and RDF Query Language (SPARQL) provides a basis for exploiting RDF databases, writing SPARQL code is burdensome for users including bioinformaticians. Thus, an easy-to-use interface is necessary. RESULTS: We developed SPANG, a SPARQL client that has unique features for querying RDF datasets. SPANG dynamically generates typical SPARQL queries according to specified arguments. It can also call SPARQL template libraries constructed in a local system or published on the Web. Further, it enables combinatorial execution of multiple queries, each with a distinct target database. These features facilitate easy and effective access to RDF datasets and integrative analysis of distributed data. CONCLUSIONS: SPANG helps users to exploit RDF datasets by generation and reuse of SPARQL queries through a simple interface. This client will enhance integrative exploitation of biological RDF datasets distributed across the Web. This software package is freely available at http://purl.org/net/spang .


Assuntos
Redes de Comunicação de Computadores , Bases de Dados Factuais , Internet
6.
J Biomed Semantics ; 7(1): 34, 2016 06 04.
Artigo em Inglês | MEDLINE | ID: mdl-27259657

RESUMO

BACKGROUND: Computational comparative analysis of multiple genomes provides valuable opportunities to biomedical research. In particular, orthology analysis can play a central role in comparative genomics; it guides establishing evolutionary relations among genes of organisms and allows functional inference of gene products. However, the wide variations in current orthology databases necessitate the research toward the shareability of the content that is generated by different tools and stored in different structures. Exchanging the content with other research communities requires making the meaning of the content explicit. DESCRIPTION: The need for a common ontology has led to the creation of the Orthology Ontology (ORTH) following the best practices in ontology construction. Here, we describe our model and major entities of the ontology that is implemented in the Web Ontology Language (OWL), followed by the assessment of the quality of the ontology and the application of the ORTH to existing orthology datasets. This shareable ontology enables the possibility to develop Linked Orthology Datasets and a meta-predictor of orthology through standardization for the representation of orthology databases. The ORTH is freely available in OWL format to all users at http://purl.org/net/orth . CONCLUSIONS: The Orthology Ontology can serve as a framework for the semantic standardization of orthology content and it will contribute to a better exploitation of orthology resources in biomedical research. The results demonstrate the feasibility of developing shareable datasets using this ontology. Further applications will maximize the usefulness of this ontology.


Assuntos
Ontologia Genética , Genômica/métodos , Genômica/normas , Padrões de Referência
7.
PLoS One ; 10(4): e0122802, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25875762

RESUMO

Recently, various types of biological data, including genomic sequences, have been rapidly accumulating. To discover biological knowledge from such growing heterogeneous data, a flexible framework for data integration is necessary. Ortholog information is a central resource for interlinking corresponding genes among different organisms, and the Semantic Web provides a key technology for the flexible integration of heterogeneous data. We have constructed an ortholog database using the Semantic Web technology, aiming at the integration of numerous genomic data and various types of biological information. To formalize the structure of the ortholog information in the Semantic Web, we have constructed the Ortholog Ontology (OrthO). While the OrthO is a compact ontology for general use, it is designed to be extended to the description of database-specific concepts. On the basis of OrthO, we described the ortholog information from our Microbial Genome Database for Comparative Analysis (MBGD) in the form of Resource Description Framework (RDF) and made it available through the SPARQL endpoint, which accepts arbitrary queries specified by users. In this framework based on the OrthO, the biological data of different organisms can be integrated using the ortholog information as a hub. Besides, the ortholog information from different data sources can be compared with each other using the OrthO as a shared ontology. Here we show some examples demonstrating that the ortholog information described in RDF can be used to link various biological data such as taxonomy information and Gene Ontology. Thus, the ortholog database using the Semantic Web technology can contribute to biological knowledge discovery through integrative data analysis.


Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Ontologia Genética , Genoma , Animais , Bactérias/genética , Biologia Computacional/estatística & dados numéricos , Conjuntos de Dados como Assunto , Fungos/genética , Humanos , Internet , Plantas/genética , Semântica
8.
Nucleic Acids Res ; 43(Database issue): D270-6, 2015 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-25398900

RESUMO

The microbial genome database for comparative analysis (MBGD) (available at http://mbgd.genome.ad.jp/) is a comprehensive ortholog database for flexible comparative analysis of microbial genomes, where the users are allowed to create an ortholog table among any specified set of organisms. Because of the rapid increase in microbial genome data owing to the next-generation sequencing technology, it becomes increasingly challenging to maintain high-quality orthology relationships while allowing the users to incorporate the latest genomic data available into an analysis. Because many of the recently accumulating genomic data are draft genome sequences for which some complete genome sequences of the same or closely related species are available, MBGD now stores draft genome data and allows the users to incorporate them into a user-specific ortholog database using the MyMBGD functionality. In this function, draft genome data are incorporated into an existing ortholog table created only from the complete genome data in an incremental manner to prevent low-quality draft data from affecting clustering results. In addition, to provide high-quality orthology relationships, the standard ortholog table containing all the representative genomes, which is first created by the rapid classification program DomClust, is now refined using DomRefine, a recently developed program for improving domain-level clustering using multiple sequence alignment information.


Assuntos
Bases de Dados Genéticas , Genoma Microbiano , Animais , Genômica , Humanos , Estrutura Terciária de Proteína , Alinhamento de Sequência
9.
BMC Bioinformatics ; 15: 148, 2014 May 18.
Artigo em Inglês | MEDLINE | ID: mdl-24885064

RESUMO

BACKGROUND: Identification of ortholog groups is a crucial step in comparative analysis of multiple genomes. Although several computational methods have been developed to create ortholog groups, most of those methods do not evaluate orthology at the sub-gene level. In our method for domain-level ortholog clustering, DomClust, proteins are split into domains on the basis of alignment boundaries identified by all-against-all pairwise comparison, but it often fails to determine appropriate boundaries. RESULTS: We developed a method to improve domain-level ortholog classification using multiple alignment information. This method is based on a scoring scheme, the domain-specific sum-of-pairs (DSP) score, which evaluates ortholog clustering results at the domain level as the sum total of domain-level alignment scores. We developed a refinement pipeline to improve domain-level clustering, DomRefine, by optimizing the DSP score. We applied DomRefine to domain-level ortholog groups created by DomClust using a dataset obtained from the Microbial Genome Database for Comparative Analysis (MBGD), and evaluated the results using COG clusters and TIGRFAMs models as the reference data. Thus, we observed that the agreement between the resulting classification and the classifications in the reference databases is improved at almost every step in the refinement pipeline. Moreover, the refined classification showed better agreement than the classifications in the eggNOG databases when TIGRFAMs was used as the reference database. CONCLUSIONS: DomRefine is a useful tool for improving the quality of domain-level ortholog classification among microbial genomes. Combining with a rapid domain-level ortholog clustering method, such as DomClust, it can be used to create a high-quality ortholog database that can serve as a solid basis for various comparative genome analyses.


Assuntos
Estrutura Terciária de Proteína , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína , Análise por Conglomerados , Genômica , Filogenia , Software
10.
J Biomed Semantics ; 5(1): 5, 2014 Feb 05.
Artigo em Inglês | MEDLINE | ID: mdl-24495517

RESUMO

The application of semantic technologies to the integration of biological data and the interoperability of bioinformatics analysis and visualization tools has been the common theme of a series of annual BioHackathons hosted in Japan for the past five years. Here we provide a review of the activities and outcomes from the BioHackathons held in 2011 in Kyoto and 2012 in Toyama. In order to efficiently implement semantic technologies in the life sciences, participants formed various sub-groups and worked on the following topics: Resource Description Framework (RDF) models for specific domains, text mining of the literature, ontology development, essential metadata for biological databases, platforms to enable efficient Semantic Web technology development and interoperability, and the development of applications for Semantic Web data. In this review, we briefly introduce the themes covered by these sub-groups. The observations made, conclusions drawn, and software development projects that emerged from these activities are discussed.

11.
Nucleic Acids Res ; 41(Database issue): D631-5, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-23118485

RESUMO

The microbial genome database for comparative analysis (MBGD, available at http://mbgd.genome.ad.jp/) is a platform for microbial genome comparison based on orthology analysis. As its unique feature, MBGD allows users to conduct orthology analysis among any specified set of organisms; this flexibility allows MBGD to adapt to a variety of microbial genomic study. Reflecting the huge diversity of microbial world, the number of microbial genome projects now becomes several thousands. To efficiently explore the diversity of the entire microbial genomic data, MBGD now provides summary pages for pre-calculated ortholog tables among various taxonomic groups. For some closely related taxa, MBGD also provides the conserved synteny information (core genome alignment) pre-calculated using the CoreAligner program. In addition, efficient incremental updating procedure can create extended ortholog table by adding additional genomes to the default ortholog table generated from the representative set of genomes. Combining with the functionalities of the dynamic orthology calculation of any specified set of organisms, MBGD is an efficient and flexible tool for exploring the microbial genome diversity.


Assuntos
Bases de Dados de Ácidos Nucleicos , Variação Genética , Genoma , Genoma Bacteriano , Internet
12.
Database (Oxford) ; 2011: bar046, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-22039163

RESUMO

CELLPEDIA is a repository database for current knowledge about human cells. It contains various types of information, such as cell morphologies, gene expression and literature references. The major role of CELLPEDIA is to provide a digital dictionary of human cells for the biomedical field, including support for the characterization of artificially generated cells in regenerative medicine. CELLPEDIA features (i) its own cell classification scheme, in which whole human cells are classified by their physical locations in addition to conventional taxonomy; and (ii) cell differentiation pathways compiled from biomedical textbooks and journal papers. Currently, human differentiated cells and stem cells are classified into 2260 and 66 cell taxonomy keys, respectively, from which 934 parent-child relationships reported in cell differentiation or transdifferentiation pathways are retrievable. As far as we know, this is the first attempt to develop a digital cell bank to function as a public resource for the accumulation of current knowledge about human cells. The CELLPEDIA homepage is freely accessible except for the data submission pages that require authentication (please send a password request to cell-info@cbrc.jp). Database URL: http://cellpedia.cbrc.jp/


Assuntos
Fenômenos Fisiológicos Celulares , Células/classificação , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Factuais , Diferenciação Celular , Humanos , Interface Usuário-Computador
13.
PLoS One ; 5(8): e11881, 2010 Aug 27.
Artigo em Inglês | MEDLINE | ID: mdl-20806061

RESUMO

How to identify true transcription factor binding sites on the basis of sequence motif information (e.g., motif pattern, location, combination, etc.) is an important question in bioinformatics. We present "PeakRegressor," a system that identifies binding motifs by combining DNA-sequence data and ChIP-Seq data. PeakRegressor uses L1-norm log linear regression in order to predict peak values from binding motif candidates. Our approach successfully predicts the peak values of STAT1 and RNA Polymerase II with correlation coefficients as high as 0.65 and 0.66, respectively. Using PeakRegressor, we could identify composite motifs for STAT1, as well as potential regulatory SNPs (rSNPs) involved in the regulation of transcription levels of neighboring genes. In addition, we show that among five regression methods, L1-norm log linear regression achieves the best performance with respect to binding motif identification, biological interpretability and computational efficiency.


Assuntos
Biologia Computacional , Polimorfismo de Nucleotídeo Único/genética , Sequências Reguladoras de Ácido Nucleico/genética , Sequências Repetitivas de Ácido Nucleico/genética , Fator de Transcrição STAT1/metabolismo , Sequência de Bases , Sítios de Ligação , Modelos Lineares , Análise de Componente Principal , RNA Polimerase II/metabolismo
14.
BMC Genomics ; 9: 152, 2008 Apr 02.
Artigo em Inglês | MEDLINE | ID: mdl-18384671

RESUMO

BACKGROUND: Interspecies sequence comparison is a powerful tool to extract functional or evolutionary information from the genomes of organisms. A number of studies have compared protein sequences or promoter sequences between mammals, which provided many insights into genomics. However, the correlation between protein conservation and promoter conservation remains controversial. RESULTS: We examined promoter conservation as well as protein conservation for 6,901 human and mouse orthologous genes, and observed a very weak correlation between them. We further investigated their relationship by decomposing it based on functional categories, and identified categories with significant tendencies. Remarkably, the 'ribosome' category showed significantly low promoter conservation, despite its high protein conservation, and the 'extracellular matrix' category showed significantly high promoter conservation, in spite of its low protein conservation. CONCLUSION: Our results show the relation of gene function to protein conservation and promoter conservation, and revealed that there seem to be nonparallel components between protein and promoter sequence evolution.


Assuntos
Sequência Conservada/genética , Evolução Molecular , Genes/genética , Regiões Promotoras Genéticas/genética , Animais , Biologia Computacional , Humanos , Camundongos , Alinhamento de Sequência , Homologia de Sequência , Especificidade da Espécie
15.
Dalton Trans ; (9): 1213-7, 2006 Mar 07.
Artigo em Inglês | MEDLINE | ID: mdl-16482359

RESUMO

Two novel salts of lacunary tungstosilicates with guanidinium and alkali metal cations, (CH6N3)7Na[SiW11O39].(CH3)2CO.8H2O (1) and (CH6N3)K6Na[SiW11O39].11.5H2O (2), have been synthesized and their crystal structures have been determined by synchrotron X-ray diffraction. In both crystals, the Na+ cations link the lacunary Keggin-type tungstosilicate anions into linear structures. The neighboring [SiW11O39]8- anions are related by two-fold screw and translational operations in compounds 1 and 2, respectively. Second harmonic generation was observed for compound 1.

16.
Dalton Trans ; (16): 2726-30, 2005 Aug 21.
Artigo em Inglês | MEDLINE | ID: mdl-16075112

RESUMO

New mixed metal clusters with M19 metal frameworks have been synthesized by NaBH4 reduction of Au(NO3)(PMe2Ph) together with AgNO3 in ethanol. Single crystal X-ray diffraction has revealed Au12Ag7 and Au17Ag2 metal skeletons for these clusters, which are best described in terms of bicapped pentagonal antiprismatic cages with a staggered-staggered M(5) ring configuration. These clusters connect the missing link between M13 icosahedral and M25 biicosahedral clusters providing a view of the cluster growth process. A TEM image of this cluster has been observed, which has clearly demonstrated single-sized nano-particles of less than 1.0 nm.


Assuntos
Ouro/química , Compostos Organometálicos/química , Prata/química , Cristalografia por Raios X , Microscopia Eletrônica de Transmissão/métodos , Modelos Moleculares , Compostos Organometálicos/síntese química , Sensibilidade e Especificidade , Espectrofotometria/métodos , Raios X
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...