Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Mol Inform ; 33(11-12): 790-801, 2014 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-27485425

RESUMO

Developing database systems connecting diverse species based on omics is the most important theme in big data biology. To attain this purpose, we have developed KNApSAcK Family Databases, which are utilized in a number of researches in metabolomics. In the present study, we have developed a network-based approach to analyze relationships between 3D structure and biological activity of metabolites consisting of four steps as follows: construction of a network of metabolites based on structural similarity (Step 1), classification of metabolites into structure groups (Step 2), assessment of statistically significant relations between structure groups and biological activities (Step 3), and 2-dimensional clustering of the constructed data matrix based on statistically significant relations between structure groups and biological activities (Step 4). Applying this method to a data set consisting of 2072 secondary metabolites and 140 biological activities reported in KNApSAcK Metabolite Activity DB, we obtained 983 statistically significant structure group-biological activity pairs. As a whole, we systematically analyzed the relationship between 3D-chemical structures of metabolites and biological activities.

2.
Comput Struct Biotechnol J ; 4: e201301010, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24688691

RESUMO

Molecular biological data has rapidly increased with the recent progress of the Omics fields, e.g., genomics, transcriptomics, proteomics and metabolomics that necessitates the development of databases and methods for efficient storage, retrieval, integration and analysis of massive data. The present study reviews the usage of KNApSAcK Family DB in metabolomics and related area, discusses several statistical methods for handling multivariate data and shows their application on Indonesian blended herbal medicines (Jamu) as a case study. Exploration using Biplot reveals many plants are rarely utilized while some plants are highly utilized toward specific efficacy. Furthermore, the ingredients of Jamu formulas are modeled using Partial Least Squares Discriminant Analysis (PLS-DA) in order to predict their efficacy. The plants used in each Jamu medicine served as the predictors, whereas the efficacy of each Jamu provided the responses. This model produces 71.6% correct classification in predicting efficacy. Permutation test then is used to determine plants that serve as main ingredients in Jamu formula by evaluating the significance of the PLS-DA coefficients. Next, in order to explain the role of plants that serve as main ingredients in Jamu medicines, information of pharmacological activity of the plants is added to the predictor block. Then N-PLS-DA model, multiway version of PLS-DA, is utilized to handle the three-dimensional array of the predictor block. The resulting N-PLS-DA model reveals that the effects of some pharmacological activities are specific for certain efficacy and the other activities are diverse toward many efficacies. Mathematical modeling introduced in the present study can be utilized in global analysis of big data targeting to reveal the underlying biology.

3.
J Mass Spectrom ; 45(7): 703-14, 2010 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-20623627

RESUMO

MassBank is the first public repository of mass spectra of small chemical compounds for life sciences (<3000 Da). The database contains 605 electron-ionization mass spectrometry (EI-MS), 137 fast atom bombardment MS and 9276 electrospray ionization (ESI)-MS(n) data of 2337 authentic compounds of metabolites, 11 545 EI-MS and 834 other-MS data of 10,286 volatile natural and synthetic compounds, and 3045 ESI-MS(2) data of 679 synthetic drugs contributed by 16 research groups (January 2010). ESI-MS(2) data were analyzed under nonstandardized, independent experimental conditions. MassBank is a distributed database. Each research group provides data from its own MassBank data servers distributed on the Internet. MassBank users can access either all of the MassBank data or a subset of the data by specifying one or more experimental conditions. In a spectral search to retrieve mass spectra similar to a query mass spectrum, the similarity score is calculated by a weighted cosine correlation in which weighting exponents on peak intensity and the mass-to-charge ratio are optimized to the ESI-MS(2) data. MassBank also provides a merged spectrum for each compound prepared by merging the analyzed ESI-MS(2) data on an identical compound under different collision-induced dissociation conditions. Data merging has significantly improved the precision of the identification of a chemical compound by 21-23% at a similarity score of 0.6. Thus, MassBank is useful for the identification of chemical compounds and the publication of experimental data.


Assuntos
Bases de Dados Factuais , Disseminação de Informação/métodos , Espectrometria de Massas , Espectrometria de Massas/métodos , Preparações Farmacêuticas/química , Espectrometria de Massas por Ionização por Electrospray/métodos
4.
Genome Inform ; 16(2): 174-82, 2005.
Artigo em Inglês | MEDLINE | ID: mdl-16901100

RESUMO

Our research activity of making the lexicon of relatively short oligopeptides has been one of the first steps to view the world of proteome from the perspective of oligopeptides. We propose a new method for the prediction of protein function, especially GeneOntology terms (GO terms), based on statistical characteristics of oligopeptides as an application of the lexicon. In the lexicon, a known function of a protein is inherited to its oligopeptides, and the correspondence between oligopeptides and the function is calculated in the whole proteins. In our method, unknown functions of proteins are predicted by means of the correspondence automatically. We measured the prediction performance using the 28,520 whole human proteins registered in RefSeq for several GO terms by recall-precision graphs. The GO terms include 'membrane', 'nucleus', 'ATP binding', 'hydorolase activity', 'GTP binding', 'intracellular signaling cascade' and 'ubiquitin cycle'. In most cases, it scores 70% recall with 80% precision. The prediction for ATP binding and GTP binding results in quite high performance: it scores 80% recall with 80% precision. Even in the worst case (ubiquitin cycle), it scores 62.6% recall with 80% precision. These results suggest that the proposed method is quite efficient for predicting GO terms.


Assuntos
Biologia Computacional/métodos , Oligopeptídeos/fisiologia , Proteínas/fisiologia , Proteoma/química , Proteoma/fisiologia , Proteômica , Biologia Computacional/estatística & dados numéricos , Humanos , Oligopeptídeos/química , Valor Preditivo dos Testes , Proteínas/química
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...