Pesquisa | Portal Regional da BVS

Calling on a million minds for community annotation in WikiProteins.

Mons, Barend; Ashburner, Michael; Chichester, Christine; van Mulligen, Erik; Weeber, Marc; den Dunnen, Johan; van Ommen, Gert-Jan; Musen, Mark; Cockerill, Matthew; Hermjakob, Henning; Mons, Albert; Packer, Abel; Pacheco, Roberto; Lewis, Suzanna; Berkeley, Alfred; Melton, William; Barris, Nickolas; Wales, Jimmy; Meijssen, Gerard; Moeller, Erik; Roes, Peter Jan; Borner, Katy; Bairoch, Amos.

Genome Biol ; 9(5): R89, 2008.

Artigo em Inglês | MEDLINE | ID: mdl-18507872

RESUMO

WikiProteins enables community annotation in a Wiki-based system. Extracts of major data sources have been fused into an editable environment that links out to the original sources. Data from community edits create automatic copies of the original data. Semantic technology captures concepts co-occurring in one sentence and thus potential factual statements. In addition, indirect associations between concepts have been calculated. We call on a 'million minds' to annotate a 'million concepts' and to collect facts from the literature with the reward of collaborative knowledge discovery. The system is available for beta testing at http://www.wikiprofessional.org.

Assuntos

Bases de Dados de Proteínas , Proteínas/genética , Software , Armazenamento e Recuperação da Informação , Internet

Literature-based concept profiles for gene annotation: the issue of weighting.

Jelier, Rob; Schuemie, Martijn J; Roes, Peter-Jan; van Mulligen, Erik M; Kors, Jan A.

Int J Med Inform ; 77(5): 354-62, 2008 May.

Artigo em Inglês | MEDLINE | ID: mdl-17827057

RESUMO

BACKGROUND: Text-mining has been used to link biomedical concepts, such as genes or biological processes, to each other for annotation purposes or the generation of new hypotheses. To relate two concepts to each other several authors have used the vector space model, as vectors can be compared efficiently and transparently. Using this model, a concept is characterized by a list of associated concepts, together with weights that indicate the strength of the association. The associated concepts in the vectors and their weights are derived from a set of documents linked to the concept of interest. An important issue with this approach is the determination of the weights of the associated concepts. Various schemes have been proposed to determine these weights, but no comparative studies of the different approaches are available. Here we compare several weighting approaches in a large scale classification experiment. METHODS: Three different techniques were evaluated: (1) weighting based on averaging, an empirical approach; (2) the log likelihood ratio, a test-based measure; (3) the uncertainty coefficient, an information-theory based measure. The weighting schemes were applied in a system that annotates genes with Gene Ontology codes. As the gold standard for our study we used the annotations provided by the Gene Ontology Annotation project. Classification performance was evaluated by means of the receiver operating characteristics (ROC) curve using the area under the curve (AUC) as the measure of performance. RESULTS AND DISCUSSION: All methods performed well with median AUC scores greater than 0.84, and scored considerably higher than a binary approach without any weighting. Especially for the more specific Gene Ontology codes excellent performance was observed. The differences between the methods were small when considering the whole experiment. However, the number of documents that were linked to a concept proved to be an important variable. When larger amounts of texts were available for the generation of the concepts' vectors, the performance of the methods diverged considerably, with the uncertainty coefficient then outperforming the two other methods.

Assuntos

Indexação e Redação de Resumos/métodos , Sistemas de Gerenciamento de Base de Dados , Processamento de Linguagem Natural , Redes Neurais de Computação , Inteligência Artificial , Intervalos de Confiança , Sistemas de Gerenciamento de Base de Dados/estatística & dados numéricos , Bases de Dados Genéticas , Perfilação da Expressão Gênica/estatística & dados numéricos , Genes , Teoria da Informação , Funções Verossimilhança , Reconhecimento Automatizado de Padrão/métodos , Mapeamento de Interação de Proteínas , PubMed , Curva ROC , Terminologia como Assunto , Incerteza , Vocabulário Controlado

Assignment of protein function and discovery of novel nucleolar proteins based on automatic analysis of MEDLINE.

Schuemie, Martijn; Chichester, Christine; Lisacek, Frederique; Coute, Yohann; Roes, Peter-Jan; Sanchez, Jean Charles; Kors, Jan; Mons, Barend.

Proteomics ; 7(6): 921-31, 2007 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-17370270

RESUMO

Attribution of the most probable functions to proteins identified by proteomics is a significant challenge that requires extensive literature analysis. We have developed a system for automated prediction of implicit and explicit biologically meaningful functions for a proteomics study of the nucleolus. This approach uses a set of vocabulary terms to map and integrate the information from the entire MEDLINE database. Based on a combination of cross-species sequence homology searches and the corresponding literature, our approach facilitated the direct association between sequence data and information from biological texts describing function. Comparison of our automated functional assignment to manual annotation demonstrated our method to be highly effective. To establish the sensitivity, we defined the functional subtleties within a family containing a highly conserved sequence. Clustering of the DEAD-box protein family of RNA helicases confirmed that these proteins shared similar morphology although functional subfamilies were accurately identified by our approach. We visualized the nucleolar proteome in terms of protein functions using multi-dimensional scaling, showing functional associations between nucleolar proteins that were not previously realized. Finally, by clustering the functional properties of the established nucleolar proteins, we predicted novel nucleolar proteins. Subsequently, nonproteomics studies confirmed the predictions of previously unidentified nucleolar proteins.

Assuntos

MEDLINE , Proteínas Nucleares , Sequência de Aminoácidos , Animais , RNA Helicases DEAD-box/química , RNA Helicases DEAD-box/genética , RNA Helicases DEAD-box/metabolismo , Bases de Dados de Proteínas , Humanos , Dados de Sequência Molecular , Proteínas Nucleares/química , Proteínas Nucleares/genética , Proteínas Nucleares/metabolismo , Proteoma

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA