Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 3 de 3
Filter
Add more filters











Database
Language
Publication year range
1.
Int J Med Inform ; 74(2-4): 317-24, 2005 Mar.
Article in English | MEDLINE | ID: mdl-15694638

ABSTRACT

Bio-medical knowledge bases are valuable resources for the research community. Original scientific publications are the main source used to annotate them. Medical annotation in Swiss-Prot is specifically targeted at finding and extracting data about human genetic diseases and polymorphisms. Curators have to scan through hundreds of publications to select the relevant ones. This workload can be greatly reduced by using bio-text mining techniques. Using a combination of natural language processing (NLP) techniques and statistical classifiers, we achieve recall points of up to 84% on the potentially interesting documents and a precision of more than 96% in detecting irrelevant documents. Careful analysis of the document pre-processing chain allows us to measure the impact of some steps on the overall result, as well as test different classifier configurations. The best combination was used to create a prototype of a search and classification tool that is currently tested by the database curators.


Subject(s)
Databases, Protein , Statistics as Topic , Genetic Diseases, Inborn/genetics , Humans , Polymorphism, Genetic
2.
Stud Health Technol Inform ; 95: 421-6, 2003.
Article in English | MEDLINE | ID: mdl-14664023

ABSTRACT

The goal of medical annotation of human proteins in Swiss-Prot is to add features specifically intended for researchers working on genetic diseases and polymorphisms. For this purpose, it is necessary to search through a vast number of publications containing relevant information. Promising results have been obtained by applying natural language processing and machine learning techniques to solve this problem. By using the Probabilistic Latent Categorizer on representative query sets, 69% recall and 59% precision was achieved for relevant documents. This classifier also rejected irrelevant abstracts with more than 96% precision. Better linguistic pre-processing of source documents can further improve such computer approach.


Subject(s)
Databases, Protein , Information Storage and Retrieval/statistics & numerical data , Probability , Switzerland
3.
Bioinformatics ; 19 Suppl 1: i91-4, 2003.
Article in English | MEDLINE | ID: mdl-12855443

ABSTRACT

MOTIVATION: Searching relevant publications for manual database annotation is a tedious task. In this paper, we apply a combination of Natural Language Processing (NLP) and probabilistic classification to re-rank documents returned by PubMed according to their relevance to Swiss-Prot annotation, and to identify significant terms in the documents. RESULTS: With a Probabilistic Latent Categoriser (PLC) we obtained 69% recall and 59% precision for relevant documents in a representative query. As the PLC technique provides the relative contribution of each term to the final document score, we used the Kullback-Leibler symmetric divergence to determine the most discriminating words for Swiss-Prot medical annotation. This information should allow curators to understand classification results better. It also has great value for fine-tuning the linguistic pre-processing of documents, which in turn can improve the overall classifier performance.


Subject(s)
Abstracting and Indexing/methods , Databases, Protein , Models, Statistical , Natural Language Processing , Periodicals as Topic/classification , Proteins/chemistry , PubMed , Algorithms , Artificial Intelligence , Documentation/methods , Pattern Recognition, Automated , Proteins/genetics
SELECTION OF CITATIONS
SEARCH DETAIL