Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 4 de 4
Filter
Add more filters










Database
Language
Publication year range
1.
Methods Mol Biol ; 1807: 203-209, 2018.
Article in English | MEDLINE | ID: mdl-30030813

ABSTRACT

The US National Library of Medicine (NLM) uses the Medical Subject Headings (MeSH) (see Note 1 ) to index almost all 24 million citations in MEDLINE, which greatly facilitates the application of biomedical information retrieval and text mining. Large-scale automatic MeSH indexing has two challenging aspects: the MeSH side and citation side. For the MeSH side, each citation is annotated by only 12 (on average) out of all 28,000 MeSH terms. For the citation side, all existing methods, including Medical Text Indexer (MTI) by NLM, deal with text by bag-of-words, which cannot capture semantic and context-dependent information well. To solve these two challenges, we developed the MeSHLabeler and DeepMeSH. By utilizing "learning to rank" (LTR) framework, MeSHLabeler integrates multiple types of information to solve the challenge in the MeSH side, while DeepMeSH integrates deep semantic representation to solve the challenge in the citation side. MeSHLabeler achieved the first place in both BioASQ2 and BioASQ3, and DeepMeSH achieved the first place in both BioASQ4 and BioASQ5 challenges. DeepMeSH is available at http://datamining-iip.fudan.edu.cn/deepmesh .


Subject(s)
Abstracting and Indexing , Algorithms , Medical Subject Headings , User-Computer Interface
2.
Bioinformatics ; 32(12): i70-i79, 2016 06 15.
Article in English | MEDLINE | ID: mdl-27307646

ABSTRACT

MOTIVATION: Medical Subject Headings (MeSH) indexing, which is to assign a set of MeSH main headings to citations, is crucial for many important tasks in biomedical text mining and information retrieval. Large-scale MeSH indexing has two challenging aspects: the citation side and MeSH side. For the citation side, all existing methods, including Medical Text Indexer (MTI) by National Library of Medicine and the state-of-the-art method, MeSHLabeler, deal with text by bag-of-words, which cannot capture semantic and context-dependent information well. METHODS: We propose DeepMeSH that incorporates deep semantic information for large-scale MeSH indexing. It addresses the two challenges in both citation and MeSH sides. The citation side challenge is solved by a new deep semantic representation, D2V-TFIDF, which concatenates both sparse and dense semantic representations. The MeSH side challenge is solved by using the 'learning to rank' framework of MeSHLabeler, which integrates various types of evidence generated from the new semantic representation. RESULTS: DeepMeSH achieved a Micro F-measure of 0.6323, 2% higher than 0.6218 of MeSHLabeler and 12% higher than 0.5637 of MTI, for BioASQ3 challenge data with 6000 citations. AVAILABILITY AND IMPLEMENTATION: The software is available upon request. CONTACT: zhusf@fudan.edu.cn SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Medical Subject Headings , Semantics , Software , Abstracting and Indexing , Data Mining , MEDLINE , National Library of Medicine (U.S.) , United States
3.
J Bioinform Comput Biol ; 13(6): 1542002, 2015 Dec.
Article in English | MEDLINE | ID: mdl-26471719

ABSTRACT

Currently, all MEDLINE documents are indexed by medical subject headings (MeSH). Computing semantic similarity between two MeSH headings as well as two documents has become very important for many biomedical text mining applications. We develop an R package, MeSHSim, which can compute nine similarity measures between MeSH nodes, by which similarity between MeSH headings as well as MEDLINE documents can be easily computed. Also, MeSHSim supports querying hierarchy information of a MeSH heading and retrieving MeSH headings of a query document, and can be easily integrated into pipelines for any biomedical text analysis tasks. MeSHSim is released under general public license (GPL), and available through Bioconductor and from Github at https://github.com/JingZhou2015/MeSHSim.


Subject(s)
Algorithms , Data Mining/methods , MEDLINE , Medical Subject Headings
4.
Bioinformatics ; 31(12): i339-47, 2015 Jun 15.
Article in English | MEDLINE | ID: mdl-26072501

ABSTRACT

MOTIVATION: Medical Subject Headings (MeSHs) are used by National Library of Medicine (NLM) to index almost all citations in MEDLINE, which greatly facilitates the applications of biomedical information retrieval and text mining. To reduce the time and financial cost of manual annotation, NLM has developed a software package, Medical Text Indexer (MTI), for assisting MeSH annotation, which uses k-nearest neighbors (KNN), pattern matching and indexing rules. Other types of information, such as prediction by MeSH classifiers (trained separately), can also be used for automatic MeSH annotation. However, existing methods cannot effectively integrate multiple evidence for MeSH annotation. METHODS: We propose a novel framework, MeSHLabeler, to integrate multiple evidence for accurate MeSH annotation by using 'learning to rank'. Evidence includes numerous predictions from MeSH classifiers, KNN, pattern matching, MTI and the correlation between different MeSH terms, etc. Each MeSH classifier is trained independently, and thus prediction scores from different classifiers are incomparable. To address this issue, we have developed an effective score normalization procedure to improve the prediction accuracy. RESULTS: MeSHLabeler won the first place in Task 2A of 2014 BioASQ challenge, achieving the Micro F-measure of 0.6248 for 9,040 citations provided by the BioASQ challenge. Note that this accuracy is around 9.15% higher than 0.5724, obtained by MTI. AVAILABILITY AND IMPLEMENTATION: The software is available upon request.


Subject(s)
Abstracting and Indexing/methods , Medical Subject Headings , Software , Algorithms , Data Mining , MEDLINE , Reproducibility of Results
SELECTION OF CITATIONS
SEARCH DETAIL
...