RESUMO
Many digital libraries use hierarchical indexing schema, such as MeSH to enable concept based search in the retrieval phase. However, improving or outperforming the traditional full text search isn't trivial. We present an extensive set of experiments using a hierarchical concept based search retrieval method, applied in addition to several baselines, within the Vaidruya search and retrieval framework. Concept Based Search applied in addition to a low baseline is outperforming significantly, especially when queried on concepts in the third level and using disjunction within the hierarchical trees.
Assuntos
Armazenamento e Recuperação da Informação/métodos , Processamento de Linguagem Natural , Descritores , Indexação e Redação de Resumos , Medical Subject Headings , Vocabulário ControladoRESUMO
OBJECTIVES: Study comparatively (1) concept-based search, using documents pre-indexed by a conceptual hierarchy; (2) context-sensitive search, using structured, labeled documents; and (3) traditional full-text search. Hypotheses were: (1) more contexts lead to better retrieval accuracy; and (2) adding concept-based search to the other searches would improve upon their baseline performances. DESIGN: Use our Vaidurya architecture, for search and retrieval evaluation, of structured documents classified by a conceptual hierarchy, on a clinical guidelines test collection. MEASUREMENTS: Precision computed at different levels of recall to assess the contribution of the retrieval methods. Comparisons of precisions done with recall set at 0.5, using t-tests. RESULTS: Performance increased monotonically with the number of query context elements. Adding context-sensitive elements, mean improvement was 11.1% at recall 0.5. With three contexts, mean query precision was 42% +/- 17% (95% confidence interval [CI], 31% to 53%); with two contexts, 32% +/- 13% (95% CI, 27% to 38%); and one context, 20% +/- 9% (95% CI, 15% to 24%). Adding context-based queries to full-text queries monotonically improved precision beyond the 0.4 level of recall. Mean improvement was 4.5% at recall 0.5. Adding concept-based search to full-text search improved precision to 19.4% at recall 0.5. CONCLUSIONS: The study demonstrated usefulness of concept-based and context-sensitive queries for enhancing the precision of retrieval from a digital library of semi-structured clinical guideline documents. Concept-based searches outperformed free-text queries, especially when baseline precision was low. In general, the more ontological elements used in the query, the greater the resulting precision.