Search | Global Index Medicus

An Enhanced Focused Web Crawler for Biomedical Topics Using Attention Enhanced Siamese Long Short Term Memory Networks

Mary, Joe Dhanith Pal Nesamony Rose; Balasubramanian, Surendiran; Raj, Raja Soosaimarian Peter.

Braz. arch. biol. technol ; 64: e21210163, 2021. tab, graf

Article in English | LILACS-Express | LILACS | ID: biblio-1355796

ABSTRACT

Abstract The Internet is chosen to be one among the primary source of biomedical information. To retrieve necessary biomedical information, the search engine needs an efficient, focused crawler mechanism. But the area of research concerned with the focused crawler for biomedical topics is notably scanty. However, the quantity, momentum, diversity, and quality of the available online biomedical information, challenges and calls for enhanced aid to crawl. This paper surmounts the challenges and proposes a new learning approach for focused web crawling adopting Attention Enhanced Siamese Long Short Term Memory (AE-SLSTM) Networks with peephole connections which predicts topical relevance of the web page. The proposed AE-SLSTM model accurately computes the semantic similarity between the topic and the web pages. The performance of the newly designed crawler is assessed using two well known metrics namely harvest rate ( h r a t e ) and irrelevance ratio ( p r a t e ). The presented crawler surpass the existing focused crawlers with an average h r a t e of 0.39 and an average p r a t e of 0.61 after crawling 5,000 web pages relating to biomedical topics. The results clearly depicts that the proposed methodology aids to download more relevant biomedical web pages related to the particular topic from the internet.

A Critique Empirical Evaluation of Relevance Computation for Focused Web Crawlers

Mary, Joe Dhanith Pal Nesamony Rose; Balasubramanian, Surendiran; Raj, Raja Soosaimarian Peter.

Braz. arch. biol. technol ; 64: e21210223, 2021. tab, graf

Article in English | LILACS-Express | LILACS | ID: biblio-1355799

ABSTRACT

Abstract Analogous to the spectacular growth of information-superhighway, The Internet, demands for coherent and economical crawling methods are translucent to shoot up. Consequently, many innovative techniques have been put forth for efficient crawling. Among them the significant one is focused crawlers. The focused crawlers are capable in searching web pages that are suitable for the topics defined in advance. Focused crawlers attract several search engines on the grounds of efficient filtering, reduced memory and time consumption. This paper furnishes a relevance computation based survey on web crawling. A bunch of fifty two focused crawlers from the existing literature survey is categorized to four different classes - classic focused crawler, semantic focused crawler, learning focused crawler and ontology learning focused crawler. The prerequisite and the mastery of each metric with respect to harvest rate, target recall, precision and F1-score are discussed. Future outlooks, shortcomings and strategies are also suggested.

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL