Your browser doesn't support javascript.
loading
An Enhanced Focused Web Crawler for Biomedical Topics Using Attention Enhanced Siamese Long Short Term Memory Networks
Mary, Joe Dhanith Pal Nesamony Rose; Balasubramanian, Surendiran; Raj, Raja Soosaimarian Peter.
  • Mary, Joe Dhanith Pal Nesamony Rose; National Institute of Technology Puducherry. Karaikal. IN
  • Balasubramanian, Surendiran; National Institute of Technology Puducherry. Karaikal. IN
  • Raj, Raja Soosaimarian Peter; Vellore Institute of Technology. School of Computer Science and Engineering. Vellore. IN
Braz. arch. biol. technol ; 64: e21210163, 2021. tab, graf
Article in English | LILACS-Express | LILACS | ID: biblio-1355796
ABSTRACT
Abstract The Internet is chosen to be one among the primary source of biomedical information. To retrieve necessary biomedical information, the search engine needs an efficient, focused crawler mechanism. But the area of research concerned with the focused crawler for biomedical topics is notably scanty. However, the quantity, momentum, diversity, and quality of the available online biomedical information, challenges and calls for enhanced aid to crawl. This paper surmounts the challenges and proposes a new learning approach for focused web crawling adopting Attention Enhanced Siamese Long Short Term Memory (AE-SLSTM) Networks with peephole connections which predicts topical relevance of the web page. The proposed AE-SLSTM model accurately computes the semantic similarity between the topic and the web pages. The performance of the newly designed crawler is assessed using two well known metrics namely harvest rate ( h r a t e ) and irrelevance ratio ( p r a t e ). The presented crawler surpass the existing focused crawlers with an average h r a t e of 0.39 and an average p r a t e of 0.61 after crawling 5,000 web pages relating to biomedical topics. The results clearly depicts that the proposed methodology aids to download more relevant biomedical web pages related to the particular topic from the internet.


Full text: Available Index: LILACS (Americas) Type of study: Prognostic study Language: English Journal: Braz. arch. biol. technol Journal subject: Biology Year: 2021 Type: Article Affiliation country: India Institution/Affiliation country: National Institute of Technology Puducherry/IN / Vellore Institute of Technology/IN

Similar

MEDLINE

...
LILACS

LIS


Full text: Available Index: LILACS (Americas) Type of study: Prognostic study Language: English Journal: Braz. arch. biol. technol Journal subject: Biology Year: 2021 Type: Article Affiliation country: India Institution/Affiliation country: National Institute of Technology Puducherry/IN / Vellore Institute of Technology/IN