Dataset Selection for Transfer Learning in Information Retrieval

Rughbeer, Y.; Pillay, A. W.; Jembere, E.

ABSTRACT

Information Retrieval is the task of satisfying an information need by retrieving relevant information from large collections. Recently, deep neural networks have achieved several performance breakthroughs in the field, owing to the availability of large-scale training sets. When training data is limited, however, neural retrieval systems vastly underperform. To compensate for the lack of training data, researchers have turned to transfer learning by relying on labelled data from other search domains. Despite having access to several publicly available datasets, researchers are currently unguided in selecting the best training set for their particular applications. To address this knowledge gap, we propose a rigorous method to select an optimal training set for a specific search domain. We validate this method on the TREC-COVID challenge, which was organized by the Allen Institute for Artificial Intelligence and the National Institute of Standards and Technology. Our neural model ranked first from 143 competing systems. More importantly, it was able to achieve this result by training on a dataset that was selected using our proposed method. This work highlights the performance gains that may be achieved through careful dataset selection in transfer learning. © 2020, Springer Nature Switzerland AG.

Similar