0

Establishing Strong Baselines for TripClick Health Retrieval

Transformer-based models and dense retrieval techniques outperform traditional BM25 methods in the TripClick health ad-hoc retrieval task, demonstrating their effectiveness through improved re-ranking and domain-specific pre-training.

Year
2022
Venue
arXiv 2022
Authors
4
Hosting
Abstract onlyARXIV-DEFAULT

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2201.00365ARXIV-DEFAULT
TL;DR
Semantic Scholar
Attribution policy →

Abstract

We present strong Transformer-based re-ranking and dense retrieval baselines for the recently released TripClick health ad-hoc retrieval collection. We improve the - originally too noisy - training data with a simple negative sampling policy. We achieve large gains over BM25 in the re-ranking task of TripClick, which were not achieved with the original baselines. Furthermore, we study the impact of different domain-specific pre-trained models on TripClick. Finally, we show that dense retrieval outperforms BM25 by considerable margins, even with simple training procedures.

Authors

4