Tutorial

IR from Bag-of-words to BERT and Beyond with PyTerrier

Nicola Tonellotto

Advances from the natural language processing community have recently sparked a renaissance in the task of adhoc search. Particularly, large contextualized language modeling techniques, such as BERT, have equipped ranking models with a far deeper understanding of language than the capabilities of previous bag-of-words (BoW) models. Applying these techniques to a new task is tricky, requiring knowledge of deep learning frameworks, and significant scripting and data munging. In this full-day tutorial, we provide background on classical (e.g., BoW), modern (e.g., Learning to Rank) and contemporary (e.g., BERT, doc2query) search ranking and re-ranking techniques. Going further, we detail and demonstrate how these can be easily experimentally applied to new search tasks in a new declarative style of conducting experiments exemplified by the PyTerrier and OpenNIR search toolkits. This tutorial is interactive in nature for participants, where each session mixes explanatory presentation with hands-on activities using prepared Jupyter notebooks running on the Google Colab platform.

 Talk Overview  Program