pyterrier_splade

An example of a SPLADE indexing and retrieval using PyTerrier transformers.

Installation

We use Naver's SPLADE repository as a dependency:

%pip install -q python-terrier
%pip install -q git+https://github.com/naver/splade.git git+https://github.com/cmacdonald/pyt_splade.git

Indexing

Indexing takes place as a pipeline: we apply SPLADE transformation of the documents, which maps raw text into a dictionary of BERT WordPiece tokens and corresponding weights. The underlying indexer, Terrier, is configured to handle arbitrary word counts without further tokenisation (pretokenised=True).

The Terrier indexer is configured to index tokens unchanged.

import pyterrier as pt

import pyt_splade
splade = pyt_splade.Splade()
indexer = pt.IterDictIndexer('./msmarco_psg', pretokenised=True)

indxr_pipe = splade.doc_encoder() >> indexer
index_ref = indxr_pipe.index(dataset.get_corpus_iter(), batch_size=128)

Retrieval

Similarly, SPLADE encodes the query into BERT WordPieces and corresponding weights. We apply this as a query encoding transformer.

splade_retr = splade.query_encoder() >> pt.terrier.Retriever('./msmarco_psg', wmodel='Tf')

Scoring

SPLADE can also be used as a text scoring function.

first_stage = ... # e.g., BM25, dense retrieval, etc.
splade_scorer = first_stage >> pt.text.get_text(dataset, 'text') >> splade.scorer()

PISA

For faster retrieval with SPLADE, you can use the fast PISA retrieval backend provided by PyTerrier_PISA:

import pyt_splade
splade = pyt_splade.Splade()
dataset = pt.get_dataset('irds:msmarco-passage')
index = PisaIndex('./msmarco-passage-splade', stemmer='none')

# indexing
idx_pipeline = splade.doc_encoder() >> index.toks_indexer()
idx_pipeline.index(dataset.get_corpus_iter())

# retrieval

retr_pipeline = splade.query_encoder() >> index.quantized()

Demo

We have a demo of PyTerrier_SPLADE at https://huggingface.co/spaces/terrierteam/splade

Credits

Craig Macdonald
Sean MacAvaney

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.github/workflows		.github/workflows
pyt_splade		pyt_splade
tests		tests
.gitignore		.gitignore
MANIFEST.in		MANIFEST.in
README.md		README.md
msmarco-psg-v1.ipynb		msmarco-psg-v1.ipynb
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pyterrier_splade

Installation

Indexing

Retrieval

Scoring

PISA

Demo

Credits

About

Releases

Packages

Contributors 4

Languages

cmacdonald/pyt_splade

Folders and files

Latest commit

History

Repository files navigation

pyterrier_splade

Installation

Indexing

Retrieval

Scoring

PISA

Demo

Credits

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages