- Briefly introduce the history and trend in IR
- Traditional DPR (Retrieve then Rank)
- propose an alternative IR architecture called differentiable search index (DSI), direct seq2seq map query to document ID
Two Major Components
- Indexing
- Retrieval
- Pros
- End-to-end training
- Cons
- Not easy to scale DSI systems to handle large data volumes
- Transformer Memory as a Differentiable Search Index
- DSI++: Updating Transformer Memory with New Documents
- Learning to Tokenize for Generative Retrieval
- Autoregressive Search Engines: Generating Substrings as Document Identifiers
- Semantic-Enhanced Differentiable Search Index Inspired by Learning Strategies
- Learning to Rank in Generative Retrieval
- TOME: A Two-stage Approach for Model-based Retrieval
- Multiview Identifiers Enhanced Generative Retrieval
- generating more training data
- document representation as queries
- DSI
- Direct Indexing
- Set Indexing
- Inverted Index
- Bridging the Gap
- queries as representation
- Bridging the Gap Between Indexing and Retrieval for Differentiable Search Index with Query Generation
- Multiview Identifiers Enhanced Generative Retrieval
- DSI++: Updating Transformer Memory with New Documents
- Continual Learning for Generative Retrieval over Dynamic Corpora
Compare the performance of traditional IR systems with DSI-based IR systems
- Datasets
- IR tasks
- knowledge intensive tasks (eg. QA)
- Metrics
- Challenges with DSI: Address potential issues
- generating non-existent document IDs (FM Index)
- scaling DSI systems to handle large data volumes.
This structure aims to provide a comprehensive overview of Generative IR and the pivotal role of Differentiable Search Indexes, making it accessible to newcomers while detailing the progress and challenges in the field.