Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Self-Retrieval: An LLM-Driven Information Retrieval Architecture for the Era of Large Language Models #768

Open
1 task
irthomasthomas opened this issue Mar 16, 2024 · 1 comment
Labels
llm Large Language Models llm-applications Topics related to practical applications of Large Language Models in various fields llm-evaluation Evaluating Large Language Models performance and behavior through human-written evaluation sets New-Label Choose this option if the existing labels are insufficient to describe the content accurately Papers Research papers RAG Retrieval Augmented Generation for LLMs

Comments

@irthomasthomas
Copy link
Owner

Self-Retrieval: An LLM-Driven Information Retrieval Architecture for the Era of Large Language Models

Title: "Self-Retrieval: An LLM-Driven Information Retrieval Architecture for the Era of Large Language Models"

Description:

"The rise of large language models (LLMs) has transformed the role of information retrieval (IR) systems in the way to humans accessing information. Due to the isolated architecture and the limited interaction, existing IR systems are unable to fully accommodate the shift from directly providing information to humans to indirectly serving large language models. In this paper, we propose Self-Retrieval, an end-to-end, LLM-driven information retrieval architecture that can fully internalize the required abilities of IR systems into a single LLM and deeply leverage the capabilities of LLMs during IR process. Specifically, Self-retrieval internalizes the corpus to retrieve into a LLM via a natural language indexing architecture. Then the entire retrieval process is redefined as a procedure of document generation and self-assessment, which can be end-to-end executed using a single large language model. Experimental results demonstrate that Self-Retrieval not only significantly outperforms previous retrieval approaches by a large margin, but also can significantly boost the performance of LLM-driven downstream applications like retrieval augumented generation."

Authors:

Qiaoyu Tang1,3†, Jiawei Chen1,3, Bowen Yu4, Yaojie Lu1, Cheng Fu4, Haiyang Yu4, Hongyu Lin1†, Fei Huang4, Ben He1,3, Xianpei Han1,2, Le Sun1,2, Yongbin Li4

  1. Chinese Information Processing Laboratory
  2. State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, Beijing, China
  3. University of Chinese Academy of Sciences, Beijing, China
  4. Alibaba Group

Affiliations:

{tangqiaoyu2020,jiawei2020,luyaojie,hongyu,xianpei,sunle}@iscas.ac.cn
{yubowen.ybw,fucheng.fuc,yifei.yhy,f.huang,shuide.lyb}@alibaba-inc.com
[email protected]

Figures:

Illustration of the proposed Self-retrieval. We compare Self-retrieval (bottom) with sparse retrieval, dense retrieval and generative retrieval (top).

The process of Self-retrieval. The Self-retrieval model first build the index of given corpus with self-supervised learning. After that, for the input query, the Self-retrieval model will generate the natural language-described index and passage. Finally, we use self-assessment to score and rank the generated passages.

URL:

https://arxiv.org/html/2403.00801v1

Suggested labels

{'label-name': 'Information Retrieval Paradigm', 'label-description': 'Describes the shift from traditional information retrieval systems to LLM-driven IR architectures like Self-Retrieval.', 'confidence': 67.55}

@irthomasthomas irthomasthomas added llm Large Language Models llm-applications Topics related to practical applications of Large Language Models in various fields llm-evaluation Evaluating Large Language Models performance and behavior through human-written evaluation sets New-Label Choose this option if the existing labels are insufficient to describe the content accurately Papers Research papers RAG Retrieval Augmented Generation for LLMs labels Mar 16, 2024
@irthomasthomas
Copy link
Owner Author

Related content

#680 similarity score: 0.87
#333 similarity score: 0.87
#316 similarity score: 0.86
#363 similarity score: 0.86
#706 similarity score: 0.86
#170 similarity score: 0.86

@irthomasthomas irthomasthomas changed the title Title: "Self-Retrieval: An LLM-Driven Information Retrieval Architecture for the Era of Large Language Models" Self-Retrieval: An LLM-Driven Information Retrieval Architecture for the Era of Large Language Models Mar 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llm Large Language Models llm-applications Topics related to practical applications of Large Language Models in various fields llm-evaluation Evaluating Large Language Models performance and behavior through human-written evaluation sets New-Label Choose this option if the existing labels are insufficient to describe the content accurately Papers Research papers RAG Retrieval Augmented Generation for LLMs
Projects
None yet
Development

No branches or pull requests

1 participant