HAGRID: A Human-LLM Collaborative Dataset for Generative Information-seeking with Attribution

HAGRID (Human-in-the-loop Attributable Generative Retrieval for Information-seeking Dataset) is a dataset for generative information-seeking scenarios. It is constructed on top of MIRACL 🌍🙌🌏, an information retrieval dataset that consists of queries along with a set of manually labelled relevant passages (quotes).

We collect attributed explanations for each question by eliciting prompts from GPT-3.5, based on the given relevant passages. The explanations adhere to an in-context citation style, similar to scientific articles, that reference the supporting quotes. We then ask human annotators to judge the explanations based on two criteria:

Informativeness: whether they provide a direct answer to the question.
Attributability: whether they are attributable to the source passages.

Data

HAGRID is hosted on Hugging Face 🤗: link.

import datasets
hagrid = datasets.load_dataset("miracl/hagrid", split="train")
print(hagrid[0])

Split	#Q	#A	#Informativeness	#Attribuatability
Train	1,922	3,214	3,214	754
Dev	716	1,318	1,157	826

Baselines (Coming soon!)

We are planning to release baseline models soon! Stay tuned!

Contact

If you have any questions, feel free to email us (project.miracl [at] gmail.com) or start a Github issue under this repository.

License

This work is licensed under the Apache 2 license. See LICENSE for details.

Citation

If you find this dataset and repository helpful, please cite HAGRID as follows:

@article{hagrid,
      title={{HAGRID}: A Human-LLM Collaborative Dataset for Generative Information-Seeking with Attribution}, 
      author={Ehsan Kamalloo and Aref Jafari and Xinyu Zhang and Nandan Thakur and Jimmy Lin},
      year={2023},
      journal={arXiv:2307.16883},
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
assets		assets
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HAGRID: A Human-LLM Collaborative Dataset for Generative Information-seeking with Attribution

Quick Links

Data

Baselines (Coming soon!)

Contact

License

Citation

About

Releases

Packages

Contributors 2

License

project-miracl/hagrid

Folders and files

Latest commit

History

Repository files navigation

HAGRID: A Human-LLM Collaborative Dataset for Generative Information-seeking with Attribution

Quick Links

Data

Baselines (Coming soon!)

Contact

License

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages