This is the source code of the EMNLP 2023 paper Large Language Models and Multimodal Retrieval for Visual Word Sense Disambiguation
[paper].
git clone https://github.com/anastasiakrith/multimodal-retrieval-for-vwsd.git
cd multimodal-retrieval-for-vwsd
On the project folder run the following commands:
$ virtualenv env
to create a virtual environment$ source venv/bin/activate
to activate the environment$ pip install -r requirements.txt
to install packages- Create a
.env
file with the environmental variables. The project needs aOPENAI_API_KEY
with the API key corresponding to your openai account, and optionally aDATASET_PATH
corresponding to the absolute path of VWSD dataset.
python vl_retrieval_eval.py -llm "gpt-3.5" -vl "clip" -baseline -penalty
python qa_retrieval_eval.py -llm "gpt-3.5" -captioner "git" -strategy "greedy" -prompt "no_CoT" -zero_shot
python image_retrieval_eval.py -vl "clip" -wiki "wikipedia" -metric "cosine"
python text_retrieval_eval.py -captioner "git" -strategy "greedy" -extractor "clip" -metric "cosine"
The implementation relies on resources from openai-api and hugging-face transformers.