Referring Expression Generation in Visually Grounded Dialogue with Discourse-aware Comprehension Guiding

Note

We are in the process of adding the material described in our paper to this repo.

Repository for the paper "Referring Expression Generation in Visually Grounded Dialogue with Discourse-aware Comprehension Guiding" to be presented at INLG 2024. Please cite the following work if you use anything from this repository or from our paper:

@inproceedings{willemsen-skantze-2024-referring-expression,
    title = "Referring Expression Generation in Visually Grounded Dialogue with Discourse-aware Comprehension Guiding",
    author = "Willemsen, Bram  and
      Skantze, Gabriel",
    editor = "Mahamood, Saad  and
      Minh, Nguyen Le  and
      Ippolito, Daphne",
    booktitle = "Proceedings of the 17th International Natural Language Generation Conference",
    month = sep,
    year = "2024",
    address = "Tokyo, Japan",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.inlg-main.38",
    pages = "453--469"
}

🔭 The Task

In this paper, we ...

... propose an approach to referring expression generation (REG) in visually grounded dialogue that is meant to produce referring expressions (REs) that are both discriminative and discourse-appropriate. Our method constitutes a two-stage process. First, we model REG as a text- and image-conditioned next-token prediction task. REs are autoregressively generated based on their preceding linguistic context and a visual representation of the referent. Second, we propose the use of discourse-aware comprehension guiding as part of a generate-and rerank strategy through which candidate REs generated with our REG model are reranked based on their discourse-dependent discriminatory power.

We fine-tune a generative VLM, IDEFICS, to serve as our REG model. We repurpose the conversational referent description generator (CRDG) framework of Willemsen et al. (2023) for discourse-aware comprehension guiding: we use the CRDG to score and, subsequently, rerank candidate REs based on their discourse-dependent discriminatory power. Figure 1 provides a visualization of the proposed two-stage, four-step framework.

Figure 1: Visualization of the proposed two-stage, four-step framework. In the first stage, we generate candidate REs with a fine-tuned VLM, conditioning the generation of tokens on the preceding linguistic context and a visual representation of the referent. In the second stage, we use the CRDG framework to score candidate REs on their discourse-dependent discriminatory power: the candidate with the highest pooled score is selected.

For more details, we refer the reader to our paper.

📄 The Data

We use data from the visually grounded dialogue task "A Game Of Sorts" for the fine-tuning and evaluation of our proposed method.

In order to reproduce our work you will need the "A Game Of Sorts" data and additional annotations:

git clone https://github.com/willemsenbram/a-game-of-sorts.git
git clone https://github.com/willemsenbram/reference-resolution-via-text-generation.git

For more information about the original dataset, we refer the reader to the "Collecting Visually-Grounded Dialogue with A Game Of Sorts" paper. For more information about the additional annotations, we refer the reader to the "Resolving References in Visually-Grounded Dialogue via Text Generation" paper.

🍝 The Code

💾 The LoRA Weights

🖨️ The Output

The generated output on which the results of the experiments reported in the paper are based can be found in ./experiments/output.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
experiments		experiments
models		models
LICENSE		LICENSE
README.md		README.md
guided_reg_framework.png		guided_reg_framework.png
paper.pdf		paper.pdf
reference.bib		reference.bib

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Referring Expression Generation in Visually Grounded Dialogue with Discourse-aware Comprehension Guiding

📜 Overview

🔭 The Task

📄 The Data

🍝 The Code

💾 The LoRA Weights

🖨️ The Output

About

Releases

Packages

Languages

License

willemsenbram/reg-with-guiding

Folders and files

Latest commit

History

Repository files navigation

Referring Expression Generation in Visually Grounded Dialogue with Discourse-aware Comprehension Guiding

📜 Overview

🔭 The Task

📄 The Data

🍝 The Code

💾 The LoRA Weights

🖨️ The Output

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages