Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clue prompt templates #808

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

clue prompt templates #808

wants to merge 7 commits into from

Conversation

yongzx
Copy link
Contributor

@yongzx yongzx commented Jul 27, 2022

Add prompt templates to the CLUE benchmark tasks.

Currently:

  • 3 original task prompt templates
  • 1 non-original task prompt template

WIP:

  • NLI (ocnli, cmnli)
  • IFLYTEK: it has 119 label names, and I haven’t populated the answer-choices yet.
  • ChID: weird dataset structure where each example contains 4 subexamples. It needs the model to find 4 correct idioms for those 4 examples. The metrics are accuracy, and I am thinking of linearizing the dataset.

Comment on lines 23 to 25
jinja: "Question: \"{{question}}\"\nAnswer choices: {{ answer_choices[:-1] | join(',\
\ ') }}, or {{ answer_choices[-1] }}?\nPassage: {% for statement in context\
\ %} \n{{ statement }}\n{% endfor %}\n|||\n{{ answer }}"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This renders as e.g. ['Given the dialogue / passage below, what is the answer for the question "根据对话,可以知道什么?"\nAnswer choices: 今天天气不好, 比赛时间变了, or 校长忘了时间?\n \n男:足球比赛是明天上午八点开始吧?\n \n女:因为天气不好,比赛改到后天下午三点了。', '比赛时间变了']

How should the model know where the passage actually ends?
It may be reasonable to just continue the previous passage. Fine-tuning on such examples may lead to generation quality decreasing I think cc @thomasw21

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the point you're making. I don't know why the rendering is a list ...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad the first one is the input, the second the answer. So if we separate them with a whitespace the model will get:

Given the dialogue / passage below, what is the answer for the question "根据对话,可以知道什么?"\nAnswer choices: 今天天气不好, 比赛时间变了, or 校长忘了时间?\n \n男:足球比赛是明天上午八点开始吧?\n \n女:因为天气不好,比赛改到后天下午三点了。 比赛时间变了

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I understand, yeah good point, I think putting the passage above makes more sense

{passage} 

Given the dialogue / passage below, what is the answer for the question "根据对话,可以知道什么?"\nAnswer choices: 今天天气不好, 比赛时间变了, or 校长忘了时间?

One other way of doing it is have a between input and target, which we've been avoiding but in this case it might make sense?

Nit: also you can remove the answer choices for him to figure out (much harder task, but would maybe help quite a bit the training?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is have a between you mean have a EOS token?

Agreed, let's add another prompt without answer choices if you agree @yongzx?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes woops

Copy link
Contributor Author

@yongzx yongzx Jul 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeap I can do that (moving the passage before the task description, and adding prompts without answer choices).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Muennighoff @thomasw21 For prompts without answer choices, should I mark it as non-original because I think we should use ROUGE or other generation metrics instead of original metric "accuracy", as we are no longer choosing answer from the given answer options.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, I'm not too familiar with how that terminology is used for. Perhaps @VictorSanh can help to know this kind of things.

subset: csl
templates:
219679f8-a02f-4ee3-91c7-9ed4726dd828: !Template
answer_choices: no ||| yes

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
answer_choices: no ||| yes
answer_choices: yes ||| no

Currently I get the below:
['Do these keywords "纳米粒子, 干细胞, 氧化物, 顺磁性" represent key concepts in the abstract "目的探讨常见氧化铁纳米粒子几种神经干细胞标记技术的标记效率.材料与方法使用超顺磁性氧化铁纳米粒子(SPIO)和超微超顺磁性氧化铁纳米粒子(USPIO)以25μgFe/ml分别单独标记、与多聚赖氨酸(PLL)及脂质体联合标记神经干细胞,以未标记细胞做对照,采用普鲁士蓝染色评价细胞标记率,并采用4.7TMRIT2WI多回波序列测量T2弛豫率(R2)评价细胞内的铁摄取量,比较各组R2的差异.结果①普鲁士蓝染色结果:SPIO及USPIO单独标记组标记率为60%~70%,低于联合标记组的100%;②MRI结果:未标记细胞R2为(2.10±0.11)/s,SPIO、USPIO单独标记组细胞R2分别为(3.39±0.21)/s、(3.16±0.32)/s,SPIO-脂质体联合标记组及USPIO-脂质体联合标记组R2分别为(4.03±025)/s、(3.61±0.32)/s,SPIO-PLL联合标记组及USPIO-PLL联合标记组R2分别为(5.38±0.52)/s、(4.44±0.35)/s,SPIO、USPIO与PLL联合标记组R2大于SPIO、USPIO与脂质体联合标记组(P<0.05);而与脂质体联合标记组R2大于单独标记组(P<0.05);SPIO与USPIO单独标记细胞时R2差异无统计学意义(P>0.05),SPIO与脂质体或PLL联合标记时R2高于USPIO(P<0.05).结论SPIO、USPIO单独标记及与PLL、脂质体联合标记均可以成功标记神经干细胞,提高R2,其中SPIO与PLL联合标记效率最高."?', 'no']

我觉得应该是yes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I follow the labeling described in the CLUE paper: https://arxiv.org/pdf/2004.05986.pdf (Table 5), and label 0 corresponds to false.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so CSL appears to be very noisy, see this issue.
The example from the paper also appears with both labels..
Let's leave it as is, but not sure we should use it for fine-tuning

Screenshot 2022-07-30 at 10 12 25

name: write_keywords_after_abstract
reference: ''
2e851dd2-2677-415a-ad90-5d885aa91fdc: !Template
answer_choices: no ||| yes

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
answer_choices: no ||| yes
answer_choices: yes ||| no

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disagree for the reason given above (In CLUE paper, the label 0 corresponds to false).

name: generate_keywords
reference: ''
aaf47f6f-fd8f-4180-8d85-e4c7df088ac6: !Template
answer_choices: no ||| yes

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
answer_choices: no ||| yes
answer_choices: yes ||| no

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disagree for the reason given above (In CLUE paper, the label 0 corresponds to false).

Comment on lines +7 to +11
jinja: 'Do "{{ sentence1 }}" and "{{ sentence2 }}" express the same thing?

|||

{{ answer_choices[label] }}'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stupid question: How does this generate samples exactly? in particular with \n and whitespaces in the beginning and the end? Does it get trimmed all the time?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the \n and whitespaces before & after ||| get trimmed away

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants