-
Notifications
You must be signed in to change notification settings - Fork 243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split pragmatics into presuppositions and scalar implicatures #2938
base: main
Are you sure you want to change the base?
Split pragmatics into presuppositions and scalar implicatures #2938
Conversation
Hi @weiqipedia, for your info. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall. Note that you have to change schema_bhasa.yaml
to reflect changes (but that can be done in a separate pull request).
instruction=instruction.format(row["choices_translated"]), | ||
) | ||
# Split "True or False" into ["True", "or", "False"] | ||
choices = row["choices"].split() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: For English, you can do row["choices"].split(" or ")
) | ||
# Split "True or False" into ["True", "or", "False"] | ||
choices = row["choices"].split() | ||
choices_translated = row["choices_translated"].split() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work consistently across every (supported) language?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question! For now we only have Indonesian (and Tamil), and this splitting and taking the first and third index of the list does work for both languages. But just FYI, this will not work for Thai because of the lack of spaces, and we'll have to use something more similar to your suggestion of " or " (but we will not be having Thai any time soon)
|
||
export HF_HOME=/mnt/fs-arf-01/railey4/cache | ||
export HF_DATASETS_CACHE=/mnt/fs-arf-01/railey4/cache | ||
export HF_TOKEN=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 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Careful with exposing secrets to the public. You should invalidate this token and avoid adding other tokens to the pull request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you'd like to add bash scripts to the git, could you:
- put this in the
scripts/bhasa
orscripts/aisingapore
folder and - add comments to the script that explains the purpose of the script?
@@ -606,14 +607,14 @@ def get_lindsea_pragmatics_pragmatic_reasoning_single_spec(language="id") -> Run | |||
scenario_spec=scenario_spec, | |||
adapter_spec=adapter_spec, | |||
metric_specs=get_exact_match_metric_specs(), | |||
groups=["bhasa_linguistic", f"lindsea_pragmatics_pragmatic_reasoning_single_{language}"], | |||
groups=["bhasa_linguistic", f"lindsea_pragmatics_presuppositions_{subset}_{language}"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at least one of these strings has to match the group name in schema_bhasa.yaml
, which is currently "lindsea_pragmatics_presuppositions_id". I'd suggest doing:
groups=["bhasa_linguistic", f"lindsea_pragmatics_presuppositions_{language}", f"lindsea_pragmatics_presuppositions_{subset}_{language}"],
if self.language not in self.prompts.keys(): | ||
raise (Exception(f"Unsupported language {self.language} - supported languages are {self.prompts.keys()}")) | ||
else: | ||
self.prompt_components = self.prompts[self.language] | ||
|
||
def download_dataset(self, output_path: str): | ||
BASE_URL = "https://raw.githubusercontent.com/aisingapore/BHASA/main/lindsea/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: You can pin this to a specific commit githash so that future changes to the git won't cause this scenario to change. e.g.
BASE_URL = "https://raw.githubusercontent.com/aisingapore/BHASA/10e34008e8142bef400cf8ffab15b2b6aaf3aa7f/lindsea/"
if self.language not in self.prompts.keys(): | ||
raise (Exception(f"Unsupported language {self.language} - supported languages are {self.prompts.keys()}")) | ||
else: | ||
self.prompt_componets = self.prompts[self.language] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prompt_componets
is misspelled - it should be prompt_components
question = self.prompt_components["single_question"] | ||
instruction = self.prompt_components["single_instruction"] | ||
|
||
passage = "{question}\nPernyataan: {text}\n{instruction}".format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move Pernyataan
into prompt components?
instruction = self.prompt_components["pair_instruction"] | ||
label = self.prompt_components[str(row["label"])] | ||
|
||
passage = "Situasi: {premise}\n{question}\nPernyataan: {conclusion}\n{instruction}".format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move Situasi
into prompt components.
question = self.prompt_components["single_question"] | ||
instruction = self.prompt_components["single_instruction"] | ||
|
||
passage = "{question}\nPernyataan: {text}\n{instruction}".format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move Pernyataan
into prompt components.
@@ -171,7 +171,7 @@ def __init__(self, language: str): | |||
super().__init__() | |||
self.language = language | |||
self.splits = {"train": TRAIN_SPLIT, "test": TEST_SPLIT} | |||
self.map = { | |||
self.prompts = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename to self.language_to_prompt_components
.
Same below.
No description provided.