-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running the code on custom dataset without the FEVER DB file #2
Comments
Hi, the intermediate output from the maskers (with IR selected evidence) are added to the Google drive folder. With IR evidence, they don't actually need the FEVER database, but the dataset loader opens the database connection anyway. An easy fix is to comment out line 27 on the |
@j6mes Okay. Considering this example from the file
Is the evidence text stored in the field If yes, does this mean that the fields |
if |
@j6mes Do you mean Also, what about the fields |
Yes, i meant pipeline_text. I think any extra values are just passed through to the |
Thank you! |
@j6mes Hi, I have a couple of questions about the data format of your model. In the following example,
What does the field "actual: correction" mean? I assumed it would be the correct statement that the model should have generated, but instead it is the mutated, incorrect version of the correct statement. Also, the 'source' field contains the masked sentence that is input to the model. But the 'target' field contains the incorrect, mutated sentence and not the correct sentence that the model is supposed to learn to generate. In this case, does the model never see the correct version of the mutated/masked statement, except in the evidence? Thank you. |
There's a few caveats to this. For the distant supervision objective, it's assume that the model doesn't have access to the reference correction, instead, it's trying to recover the input sentence as an auto-encoder. For scoring, we have to use the info in the metadata to compare against what was predicted and what the claim was before correction. I'll see if i can make this clearer in the documentation. I had to do a lot of cleaning before making the repo public and perhaps there's an easier way i can present all this info and ensure that it's consistent |
@j6mes I see. So does this mean that both the "actual: correction" field and 'target' field from the above example contain the incorrect, mutated version of the input statement? If I have access to the correct reference statement, can I provide it as input to the model as part of training? If yes, how could I do that? |
There's a supervised version as well which doesn't use any masking (see finetune_supervised.sh and finetune_supervised_pipeline.sh) if you want to mix supervision and masks, you could either train a supervised model first, then fine-tune on masks. or combine the supervised and mask_based_reader from this folder to make a reader that understands both tasks https://github.com/j6mes/2021-acl-factual-error-correction/blob/main/src/error_correction/modelling/reader/supervised_correction_reader.py |
@j6mes Thank you for replying. I will look into the supervised version. Regarding the
Both the So it seems that the model being trained in the masked version never has access to the correct reference sentence. In such a case, could you please clarify how it could make a correction to a masked input sentence at test time? Would it just use information from the evidence for this? |
Hi,
Is it possible to run the
masker-corrector
module of this code, without using theFEVER sqlite3
database file, in the following code filesrc/error_correction/modelling/error_correction_module.py
?I have my own dataset with the evidence text already retrieved, so I am hoping to avoid the step of retrieving information from the FEVER database. By any chance, are any intermediate output files generated after text has been retrieved from the FEVER database, that I can look at?
Thank you!
The text was updated successfully, but these errors were encountered: