Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement bad inputs for input/output format validators #86

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
79 changes: 79 additions & 0 deletions problemtools/verifyproblem.py
Original file line number Diff line number Diff line change
Expand Up @@ -323,6 +323,7 @@ def check(self, args):

infiles = glob.glob(os.path.join(self._datadir, '*.in'))
ansfiles = glob.glob(os.path.join(self._datadir, '*.ans'))
wafiles = glob.glob(os.path.join(self._datadir, '*.ans.wrong-*'))

if self._parent is None:
seen_secret = False
Expand Down Expand Up @@ -361,10 +362,15 @@ def check(self, args):
for f in infiles:
if not f[:-3] + '.ans' in ansfiles:
self.error("No matching answer file for input '%s'" % f)

for f in ansfiles:
if not f[:-4] + '.in' in infiles:
self.error("No matching input file for answer '%s'" % f)

for f in wafiles:
if f.split('.ans.wrong-')[0] + '.in' not in infiles:
self.error("No matching input file for wrong answer '%s'" % f)

for subdata in self._items:
if subdata.matches_filter(args.data_filter):
subdata.check(args)
Expand Down Expand Up @@ -795,8 +801,44 @@ def modified_input_validates(applicable, modifier):

os.unlink(file_name)

self._verify_invalid_inputs()

return self._check_res

def _verify_invalid_inputs(self):
"""Check that input format validators decline invalid input files
given in input_format_validators/bad_inputs
"""

path_to_invalid_inputs = os.path.join(self._problem.probdir, 'input_format_validators', 'bad_inputs')

# verify only if invalid inputs are given, otherwise nothing to check in this function
if not os.path.exists(path_to_invalid_inputs):
return

# verify that invalid inputs are given in a directory, not a file
if not os.path.isdir(path_to_invalid_inputs):
self.error("%s should be a directory containing invalid inputs, not a file" % path_to_invalid_inputs)
return

for invalid_input_filename in os.listdir(path_to_invalid_inputs):
invalid_infile = os.path.join(path_to_invalid_inputs, invalid_input_filename)
if not invalid_infile.endswith('.in'):
self.warning('Input file %s is not an input file' % invalid_input_filename)
continue

for val in self._validators:
status, _ = val.run(invalid_infile, args=None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, it's a bit sad not to be able to pass any flags here... It means we can't use this feature with IOI-style problems, where testgroup-specific limits are checked though flags. (Not that this is a feature I suspect we'll use for at least the Swedish IOI qualifiers, but still...)

I guess that's an argument for using *.in.wrong-* files in data/ instead, even if I agree that they don't completely belong there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that at implementing, I didn't use this feature myself, but I didn't know that this is important for IOI. Of course, implementing it at wrong input files like the output validator is possible. Like you said, it doesn't completely belong there. Another way is to have the same directory-structure like in /data, but I don't know if this is worthwhile.

I think a third opinion here would be nice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what the best choice is here either. Agreed that a third opinion would be nice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simon, what flags would you pass here?

IMHO, the data directory should be cluttered as less as possible. The bad inputs don't have any relationship to the other test cases so they don't belong in here.
The bad outputs are related to a single test case so it makes more sense to store them next to the real thing although storing them in the output_validator section with the same data/{sample,secret,whatever} structure may also be an option to clearly separate between test data relevant for the contest(ants) and only for testing the validators.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flags either has to be None or taken from a .yaml file. I don't know the best option either.

I think the solution I'd lean towards is to put everything in data/. Say we add a directory to data/validator-tests with the same structure as secret and sample, and additionally files *.{in,ans}.{invalid,valid} (or -wrong or -wrong-* or whatever). This gives us input- and output validator flags, makes the structure uniform for both sorts of tests, is somewhat self-explanatory when you see it for the first time, and also makes it possible to add output validator tests without adding them as actual test cases. (For the Swedish IOI qualifiers for instance, we keep the secret/ directory entirely auto-generated, while a validator-tests directory would be under version control.) We might want to support .ans.invalid in the actual data directories as well, of course.

FWIW, I recently tried out Polygon and it has the following UI for validator tests (same for input and output):
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some testing I think that putting everything in data with the same structure as the other testcases is kind of hard. We would have to exclude this directory from the other testcases, implement that data/testcase-validator/sample should use the same flags as sample, think if wrong-answer-files should be able to refer to input-files from the testfiles and more. That is all possible, but require some changes on the current testcase class (like expand it with wrong answers, but that couldn't model invalid inputs) or other major restructures.

Should we check if the current state is good enough for a merge, or check some time more if we can fulfill all requested features here?

if not os.WIFEXITED(status):
self.error('Input format validator %s crashed on input %s' % (val, invalid_infile))

# If an input validator declines the input file, everything is fine and we break.
if os.WEXITSTATUS(status) == 43:
break
else:
# Will only be executed if loop wasn't ended by break.
# No input validator declined the invalid input file, this is an error.
self.error('Input format validators accepted invalid input %s' % invalid_infile)

def validate(self, testcase):
flags = testcase.testcasegroup.config['input_validator_flags'].split()
Expand Down Expand Up @@ -894,6 +936,8 @@ def grade(self, sub_results, testcasegroup, shadow_result=False):
class OutputValidators(ProblemAspect):
_default_validator = run.get_tool('default_validator')

WA_GLOB = "%s.wrong-*" # %s is the answerfile to a test case, i.e. 1.ans


def __init__(self, problem):
self._problem = problem
Expand Down Expand Up @@ -947,8 +991,43 @@ def check(self, args):
self.warning('%s gets AC' % (desc))
os.unlink(file_name)

self._verify_invalid_outputs()

return self._check_res

def _verify_invalid_outputs(self):
"""Check that output validators decline invalid answer files"""

val_timelim = self._problem.config.get('limits')['validation_time']
val_memlim = self._problem.config.get('limits')['validation_memory']
flags = self._problem.config.get('validator_flags').split()

for testcase in self._problem.testdata.get_all_testcases():
if not os.path.exists(testcase.infile):
# .in-file doesn't exist, already reported by testcase-check, so 'ignore' here
continue

wrong_answers = glob.glob(OutputValidators.WA_GLOB % testcase.ansfile)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I know this isn't in the hot path, but this linear-time, syscall-based, noop-in-99%-of-cases globbing still tears at me. Maybe we could loop over all wrong answers and pick out their associated testcases, or store wrong answer lists as part of testcase objects.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like in the check if there is a corresponding input file? Yeah, nice idea. I thought this works fine and with 100 testcases at most, this won't be a big problem. Of course, your solution is better here.

if wrong_answers:
if self._problem.is_interactive:
self.warning('Output validator check with wrong answers for interactive problems is currently not supported (skipping)')
return

for wafile in wrong_answers:
for val in self._actual_validators():
feedbackdir = tempfile.mkdtemp(prefix='feedback', dir=self._problem.tmpdir)
status, runtime = val.run(wafile,
args=[testcase.infile, testcase.ansfile, feedbackdir] + flags,
timelim=val_timelim, memlim=val_memlim)
res = self._parse_validator_results(val, status, feedbackdir, testcase)
shutil.rmtree(feedbackdir)
if res.verdict != 'AC':
# One output validator declined this wrong answer, so stop validating
break
else:
# Will only be executed if loop wasn't ended by break
# No output validator declined wrong answer, this is an error.
self.error("Wrong answer file %s was not declined by output validators" % wafile)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like with the input validator, I don't like this duplication much (pretty complex code, too).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a function here would be nice. Maybe an abstract base class 'Validators' would help here, this would also make the check on noncompiling validators easier.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To abstract both input and output validators? I'm not sure, they don't share that much logic, and inheritance tends to be confusing. Maybe a helper function could be better (if anything; some small amount of duplication is fine IMO).


def _parse_validator_results(self, val, status, feedbackdir, testcase):
custom_score = self._problem.config.get('grading')['custom_scoring']
Expand Down