Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autograder reporting roadmap #6126

Open
8 of 10 tasks
cysjonathan opened this issue Jun 5, 2023 · 0 comments
Open
8 of 10 tasks

Autograder reporting roadmap #6126

cysjonathan opened this issue Jun 5, 2023 · 0 comments

Comments

@cysjonathan
Copy link
Contributor

cysjonathan commented Jun 5, 2023

Linked to #5949

This serves as a documentation of the types of autograder reporting we already have, as well as other possible enhancements we can make.

Looking into referencing another online evaluator / autograder to improve on error reporting to students.

  • In Queue (QU): The judge is busy and can't attend your submission. It will be judged as soon as possible.
    We have implemented this in the form of a spinner with grading in progress.
  • Submission Error (SE): The submission is not successful. This is due to some error during the submission process or data corruption.
    This is a fallback generic error, which we already implement as "Encountered an error. The server may be busy now and you may want to try again..."
  • Accepted (AC): OK! Your program is correct! It produced the right answer in reasonable time and within the limit memory usage. Congratulations!
    Test case(s) passed
  • Wrong Answer (WA): Correct solution not reached for the inputs. The inputs and outputs that we use to test the programs are not public so you'll have to spot the bug by yourself (it is recommendable to get accustomed to a true contest dynamic ;-)). If you truly think your code is correct, you can contact us using the link on the left. Judge's outputs are not always correct...
    Test case(s) failed
  • Compile Error (CE): The compiler could not compile your program. Of course, warning messages are not error messages. The compiler output messages are reported you by e-mail.
    StdError reporting (same as runtime). Although it might be instructive to separate Compile and runtime errors, so something that we can possibly look into.
  • Runtime Error (RE): Your program failed during the execution (segmentation fault, floating point exception...). The exact cause is not reported to the user to avoid hacking. Be sure that your program returns a 0 code to the shell. If you're using Java, please follow all the submission specifications.
    StdError reporting (same as compile). Although instructors can also choose to hide this error to prevent hacking of test cases.
  • Time Limit Exceeded (TL): Your program tried to run during too much time; this error doesn't allow you to know if your program would reach the correct solution to the problem or not.
    Reports Time Limit Exceeded (TLE) Error
  • Memory Limit Exceeded (ML): Your program tried to use more memory than the judge allows. If you are sure that such problem needs more memory, please contact us.
    Reports Memory Limit Exceeded (MLE) Error
  • Output Limit Exceeded (OL): Your program tried to write too much information. This usually occurs if it goes into a infinite loop.
    Something to look into, and would possibly make the containers more robust (prevent large writes to container/db to store result)
  • Presentation Error (PE): Your program outputs are correct but are not presented in the correct way. Check for spaces, justify, line feeds...
    Something to look into, and would be instructive for students who make whitespace-related errors
  • Restricted Function (RF): Your program is trying to use a function that we considered harmful to the system. If you get this verdict you probably know why...
    We currently have no "blacklist" functions/routines on our autograder. Since each script is executed in isolation within a container, we will not look into this for the time being.
  • Can't Be Judged (CJ): The judge doesn't have test input and outputs for the selected problem. While choosing a problem be careful to ensure that the judge will be able to judge it!
    Not applicable since we create test cases for autograded questions before publishing.

At the same time, we can consider outsourcing the autograding/evaluation of code to a 3rd party service in future.

To consider a re-write of the current test framework to support independently-timed test cases (currently a timeout at the job level causes all test cases to report failure).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant