-
Notifications
You must be signed in to change notification settings - Fork 2.2k
[PROPOSAL] Switch to pytest style test classes, use plain asserts #4204
Conversation
The "Check Models" workflow will fail until the |
allennlp/common/testing/test_case.py
Outdated
_available_devices = ["cpu"] + (["cuda"] if torch.cuda.is_available() else []) | ||
multi_device = parametrize(("device",), [(device,) for device in _available_devices]) | ||
|
||
multi_device = lambda f: pytest.mark.parametrize("device", _available_devices)(pytest.mark.gpu(f)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pytest.mark.gpu(f)
ensures these tests are ran in the "GPU Checks" workflow
I have been wanting to do this for a while. It's especially confusing in the models repo, because |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit different than what I thought you were doing. I thought you were changing everything to top-level functions named test_*
, with common setup and tear-down in pytest fixtures. The base class AllenNlpTestCase
would then go away. Any reason not to do that?
self.assertEqual( | ||
expected_output, | ||
actual_output, | ||
assert expected_output == actual_output, ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this give me those pretty comparisons, that show immediately where the difference is, even in large data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes sir!
I think having a base class is beneficial because it wraps up all of its fixtures. Without it, you would have to import the fixtures (the temp directory and whatever else the base class comes with) to each test module or to a That said, if you don't need those fixtures then it's just unnecessary overhead. So we could probably just use |
.github/workflows/pull_request.yml
Outdated
@@ -116,7 +116,9 @@ jobs: | |||
ALLENNLP_VERSION_OVERRIDE: "" # Don't replace the core library. | |||
run: | | |||
git clone https://github.com/allenai/allennlp-models.git | |||
cd allennlp-models && pip install --upgrade --upgrade-strategy eager -e . -r dev-requirements.txt | |||
cd allennlp-models | |||
git checkout test-case # TODO remove this line |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just putting this here temporarily to make sure the models tests pass on that branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I would vote strongly against moving everything to top-level test methods, by the way. In addition to what @epwalsh said, it makes model tests more annoying.
@@ -65,25 +66,25 @@ def test_sequence_label_field_raises_on_incorrect_type(self): | |||
with pytest.raises(ConfigurationError): | |||
_ = SequenceLabelField([[], [], [], [], []], self.text) | |||
|
|||
def test_class_variables_for_namespace_warnings_work_correctly(self): | |||
def test_class_variables_for_namespace_warnings_work_correctly(self, caplog): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: caplog
is not a great variable name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually an out-of-the-box pytest fixture so unfortunately I think we're stuck with that name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, hmm, didn't realize that pytest did this magic. Ok 🤷♂️
self.model = torch.nn.Sequential(torch.nn.Linear(10, 10)) | ||
|
||
def test_reduce_on_plateau_error_throw_when_no_metrics_exist(self): | ||
with self.assertRaises(ConfigurationError) as context: | ||
with pytest.raises(ConfigurationError) as context: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should just be something like match="learning rate ..."
, instead of as context
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Just updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable to me!
As far as I can tell, since we're using
pytest
to run tests, there's no benefit of usingunittest.TestCase
as the base class forAllenNlpTestCase
as opposed to plainpytest
style classes.There is, however, several disadvantages:
@pytest.mark.parametrize
does not work with methods on aunittest.TestCase
class. Given that parametrizing tests is such a powerful tool, I think this is a major issue.unittest.TestCase
assertion methods (self.assertEqual
,self.assertSetEqual
, etc) are less concise than plainassert
statements, which are recommended bypytest
.setUp
andtearDown
methods do not have PEP-8 compatible names, and it's not clear when these methods are run based on their names. On the other hand, thepytest
equivalents (setup_method
andteardown_method
orsetup_class
andteardown_class
) have snake-case names that are perfectly clear as to when they run.I know this looks like a huge PR but it's mostly just replacing
unittest
assertion methods with thepytest
equivalents.