-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[UnitTests] Automatic parametrization over targets, with explicit opt-out #8010
Conversation
57b495d
to
89a3aaf
Compare
ab4261e
to
3e7fcb6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Lunderberg for this great PR! I think one major thing that would really improve the error handling situation would be a document that describes how to use the testing infrastructure. Something that we could point developers to when we are receiving PRs.
python/tvm/testing.py
Outdated
logging.warning( | ||
"None of the following targets are supported by this build of TVM: %s." | ||
" Try setting TVM_TEST_TARGETS to a supported target. Defaulting to llvm.", | ||
target_str, | ||
) | ||
return {"llvm"} | ||
return _get_targets("llvm") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this loop forever if llvm is not enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, it would. Updating to check tvm.runtime.enabled('llvm')
. If enabled, maintain current behavior. Otherwise, raise an exception.
xfail_targets = set() | ||
|
||
target_marks = [] | ||
for t in _get_targets(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't _get_targets
filter out all non-unable targets? So we are not including unrunable targets here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The updated implementation of _get_targets
returns all targets without filtering, but marks them as running or un-runnable. This allows enabled_targets()
to maintain its current behavior of filtering out un-runnable targets, while the _pytest_target_params
can return all targets, but marked with pytest.skipif
to indicate which ones cannot run on the current platform.
python/tvm/testing.py
Outdated
Use this decorator when you want your test to be run over a | ||
variety of targets and devices (including cpu and gpu devices). | ||
|
||
Alternatively, a test that accepts the "target" and "dev" will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe specify that you want to use parameterize_targets
when you have a specific set of targets you want to run over. Otherwise users should not use the decorator. Also mention that exclude_targets may be a better option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, edited documentation to reflect new intended usage, and to recommend that exclude_targets
or known_failing_targets
should typically be used instead.
metafunc.parametrize(names, value_sets, indirect=True) | ||
|
||
|
||
def fixture(func=None, *, cache_return_value=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you allowed to have an optional parameter before regular arguments? I think lint will not be happy with this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In python2 it would be an error, but in python3 it is allowed, and passes the linter both locally and on the CI. I did this intentionally so that cache_return_value
would be a keyword-only argument. My goal is to make it as obvious as possible at the fixture-definition site whether a fixture is going to be cached or not. Mandating fixture(cache_return_value=True)
makes that obvious, where fixture(True)
may not be.
python/tvm/testing.py
Outdated
>>> @tvm.testing.parametrize("llvm", "cuda") | ||
>>> @tvm.testing.parametrize_targets | ||
>>> def test_mytest(target, dev): | ||
>>> ... # do something |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd just remove this example as we want people to only use the decorator with arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds reasonable, and removed.
19eb0e6
to
692c50b
Compare
Rebased on main to start CI again, now that the CI fix #8160 is in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the hard work @Lunderberg!
Added one more bugfix. First implementation of removing fixture functions from module scope was a bit overzealous, also removed any objects that implement |
449fbf8
to
0da04a8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for these changes @Lunderberg. My only concern is that you've introduced a lot of new features but not all of them are used yet in the codebase. Do you think its worth adding a meta-test file to make sure things like known_failing_targets
work as expected?
@jwfromm That's a good point. I had initially thought that there were few enough features that they could be implicitly tested by their use in other tests, but with the additional features that I added following discussion, it would be good to have dedicated tests for the testing features. I will add them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great PR - LGTM! One note is that the error message link won't work until apache/tvm-rfcs#7 gets merged, but that should happen fairly soonish, so I don't see this as blocking. Given that only the relu tests is being ported to the parameterized tests, this is a low-risk merge.
@jwfromm And added meta-tests for all the new functionality. @tmoreau89 Good point, that was an intentional choice to point to the main branch of tvm-rfcs. I figured that since the main discussion was on the intended behavior, it would be likely that the two would be accepted or rejected together. Thank you both for the reviews, and I think the only thing remaining is the CI. |
python/tvm/testing.py
Outdated
# Optional cls parameter in case a parameter is defined inside a | ||
# class scope. | ||
@pytest.fixture(params=values, ids=ids) | ||
def as_fixture(*cls, request): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Lunderberg looks like the linter was not too happy about the unused argument here, that's the only thing blocking CI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, modified to _cls
which passes the linter when running locally.
python/tvm/testing.py
Outdated
|
||
# Optional cls parameter in case a parameter is defined inside a | ||
# class scope. | ||
def fixture_func(*cls, request): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And same change made here.
acfef75
to
68947c8
Compare
…TS but were skipped Previously, these were removed by a filter in tvm.testing._get_targets(), and weren't listed at all. With this change, they are instead removed by pytest.skipif, and show up as explicitly skipped tests in pytest's summary when using tvm.testing.parametrize_targets.
…dev) Should make it easier to convert tests from using tvm.testing.enabled_targets to use pytest's parametrized tests instead.
…cular test Uses tvm_exclude_targets variable, which can be set (1) in the conftest.py to apply to a test directory, (2) in a test script to apply to that module, or (3) on an individual test function to apply to it. The @tvm.testing.exclude_targets decorator is provided for readability in case apache#3.
Intended to mark tests that fail for a particular target, and are intended to be fixed in the future. Typically, these would result either from implementing a new test, or from an in-progress implementation of a new target.
These were implemented to exclude or mark as failing an entire file or directory of tests. In https://discuss.tvm.apache.org/t/rfc-parametrized-unit-tests/9946/4, it was pointed out that the global variables would be vulnerable to typos in the names, resulting in the option being silently ignored. The decorators `@tvm.testing.exclude_targets` and `@tvm.testing.known_failing_targets` do not have this failure mode, and are the preferred version.
- tvm.testing.parameter() defines a parameter that can be passed to tests. Tests that accept more than one parameter are run for all combinations of parameter values. - tvm.testing.parameters() defines multiple sets of parameter values. Tests that accept more than one parameter are run once for each set of parameter values. - tvm.testing.fixture() is a decorator that defines setup code. The `cache=True` argument can be passed to avoid repeating expensive setup across multiple tests.
Previously, if the @parametrize_targets were present, but had other @pytest.mark.parametrize after it, "target" would get parametrized a second time. Now, it checks more than just the closest "parametrize" marker.
As recommended by @tkonolige: - Avoid infinite loop if LLVM target isn't enabled - Update documentation for preferred use cases of tvm.testing.parametrize_targets, and recommended alternatives.
- Documentation, removed previous example usage of tvm.testing.parametrize_targets
- Previously, a fixture function defined in a module was accessible through the global scope, and the function definition is accessible if a test function uses that name but fails to declare the fixture as a parameter. Now, it will result in a NameError instead.
…bal scope. - Initial implementation only checked hasattr(obj, "_pytestfixturefunction") before removing obj, which gave false positives for objects that implement __getattr__, such as caffe.layers. Now, also check that the value contained is a FixtureFunctionMarker.
…eturn_value=True) To avoid unit tests being able to influence each other through a shared cache, all cached fixtures are passed through copy.deepcopy prior to use.
68947c8
to
0f17d65
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the tests, this is an excellent PR.
Thank you @tkonolige @jwfromm @Lunderberg the PR is now merged! |
try: | ||
cached_value = cache[cache_key] | ||
except KeyError: | ||
cached_value = cache[cache_key] = func(*args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the exception case tested here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.g. what happens if func
itself raises another exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the fixture definition func
raises an exception, then the exception gets passed on to pytest, and it gets treated as a failure to generate the fixture. These still result in the test failing, but are recorded as a failed setup. The test itself is never run in that case. This behavior is pytest's default, and is the same in both the cached and uncached versions of tvm.testing.fixture
.
I don't have a unit test yet to verify this behavior, but I'll add one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unit test added in #8343
# numpy array as input, then calculates uses a slow method to | ||
# compute a known correct output for that input. Therefore, | ||
# including a fallback for serializable types. | ||
def get_cache_key(*args, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this guaranteed to be deterministic? pickle.dumps
and maybe hash
with stuff like dicts might not be...though maybe the dict thing is fixed now?
in any case, tbh i think this is pretty complicated for a cache key function. since we are trying to use this with parameterizable test cases, can't we just whitelist types that have an obvious, stable conversion to a cache key, and then error on the rest? i am not going to ever run python tests/python/unittest/test_bar.py --param=<pickled data>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For hash
, it is guaranteed to be deterministic, but pickle.dumps
is not. For numpy arrays, pickle.dumps
is, but that isn't guaranteed across all types.
The difficult part here is that the caching should work for fixtures that are based on other fixtures. For example, consider the following case. If we want to cache correct_output
, then the cache needs to be based on the input_data
argument. I agree that I don't think anybody will ever input pickled data from the command line, but this pattern of comparing to the correct output feels like it would be pretty common.
arr_size = tvm.testing.parameter(1, 16, 256)
@tvm.testing.fixture
def input_data(arr_size):
return np.random.uniform(size=arr_size)
@tvm.testing.fixture
def correct_output(input_data):
run_very_slow_method(input_data)
def test_func(target, dev, input_data, correct_output):
output = func(target, dev, input_data)
tvm.testing.assert_allclose(target, dev, correct_output)
The other scheme I considered was to look up which parameters were indirectly involved in computing a particular fixture and caching based on that parameter or parameters. In this case, correct_output
is indirectly based on arr_size
. However, that would have introduced a potential failure mode if correct_output
is cached but input_data
is not. In that case, the second target to use arr_size==1
would look up the cached version correct_output
associated with arr_size==1
, but would generate a new random value for input_data
. This felt like a worse failure mode than the current one of repeating the fixture setup, which is why I used pickle.dumps
as the fallback.
…-out (apache#8010) * [UnitTests] Explicitly list tests that were enabled by TVM_TEST_TARGETS but were skipped Previously, these were removed by a filter in tvm.testing._get_targets(), and weren't listed at all. With this change, they are instead removed by pytest.skipif, and show up as explicitly skipped tests in pytest's summary when using tvm.testing.parametrize_targets. * [UnitTests] Automatic parametrize_targets for tests that use (target,dev) Should make it easier to convert tests from using tvm.testing.enabled_targets to use pytest's parametrized tests instead. * [UnitTests] Added ability to explicitly exclude a target from a particular test Uses tvm_exclude_targets variable, which can be set (1) in the conftest.py to apply to a test directory, (2) in a test script to apply to that module, or (3) on an individual test function to apply to it. The @tvm.testing.exclude_targets decorator is provided for readability in case apache#3. * [UnitTests] Refactored test_topi_relu.py to use pytest.mark.parametrize * [UnitTests] Added tvm_known_failing_targets option for the unittests. Intended to mark tests that fail for a particular target, and are intended to be fixed in the future. Typically, these would result either from implementing a new test, or from an in-progress implementation of a new target. * [UnitTests] Known failing targets now marked with xfail instead of skipif * [UnitTests] Removed tvm_excluded_targets and tvm_known_failing_targets These were implemented to exclude or mark as failing an entire file or directory of tests. In https://discuss.tvm.apache.org/t/rfc-parametrized-unit-tests/9946/4, it was pointed out that the global variables would be vulnerable to typos in the names, resulting in the option being silently ignored. The decorators `@tvm.testing.exclude_targets` and `@tvm.testing.known_failing_targets` do not have this failure mode, and are the preferred version. * [UnitTests] Added helper functions to tvm.testing. - tvm.testing.parameter() defines a parameter that can be passed to tests. Tests that accept more than one parameter are run for all combinations of parameter values. - tvm.testing.parameters() defines multiple sets of parameter values. Tests that accept more than one parameter are run once for each set of parameter values. - tvm.testing.fixture() is a decorator that defines setup code. The `cache=True` argument can be passed to avoid repeating expensive setup across multiple tests. * [UnitTests] Bugfix for auto parametrizing of "target" Previously, if the @parametrize_targets were present, but had other @pytest.mark.parametrize after it, "target" would get parametrized a second time. Now, it checks more than just the closest "parametrize" marker. * [UnitTests] Renamed "cache" argument of tvm.testing.fixture to "cache_return_value" * [UnitTests] Minor updates to parametrized test implementation. As recommended by @tkonolige: - Avoid infinite loop if LLVM target isn't enabled - Update documentation for preferred use cases of tvm.testing.parametrize_targets, and recommended alternatives. * [UnitTests] Minor updates to parametrized test implementation - Documentation, removed previous example usage of tvm.testing.parametrize_targets * [UnitTests] Changed accidental use of pytest fixtures to a NameError. - Previously, a fixture function defined in a module was accessible through the global scope, and the function definition is accessible if a test function uses that name but fails to declare the fixture as a parameter. Now, it will result in a NameError instead. * [UnitTests] More careful removal of fixture functions from module global scope. - Initial implementation only checked hasattr(obj, "_pytestfixturefunction") before removing obj, which gave false positives for objects that implement __getattr__, such as caffe.layers. Now, also check that the value contained is a FixtureFunctionMarker. * [UnitTests] Copy cached values when using tvm.testing.fixture(cache_return_value=True) To avoid unit tests being able to influence each other through a shared cache, all cached fixtures are passed through copy.deepcopy prior to use. * [UnitTests] Added meta-tests for tvm.testing functionality Co-authored-by: Eric Lunderberg <[email protected]>
…-out (apache#8010) * [UnitTests] Explicitly list tests that were enabled by TVM_TEST_TARGETS but were skipped Previously, these were removed by a filter in tvm.testing._get_targets(), and weren't listed at all. With this change, they are instead removed by pytest.skipif, and show up as explicitly skipped tests in pytest's summary when using tvm.testing.parametrize_targets. * [UnitTests] Automatic parametrize_targets for tests that use (target,dev) Should make it easier to convert tests from using tvm.testing.enabled_targets to use pytest's parametrized tests instead. * [UnitTests] Added ability to explicitly exclude a target from a particular test Uses tvm_exclude_targets variable, which can be set (1) in the conftest.py to apply to a test directory, (2) in a test script to apply to that module, or (3) on an individual test function to apply to it. The @tvm.testing.exclude_targets decorator is provided for readability in case apache#3. * [UnitTests] Refactored test_topi_relu.py to use pytest.mark.parametrize * [UnitTests] Added tvm_known_failing_targets option for the unittests. Intended to mark tests that fail for a particular target, and are intended to be fixed in the future. Typically, these would result either from implementing a new test, or from an in-progress implementation of a new target. * [UnitTests] Known failing targets now marked with xfail instead of skipif * [UnitTests] Removed tvm_excluded_targets and tvm_known_failing_targets These were implemented to exclude or mark as failing an entire file or directory of tests. In https://discuss.tvm.apache.org/t/rfc-parametrized-unit-tests/9946/4, it was pointed out that the global variables would be vulnerable to typos in the names, resulting in the option being silently ignored. The decorators `@tvm.testing.exclude_targets` and `@tvm.testing.known_failing_targets` do not have this failure mode, and are the preferred version. * [UnitTests] Added helper functions to tvm.testing. - tvm.testing.parameter() defines a parameter that can be passed to tests. Tests that accept more than one parameter are run for all combinations of parameter values. - tvm.testing.parameters() defines multiple sets of parameter values. Tests that accept more than one parameter are run once for each set of parameter values. - tvm.testing.fixture() is a decorator that defines setup code. The `cache=True` argument can be passed to avoid repeating expensive setup across multiple tests. * [UnitTests] Bugfix for auto parametrizing of "target" Previously, if the @parametrize_targets were present, but had other @pytest.mark.parametrize after it, "target" would get parametrized a second time. Now, it checks more than just the closest "parametrize" marker. * [UnitTests] Renamed "cache" argument of tvm.testing.fixture to "cache_return_value" * [UnitTests] Minor updates to parametrized test implementation. As recommended by @tkonolige: - Avoid infinite loop if LLVM target isn't enabled - Update documentation for preferred use cases of tvm.testing.parametrize_targets, and recommended alternatives. * [UnitTests] Minor updates to parametrized test implementation - Documentation, removed previous example usage of tvm.testing.parametrize_targets * [UnitTests] Changed accidental use of pytest fixtures to a NameError. - Previously, a fixture function defined in a module was accessible through the global scope, and the function definition is accessible if a test function uses that name but fails to declare the fixture as a parameter. Now, it will result in a NameError instead. * [UnitTests] More careful removal of fixture functions from module global scope. - Initial implementation only checked hasattr(obj, "_pytestfixturefunction") before removing obj, which gave false positives for objects that implement __getattr__, such as caffe.layers. Now, also check that the value contained is a FixtureFunctionMarker. * [UnitTests] Copy cached values when using tvm.testing.fixture(cache_return_value=True) To avoid unit tests being able to influence each other through a shared cache, all cached fixtures are passed through copy.deepcopy prior to use. * [UnitTests] Added meta-tests for tvm.testing functionality Co-authored-by: Eric Lunderberg <[email protected]>
Implemented features for the python tests to automatically parametrize over enabled targets, and to explicitly list the targets that were skipped. PR includes testing framework changes, along with changes to a single test file (
test_topi_relu.py
) as a proof of concept.Link to RFCLink to RFC on tvm-rfcs, documenting differences in the testing style, advantages of the proposed style, and changes needed to use the new style.