Deep integration between Hypothesis and py.test is currently impossible #916

DRMacIver · 2015-08-05T09:57:02Z

Context: I write Hypothesis, a randomized testing library for Python. It works "well" under py.test, but only in the sense that it ignores py.test almost completely other than doing its best to expose functions in a way that py.test fixtures can understand.

A major problem with using Hypothesis with py.test is that function level fixtures get evaluated once per top-level function, not once per example. When these fixtures are mutable and mutated by the test this is really bad, because you end up running the test against the fixture many times, changing it each time.

People keep running into this as an issue, but currently it seems to be impossible to fix without significant changes to py.test. @RonnyPfannschmidt asked me to write a ticket about this as an example use-case of subtests, so here I am.

So what's the problem?

A test using Hypothesis looks something like:

@given(b=integers())
def test_some_stuff(a, b):
    ...

This translates into something approximately like:

def test_some_stuff(a, b=special_default):
    if b == special_default:
       for b in examples():
           ...
   else:
      ...

The key problem here is that examples() cannot be evaluated at collect time because it depends on the results of previous test execution.

The reasons of this in order of decreasing amount of "this seems to be impossible" (i.e. with the current feature set of py.test I have no idea how to solve the first and neither does anyone else, could maybe solve the second, could definitely do something about the third):

The fundamental blocker is that this is a two-phase process. You've got an initial generate phase, but then if a failure is found you have a "simplify" phase, which runs multiple simplify passes over the failing example. The space of possible examples to explore here is essentially infinite and depends intimately on the structure of the failing test.
The number of examples run depends on both timing (Hypothesis stops running examples after a configurable timeout) and what the test does. In particular tests can throw an UnsatisfiedAssumption exception which causes the example to not count towards the maximum number of examples to run (there is an additional cap which is larger but does count these).
Some examples may be skipped if they come from the same batch as something that produced an UnsatisfiedAssumption error.

The-Compiler · 2015-08-05T10:51:26Z

There's a somewhat similar issue in pytest-dev/pytest-qt#63 - this adds a way to pytest-qt to test Qt models to ensure they behave correctly.

The tests can't easily change the data in the model (as a model has a defined interface for getting the data from it, not necessarily for adding/removing/changing data), so the approach of the original C++ tests is to re-run all tests when the model changes, so you can "attach" the tester and then make the model do something and the tests rerun as soon as the model changes.

I've not found a satisfying way to do that yet since the tests aren't know at collection time. What the code is doing currently is to have a qtmodeltester.setup_and_run(model) method which runs the tests once and listens for changes, and the user then modifies the model as part of their (single) test.

This however poses several problems, e.g. how to tell the user which of the "sub-tests" has failed and which tests did run, etc.

/cc @nicoddemus

RonnyPfannschmidt · 2015-08-05T12:46:58Z

@The-Compiler i think your use-case is fundamentally different

as far as i understand @DRMacIver needs sub-test level operations, setup/teardown
while you need something thats more like a set of attached checks that run per model change

The-Compiler · 2015-08-05T12:56:50Z

I think both use-cases would be satisfied by having a way to generate new tests (or sub-tests) while a test is running. Then pytest would take care of running the new tests and handling setup/teardown for each one.

untitaker · 2015-08-13T13:42:13Z

Generating new first-class tests while the tests are already running will be awkward for the UI, so I think subtests are the only option (for a start only the parent test is visible in the UI).

I wonder if, for hypothesis' case, there's an upperbound on the test runs necessary that can be determined at collection time.

DRMacIver · 2015-08-13T13:55:38Z

There isn't right now, but there could be made to be one. However it's going to be somewhere between 10 and 100 times larger than the typical number of runs.

DRMacIver · 2015-08-13T13:57:08Z

Also note that Hypothesis in default configuration runs 200 subtests per test as part of its typical run, so if you want to display those in the UI it's already going to be um, fun.

untitaker · 2015-08-13T14:02:01Z

I see. The idea was to, as a workaround, generate as many testcases as possibly needed for hypothesis, and then just skip the ones that are not needed.

DRMacIver · 2015-08-13T14:05:23Z

Yeah, I figured it would be something like that. It's... sortof possible but the problem is also that Hypothesis can't really know in advance what each example is going to be, so there'd have to be a bunch of work to match the two up. I think I would rather simply not support the feature than use this workaround.

untitaker · 2015-08-13T22:38:05Z

I'm currently fooling around with this. Would it be an OK API if there's a way to instantiate sub-sessions (on the same config)?

nicoddemus · 2015-08-14T03:57:07Z

@untitaker you mean subtests (#153)? or something else?

untitaker · 2015-08-14T04:03:13Z

No, I meant to actually instantiate a new _pytest.Session within the existing test session. Nevermind, it seems to be unnecessary.

Meanwhile I've come up with https://gist.github.com/untitaker/49a05d4ea9c426b179e9, the thing works for function-scoped fixtures only.

RonnyPfannschmidt · 2015-08-14T06:04:51Z

@untitaker that looks pretty much like what i mean with subtests, however the way its implemented might add extra unnecessary setup/tardown cost due to nextitem

untitaker · 2015-08-14T06:44:18Z

I'm not sure if we can set nextitem properly without changes to at least Hypothesis.

DRMacIver · 2015-08-14T06:54:32Z

I'm not expecting this to work automatically. :-) Hypothesis doesn't depend on py.test by default, but I can either hook in to things from the hypothesis-pytest plugin or provide people with a decorator they can use to make this work (the former would be better).

What sort of unneccessary teardown/setup cost did you have in mind? Does it just run the fixtures an extra time?

untitaker · 2015-08-14T06:55:43Z

Currently it seems that module-level fixtures are set up and torn down for each subtest. I wonder if that's because of the incorrect nextitem value.

DRMacIver · 2015-08-14T06:56:04Z

Ah, yes, that would be unfortunate.

RonnyPfannschmidt · 2015-08-14T07:25:36Z

@untitaker thats exactly the problem, but i consider that a pytest bug - unfortunately its a structural one, so hard to fix before 3.0

as a hack you could perhaps use the parent as next item, that way the teardown_torwards mechanism should keep things intact

RonnyPfannschmidt · 2015-08-14T07:26:05Z

@untitaker i future i'd like to see a subtest mechanism help with those details

untitaker · 2015-08-14T07:27:09Z

I'm currently experimenting with this, I fear that this might leak state to subsequent testfuncs in different modules/classes.

RonnyPfannschmidt · 2015-08-14T07:30:17Z

the state leak should be prevented by the outer runtest_protocol of the actual real test function

due to doing a teardown_torwards with a next item there cleanup should be expected,
but to ensure it works, a acceptance tests with a fnmatch_lines item is needed

untitaker · 2015-08-14T07:39:01Z

I've updated the gist.

untitaker · 2015-08-14T07:41:29Z

BTW should this hack rather go into hypothesis-pytest for trying it out, or do you already want to stabilize an API in pytest?

untitaker · 2015-08-14T07:45:00Z

Also I'd like to hide the generated tests from the UI.

DRMacIver · 2015-08-14T07:47:27Z

Yeah I was just about to ask if there was a way to do that. This looks great (just tried it locally), but I'd rather not spam the UI with 200 tests, particularly for people like me who typically run in verbose mode.

RonnyPfannschmidt · 2015-08-14T07:55:40Z

@untitaker should go into something external, and we should later on figure a feature test to kill it out

@DRMacIver the proper solution is still a bit away (it would hide the number of sub-tests)

however making that happen is a bit major, and between personal life and a job i cant make any promises for quick progress

right now i'm not even putting the needed amount of time into the pytest-cache merge and the yield test refactoring

Zac-HD · 2017-09-18T04:55:28Z

Will hypothesis work with pytest if I'll completely ignore @given decorators and will use only strategies? I want use mainly data-generation features of hypothesis and pass those data through params or even call them directly inside test?

Unfortunately this won't work - to get reproducible and minimal examples, you need to use the strategy with @given or find. a_strategy.example() is great for interactive exploration, but so terrible for use in tests that we raise an error if we detect it.

You can construct strategies and draw values interactively inside a test, and it's as powerful as you might think, but does require @given.

fkromer · 2018-03-15T13:29:10Z

Could we summarize the current suggestions how to use hypothesis with pytest somehow? Probably in some followup article for "How do I use pytest fixtures with Hypothesis?". This could help "Seeking funding for deeper integration between Hypothesis and pytest" as well. Hypothesis is great. But if it may not be run with many peoples favorite test framework pytest nicely this could hold back a lot of potential users.

DRMacIver · 2018-03-16T18:20:06Z

Could we summarize the current suggestions how to use hypothesis with pytest somehow?

There isn't really a current suggestion for how to use Hypothesis with pytest because there's nothing to suggest. Hypothesis works fine with pytest as long as you don't use function scoped fixtures, so "How to use Hypothesis with pytest" is "use Hypothesis and pytest normally but don't use function scoped fixtures"

untitaker · 2018-03-16T20:07:42Z

@fkromer I still use pytest-subtesthack, #916 (comment)

Sup3rGeo · 2018-10-07T17:23:57Z

The fundamental blocker is that this is a two-phase process. You've got an initial generate phase, but then if a failure is found you have a "simplify" phase, which runs multiple simplify passes over the failing example. The space of possible examples to explore here is essentially infinite and depends intimately on the structure of the failing test.

Maybe its non sense but what if, instead of having this work on the first test run, have it working throughout following test runs. pytest has this cache that saves cross-testruns, so maybe hypothesis could work based on the failed tests in the previous run?
At least this is what I can think of concerning this matter of having separate collection/test-run

RonnyPfannschmidt · 2018-10-07T18:26:14Z

@Sup3rGeo unfortunately wrong integration point

lazarillo · 2020-09-21T13:41:40Z

Just curious the status of this. I saw that @DRMacIver was quite close in his article requesting further funding written 2 years ago.

But I was just using Hypothesis and saw that there was still an issue with function-scoped fixtures and hypothesis. I was hoping maybe there was just something I needed to do to better integrate them. I had trouble using a context manager, but that might be because I could not find good examples to mimic and I was doing something wrong. 😏

Zac-HD · 2020-09-21T13:50:58Z

No change to report, beyond "Hypothesis now detects and warns if you use function-scope fixture in an @given test". Sorry!

nicoddemus · 2020-09-21T16:33:36Z

@Zac-HD btw, do you know if any work has been done to try to use Hypothesis with pytest-subtests?

Zac-HD · 2020-09-21T17:07:38Z

I do not know, but wouldn't expect it to work.

lazarillo · 2020-09-22T07:27:43Z

Thanks, @Zac-HD ! Do you know of any specific examples I can follow to try to mimic? Specifically, I need to use pytest builtin, function-scoped fixtures. I have found workarounds for my own creations, but not sure how to adjust builtins. ☹️

Zac-HD · 2020-09-22T07:52:34Z

It really depends on which fixtures you mean - for creating temporary files you can use the stdlib tempfile and pathlib modules; for monkeypatching the pytest implementation doesn't have to be used via a fixture (see e.g.), etc. For more details StackOverflow is probably more appropriate than this issue.

fkromer · 2020-09-22T08:14:10Z

@Zac-HD Probably you guys have some time to create a MOOC or cookbook style book at some point in time ... if you've enough time asides your usual consulting work of course 😉

untitaker · 2020-09-22T08:16:40Z

I still use pytest-subtesthack in some projects. I would recommend to avoid doing unnecessarily heavy computation or IO-bound work in your test though. Doing anything but unit-testing makes the overall testrun too slow.

Zac-HD · 2020-09-22T09:10:52Z

Probably you guys have some time to create a MOOC or cookbook style book at some point in time

I'd love to have a cookbook kinda thing, but (a) who has free time in 2020? I'm fitting in maintainence around a PhD, and technical writing is a whole 'nother skillset and doing it well takes time, and (b) it's really not clear what non-intro-level recipes people would need - it's often domain-specific.

StackOverflow is actually a decent start because at least we don't have to guess, but my dream solution would be to get a grant and hire someone for ~six months to overhaul our docs.

... if you've enough time asides your usual consulting work of course

Haha, I wish that was the problem - between us we get ~one consulting gig per year. Nice when it happens, but I can literally count them on one hand. It paid for the stickers etc. I give away at conferences, but I've spent more attending than I've made from consulting... and if I calculated my effective hourly rate on open source it would be measured in cents. Don't give away software if you're in it for the money 😅

(I'm hoping that making HypoFuzz commercially-licenced will redirect enough money from corporate users to improve the situation, but it's early days and zero customers so far... we'll see)

fkromer · 2020-09-22T11:42:22Z

I'd love to have a cookbook kinda thing, but (a) who has free time in 2020? I'm fitting in maintainence around a PhD, and technical writing is a whole 'nother skillset and doing it well takes time, and (b) it's really not clear what non-intro-level recipes people would need - it's often domain-specific.

I began to write some things on leanpub.com but decided it's not worth the effort. The same is true for creating MOOCs. If it's not something with a huge user base (usually frontend topics like JavaScript, TypeScript, Angular, etc.) it's hard to get out enough money per hour that it's profitable.

StackOverflow is actually a decent start because at least we don't have to guess, but my dream solution would be to get a grant and hire someone for ~six months to overhaul our docs.

Yepp.

Haha, I wish that was the problem - between us we get ~one consulting gig per year. Nice when it happens, but I can literally count them on one hand. It paid for the stickers etc. I give away at conferences, but I've spent more attending than I've made from consulting... and if I calculated my effective hourly rate on open source it would be measured in cents. Don't give away software if you're in it for the money sweat_smile

The problem is that most people don't understand the power of hypothesis. AND that quality is the last thing people care about if it's about money. You are so right. I'd never open source something I own. And contributions to other peoples projects are something I think about only if I'm forced to do so. E.g. when I have to customize sources.

Someone which contributed to pandas massively has created LibreSelery. This is something I'm pretty interested in w.r.t. how it will evolve.

fkromer · 2020-09-22T11:44:11Z

@Zac-HD Good luck with https://hypofuzz.com/ btw! 👍

Stranger6667 · 2020-12-15T15:42:14Z

if any work has been done to try to use Hypothesis with pytest-subtests?

@nicoddemus FYI. Schemathesis uses it in its "lazy-loading" workflow and it seems to work alright with the underlying Hypothesis test.

Zac-HD · 2020-12-16T01:23:43Z

FWIW I'd expect the subtest fixture to work OK when generating test inputs, but I'm not sure how it would interact with shrinking (might report unshrunk things?) or the Hypothesis example database (likely key collisions, which is allowed by design but still bad).

But if @Stranger6667 reports it works 🤷‍♂️, it must work at least a bit - even if not at our 'officially supported' level.

aragilar · 2024-08-21T00:31:05Z

What is the status of this? I've got pytest-subtesthack working to load the fixture at each example, would it make sense to include it into pytest? I couldn't get pytest-subtests working, but I think that's because I couldn't work out how to get the fixtures loaded (I was getting NameErrors).

nicoddemus mentioned this issue Aug 11, 2015

testscenarios support #263

Closed

vespian mentioned this issue Apr 16, 2018

Admin Router: /service endpoint url rewriting is now configurable using the Marathon label dcos/dcos#2714

Merged

fkromer mentioned this issue Apr 28, 2018

Add documentation about pytest integration fkromer/hypothesis-ros#31

Open

pytestbot mentioned this issue May 15, 2018

Feature idea: rerun test if condition is not met (similar to hypothesis.assume()) #3475

Closed

darrenburns mentioned this issue Jan 17, 2020

Hypothesis integration darrenburns/ward#69

Closed

bdice mentioned this issue Aug 3, 2020

Added hypothesis based tesing to SyncedCollection glotzerlab/signac#373

Closed

12 tasks

Stranger6667 mentioned this issue Dec 15, 2020

Test duplication in the report when dynamically applying the same title or story multiple times allure-framework/allure2#1883

Closed

astafan8 mentioned this issue Jan 11, 2021

Bump hypothesis from 5.49.0 to 6.0.0 microsoft/Qcodes#2601

Merged

Stranger6667 mentioned this issue Jan 21, 2022

[FEATURE] Output individual Hypothesis tests as subtests schemathesis/schemathesis#1375

Closed

Stranger6667 mentioned this issue Mar 30, 2022

[FEATURE] Shortcut for testing multiple schemas at once schemathesis/schemathesis#1409

Open

jobh mentioned this issue Jul 10, 2024

RFC. Support for resetting function-scoped fixtures #12596

Open

Deep integration between Hypothesis and py.test is currently impossible #916

Deep integration between Hypothesis and py.test is currently impossible #916

Comments

DRMacIver commented Aug 5, 2015

The-Compiler commented Aug 5, 2015

RonnyPfannschmidt commented Aug 5, 2015

The-Compiler commented Aug 5, 2015

untitaker commented Aug 13, 2015

DRMacIver commented Aug 13, 2015

DRMacIver commented Aug 13, 2015

untitaker commented Aug 13, 2015

DRMacIver commented Aug 13, 2015

untitaker commented Aug 13, 2015

nicoddemus commented Aug 14, 2015

untitaker commented Aug 14, 2015

RonnyPfannschmidt commented Aug 14, 2015

untitaker commented Aug 14, 2015

DRMacIver commented Aug 14, 2015

untitaker commented Aug 14, 2015

DRMacIver commented Aug 14, 2015

RonnyPfannschmidt commented Aug 14, 2015

RonnyPfannschmidt commented Aug 14, 2015

untitaker commented Aug 14, 2015

RonnyPfannschmidt commented Aug 14, 2015

untitaker commented Aug 14, 2015

untitaker commented Aug 14, 2015

untitaker commented Aug 14, 2015

DRMacIver commented Aug 14, 2015

RonnyPfannschmidt commented Aug 14, 2015

Zac-HD commented Sep 18, 2017

fkromer commented Mar 15, 2018

DRMacIver commented Mar 16, 2018 • edited Loading

untitaker commented Mar 16, 2018

Sup3rGeo commented Oct 7, 2018

RonnyPfannschmidt commented Oct 7, 2018

lazarillo commented Sep 21, 2020

Zac-HD commented Sep 21, 2020

nicoddemus commented Sep 21, 2020

Zac-HD commented Sep 21, 2020

lazarillo commented Sep 22, 2020

Zac-HD commented Sep 22, 2020 • edited Loading

fkromer commented Sep 22, 2020

untitaker commented Sep 22, 2020

Zac-HD commented Sep 22, 2020

fkromer commented Sep 22, 2020

fkromer commented Sep 22, 2020

Stranger6667 commented Dec 15, 2020

Zac-HD commented Dec 16, 2020

aragilar commented Aug 21, 2024

DRMacIver commented Mar 16, 2018 •

edited

Loading

Zac-HD commented Sep 22, 2020 •

edited

Loading