Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deep integration between Hypothesis and py.test is currently impossible #916

Open
DRMacIver opened this issue Aug 5, 2015 · 66 comments
Open
Labels
topic: collection related to the collection phase topic: parametrize related to @pytest.mark.parametrize type: backward compatibility might present some backward compatibility issues which should be carefully noted in the changelog type: enhancement new feature or API change, should be merged into features branch

Comments

@DRMacIver
Copy link

Context: I write Hypothesis, a randomized testing library for Python. It works "well" under py.test, but only in the sense that it ignores py.test almost completely other than doing its best to expose functions in a way that py.test fixtures can understand.

A major problem with using Hypothesis with py.test is that function level fixtures get evaluated once per top-level function, not once per example. When these fixtures are mutable and mutated by the test this is really bad, because you end up running the test against the fixture many times, changing it each time.

People keep running into this as an issue, but currently it seems to be impossible to fix without significant changes to py.test. @RonnyPfannschmidt asked me to write a ticket about this as an example use-case of subtests, so here I am.

So what's the problem?

A test using Hypothesis looks something like:

@given(b=integers())
def test_some_stuff(a, b):
    ...

This translates into something approximately like:

def test_some_stuff(a, b=special_default):
    if b == special_default:
       for b in examples():
           ...
   else:
      ...

The key problem here is that examples() cannot be evaluated at collect time because it depends on the results of previous test execution.

The reasons of this in order of decreasing amount of "this seems to be impossible" (i.e. with the current feature set of py.test I have no idea how to solve the first and neither does anyone else, could maybe solve the second, could definitely do something about the third):

  1. The fundamental blocker is that this is a two-phase process. You've got an initial generate phase, but then if a failure is found you have a "simplify" phase, which runs multiple simplify passes over the failing example. The space of possible examples to explore here is essentially infinite and depends intimately on the structure of the failing test.
  2. The number of examples run depends on both timing (Hypothesis stops running examples after a configurable timeout) and what the test does. In particular tests can throw an UnsatisfiedAssumption exception which causes the example to not count towards the maximum number of examples to run (there is an additional cap which is larger but does count these).
  3. Some examples may be skipped if they come from the same batch as something that produced an UnsatisfiedAssumption error.
@The-Compiler
Copy link
Member

There's a somewhat similar issue in pytest-dev/pytest-qt#63 - this adds a way to pytest-qt to test Qt models to ensure they behave correctly.

The tests can't easily change the data in the model (as a model has a defined interface for getting the data from it, not necessarily for adding/removing/changing data), so the approach of the original C++ tests is to re-run all tests when the model changes, so you can "attach" the tester and then make the model do something and the tests rerun as soon as the model changes.

I've not found a satisfying way to do that yet since the tests aren't know at collection time. What the code is doing currently is to have a qtmodeltester.setup_and_run(model) method which runs the tests once and listens for changes, and the user then modifies the model as part of their (single) test.

This however poses several problems, e.g. how to tell the user which of the "sub-tests" has failed and which tests did run, etc.

/cc @nicoddemus

@RonnyPfannschmidt RonnyPfannschmidt added type: enhancement new feature or API change, should be merged into features branch topic: parametrize related to @pytest.mark.parametrize topic: collection related to the collection phase type: backward compatibility might present some backward compatibility issues which should be carefully noted in the changelog labels Aug 5, 2015
@RonnyPfannschmidt
Copy link
Member

@The-Compiler i think your use-case is fundamentally different

as far as i understand @DRMacIver needs sub-test level operations, setup/teardown
while you need something thats more like a set of attached checks that run per model change

@The-Compiler
Copy link
Member

I think both use-cases would be satisfied by having a way to generate new tests (or sub-tests) while a test is running. Then pytest would take care of running the new tests and handling setup/teardown for each one.

@untitaker
Copy link
Contributor

Generating new first-class tests while the tests are already running will be awkward for the UI, so I think subtests are the only option (for a start only the parent test is visible in the UI).

I wonder if, for hypothesis' case, there's an upperbound on the test runs necessary that can be determined at collection time.

@DRMacIver
Copy link
Author

There isn't right now, but there could be made to be one. However it's going to be somewhere between 10 and 100 times larger than the typical number of runs.

@DRMacIver
Copy link
Author

Also note that Hypothesis in default configuration runs 200 subtests per test as part of its typical run, so if you want to display those in the UI it's already going to be um, fun.

@untitaker
Copy link
Contributor

I see. The idea was to, as a workaround, generate as many testcases as possibly needed for hypothesis, and then just skip the ones that are not needed.

@DRMacIver
Copy link
Author

Yeah, I figured it would be something like that. It's... sortof possible but the problem is also that Hypothesis can't really know in advance what each example is going to be, so there'd have to be a bunch of work to match the two up. I think I would rather simply not support the feature than use this workaround.

@untitaker
Copy link
Contributor

I'm currently fooling around with this. Would it be an OK API if there's a way to instantiate sub-sessions (on the same config)?

@nicoddemus
Copy link
Member

@untitaker you mean subtests (#153)? or something else?

@untitaker
Copy link
Contributor

No, I meant to actually instantiate a new _pytest.Session within the existing test session. Nevermind, it seems to be unnecessary.

Meanwhile I've come up with https://gist.github.com/untitaker/49a05d4ea9c426b179e9, the thing works for function-scoped fixtures only.

@RonnyPfannschmidt
Copy link
Member

@untitaker that looks pretty much like what i mean with subtests, however the way its implemented might add extra unnecessary setup/tardown cost due to nextitem

@untitaker
Copy link
Contributor

I'm not sure if we can set nextitem properly without changes to at least Hypothesis.

@DRMacIver
Copy link
Author

I'm not expecting this to work automatically. :-) Hypothesis doesn't depend on py.test by default, but I can either hook in to things from the hypothesis-pytest plugin or provide people with a decorator they can use to make this work (the former would be better).

What sort of unneccessary teardown/setup cost did you have in mind? Does it just run the fixtures an extra time?

@untitaker
Copy link
Contributor

Currently it seems that module-level fixtures are set up and torn down for each subtest. I wonder if that's because of the incorrect nextitem value.

@DRMacIver
Copy link
Author

Ah, yes, that would be unfortunate.

@RonnyPfannschmidt
Copy link
Member

@untitaker thats exactly the problem, but i consider that a pytest bug - unfortunately its a structural one, so hard to fix before 3.0

as a hack you could perhaps use the parent as next item, that way the teardown_torwards mechanism should keep things intact

@RonnyPfannschmidt
Copy link
Member

@untitaker i future i'd like to see a subtest mechanism help with those details

@untitaker
Copy link
Contributor

I'm currently experimenting with this, I fear that this might leak state to subsequent testfuncs in different modules/classes.

@RonnyPfannschmidt
Copy link
Member

the state leak should be prevented by the outer runtest_protocol of the actual real test function

due to doing a teardown_torwards with a next item there cleanup should be expected,
but to ensure it works, a acceptance tests with a fnmatch_lines item is needed

@untitaker
Copy link
Contributor

I've updated the gist.

@untitaker
Copy link
Contributor

BTW should this hack rather go into hypothesis-pytest for trying it out, or do you already want to stabilize an API in pytest?

@untitaker
Copy link
Contributor

Also I'd like to hide the generated tests from the UI.

@DRMacIver
Copy link
Author

Yeah I was just about to ask if there was a way to do that. This looks great (just tried it locally), but I'd rather not spam the UI with 200 tests, particularly for people like me who typically run in verbose mode.

@RonnyPfannschmidt
Copy link
Member

@untitaker should go into something external, and we should later on figure a feature test to kill it out

@DRMacIver the proper solution is still a bit away (it would hide the number of sub-tests)

however making that happen is a bit major, and between personal life and a job i cant make any promises for quick progress

right now i'm not even putting the needed amount of time into the pytest-cache merge and the yield test refactoring

@Zac-HD
Copy link
Member

Zac-HD commented Sep 18, 2017

Will hypothesis work with pytest if I'll completely ignore @given decorators and will use only strategies? I want use mainly data-generation features of hypothesis and pass those data through params or even call them directly inside test?

Unfortunately this won't work - to get reproducible and minimal examples, you need to use the strategy with @given or find. a_strategy.example() is great for interactive exploration, but so terrible for use in tests that we raise an error if we detect it.

You can construct strategies and draw values interactively inside a test, and it's as powerful as you might think, but does require @given.

@fkromer
Copy link

fkromer commented Mar 15, 2018

Could we summarize the current suggestions how to use hypothesis with pytest somehow? Probably in some followup article for "How do I use pytest fixtures with Hypothesis?". This could help "Seeking funding for deeper integration between Hypothesis and pytest" as well. Hypothesis is great. But if it may not be run with many peoples favorite test framework pytest nicely this could hold back a lot of potential users.

@DRMacIver
Copy link
Author

DRMacIver commented Mar 16, 2018

Could we summarize the current suggestions how to use hypothesis with pytest somehow?

There isn't really a current suggestion for how to use Hypothesis with pytest because there's nothing to suggest. Hypothesis works fine with pytest as long as you don't use function scoped fixtures, so "How to use Hypothesis with pytest" is "use Hypothesis and pytest normally but don't use function scoped fixtures"

@untitaker
Copy link
Contributor

@fkromer I still use pytest-subtesthack, #916 (comment)

@Sup3rGeo
Copy link
Member

Sup3rGeo commented Oct 7, 2018

The fundamental blocker is that this is a two-phase process. You've got an initial generate phase, but then if a failure is found you have a "simplify" phase, which runs multiple simplify passes over the failing example. The space of possible examples to explore here is essentially infinite and depends intimately on the structure of the failing test.

Maybe its non sense but what if, instead of having this work on the first test run, have it working throughout following test runs. pytest has this cache that saves cross-testruns, so maybe hypothesis could work based on the failed tests in the previous run?
At least this is what I can think of concerning this matter of having separate collection/test-run

@RonnyPfannschmidt
Copy link
Member

@Sup3rGeo unfortunately wrong integration point

@lazarillo
Copy link

Just curious the status of this. I saw that @DRMacIver was quite close in his article requesting further funding written 2 years ago.

But I was just using Hypothesis and saw that there was still an issue with function-scoped fixtures and hypothesis. I was hoping maybe there was just something I needed to do to better integrate them. I had trouble using a context manager, but that might be because I could not find good examples to mimic and I was doing something wrong. 😏

@Zac-HD
Copy link
Member

Zac-HD commented Sep 21, 2020

No change to report, beyond "Hypothesis now detects and warns if you use function-scope fixture in an @given test". Sorry!

@nicoddemus
Copy link
Member

@Zac-HD btw, do you know if any work has been done to try to use Hypothesis with pytest-subtests?

@Zac-HD
Copy link
Member

Zac-HD commented Sep 21, 2020

I do not know, but wouldn't expect it to work.

@lazarillo
Copy link

Thanks, @Zac-HD ! Do you know of any specific examples I can follow to try to mimic? Specifically, I need to use pytest builtin, function-scoped fixtures. I have found workarounds for my own creations, but not sure how to adjust builtins. ☹️

@Zac-HD
Copy link
Member

Zac-HD commented Sep 22, 2020

It really depends on which fixtures you mean - for creating temporary files you can use the stdlib tempfile and pathlib modules; for monkeypatching the pytest implementation doesn't have to be used via a fixture (see e.g.), etc. For more details StackOverflow is probably more appropriate than this issue.

@fkromer
Copy link

fkromer commented Sep 22, 2020

@Zac-HD Probably you guys have some time to create a MOOC or cookbook style book at some point in time ... if you've enough time asides your usual consulting work of course 😉

@untitaker
Copy link
Contributor

I still use pytest-subtesthack in some projects. I would recommend to avoid doing unnecessarily heavy computation or IO-bound work in your test though. Doing anything but unit-testing makes the overall testrun too slow.

@Zac-HD
Copy link
Member

Zac-HD commented Sep 22, 2020

Probably you guys have some time to create a MOOC or cookbook style book at some point in time

I'd love to have a cookbook kinda thing, but (a) who has free time in 2020? I'm fitting in maintainence around a PhD, and technical writing is a whole 'nother skillset and doing it well takes time, and (b) it's really not clear what non-intro-level recipes people would need - it's often domain-specific.

StackOverflow is actually a decent start because at least we don't have to guess, but my dream solution would be to get a grant and hire someone for ~six months to overhaul our docs.

... if you've enough time asides your usual consulting work of course

Haha, I wish that was the problem - between us we get ~one consulting gig per year. Nice when it happens, but I can literally count them on one hand. It paid for the stickers etc. I give away at conferences, but I've spent more attending than I've made from consulting... and if I calculated my effective hourly rate on open source it would be measured in cents. Don't give away software if you're in it for the money 😅

(I'm hoping that making HypoFuzz commercially-licenced will redirect enough money from corporate users to improve the situation, but it's early days and zero customers so far... we'll see)

@fkromer
Copy link

fkromer commented Sep 22, 2020

I'd love to have a cookbook kinda thing, but (a) who has free time in 2020? I'm fitting in maintainence around a PhD, and technical writing is a whole 'nother skillset and doing it well takes time, and (b) it's really not clear what non-intro-level recipes people would need - it's often domain-specific.

I began to write some things on leanpub.com but decided it's not worth the effort. The same is true for creating MOOCs. If it's not something with a huge user base (usually frontend topics like JavaScript, TypeScript, Angular, etc.) it's hard to get out enough money per hour that it's profitable.

StackOverflow is actually a decent start because at least we don't have to guess, but my dream solution would be to get a grant and hire someone for ~six months to overhaul our docs.

Yepp.

Haha, I wish that was the problem - between us we get ~one consulting gig per year. Nice when it happens, but I can literally count them on one hand. It paid for the stickers etc. I give away at conferences, but I've spent more attending than I've made from consulting... and if I calculated my effective hourly rate on open source it would be measured in cents. Don't give away software if you're in it for the money sweat_smile

The problem is that most people don't understand the power of hypothesis. AND that quality is the last thing people care about if it's about money. You are so right. I'd never open source something I own. And contributions to other peoples projects are something I think about only if I'm forced to do so. E.g. when I have to customize sources.

Someone which contributed to pandas massively has created LibreSelery. This is something I'm pretty interested in w.r.t. how it will evolve.

@fkromer
Copy link

fkromer commented Sep 22, 2020

@Zac-HD Good luck with https://hypofuzz.com/ btw! 👍

@Stranger6667
Copy link
Contributor

if any work has been done to try to use Hypothesis with pytest-subtests?

@nicoddemus FYI. Schemathesis uses it in its "lazy-loading" workflow and it seems to work alright with the underlying Hypothesis test.

@Zac-HD
Copy link
Member

Zac-HD commented Dec 16, 2020

FWIW I'd expect the subtest fixture to work OK when generating test inputs, but I'm not sure how it would interact with shrinking (might report unshrunk things?) or the Hypothesis example database (likely key collisions, which is allowed by design but still bad).

But if @Stranger6667 reports it works 🤷‍♂️, it must work at least a bit - even if not at our 'officially supported' level.

@aragilar
Copy link

What is the status of this? I've got pytest-subtesthack working to load the fixture at each example, would it make sense to include it into pytest? I couldn't get pytest-subtests working, but I think that's because I couldn't work out how to get the fixtures loaded (I was getting NameErrors).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: collection related to the collection phase topic: parametrize related to @pytest.mark.parametrize type: backward compatibility might present some backward compatibility issues which should be carefully noted in the changelog type: enhancement new feature or API change, should be merged into features branch
Projects
None yet
Development

No branches or pull requests