Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use fixtures to better distribute tests via load scheduling #18

Open
nchammas opened this issue Dec 2, 2015 · 41 comments · May be fixed by #500
Open

Use fixtures to better distribute tests via load scheduling #18

nchammas opened this issue Dec 2, 2015 · 41 comments · May be fixed by #500

Comments

@nchammas
Copy link
Contributor

nchammas commented Dec 2, 2015

xdist currently supports 2 ways of distributing tests across a number of workers:

  1. "Each" Scheduling: Given a set of tests, "each" scheduling sends each test in the set to each available worker.
  2. Load Scheduling: Given a set of tests, load scheduling distributes the tests among the available workers. Each test runs only once, and xdist tries to give each worker the same amount of work.

Problems with Load Scheduling's current implementation

The current load scheduling implementation distributes tests naively across the workers. Often this means that two tests which depend on the same fixture get assigned to different runners, and the fixture has to be created on each runner.

This is a problem. Fixtures often capture expensive operations. When multiple tests depend on the same fixture, the author typically expects the expensive operation represented by that fixture to happen only once and be reused by all dependent tests. When tests that depend on the same fixture are sent to different workers, that expensive operation is executed multiple times. This is wasteful and can add significantly to the overall testing runtime.

Our goal should be to reuse the existing pytest concept of the fixture to better distribute tests and reduce overall testing time, preferably without adding new options or APIs. This benefits the most users and strengthens pytest's declarative style.

Proposed solution

We can solve this problem in 3 phases:

  1. Let's formalize the concept of a "test chunk" or "test block". A test chunk is a group of tests that always execute on the same worker. This is an internal xdist abstraction that the user normally doesn't have to know about.

    The master will only send complete test chunks to workers, not individual tests. Initially, each test will be assigned to its own chunk, so this won't change xdist's behavior at first. But it will pave the way for us to chunk tests by their attributes, like what fixtures they depend on.

    Once we have this internal abstraction, we can optionally also expose a hook that lets users define their own chunking algorithm to replace the initial default of "1 test -> 1 chunk". The hook won't be very useful until more information about each test is made available, which brings us to the next phase.

  2. We need to pass in additional information to the xdist master about the tests it's running so it can better distribute them. Specifically the master needs to be able to identify unique instances of every fixture that each test depends on. Tests that depend on distinct fixture instances can be assigned to different chunks and thus sent to different workers.

    To identify the distinct fixture instances that each test depends on, we need the following pieces of information for each test:

    1. Test name.
    2. For each fixture that the test depends on:
      1. Fixture name.
      2. Fixture scope and "scope instance".
        • e.g. This fixture is module-scoped and this particular instance of the fixture is for module test_a.py.
      3. Fixture parameter inputs, if the fixture is parameterized.
        • e.g. We need to distinguish fixture(request.param=1) and fixture(request.param=2) as separate fixture instances.

    Initially this information won't be used for anything. It will just be made available to the master. At this point, we'll be ready for the final phase.

  3. Using the new information we have about tests, along with the new internal abstraction of a test chunk, we can now chunk tests up by the list of unique fixture instances they depend on.

    Tests that depend on the same instance of a fixture will now always be sent to the same worker. 🎉

Advantages over other solutions

This approach has two major advantages over other proposals.

  1. This approach adds no new configuration options or public APIs that users have to learn. Everyone using pytest automatically gets a better, arguably even more correct, distribution of their tests to workers without having to do any work.
  2. This approach promotes a declarative style of writing tests over an imperative one. The goal we should strive for is: Capture your ideas correctly, and pytest will figure out the appropriate execution details. In practice, this is probably not always feasible, and the user will want to exercise control in specific cases. But the closer we can stick to this goal the better the pytest user experience will be.

How this approach addresses common use cases

There are several use cases described in the original issue requesting some way to control parallel execution. This is how the approach described here addresses those use cases:

  • Use case: "I want all the tests in a module to go to the same worker."

    Solution: Make all the tests in that module depend on the same module-scoped fixture.

    If the tests don't need to depend on the same fixture, then why do they need to go to the same worker? (The most likely reason is that they are not correctly isolated.)

  • Use case: "I want all the tests in a class to go the same worker."

    Solution: The solution here is the same. If the tests need to go to the same worker, then they should depend on a common fixture.

    To cleanly address this case, we may need to implement the concept of a class-scoped fixture. (Fixtures can currently be scoped only by function, module, or session.) Otherwise, the tests can depend on a common, module-scoped fixture and achieve the same result.

  • Use case: "I want all the tests in X to go to the same worker."

    Solution: You know the drill. If tests belong on the same worker, we are betting that there is an implied, shared dependency. Express that shared dependency as a fixture and pytest will take care of the rest for you.

Counter-proposal to #17.
Addresses pytest-dev/pytest#175.
Adapted from this comment by @hpk42.

@nicoddemus
Copy link
Member

@nchammas wow, congratulations on a very well written and thought out proposal! 😄

It clearly states the problems we have today, the solution and perhaps most importantly the steps to achieve it. I will give it some more thought and post my comments later.

Thanks!

@hpk42
Copy link
Contributor

hpk42 commented Dec 3, 2015

i think it would be instructive to write a plugin that hacks into the internal data structures to get the "feature instance" thing in association with tests. So one could print a mapping of "fixture-instance -> [testids]" and get it over some existing test suites to know what we are dealing with. I am not sure how easy it is to get this information - getting the fixturenames per test ("request.fixturenames") and probably the fixture definition function (working with the internal registry) for each fixturename should be doable but that doesn't provide the instances yet. the "instance" structure, so to say, is more obvious during execution i think. So maybe the plugin (or pytest hack) could first try to get the information during execution instead of during collection.

My other thought is that i am not sure if we can just send the tests for each fixture instance to one worker. If you have an autouse - db-transaction session-scoped fixture it will be present in most tests but you still want to send them to different workers. Maybe it's easier to consider first chunking by class and module scoped fixtures and ignore session/function.

so much for now.

@nicoddemus
Copy link
Member

the "instance" structure, so to say, is more obvious during execution i think. So maybe the plugin (or pytest hack) could first try to get the information during execution instead of during collection.

Could you please clarify what you mean by instance structure? Also, what phase do you mean by "during execution*? As I understand it, we have collection and test execution phases only, with the test execution happening at the slaves...

If you have an autouse - db-transaction session-scoped fixture it will be present in most tests but you still want to send them to different workers.

Good point, but I think autouse fixtures should not be taken in account by default, since they would for all tests affected by it into the same slave. Perhaps they should opt-in using some flag or mark?

@hpk42
Copy link
Contributor

hpk42 commented Dec 3, 2015

"instance structure" -- i meant which fixture instances map to which tests. With "during execution" i mean indeed test execution. The plugin is not meant for implementing the proposal but for getting a better idea which kind of "fixture instances versus tests" we have.

@nicoddemus
Copy link
Member

I see, thanks for the clarification. 😄

@nicoddemus
Copy link
Member

Just to make my point clear about autouse fixtures maybe being opt-in to be taken into consideration for chunking, it's possible to have session-autouse fixtures in a conftest.py deep in the test hierarchy, affecting only a few tests so in this case it might make sense to chunk those tests together.

But of course that is a refinement that might be done later on.

@hpk42
Copy link
Contributor

hpk42 commented Dec 4, 2015

sure, there are all kinds of situations with fixture setups. The decision how to distribute given all the information is probably not a simple "just send all tests using a particular fixture to the same worker" because there are many fixtures involved in real-world projects. We might want to also inform the master about the dependencies between fixtures (or even fixture instances).

@nchammas
Copy link
Contributor Author

nchammas commented Dec 4, 2015

It sounds like we agree on the overall approach described here, and we want to start by tackling phase 2 (whether by writing a plugin, hacking pytest, etc.). Is that correct?

I'm trying to get a clear sense of whether there is consensus on the next step to take before moving forward.

@nicoddemus
Copy link
Member

sounds like we agree on the overall approach described here, and we want to start by tackling phase 2 (whether by writing a plugin, hacking pytest, etc.). Is that correct?

I think so. I would hack it into a pytest or pytest-xdist fork which could discover and dump fixture information into a json file containing test ids and fixtures used by each test id along with their scopes. Perhaps a challenge here will be how to identify that two fixtures with the same name are actually different fixtures because fixtures can be overwritten in different hierarchies.

With this in place, we can ask people to run their test suites with it, and then take a look at the results. 😁

@RonnyPfannschmidt
Copy link
Member

Note that itwill break for remote usage

@nchammas
Copy link
Contributor Author

nchammas commented Dec 5, 2015

@RonnyPfannschmidt - Once we have an implementation that collects this additional information about tests, I'm sure we'll uncover many issues. It'll probably be easier to understand them if we have an implementation to refer to, or at least the beginnings of one.

@nicoddemus / @hpk42 - I have a few questions which will help me get started on making this additional information available to the xdist master. Since we don't have any architecture docs, I can even use your answers to start building minimal documentation of how xdist works, if we want.

  1. Is what we call "the xdist master" simply represented by NodeManager?
  2. Where is the interface through which pytest gives the xdist master information about what tests to run? This is the interface we'll want to work through to make the additional information available to xdist.
  3. I'm guessing we'll want to get the information we need about fixtures from FixtureManager. Is that correct? From your knowledge of how fixtures work, is all the information listed in "phase 2" already available via FixtureManager?

@RonnyPfannschmidt
Copy link
Member

I would like to suggest a video conference for technical introduction and brainstorming

@nchammas
Copy link
Contributor Author

nchammas commented Dec 5, 2015

That's fine with me, but I'll just add that when we communicate via GitHub, 1) it makes it easier for us to summarize our discussions in a document (because everything is already in writing), and 2) it makes it easier for others to follow along. I do appreciate though that a video discussion is more easy going and could help kickstart things.

@RonnyPfannschmidt
Copy link
Member

True, but I think we can reasonably connect that gap by logging the discussion in a etherpad and copying to a wiki page later

@nchammas
Copy link
Contributor Author

nchammas commented Dec 8, 2015

What's the status here? Are we waiting for someone to setup a call, or are we OK with discussing this via GitHub? Or should I just try to figure things out on my own for now?

@nicoddemus
Copy link
Member

I will try to write something regarding how xdist works tomorrow... I think @RonnyPfannschmidt will setup a chat as well.

@RonnyPfannschmidt
Copy link
Member

Due to personal constraints I won't be able to do it before Wednesday evening European time :(

@nchammas
Copy link
Contributor Author

Hey folks,

Sorry I haven't been able to continue working on this like I originally planned. It likely won't be until several months from now that I can revisit this, if at all. I hope some other motivated person who comes across the proposal here takes a good look at it and considers putting together an implementation. There is quite a bit of work to be done, but I think it will make xdist much better.

@nicoddemus Thanks for putting together the overview in #28. My 3 questions from this comment still stand, though I suppose now it's not important to answer them unless someone is in a position to act on the answers.

@nicoddemus
Copy link
Member

@nchammas thanks for bringing this up and rekindling the discussion! 😄

@nicoddemus
Copy link
Member

I think implementing #19 is a good 1st step independently of pursuing the "fixture" based approach now. It seems simple to implement and might even sprout 3rd party plugins that use different heuristics for chunking tests together.

If everybody agrees I can find some time to tackle that.

@piwai
Copy link

piwai commented Jan 5, 2016

@nicoddemus / @nchammas thanks for pursuing the effort! (I'm the author of the initial PoC commented by @hpk42 on which this is based, but I must admit I haven't found time to improve it myself).

@leycec
Copy link

leycec commented May 27, 2016

Has there been any recent movement on this critical feature request? It's been four or five years since the initial round of relevant issues (e.g., #175) and stackoverflow queries (e.g., 12411431). As a thoroughly biased external observer, there appears to have been no progress on any of this. At all.

Plans were drafted. Words were spoken. But nothing was done. At this late hour, I'd happily except any semblance of a working solution – a new third-party plugin, a new test decorator, a new CLI option. I'd happily accept an unofficial and presumably untested xdist rewrite, fork, or diff implementing this. Anything.

In our bioinformatics-based use case, functional tests are fundamentally serial. Tests depend on the success of other tests. Happily, @hpk42 solved this frequently asked question several years ago with an ingenious pair of py.test hooks. Unhappily, these hooks assume interdependent tests to be run on the same slave. Due to the computational nature of our tests, the cost of repeating work performed by prior tests on different slaves is substantially higher than the benefits of parallelizing tests in the first place. Ergo, we gain far more than we lose by abandoning xdist. Which is self-evidently sad panda. 🐼

Most use cases (including ours) do not require full-blown support for general-purpose, industrial-strength, patent-ready test chunking; they only require the capacity to force specific tests to be run on the same slave. That's it. That can't be as hard as the miserable silence on these issues suggests.

This is the sound of silence. In code space, no one can hear you scream.

@nchammas
Copy link
Contributor Author

Plans were drafted. Words were spoken. But nothing was done. At this late hour, I'd happily except any semblance of a working solution – a new third-party plugin, a new test decorator, a new CLI option. I'd happily accept an unofficial and presumably untested xdist rewrite, fork, or diff implementing this. Anything.
...
Most use cases (including ours) do not require full-blown support for general-purpose, industrial-strength, patent-ready test chunking; they only require the capacity to force specific tests to be run on the same slave. That's it. That can't be as hard as the miserable silence on these issues suggests.

Please don't use this kind of tone. You are talking down to volunteers who are working in their spare time for fun. Perhaps you didn't mean to come off that way, but that's how your comment reads.

If you are so burned up over not having this feature, please implement it yourself and submit a PR instead of making a patronizing comment like this.

@RonnyPfannschmidt
Copy link
Member

@timj the new loadscope feature may be able to help you before the exact case can be supported

@timj
Copy link
Contributor

timj commented Sep 2, 2017

Thank you. I will try that. It seems like a good stop gap. It's less than ideal in the sense that it will force all the tests to be locked to nodes by class/module but it will let me include this test without doing something special.

@hroncok
Copy link
Member

hroncok commented May 28, 2018

A poor man's solution to this problem:

from xdist.scheduler import LoadScopeScheduling


class FixtureScheduling(LoadScopeScheduling):
    """Split by [] value. This is very hackish and might blow up any time!
    """
    def _split_scope(self, nodeid):
        if '[' in nodeid:
            return nodeid.rsplit('[')[-1].replace(']', '')
        return None


def pytest_xdist_make_scheduler(log, config):
    return FixtureScheduling(config, log)

(In this case I only use fixtures based on the parameter, so I can use this approach.)

hroncok added a commit to hroncok/taskotron-python-versions that referenced this issue May 28, 2018
This is an attempt to fix fedora-python#45

It does the following:

 * uses pytest-xdist to paralelize the tests (3 workers)
 * utilizes custom scheduler that splits the tests acroding to the fixture name
   * this is needed not to use 1 fixture on multiple workers
     see pytest-dev/pytest-xdist#18
 * use parametrize an all integration tests to enbale our hackish scheduler
 * mock now creates the roots in pwd (not to pollute the filesystem on /)
   * note that the roots can take sveral GBs
hroncok added a commit to hroncok/taskotron-python-versions that referenced this issue May 28, 2018
This is an attempt to fix fedora-python#45

It does the following:

 * uses pytest-xdist to paralellize the tests (3 workers)
 * utilizes custom scheduler that splits the tests according to the fixture name
   * this is needed not to use 1 fixture on multiple workers
     see pytest-dev/pytest-xdist#18
 * use parametrize an all integration tests to enable our hackish scheduler
 * mock now creates the roots in pwd (not to pollute the filesystem on /)
   * note that the roots can take several GBs
@RonnyPfannschmidt
Copy link
Member

indeed - https://github.com/ManageIQ/integration_tests/tree/master/cfme/fixtures/parallelizer uses a similar approach for its own parallelism (not xdist based)

hroncok added a commit to fedora-python/taskotron-python-versions that referenced this issue May 30, 2018
This is an attempt to fix #45

It does the following:

 * uses pytest-xdist to paralellize the tests (3 workers)
 * utilizes custom scheduler that splits the tests according to the fixture name
   * this is needed not to use 1 fixture on multiple workers
     see pytest-dev/pytest-xdist#18
 * use parametrize an all integration tests to enable our hackish scheduler
 * mock now creates the roots in pwd (not to pollute the filesystem on /)
   * note that the roots can take several GBs
@Kkevsterrr
Copy link

I've been watching this issue for the last year or two - just wanted to touch base and see if any progress has been made on this - having this capability would be very helpful! Huge thanks to all of the hardworking volunteers that have already invested so much time into pytest and xdist to bring it where it is today.

@nicoddemus
Copy link
Member

Since then there's been two new scope modes: --dist=loadscope and --dist=loadfile. They are not based on fixtures, but they allow to distribute tests using a new criteria and seem to be useful to a number of people.

But other than that I'm afraid nobody has put the time to work on this.

@SalmonMode
Copy link

While the --dist argument is incredibly helpful, sometimes it just won't work, even if you're following best practices in terms of isolating tests and their resources.

However, I really like @hroncok's approach here, and I link to their comment in this issue every time someone asks how to do this. It's basically analyzing every test's nodeid as it's determining how to split them up, and whatever is returned by _split_scope is effectively a "group" for that test where the tests associated with that group won't be split across different processes.

This approach not only lets you easily specify a de facto logic based on normal scopes (i.e. session, package, module, class, function) by simply targeting the part of the nodeid relevant to the desired scope, but it also lets you set arbitrary criteria based on things like parametrization at the same time, using very straightforward logic.

For example, if I want to default to splitting based on class, but handle a specific parameter should it come up (e.g. maybe I want all tests involving Google Chrome to be run together), I could do something like this:

def _split_scope(self, nodeid):
    if '[' in nodeid:
        params = nodeid.rsplit('[')[-1].replace(']', '').split('-')
        if "chrome" in params:
            return "chrome"
    class_nodeid = nodeid.rsplit("::", 1)[0]
    return class_nodeid

I think this approach is solid, but could be refined by making it a hook that is basically just a standalone callable, e.g.:

def pytest_xdist_split_scope(nodeid):
    if '[' in nodeid:
        params = nodeid.rsplit('[')[-1].replace(']', '').split('-')
        if "chrome" in params:
            return "chrome"
    class_nodeid = nodeid.rsplit("::", 1)[0]
    return class_nodeid

The default behavior could just be as it is now, including the --dist flag, but defining this hook would override that behavior.

If I get a chance, I'll start working on a PR, but what does everyone think?

@fruch
Copy link

fruch commented Nov 21, 2019

Would getting only the markers during collection, is any easier then getting the fixtures ?, cause for my use case I want to mark the test, and schedule according to specific marks.

@SalmonMode
Copy link

SalmonMode commented Nov 22, 2019

@fruch it doesn't really have to do with fixtures. The way it's currently implemented, the easiest and most straightforward approach would be to base it off of nodeid. Anything else would require much more refactoring. It's also fairly easy to just group tests together based on which ones you want to run together (e.g. putting them in classes, modules, or packages, or even param sets like with the browser example), or even just have a common bit of text anywhere in the nodeid (be it package, module, class, or function name), and then sort them into work groups based on that.

It's important to note that this doesn't really "schedule" tests in a timing sense. There's no real control over the order the tests are run in, what time they get run, or restrictions around which ones can run at certain times or when they can't run relative to other tests. It's strictly determining which tests get grouped together to get run by the same worker in the same batch of tests. But even if two groups of tests happen to get run by the same worker it would be best to assume this wouldn't happen as it's non-deterministic, so each test grouping should be treated and designed as though it's running in complete isolation, but could also be running at the same time as any other test grouping. And even within a test grouping, you should assume the order those tests run in is non-deterministic as well.

Actually scheduling the tests based on order and timing would be a much greater challenge, and doesn't seem like something a plugin designed to run tests in parallel would be concerned with (IMO), as this could just be handled in a more meta fashion by just running multiple test suites in sequence, or even just multiple subsets of tests within a suite in sequence through multiple pytest calls. This seems like something that's more the responsibility of the CI/CD pipeline and how it calls the tests. Anything that could be implemented by a plugin at the test suite level to handle this would really only get worse results than if it was handled by an external coordinating mechanism like a CI/CD pipeline. If a test group is loaded by a worker that can't be run because another worker is running a test group that can't be run at the same time as the other one, then that first worker (assuming it's aware of this) will have to either go searching for another test group that it can run at that time, or it will have to sit there doing nothing until the other worker is done with that test group. Worse yet, if a worker is running a test group that can't be run at the same time as any other tests, then every other worker will have to sit there doing nothing until it's done.

One other addition that someone suggested to me elsewhere would be to control the sorting order in which test groups are placed in the dict of work groups (especially since dicts are ordered in the latest versions of python), as this would let you place a particularly long running work group at the beginning to more evenly distribute the workload/run time across all workers for faster test runs. But this would probably be best done as a separate PR than the one made as a solution for this ticket.

@SalmonMode SalmonMode linked a pull request Jan 19, 2020 that will close this issue
@fpiccione
Copy link

fpiccione commented Feb 9, 2020

Great write up!

Let's formalize the concept of a "test chunk" or "test block".

I thought that these concepts were already inherited from pytest, so for example a class was a group of tests. Could we not stay with those to keep it simple?

To cleanly address this case, we may need to implement the concept of a class-scoped fixture. (Fixtures can currently be scoped only by function, module, or session.)

Doh! I thought this was already implemented, and created an issue as couldn't get a class scoped fixture to work properly with xdist and loadscope option:
#501
Someone pls confirm and I will close it.

@SalmonMode
Copy link

SalmonMode commented Feb 9, 2020

@fpiccione "test/work groups/chunks" in this context are lists of test nodeids that are bundled together. They don't have to be part of the same logical structure like a class.

For example, I could have an entire e2e test suite using browsers, but parameterize them all with the different browser types, so they all would run once for each browser type. These tests would span numerous logical structures, but wouldn't necessarily be included in the same group/chunk depending on the distribution used. I could theoretically group them all based only on the browser they use, so all the Firefox tests go into one group, and all the chrome tests go into another.

These groups are what the workers grab when they're ready to do more work. They just pop one off the queue and start going.

The slashes I'm using are because there's no formalized terminology yet. My vote is for either "test group" or "test work group" though.

As for the issue you opened, I added a comment there clarifying how the plugin works and what the loadscope distribution does/is meant for.

Also, this ticket is very old, so it looks like the class scoped grouping had been implemented since this was created.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.