-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Test Synapse in worker mode under Complement #12638
Comments
@anoadragon453 could you braindump what you think needs to get done here? |
matrix-org/pipelines#123 (comment) is probably a good start |
#10065 (comment) is probably the best summary at the moment. The main takeaway is signalling to Complement that all processes have completed startup before testing begins. |
Was this done? |
@kegsay Nope, still a TODO :/ |
#10065 is now resolved, and it is relatively easy to run complement against a synapse-with-workers by setting the However, I just tried to do so, and (after 15 minutes) it exploded with a pile of fail, so more work to be done here :(. |
ok, it works much better if you bump up the timeout for starting the homeserver (#12637). The failing tests are now:
... which doesn't sound so bad. Presumably there is some snafu with routing of registration, which will explain 6 out of 7 of those fails. |
oh hum, it looks like the whole of the /csapi package failed:
|
right, if you bump the timeout right up, you still get this list of test failures:
Which still isn't so bad. (Though a runtime of getting on for an hour is obviously sub-optimal.) |
I ran Complement with workers1, obtaining a runtime of 11m 7s and the following failures:
(Not sure what happened between Rich's run taking most of an hour and my run!) I also get some failures with no workers2; so there might be something wrong with my setup (as CI doesn't reproduce these failures):
So the summary of 'What's the state of workerised Synapse under Complement?' is:
Footnotes |
they all take a long time. Every single test seems to take a good 30 seconds to spin up Synapse: AFAICT it's just busy spinning up 14(?) python processes - or 28, if it's a federated test. It's entirely possible the difference is in hardware. FTR my machine is a i7-6560U with 16G of ram, though it's a bit venerable and I suspect it gets thermally throttled when it's busy. |
Why is Sytest not bitten by this problem? Come to think of it, does Sytest not spin up a new homeserver for every test? I also wonder if supervisor isn't starting up the workers at the same time or something like that;
That's a good point; the machine I use is very performant (Ryzen 7 5800H with 32 GB RAM). The cooling is also very competent so I suspect it's not thermally throttled. It has enough threads for all the workers to get their own (crazy I know). A 6x difference in runtime doesn't sound so crazy taking all that into account. What does Complement's runtime look like for you when using a monolithic deployment? As a point of comparison, |
No it does not, and that is why it is not bitten by this problem :). It spins up two HS instances, and uses them for each test. On the one hand, it's faster. On the other, it's very easy for state to leak from one test to another.
it seems to be starting them together, or near enough. Ironically I wonder if it would be better if it staggered them. But 🤷♂️
Apparently, more than 10 minutes, since it timed out. Something I forgot here: it's not just spinning up two homeservers at once (for the federated tests), it's doing three (because Complement likes to run the |
I'm down to just a couple of test failures (on a custom branch with all my in-flight PRs merged, that is):
These two tests are tracked in #12822 and #12825. After that, I'm trying to get workerised Complement enabled in CI at #12810 (looks like it's going to involve a bit of fighting GitHub Actions since my first attempt didn't just work right away). Fixing the slowness would be nice anyway, to help Rich and likely others out, but if CI turns out to be tolerable (i.e. no slower than the slowest of SyTest or Trial?) then it'd be good to get this running in CI as soon as possible. |
#12810 is giving 52 minutes for running the suite in CI. That's not good; easily twice as slow as the next slowest thing in CI. I will note that this figure will almost certainly get worse as new tests are added (so aiming for SyTest parity will make it worse!) |
Since Complement tests are self-contained, one way to handle this is to run the test suite across multiple GHA workers. It's not ideal, but would work. |
I think we need to confirm what, exactly, is taking so long. I think it's Python loading all the |
There's been some work to support this in matrix-org/complement#62, matrix-org/complement#105, and matrix-org/complement#116 but it still isn't possible to run complement against Synapse in worker mode. We should finish this up.
This is a blocker for removing tests from sytest / switching to Complement.
The text was updated successfully, but these errors were encountered: