Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Volatility introduced in tests since approx September 18th - potentially package sync #446

Open
ncclementi opened this issue Oct 13, 2022 · 16 comments
Assignees

Comments

@ncclementi
Copy link
Contributor

Some tests that were decently stable have become very volatile around mid september. This makes it very hard to trust the regression detection system we have in place. But there is a pattern:

q3 [5GB parquet] - upstream
Screen Shot 2022-10-13 at 12 51 11 PM

q5 [5GB parquet] - upstream - Hard to see but the red line became more volatile around the same time
Screen Shot 2022-10-13 at 12 54 52 PM

q7 [5GB parquet] - upstream
Screen Shot 2022-10-13 at 12 56 21 PM

q8 [5GB parquet] - upstream
Screen Shot 2022-10-13 at 12 56 56 PM

test_dataframe_align - volatile and definitely a regression
Screen Shot 2022-10-13 at 12 58 17 PM

test_shuffle - became more volatile
Screen Shot 2022-10-13 at 12 59 26 PM

The interesting thing is that this volatility also happens on latest (2022.6.1) which makes me think this issue was introduced with something that happened in package sync (PR merged 09/15)

q3 [5GB parquet] latest
Screen Shot 2022-10-13 at 1 04 51 PM

q7 [5GB parquet] latest
Screen Shot 2022-10-13 at 1 06 04 PM

test_dataframe_align
Screen Shot 2022-10-13 at 1 06 55 PM

cc: @ian-r-rose @jrbourbeau @shughes-uk

@ncclementi
Copy link
Contributor Author

@fjetter and @ntabris for visibility

@fjetter
Copy link
Member

fjetter commented Oct 18, 2022

Comparing conda environments between pre and post merge, the most interesting diff is coiled itself which jumped from 0.2.27 to 0.2.30 but I only compared two runs. I'd be interested to see a comparison of more commits, particularly the spikes.
Do we have an easy way checking what versions where installed on all the runs? Is this something we should consider adding to the benchmark database, e.g. output of client.get_versions?

@ntabris
Copy link
Member

ntabris commented Oct 18, 2022

Do we have an easy way checking what versions where installed on all the runs? Is this something we should consider adding to the benchmark database, e.g. output of client.get_versions?

Two thoughts.

  1. I'm interesting in having better tagging/tracking on the platform side, rather than a thing that's specifically for coiled-runtime tests.
  2. I'll take a quick look and see if I can get some data about version changes around Sept 19.

@ntabris
Copy link
Member

ntabris commented Oct 18, 2022

It's very hard to find data for old tests. So here's what I can get without ridiculous amounts of work.

Here's a test_array cluster from after volatility increased:

https://cloud.coiled.io/dask-engineering/clusters/83375/details

That cluster has py3.9 and dask/distributed 2022.6.0. You can follow that link to see other software versions.

Here's cluster from before that has py3.9 and dask/distributed 2022.6.0:

https://cloud.coiled.io/dask-engineering/clusters/75547/details

I see changes in pandas and pyarrow. Would that be relevant? Anyone else is welcome to dig in more comparing those clusters, I don't have much more insight.

@shughes-uk
Copy link
Contributor

I've identified a couple of issues with package sync, in both detecting packages on cluster and client side. Staging has a ton of bugfixes i'm eager to get out but it's held up currently by ensuring we've tested some of the infra work.

@ntabris
Copy link
Member

ntabris commented Oct 18, 2022

I've identified a couple of issues with package sync, in both detecting packages on cluster and client side. Staging has a ton of bugfixes i'm eager to get out but it's held up currently by ensuring we've tested some of the infra work.

Any reason to think package sync bugs are causing this?

@shughes-uk
Copy link
Contributor

shughes-uk commented Oct 18, 2022

I suspect it might be causing release versions to be installed instead of main branch. If this is happening even on legacy tests, then it's unlikely.

@ntabris
Copy link
Member

ntabris commented Oct 18, 2022

I suspect it might be causing release versions to be installed instead of main branch. If this is happening even on legacy tests, then it's unlikely.

Yeah, you can see the change on clusters running coiled-runtime=0.1.0: https://benchmarks.coiled.io/coiled-0.1.0-py3.9.html

The examples I listed #446 (comment) are runtime 0.1.0 w/ py39.

@ncclementi
Copy link
Contributor Author

@shughes-uk Should we have an "environment" in the runtime where we test against staging?
It would dask/distributed upstream + coiled staging. Or is this something y'all have covered on your end?

@shughes-uk
Copy link
Contributor

If we can identify a subsection of the tests for this it might be worthwhile, otherwise the cost might be prohibitive

@ncclementi
Copy link
Contributor Author

It seems most of the volatility has calm down, except for the case of test_dataframe_align. I would be inclined to close this issue and talk about dataframe align in a separate one, thoughts?

q3 [5GB parquet] - upstream
Screen Shot 2022-10-31 at 12 50 21 PM

q5 [5GB parquet] - upstream
Screen Shot 2022-10-31 at 12 51 04 PM

q7 [5GB parquet] - upstream
Screen Shot 2022-10-31 at 12 51 47 PM

q8 [5GB parquet - upstream
Screen Shot 2022-10-31 at 12 52 33 PM

test_dataframe_align - upstream (still crazy jumps)
Screen Shot 2022-10-31 at 12 53 43 PM

test_shuffle - upstream
Screen Shot 2022-10-31 at 12 54 37 PM

q3 [5GB parquet] latest
Screen Shot 2022-10-31 at 12 55 49 PM

q7 [5GB parquet] latest
Screen Shot 2022-10-31 at 12 56 44 PM

test_dataframe_align latest
Screen Shot 2022-10-31 at 12 57 08 PM

@ntabris
Copy link
Member

ntabris commented Oct 31, 2022

vorticity also shows non-trivial variability

@ncclementi
Copy link
Contributor Author

vorticity also shows non-trivial variability

Yes, but that has been that way since the beginning, maybe @gjoseph92 knows more about it.

Screen Shot 2022-10-31 at 1 07 41 PM

@gjoseph92
Copy link
Contributor

I don't think that's too surprising, I think that one spills a bit? I'd expect it to get better after dask/distributed#7213

@ntabris
Copy link
Member

ntabris commented Nov 1, 2022

I don't think that's too surprising, I think that one spills a bit?

Yep. Spill is also why test_dataframe_align sometimes is quick (when no spill) and sometimes slow (when lots of spill).

@gjoseph92
Copy link
Contributor

This could even be the same thing as test_dataframe_align: if work stealing decides to mess with the co-assignment, that could also affect this test I think

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants