Plotly.purge in parcoords_test #2406

alexcjohnson · 2018-02-27T03:55:30Z

@etpinard this seems to make parcoords_test pass reliably - and I can even take out the @flaky tag I added in #2399. (I did get a gl2d failure in one of these runs though... https://circleci.com/gh/plotly/plotly.js/7227)

I seem to recall trying to add Plotly.purge directly to destroyGraphDiv once before, and running into problems that looked like I was purging after the next test case had already started... anyway that's why I added the delay(50)().then(done), and if that's really required it doesn't seem like a good idea to add it to every destroyGraphDiv - though perhaps we can make this into a variant just for use with gl tests? Or maybe we can detect gl contexts and only purge/delay if we find one?

A little more detail on my investigations:

Even when the tests failed in normal CI, they would consistently work if I ssh into the same CI container and run them right after they fail. But I can see the tests that fail in the normal run take a very long time in ssh.
The gl tests run much more slowly on CI than locally. The non-gl tests also run slower but only a little, whereas the gl tests run far slower on CI:
- the full gl suite goes about 3x slower:
  -npm run test-jasmine -- --tags=gl --skip-tags=noCI,flaky
  - local: ~45 sec
  - CI ssh ~2 min 32 sec
- just parcoords is an extreme case >6x slower!
  - npm run test-jasmine -- --tags=gl --skip-tags=noCI,flaky parcoords:
  - local ~9 sec
  - CI ssh ~57 sec
- Adding Plotly.purge doesn't make things go faster - unsurprisingly it adds a couple of seconds - but it does seem to make the timing more consistent from one test case to the next, as though the need to purge was building up and then when the system finally decided to do it on its own, it took an enormous amount of time and caused a timeout. If that's the case, then we could also consider increasing the 30-sec no activity timeout.

etpinard · 2018-02-27T16:18:55Z

Even when the tests failed in normal CI, they would consistently work if I ssh into the same CI container and run them right after they fail. But I can see the tests that fail in the normal run take a very long time in ssh.

That's consistent with what I noticed in previous investigations, and the reason why I gave up debugging flaky tests on CI and added @flaky 😛

The gl tests run much more slowly on CI than locally.

Yeah, that's what you get for faking GPU processing using xvfb on a headless CI machine 😓

The parcoord tests are extremely resource intensive, that's why we originally skipped them on CI. Maybe we could try to 🔒 the same set of features with mocks with less dimensions and values?

though perhaps we can make this into a variant just for use with gl tests?

Maybe we could add a destroyGlGraphDiv method to assets/? Or perhaps we could start replacing all the Plotly.plot by Plotly.newPlot (which should clean up old gl contexts) - which will slow down our test, but should make things easier to maintain. ⚖️

All in all, thanks for investigating this! 💃 💃

try purge in afterEach in parcoords_test

ec69aa6

etpinard added type: maintenance labels Feb 27, 2018

alexcjohnson merged commit e0e3ee1 into master Feb 27, 2018

alexcjohnson deleted the grid-ci-test branch February 27, 2018 16:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plotly.purge in parcoords_test #2406

Plotly.purge in parcoords_test #2406

alexcjohnson commented Feb 27, 2018

etpinard commented Feb 27, 2018

Plotly.purge in parcoords_test #2406

Plotly.purge in parcoords_test #2406

Conversation

alexcjohnson commented Feb 27, 2018

etpinard commented Feb 27, 2018