Explore running smaller, automated crawls in CI to detect regressions #49

englehardt · 2019-08-27T20:27:14Z

In the past we've run small test crawls to check for regressions in profile handling. This test has since been disabled alongside the rest of our stateful crawling support (see: https://github.com/mozilla/OpenWPM/projects/2).

As proposed in #28 (review), CI-only crawls would be helpful in detecting regressions in the overall site crash rate, timeout rate, and error rate. Setting this up wouldn't be entirely straight forward, so I'm opening this issue for discussion purposes.

motin · 2019-08-28T09:20:58Z

To minimize crawl-by-crawl variation while still resembling somewhat of a real crawl (without actually crawling real URLs as that would be inpolite), we ought to set up some isolated mirrored snapshot of a set of urls to crawl. Ideally the size of this set should be fairly large, at least 1k, and executed in GCP with kubernetes+docker to resemble the way production crawls are executed.
We can have a small cluster running with auto-scaling enabled and an associated GCP machine account which can trigger the crawl in CI. I wonder if this should be part of this repo or set up in https://github.com/mozilla/openwpm-crawler? Maybe we should have a very small test crawl as part of the ordinary OpenWPM tests as before and let longitudinal health monitoring of more sizable crawls be implemented in a separate repo?

vringar · 2020-05-07T21:20:05Z

I think we should explore running the webserver that we use in local testing in it's own docker container and then have the crawlers run against it in a docker-compose enviroment.
It isn't quite the real thing but it should be close enough seeing as well have localstack and redis running as well.

birdsarah · 2020-05-13T23:30:27Z

Note that we can conda install redis so when openwpm/OpenWPM#648 is done, we can probably do this without spawning docker containers in travis.

motin changed the title ~~Explore running smaller, automated crawls in CI to detect regresssions~~ Explore running smaller, automated crawls in CI to detect regressions Aug 28, 2019

vringar transferred this issue from openwpm/OpenWPM Nov 11, 2020

vringar closed this as completed Nov 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explore running smaller, automated crawls in CI to detect regressions #49

Explore running smaller, automated crawls in CI to detect regressions #49

englehardt commented Aug 27, 2019

motin commented Aug 28, 2019

vringar commented May 7, 2020

birdsarah commented May 13, 2020

Explore running smaller, automated crawls in CI to detect regressions #49

Explore running smaller, automated crawls in CI to detect regressions #49

Comments

englehardt commented Aug 27, 2019

motin commented Aug 28, 2019

vringar commented May 7, 2020

birdsarah commented May 13, 2020