Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore running smaller, automated crawls in CI to detect regressions #49

Closed
englehardt opened this issue Aug 27, 2019 · 3 comments
Closed

Comments

@englehardt
Copy link
Contributor

In the past we've run small test crawls to check for regressions in profile handling. This test has since been disabled alongside the rest of our stateful crawling support (see: https://github.com/mozilla/OpenWPM/projects/2).

As proposed in #28 (review), CI-only crawls would be helpful in detecting regressions in the overall site crash rate, timeout rate, and error rate. Setting this up wouldn't be entirely straight forward, so I'm opening this issue for discussion purposes.

@motin motin changed the title Explore running smaller, automated crawls in CI to detect regresssions Explore running smaller, automated crawls in CI to detect regressions Aug 28, 2019
@motin
Copy link
Contributor

motin commented Aug 28, 2019

To minimize crawl-by-crawl variation while still resembling somewhat of a real crawl (without actually crawling real URLs as that would be inpolite), we ought to set up some isolated mirrored snapshot of a set of urls to crawl. Ideally the size of this set should be fairly large, at least 1k, and executed in GCP with kubernetes+docker to resemble the way production crawls are executed.
We can have a small cluster running with auto-scaling enabled and an associated GCP machine account which can trigger the crawl in CI. I wonder if this should be part of this repo or set up in https://github.com/mozilla/openwpm-crawler? Maybe we should have a very small test crawl as part of the ordinary OpenWPM tests as before and let longitudinal health monitoring of more sizable crawls be implemented in a separate repo?

@vringar
Copy link
Contributor

vringar commented May 7, 2020

I think we should explore running the webserver that we use in local testing in it's own docker container and then have the crawlers run against it in a docker-compose enviroment.
It isn't quite the real thing but it should be close enough seeing as well have localstack and redis running as well.

@birdsarah
Copy link
Contributor

Note that we can conda install redis so when openwpm/OpenWPM#648 is done, we can probably do this without spawning docker containers in travis.

@vringar vringar transferred this issue from openwpm/OpenWPM Nov 11, 2020
@vringar vringar closed this as completed Nov 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants