Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add helpers to extract crawl metrics / data verification #12

Open
motin opened this issue Sep 1, 2019 · 0 comments
Open

Add helpers to extract crawl metrics / data verification #12

motin opened this issue Sep 1, 2019 · 0 comments

Comments

@motin
Copy link

motin commented Sep 1, 2019

Currently after each crawl, we run data verification using a rather manual process, requiring quite a lot of notebook copying/cloning.

Ideally, it should be enough to run something like crawl_metrics(s3_bucket, crawl_directory) or similar to get relevant metrics, including those from https://github.com/citp/openwpm-data-release/blob/master/Crawl-Data-Metrics.ipynb and those in the notebook linked in openwpm/openwpm-crawler#30 (comment).

A companion crawl_metrics_summary(crawl_metrics) method could be included to print out the most relevant metrics in human-readable form.

Use cases:

  • Include at the top of every crawl-analysis notebook to understand the nature of the gathered crawl dataset
  • To easily set up notebooks that analyses notebook crawl datasets longitudinally and/or compares individual crawl datasets
  • Include in OpenWPM CI to spot regressions in crawl performance/health (related: https://github.com/mozilla/OpenWPM/issues/479)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant