Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail scrapper when there are too many errors while retrieving xblocks #178

Merged
merged 3 commits into from
Jul 14, 2023

Conversation

benoit74
Copy link
Collaborator

Rationale

Fix #160

Changes

  • scraper will fail when there are too many erros while retrieving xblocks

Implementation details

  • there is now a global watcher for all xblocks, with

    • total_count: the total number of xblocks to retrieve
    • dl_count: the total number of xblocks download attempt so far
    • success_count: the total number of xblocks successful download so far
    • failed_xblocks: details about failed xblocks
  • there are two parameters to control scrapper stop:

    • watcher_min_ratio: the minimum ratio of successful downloads (compared to the number of download attempts)
    • watcher_min_dl_count: the minimum number of xblocks to have attempted to download before stopping the scraper (because otherwise the ratio might be wrong just because we are unlucky in terms of xblocks download order)
  • these two parameters can be set at the CLI level

Copy link
Member

@rgaudin rgaudin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you ; please see inline comments

openedx2zim/scraper.py Outdated Show resolved Hide resolved
openedx2zim/xblocks_extractor/libcast.py Outdated Show resolved Hide resolved
openedx2zim/xblocks_extractor/base_xblock.py Outdated Show resolved Hide resolved
openedx2zim/xblocks_extractor/base_xblock.py Show resolved Hide resolved
openedx2zim/entrypoint.py Outdated Show resolved Hide resolved
@benoit74 benoit74 requested a review from rgaudin July 13, 2023 05:32
Copy link
Member

@rgaudin rgaudin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@benoit74 benoit74 merged commit 53db630 into main Jul 14, 2023
1 check passed
@benoit74 benoit74 deleted the fail_on_too_many_errors branch July 14, 2023 13:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fail on missing resources
2 participants