-
-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fail on missing resources #160
Comments
@rgaudin should we fail on all types of resources (which might be a bit tricky to implement as we have some invalid URLs too), or shall we fail on only major resources like videos? |
Can you describe what are the failures we get currently? Which URLs and the reasons? |
Yep. Some of them are during the subtitle download for some videos and fail with a 404, due to invalid links in the HTML (as it can be very random). We currently do acknowledge if download was successfull and rewrite the links only if successful downloads took place. One solution would be to handle this explicitly for different xblocks and types of assets. A better solution would be to fail when we get errors and we have exhausted all retry attempts. But then we need to ensure that the URL exists and is not some random invalid URL due to which we fail the whole scraper. Moreover, for some links, the content might not be available. An example would be video 8 on https://mooc.phzh.ch/courses/course-v1:PHZH+W-IB+2019_E/9a122b295d484793bbf1a33ab0217a69/ , which has been removed from YouTube, and hence youtube_dl would throw an error. |
This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions. |
Would it make sense to allow some failed resources like I did on iFixit. I mean we should probably not fail on first resource missing, but maybe an absolute and/or a relative threshold would make sens, e.g. if more than 10% of resources are missing, it means that we have a significant bug which should fail the scrapper run. Does it makes any sense? |
As seen in #159, there are cases where we failed to download resources yet succeeded the scraper. We should fail on missing resources.
The text was updated successfully, but these errors were encountered: